Java Spark算子:distinct

Isoke ·
更新时间:2024-11-13
· 854 次阅读

import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import java.util.Arrays; import java.util.List; /** * distinct()算子 * 去除RDD的重复元素。 * */ public class DistinctDemo { public static void main(String[] args) { SparkConf conf = new SparkConf().setMaster("local").setAppName("spark"); JavaSparkContext sc = new JavaSparkContext(conf); List list = Arrays.asList("a","b","c","a","b","c","d"); JavaRDD javaRDD = sc.parallelize(list); //distinct算子:去重 JavaRDD reduce = javaRDD.distinct(); System.err.println(reduce.collect()); } }
作者:默默倾听全世界



JAVA spark distinct

需要 登录 后方可回复, 如果你还没有账号请 注册新账号