How to write udaf function to calculate the median of spark 07/02 Update SLTechnology News&Howtos

How to write udaf function to calculate the median of spark

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "spark how to write udaf function to find the median", interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next let the editor to take you to learn "spark how to write udaf functions to find the median" it!

Package com.frank.sparktest.java;import org.apache.spark.sql.Row;import org.apache.spark.sql.expressions.MutableAggregationBuffer;import org.apache.spark.sql.expressions.UserDefinedAggregateFunction;import org.apache.spark.sql.types.DataType;import org.apache.spark.sql.types.DataTypes;import org.apache.spark.sql.types.StructField;import org.apache.spark.sql.types.StructType;import java.util.ArrayList;import java.util.Arrays;import java.util.Collections;import java.util.List Public class MedianUdaf extends UserDefinedAggregateFunction {private StructType inputSchema; private StructType bufferSchema; public MedianUdaf () {List inputFields = new ArrayList (); inputFields.add (DataTypes.createStructField ("nums", DataTypes.IntegerType,true)); inputSchema=DataTypes.createStructType (inputFields); List bufferFields = new ArrayList (); bufferFields.add (DataTypes.createStructField ("datas", DataTypes.StringType,true)); bufferSchema=DataTypes.createStructType (bufferFields) } @ Override public StructType inputSchema () {return inputSchema;} @ Override public StructType bufferSchema () {return bufferSchema;} @ Override public DataType dataType () {return DataTypes.DoubleType;} @ Override public boolean deterministic () {return true;} @ Override public void initialize (MutableAggregationBuffer buffer) {buffer.update (0); buffer.update (1) } @ Override public void update (MutableAggregationBuffer buffer, Row input) {if (! input.isNullAt (0)) {buffer.update (0 buffer.getString (0) + "," + input.getInt (0));} @ Override public void merge (MutableAggregationBuffer buffer1, Row buffer2) {buffer1.update (0Jing buffer1.getString (0) + "," + buffer2.getInt (0)) } @ Override public Object evaluate (Row buffer) {List list = new ArrayList (); List stringList = Arrays.asList (buffer.getString (0). Split (",")); for (String s: stringList) {list.add (Integer.valueOf (s));} Collections.sort (list); int size = list.size (); int num=0 If (size% 2 = = 1) {num = list.get ((size / 2) + 1);} if (size% 2 = = 0) {num = (list.get (size / 2) + list.get ((size / 2) + 1) / 2;} return num;}}

The above is a code snippet that can be used directly.

Here is the test program

Package com.frank.sparktest.java;import org.apache.spark.sql.SQLContext;import org.apache.spark.sql.SparkSession;import org.apache.spark.sql.types.DataTypes;import java.io.IOException;import java.util.stream.IntStream;public class DemoUDAF {public static void main (String [] args) throws IOException {SQLContext sqlContext = SparkSession.builder () .master ("local") .getOrCreate () .sqlContext () SqlContext.udf (). Register ("generate", (Integer start, Integer end)-> IntStream.range (start, end+1). Boxed (). ToArray (), DataTypes.createArrayType (DataTypes.IntegerType); sqlContext.udf (). Register ("media", new MedianUdaf ()); sqlContext.sql ("select generate (1Mag10)"). Show () }} at this point, I believe you have a deeper understanding of "spark how to write udaf functions to find the median". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.