The method of integrating Mongodb by Spark 10/26 Update SLTechnology News&Howtos

The method of integrating Mongodb by Spark

2025-10-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction to Spark

Officially, Spark is a general purpose, fast, large-scale data processing engine.

Generality: We can use Spark SQL for general analytics, Spark Streaming for streaming data processing, and Mlib for machine learning. Java, Python, Scala and R language support is also one of the manifestations of its versatility.

Fast: This may have been one of the initial reasons for Spark's success, largely due to its memory-based approach to computing. When the data to be processed needs to be iterated repeatedly, Spark can temporarily store the data directly in memory, instead of writing the data back to disk like Map Reduce. Official data shows that it can be 100 times faster than traditional Map Reduce.

Large-scale: Native HDFS support, and its compute nodes support elastic scaling, leveraging the concurrency of large amounts of cheap compute resources to support large-scale data processing.

Environmental preparation

mongodb download

Unpack installation

Start mongodb service

$MONGODB_HOME/bin/mongod --fork --dbpath=/root/data/mongodb/ --logpath=/root/data/log/mongodb/mongodb.log

POM dependency

org.mongodb.spark mongo-spark-connector_2.11 ${spark.version}

example code

object ConnAppTest { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .master("local[2]") .appName("ConnAppTest") .config("spark.mongodb.input.uri", "mongodb://192.168.31.136/testDB.testCollection") //Specify mongodb input.config("spark.mongodb.output.uri", "mongodb://192.168.31.136/testDB.testCollection") //Specify mongodb output.getOrCreate() //Generate test data val documents = spark. spark.parallelize ((1 to 10).map(i => Document.parse(s"{test: $i}"))) //store data to mongodb MongoSpark.save (documents) //load data val rdd = MongoSpark.load(spark) //printout rdd.show} }

summary

The above is the method of Spark integration Mongodb introduced by Xiaobian to you. I hope it will help you. If you have any questions, please leave a message to me. Xiaobian will reply to you in time. Thank you very much for your support!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.