In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.
1. Brief introduction
This article focuses on how to use java to generate Avro format data and how to convert Avro data files into DataSet and DataFrame through spark.
What is 1.1Apache Arvo?
Apache Avro is a data serialization system. Avro provides API interfaces for Java, Python, C, C++, C # and other languages. Let's use an example of java to illustrate Avro serialization and deserialization of data.
Simple integration of file container remote procedure call (RPC) dynamic language that supports rich data structures, fast, compressible binary data format for storing persistent data
2.Avro data generation
2.1Definition Schema file
1. Download avro-tools-1.8.1.jar
Avro official website: http://avro.apache.org/ Avro version: 1.8.1 download Avro related jar package: avro-tools-1.8.1.jar this jar package main users will generate the defined schema file to the corresponding java file
two。 Define a schema file named CustomerAdress.avsc
{
"namespace": "com.peach.arvo"
"type": "record"
"name": "CustomerAddress"
"fields": [
{"name": "ca_address_sk", "type": "long"}
{"name": "ca_address_id", "type": "string"}
{"name": "ca_street_number", "type": "string"}
{"name": "ca_street_name", "type": "string"}
{"name": "ca_street_type", "type": "string"}
{"name": "ca_suite_number", "type": "string"}
{"name": "ca_city", "type": "string"}
{"name": "ca_county", "type": "string"}
{"name": "ca_state", "type": "string"}
{"name": "ca_zip", "type": "string"}
{"name": "ca_country", "type": "string"}
{"name": "ca_gmt_offset", "type": "double"}
{"name": "ca_location_type", "type": "string"}
]
}
Schema description:
Namespace: when generating a java file, the import package path type:omplex types (record, enum,array, map, union, and fixed) name: the fields and types defined in the class name fileds:schema when generating the java file
3. Generate java code files
Generate java code using the avro-tools-1.8.1.jar package downloaded in step 1
Java-jar avro-tools-1.8.1.jar compile schema CustomerAddress.avsc.
"." at the end. On behalf of java code, it is generated in the current directory. When the command is executed successfully, it is displayed:
2.2 generate Avro files using Java
1. Create a java project using Maven
Add the following dependencies to the pom.xml file
Org.apache.avro
Avro
1.8.1
two。 Create a new java class GenerateDataApp with the following code
Dynamically generate the avro file, by encapsulating the data as GenericRecord objects, dynamically write to the avro file, the following code snippet:
3. Spark reads the Avro file
1. Create a scala project using Maven
Add the following dependencies to the pom.xml file
2.Scala case code snippet
Results of 3.Spark operation
Source address:
Https://github.com/javaxsky/avrotospark
Drunken whips are famous horses, and teenagers are so pompous! Lingnan Huan Xisha, under the vomiting liquor store! The best friend refuses to let go, the flower of data play!
Warm Tip: to see the high-definition no-code picture, please open it with your mobile phone and click the picture to enlarge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.