In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to generate Apache Avro data". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to generate Apache Avro data.
Introduction to Avro
Avro is a data serialization system
Provides:
Rich data structure
Compact, fast, binary data format
A file format for storing persistent data
Remote procedure call system (RPC)
Simple interaction with dynamic languages. You do not need to generate code for reading and writing data files, nor do you need to use or implement the RPC protocol. Code generation is an optimization, but it only makes sense for static languages.
Background with the rapid development of the Internet, cloud computing, big data, artificial intelligence AI, Internet of things and other cutting-edge technologies have become the mainstream of high and new technologies, such as e-commerce websites, face recognition, self-driving, smart home, smart city and so on, which not only facilitate people's food, clothing, housing and transportation. Behind all the time, a large amount of data is collected, clear and analyzed by a variety of system platforms, and it is particularly important to ensure the low delay, high throughput and security of the data. Apache Avro itself serializes through Schema for binary transmission, which ensures the high-speed transmission of data on the one hand and data security on the other. Avro is more and more widely used in various industries. How to process and parse avro data is particularly important. This article will demonstrate how to serialize avro data and parse it using FlinkSQL.
This article is a demo parsed by avro. Currently, FlinkSQL is only suitable for simple avro data parsing, and complex nested avro data is not supported for the time being.
Scene introduction this article mainly introduces the following three key contents:
How to serialize and generate Avro data
How to deserialize and analyze Avro data
How to use FlinkSQL to parse Avro data
prerequisite
To learn what avro is, you can refer to the apache avro official website Quick start Guide.
Understand the avro application scenario
Operation steps
1. Create a new avro maven project and configure pom dependencies
The pom file is as follows:
4.0.0 com.huawei.bigdata avrodemo 1.0-SNAPSHOT org.apache.avro avro 1.8.1 junit junit 4.12 org.apache.avro avro-maven- Plugin 1.8.1 generate-sources schema ${project.basedir} / src/main/avro / ${project.basedir} / src/main/java/ org.apache.maven.plugins maven-compiler-plugin 1.6 1.6Note: the above pom file is configured with the path to automatically generate classes. That is, after ${project.basedir} / src/main/avro/ and ${project.basedir} / src/main/java/, are configured, when the mvn command is executed, the plug-in will automatically generate the avsc schema class files in this directory and place them in the latter directory. If the avro directory is not generated, just create it manually.
2. Define schema
Use JSON to define a schema for Avro. Schema consists of basic types (null,boolean, int, long, float, double, bytes and string) and complex types (record, enum, array, map, union, and fixed). For example, the following defines a schema for user, creates an avro directory under the main directory, and then creates a new file user.avsc under the avro directory:
{"namespace": "lancoo.ecbdc.pre", "type": "record", "name": "User", "fields": [{"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string" "null"]}]}
3. Compile schema
Click the compile of the maven projects project to compile, and the namespace path and User class code will be created automatically
4. Serialization
Create a TestUser class to serialize the generated data
User user1 = new User (); user1.setName ("Alyssa"); user1.setFavoriteNumber (256); / / Leave favorite col or null// Alternate constructorUser user2 = new User ("Ben", 7, "red"); / / Construct via builderUser user3 = User.newBuilder () .setName ("Charlie") .setFavoriteColor ("blue") .setFavoriteNumber (null) .build (); / / Serialize user1, user2 and user3 to diskDatumWriter userDatumWriter = new SpecificDatumWriter (User.class) DataFileWriter dataFileWriter = new DataFileWriter (userDatumWriter); dataFileWriter.create (user1.getSchema (), new File ("user_generic.avro")); dataFileWriter.append (user1); dataFileWriter.append (user2); dataFileWriter.append (user3); dataFileWriter.close (); after executing the serializer, avro data is generated in the project's sibling directory
The user_generic.avro content is as follows:
Objavro.schema name {"type": "record", "name": "User", "namespace": "lancoo.ecbdc.pre", "fields": [{"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]}]}
5. Deserialization
Parsing avro data through deserialization code
/ / Deserialize Users from diskDatumReader userDatumReader = new SpecificDatumReader (User.class); DataFileReader dataFileReader = new DataFileReader (new File ("user_generic.avro"), userDatumReader); User user = null;while (dataFileReader.hasNext ()) {/ / Reuse user object by passing it to next (). This saves us from / / allocating and garbage collecting many objects for files with / / many items. User = dataFileReader.next (user); System.out.println (user);} execute deserialization code to parse user_generic.avro
Avro data parsing succeeded.
6. Upload user_generic.avro to hdfs path
Hdfs dfs-mkdir-p / tmp/lztest/hdfs dfs-put user_generic.avro / tmp/lztest/
7. Configure flinkserver
Prepare the avro jar package
Put flink-sql-avro-*.jar and flink-sql-avro-confluent-registry-*.jar into flinkserver lib and execute the following command on all flinkserver nodes
Cp / opt/huawei/Bigdata/FusionInsight_Flink_8.1.2/install/FusionInsight-Flink-1.12.2/flink/opt/flink-sql-avro*.jar / opt/huawei/Bigdata/FusionInsight_Flink_8.1.3/install/FusionInsight-Flink-1.12.2/flink/libchmod 500 flink-sql-avro*.jarchown omm:wheel flink-sql-avro*.jar
At the same time, restart the FlinkServer instance. After the restart, check whether the avro package has been uploaded.
Hdfs dfs-ls / FusionInsight_FlinkServer/8.1.2-312005/lib
8. Write FlinkSQL
CREATE TABLE testHdfs (name String, favorite_number int, favorite_color String) WITH ('connector' =' filesystem', 'path' =' hdfs:///tmp/lztest/user_generic.avro', 'format' =' avro') CREATE TABLE KafkaTable (name String, favorite_number int, favorite_color String) WITH ('connector' =' kafka', 'topic' =' testavro', 'properties.bootstrap.servers' =' 96.10.2.1 favorite_color String 21005mm, 'properties.group.id' =' testGroup', 'scan.startup.mode' =' latest-offset', 'format' =' avro'); insert into KafkaTableselect * from testHdfs
Save submit task
9. Check whether there is any data in the corresponding topic
FlinkSQL successfully parsed avro data.
Thank you for your reading, the above is the content of "how to generate Apache Avro data", after the study of this article, I believe you have a deeper understanding of how to generate Apache Avro data, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.