Example Analysis of Apache Avro data 07/09 Update SLTechnology News&Howtos

Example Analysis of Apache Avro data

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the example analysis of Apache Avro data, which is very detailed and has a certain reference value. Friends who are interested must read it!

With the rapid development of the Internet, cloud computing, big data, artificial intelligence AI, Internet of things and other cutting-edge technologies have become the mainstream of high and new technologies, such as e-commerce websites, face recognition, self-driving, smart home, smart city and so on, which not only facilitate people's clothing, food, housing and transportation, but also have a large number of data collected, clear and analyzed by various system platforms all the time. It is particularly important to ensure the low latency, high throughput and security of the data. Apache Avro itself serializes and transmits binary data through Schema, on the one hand, it ensures the high-speed transmission of data, on the other hand, it ensures the security of data. Avro is more and more widely used in various industries. How to process and parse avro data is particularly important. This article will demonstrate how to serialize and generate avro data. And use FlinkSQL for parsing.

This article is a demo parsed by avro. Currently, FlinkSQL is only suitable for simple avro data parsing, and complex nested avro data is not supported for the time being.

Scene introduction

This paper mainly introduces the following three key contents:

How to serialize and generate Avro data

How to deserialize and analyze Avro data

How to use FlinkSQL to parse Avro data

prerequisite

To learn what avro is, you can refer to the apache avro official website Quick start Guide.

Understand the avro application scenario

Operation steps

1. Create a new avro maven project and configure pom dependencies

The pom file is as follows:

4.0.0 com.huawei.bigdata avrodemo 1.0-SNAPSHOT org.apache.avro avro 1.8.1 junit junit 4.12 org.apache.avro avro-maven- Plugin 1.8.1 generate-sources schema ${project.basedir} / src/main/avro / ${project.basedir} / src/main/java/ org.apache.maven.plugins maven-compiler-plugin 1.61.6

Note: the above pom file is configured with the path to automatically generate classes, that is, after project.basedir/src/main/avro/ and {project.basedir} / src/main/avro/ and project.basedir/src/main/avro/ and {project.basedir} / src/main/java/, are configured, when executing the mvn command, the plug-in will automatically generate the avsc schema class files in this directory and put them in the latter directory. If the avro directory is not generated, just create it manually.

2. Define schema

Use JSON to define a schema for Avro. Schema consists of basic types (null,boolean, int, long, float, double, bytes and string) and complex types (record, enum, array, map, union, and fixed). For example, the following defines a schema for user, creates an avro directory under the main directory, and then creates a new file user.avsc under the avro directory:

{"namespace": "lancoo.ecbdc.pre", "type": "record", "name": "User", "fields": [{"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string" "null"]}]}

3. Compile schema

Click the compile of the maven projects project to compile, and the namespace path and User class code will be created automatically

4. Serialization

Create a TestUser class to serialize the generated data

User user1 = new User (); user1.setName ("Alyssa"); user1.setFavoriteNumber (256); / / Leave favorite col or null// Alternate constructorUser user2 = new User ("Ben", 7, "red"); / / Construct via builderUser user3 = User.newBuilder () .setName ("Charlie") .setFavoriteColor ("blue") .setFavoriteNumber (null) .build (); / / Serialize user1, user2 and user3 to diskDatumWriter userDatumWriter = new SpecificDatumWriter (User.class) DataFileWriter dataFileWriter = new DataFileWriter (userDatumWriter); dataFileWriter.create (user1.getSchema (), new File ("user_generic.avro")); dataFileWriter.append (user1); dataFileWriter.append (user2); dataFileWriter.append (user3); dataFileWriter.close ()

After the serializer is executed, avro data is generated in the sibling directory of the project

The user_generic.avro content is as follows:

Objavro.schema name {"type": "record", "name": "User", "namespace": "lancoo.ecbdc.pre", "fields": [{"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]}]}

So far, the avro data has been generated.

5. Deserialization

Parsing avro data through deserialization code

/ / Deserialize Users from diskDatumReader userDatumReader = new SpecificDatumReader (User.class); DataFileReader dataFileReader = new DataFileReader (new File ("user_generic.avro"), userDatumReader); User user = null;while (dataFileReader.hasNext ()) {/ / Reuse user object by passing it to next (). This saves us from / / allocating and garbage collecting many objects for files with / / many items. User = dataFileReader.next (user); System.out.println (user);}

Execute deserialization code to parse user_generic.avro

Avro data parsing succeeded.

6. Upload user_generic.avro to hdfs path

Hdfs dfs-mkdir-p / tmp/lztest/hdfs dfs-put user_generic.avro / tmp/lztest/

7. Configure flinkserver

Prepare the avro jar package

Put flink-sql-avro-*.jar and flink-sql-avro-confluent-registry-*.jar into flinkserver lib and execute the following command on all flinkserver nodes

Cp / opt/huawei/Bigdata/FusionInsight_Flink_8.1.2/install/FusionInsight-Flink-1.12.2/flink/opt/flink-sql-avro*.jar / opt/huawei/Bigdata/FusionInsight_Flink_8.1.3/install/FusionInsight-Flink-1.12.2/flink/libchmod 500 flink-sql-avro*.jarchown omm:wheel flink-sql-avro*.jar

At the same time, restart the FlinkServer instance. After the restart, check whether the avro package has been uploaded.

Hdfs dfs-ls / FusionInsight_FlinkServer/8.1.2-312005/lib

8. Write FlinkSQL

CREATE TABLE testHdfs (name String, favorite_number int, favorite_color String) WITH ('connector' =' filesystem', 'path' =' hdfs:///tmp/lztest/user_generic.avro', 'format' =' avro') CREATE TABLE KafkaTable (name String, favorite_number int, favorite_color String) WITH ('connector' =' kafka', 'topic' =' testavro', 'properties.bootstrap.servers' =' 96.10.2.1 favorite_color String 21005mm, 'properties.group.id' =' testGroup', 'scan.startup.mode' =' latest-offset', 'format' =' avro'); insert into KafkaTableselect * from testHdfs

Save submit task

9. Check whether there is any data in the corresponding topic

FlinkSQL successfully parsed avro data.

The above is all the contents of the article "sample Analysis of Apache Avro data". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.