In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "go language how to use Protobuf to do data exchange", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "go language how to use Protobuf to do data exchange" bar!
Protocol buffers (Protobufs), like XML and JSON, allow applications written in different languages and running on different platforms to exchange data. For example, a sender written in Go can encode sales order data expressed in Go in Protobuf, and then a receiver written in Java can decode it to get a Java representation of the received order data.
Compared to XML and JSON, Protobuf encoding is binary rather than text, which complicates debugging. However, as the code examples in this article confirm, Protobuf coding is much more efficient in size than XML or JSON coding.
Protobuf provides this effectiveness in another way. At the implementation level, Protobuf and other coding systems serialize serialize and deserialize deserialize for structured data. Serialization converts the data structure of a particular language into a byte stream, and deserialization is an inverse operation that converts the byte stream back to the data structure of a particular language. Serialization and deserialization can be bottlenecks in data exchange because these operations take up a lot of CPU. Efficient serialization and deserialization is another design goal of Protobuf.
Recent coding technologies, such as Protobuf and FlatBuffers, originated from the DCE/RPC (distributed Computing Environment / remote procedure call Distributed Computing Environment/Remote Procedure Call) project in the early 1990s. Like DCE/RPC, Protobuf contributes to IDL (Interface definition language) and coding layer in data exchange.
This article will look at these two layers, then provide code examples in Go and Java to enrich the details of Protobuf and show that Protobuf is easy to use.
Protobuf as an IDL and coding layer
Like Protobuf, DCE/RPC is designed to be language-and platform-independent. Appropriate libraries and utilities allow any language and platform to be used in the DCE/RPC domain. In addition, the DCE/RPC architecture is very elegant. An IDL document is a protocol between a remote process on one side and a caller on the other side. Protobuf is also centered around IDL documents.
IDL documents are text, and in DCE/RPC, use basic C syntax as well as syntax extensions for metadata (square brackets) and some new keywords, such as interface. This is an example:
[uuid (2d6ead46-05e3-11ca-7dd1-426909beabcd), version] interface echo {const long int ECHO_SIZE = 512; void echo ([in] handle_t h, [in, string] idl_char from_client [], [out, string] idl_char from_ service [echo _ SIZE]);}
The IDL document declares a procedure called echo that takes three parameters: the [in] parameter of type handle_t (implementation pointer) and idl_char (ASCII character array) is passed to the remote procedure, and the [out] parameter (also a string) is passed back from the procedure. In this example, the echo procedure does not explicitly return a value (void to the left of echo), but it can also return a value. The return value, along with one or more [out] parameters, allows the remote procedure to return many values at will. The next section will introduce Protobuf IDL, which has a different syntax, but is also used as a protocol in data exchange.
The IDL documents in DCE/RPC and Protobuf are inputs to utilities that create infrastructure code for exchanging data:
IDL documents-> DCE/PRC or Protobuf utilities-> support code for data exchange
As a relatively simple text, IDL is an equally easy-to-read document about the details of data exchange (especially the number of data items exchanged and the data type of each item).
Protobuf can be used in modern RPC systems, such as gRPC;, but Protobuf itself provides only the IDL layer and the coding layer for messages delivered from sender to receiver. Like the original DCE/RPC, Protobuf coding is binary, but more efficient.
At present, XML and JSON coding still dominate data exchange through technologies such as Web services, which make use of existing infrastructure such as Web servers, transport protocols (such as TCP, HTTP), and standard libraries and utilities to process XML and JSON documents. In addition, various types of database systems can store XML and JSON documents, and even old relational systems can easily generate XML codes for query results. Every general programming language now has libraries that support XML and JSON. So what brings us back to binary coding systems like Protobuf?
Let's take a look at the negative decimal value-128. In the complement binary representation of 2 (dominant in systems and languages), this value can be stored in a single 8-bit byte: 10000000. The text encoding of this integer value in XML or JSON requires multiple bytes. For example, UTF-8 encoding requires a string of four bytes, or-128, or one byte per character (hexadecimal with values of 0x2d, 0x31, 0x32, and 0x38). XML and JSON also add markup characters, such as angle braces and curly braces. Details on Protobuf encoding are described below, but the focus now is on a general point: text encoding is significantly less compressed than binary encoding.
Example of using Protobuf in Go
My code example focuses on Protobuf rather than RPC. The following is an overview of the first example:
The IDL file named dataitem.proto defines an Protobuf message with six different types of fields: a different range of integer values, fixed-size floating-point values, and two strings of different lengths.
The Protobuf compiler uses IDL files to generate Protobuf messages and supporting functions for the Go version (and later the Java version).
The Go application populates the native Go data structure with randomly generated values and then serializes the result to a local file. XML and JSON encodings are also serialized to local files for comparison purposes.
As a test, the Go application reconstructs an instance of its native data structure by deserializing the contents of the Protobuf file.
As a language neutrality test, Java applications also deserialize the contents of the Protobuf file to obtain instances of native data structures.
This IDL file, along with two Go and a Java source file, is available on my website, packaged as a ZIP file.
The most important Protobuf IDL documents are shown below. The document is stored in the file dataitem.proto and has a regular .proto extension.
Example 1. Protobuf IDL document syntax = "proto3"; package main; message DataItem {int64 oddA = 1; int64 evenA = 2; int32 oddB = 3; int32 evenB = 4; float small = 5; float big = 6; string short = 7; string long = 8;}
This IDL uses the current proto3 instead of the earlier proto2 syntax. The package name (main in this case) is optional, but it is conventionally used to avoid name conflicts. This structured message consists of eight fields, each with a Protobuf data type (for example, int64, string), a name (for example, oddA, short), and a numeric label followed by an equal sign = (that is, the key). Labels (1 to 8 in this example) are unique integer identifiers that determine the order in which fields are serialized.
Protobuf messages can be nested to any level, and one message can be the field type of another message. This is an example of using a DataItem message as a field type:
Message DataItems {repeated DataItem item = 1;}
A single DataItems message consists of duplicate (zero or more) DataItem messages.
For clarity, Protobuf also supports enumerated types:
Enum PartnershipStatus {reserved "FREE", "CONSTRAINED", "OTHER";}
The reserved qualifier ensures that the numeric values used to implement these three symbolic names cannot be reused.
In order to generate one or more language-specific versions that declare Protobuf message structures, IDL files containing these structures are passed to the protoc compiler (which can be found in the Protobuf GitHub repository). For Go code, you can install the supported Protobuf libraries in the usual way (here with% as a command line prompt):
% go get github.com/golang/protobuf/proto
The command to compile the Protobuf IDL file dataitem.proto into Go source code is:
% protoc-- go_out=. Dataitem.proto
Flag-go_out instructs the compiler to generate Go source code. Other languages have similar logos. In this case, the result is a file called dataitem.pb.go, which is small enough to copy its basic contents into the Go application. The following is the main part of the generated code:
Var _ = proto.Marshal type DataItem struct {OddA int64 `protobuf: "varint,1,opt,name=oddA" json: "oddA,omitempty" `EvenA int64 `protobuf: "varint,2,opt,name=evenA" json: "evenA,omitempty" `OddB int32 `protobuf: "varint,3,opt,name=oddB" json: "oddB,omitempty" `EvenB int32 `protobuf: "varint,4,opt,name=evenB" json: "evenB,omitempty" `Small float32 `protobuf: "fixed32,5,opt,name=small" json: "small,omitempty" `Big float32 `protobuf: "fixed32,6 Opt,name=big "json:" big,omitempty "`Short string `protobuf:" bytes,7,opt,name=short "json:" short,omitempty "`Long string `protobuf:" bytes,8,opt,name=long "json:" long,omitempty "`} func (m * DataItem) Reset () {* m = DataItem {} func (m * DataItem) String () string {return proto.CompactTextString (m)} func (* DataItem) ProtoMessage () {} func init () {}
The code generated by the compiler has the Go structure DataItem, which exports the Go field (the name is now uppercase), which matches the name declared in Protobuf IDL. The structure field has standard Go data types: int32, int64, float32, and string. At the end of each field line is a string that describes the Protobuf type, providing numeric tags in the Protobuf IDL document and metadata about JSON information, which will be discussed later.
There are also functions; the most important is Proto.Marshal, which is used to serialize instances of DataItem structures into Protobuf format. The helper functions include clearing the Reset of the DataItem structure and generating the String of the single-line string representation of the DataItem.
The metadata that describes the Protobuf encoding should be carefully studied before analyzing the Go program in more detail.
Protobuf coding
The structure of the Protobuf message is a collection of key / value pairs, where the numeric label is the key and the corresponding field is the value. Field names (for example, oddA and small) are for human reading, but the protoc compiler does use field names to generate language-specific corresponding names. For example, the oddA and small names in Protobuf IDL become fields OddA and Small in the Go structure, respectively.
Keys and their values are encoded, but there is one important difference: some numeric values have fixed-size 32-or 64-bit encodings, while others (including message labels) are varint-encoded, depending on the absolute value of the integer. For example, integer values 1 to 15 require 8-bit varint encoding, while values 16 to 2047 require 16 bits. Varint coding is similar in nature to UTF-8 coding (but with different details), preferring smaller integer values to larger integer values. (for detailed analysis, see the Protobuf coding Guide.) the result is that the Protobuf message should have a small integer value in the field (if possible) and as few keys as possible, but each field must have at least one key.
Table 1 below lists the main points of Protobuf coding:
Encoding sample type length varintint32, uint32, int64 variable length fixedfixed32, float, double fixed 32-bit or 64-bit length byte sequence string, bytes sequence length
Table 1. Protobuf data type
An unspecified fixed-length integer type is varint-encoded; therefore, in the varint type, such as uint32 (u stands for unsigned), the number 32 describes the range of integers (in this case, 0 to 232-1) rather than the size of its bits, which depend on the value. In contrast, for fixed-length types (such as fixed32 or double), Protobuf encoding requires 32 bits and 64 bits, respectively. A string in Protobuf is a sequence of bytes; therefore, the size of the field encoding is the length of the sequence of bytes.
Another efficient method is worth mentioning. Recall from the previous example that the DataItems message consists of duplicate DataItem instances:
Message DataItems {repeated DataItem item = 1;}
Repeated indicates that the DataItem instance is packaged: the collection has a single label, in this case 1. Therefore, DataItems messages with duplicate DataItem instances are more efficient than messages with multiple but separate DataItem fields, each of which requires its own label.
With this background in mind, let's go back to the Go program.
Details of the dataItem program
The dataItem program creates an instance of DataItem and populates the field with randomly generated values of the appropriate type. Go has a rand package with functions for generating pseudorandom integers and floating-point values, while my randString function can generate pseudorandom strings of specified length from a character set. The design goal is to have an instance of DataItem with field values of different types and bit sizes. For example, OddA and EvenA values are odd and even 64-bit non-negative integer values, respectively, but OddB and EvenB variants are 32 bits in size and hold small integer values between 0 and 2047. Random floating point values are 32 bits in size and the string is the length of 16 (Short) and 32 (Long) characters. This is the code snippet that populates the DataItem structure with random values:
/ / variable length integer N1: = rand.Int63 () / / large integer if (N1 & 1) = 0 {N1 integer +} / / make sure it is odd... n3: = rand.Int31 ()% UpperBound / / small integer if (n3 & 1) = = 0 {n3 integer +} / / make sure it is odd / / fixed length floating point number... T1: = rand.Float32 () T2: = rand.Float32 ()... / / string str1: = randString (StrShort) str2: = randString (StrLong) / / message dataItem: = & DataItem {OddA: N1 EvenA: n2, OddB: n3, EvenB: n4, Big: f1, Small: f2, Short: str1, Long: str2,}
After you create and populate the values, the DataItem instance is encoded in XML, JSON, and Protobuf, each of which is written to the local file:
Func encodeAndserialize (dataItem * DataItem) {bytes, _: = xml.MarshalIndent (dataItem, ",") / / Xml to dataitem.xml ioutil.WriteFile (XmlFile, bytes, 0644) / / 0644 is file access permissions bytes, _ = json.MarshalIndent (dataItem, ",") / / Json to dataitem.json ioutil.WriteFile (JsonFile, bytes, 0644) bytes _ = proto.Marshal (dataItem) / / Protobuf to dataitem.pbuf ioutil.WriteFile (PbufFile, bytes, 0644)}
These three serialization functions use the term marshal, which has roughly the same meaning as serialize. As shown in the code, all three Marshal functions return a byte array and then write it to the file. (for simplicity, ignore possible error handling. ) in the example run, the file size is:
Dataitem.xml: 262 bytesdataitem.json: 212 bytesdataitem.pbuf: 88 bytes
Protobuf coding is obviously smaller than the other two coding schemes. By eliminating indented characters (white space and newline characters in this case), you can slightly reduce the size of XML and JSON serialization.
Here is the dataitem.json file, which was eventually generated by the json.MarshalIndent call, with comments starting with # #:
{"oddA": 4744002665212642479, # # 64-bit > = 0 "evenA": 2395006495604861128, # # ditto "oddB": 57, # # 32-bit > = 0 but < 2048 "evenB": 468, # # ditto "small": 0.7562016 # # 32-bit floating-point "big": 0.85202795, # # ditto "short": "ClH1oDaTtoX$HBN5", # # 16 random chars "long": "xId0rD3CR% 3Wt% ^ QjcFLJgyXBu9 ^ DZI" # # 32 random chars}
Although this serialized data is written to a local file, the data can be written to the output stream of the network connection using the same method.
Test serialization and deserialization
The Go program then runs the basic test by deserializing the bytes previously written to the dataitem.pbuf file into an DataItem instance. This is a code snippet that removes the error checking section:
Filebytes, err: = ioutil.ReadFile (PbufFile) / / get the bytes from the file...testItem.Reset () / / clear the DataItem structureerr = proto.Unmarshal (filebytes, testItem) / / deserialize into a DataItem instance
The proto.Unmarshal function for Protbuf deserialization is the opposite of the proto.Marshal function. The original DataItem and deserialized copy will be printed to confirm an exact match:
Original:2041519981506242154 3041486079683013705 1192 18790.572123 0.326855boPb#T0O8Xd&Ps5EnSZqDg4Qztvo7IIs 9vH66AiGSQgCDxk & Deserialized:2041519981506242154 3041486079683013705 1192 18790.572123 0.326855boPb#T0O8Xd&Ps5EnSZqDg4Qztvo7IIs 9vH66AiGSQgCDxk & a Java Protobuf client
The example written in Java is to confirm the language neutrality of Protobuf. The original IDL file can be used to generate Java support code, which involves nested classes. However, to suppress the warning message, some additions can be made. This is the revision, which specifies a DataMsg as the name of the external class, and the inner class is automatically named DataItem after the Protobuf message:
Syntax = "proto3"; package main; option java_outer_classname = "DataMsg"; message DataItem {.
With this change, the protoc compilation is the same as before, except that the expected output is now Java instead of Go:
% protoc-- java_out=. Dataitem.proto
The resulting source file (in a subdirectory named main) is DataMsg.java and is about 1120 lines long: Java is not concise. Compiling and then running Java code requires a JAR file supported by the Protobuf library. The file is located in the Maven repository.
After placing these snippets, my test code is relatively short (and provided as Main.java in the ZIP file):
Package main;import java.io.FileInputStream; public class Main {public static void main (String [] args) {String path = "dataitem.pbuf"; / / from the Go program's serialization try {DataMsg.DataItem deserial = DataMsg.DataItem.newBuilder (). MergeFrom (new FileInputStream (path)). Build (); System.out.println (deserial.getOddA ()); / / 64-bit odd System.out.println (deserial.getLong ()) / / 32-character string} catch (Exception e) {System.err.println (e);}
Of course, production-level testing will be more thorough, but even this preliminary test can prove Protobuf's language neutrality: the dataitem.pbuf file is the result of the Go program serializing the Go language version of DataItem, and the bytes in the file are deserialized to produce a DataItem instance of the Java language. The output of the Java test is the same as that of the Go test.
End with the numPairs program
Let's end with an example to highlight Protobuf efficiency but emphasize the costs involved in any coding technique. Consider the following Protobuf IDL files:
Syntax = "proto3"; package main; message NumPairs {repeated NumPair pair = 1;} message NumPair {int32 odd = 1; int32 even = 2;}
The NumPair message consists of two int32 values and an integer label for each field. A NumPairs message is a sequence of embedded NumPair messages.
The numPairs program for the Go language (shown below) creates 2 million instances of NumPair, each attached to a NumPairs message. The message can be serialized and deserialized in a normal way.
Example 2. NumPairs program package main import ("math/rand"time"encoding/xml"encoding/json"io/ioutil"github.com/golang/protobuf/proto") / / protoc-generated code: startvar _ = proto.Marshaltype NumPairs struct {Pair [] * NumPair `protobuf: "bytes,1,rep,name=pair" json: "pair Omitempty "`} func (m * NumPairs) Reset () {* m = NumPairs {} func (m * NumPairs) String () string {return proto.CompactTextString (m)} func (* NumPairs) ProtoMessage () {} func (m * NumPairs) GetPair () [] * NumPair {if m! = nil {return m.Pair} return nil} type NumPair struct {Odd int32 `protobuf:" varint,1,opt,name=odd "json:" odd,omitempty "`Even int32` protobuf:" varint,2,opt Name=even "json:" even,omitempty "`} func (m * NumPair) Reset () {* m = NumPair {} func (m * NumPair) String () string {return proto.CompactTextString (m)} func (* NumPair) ProtoMessage () {} func init () {} / / protoc-generated code: finish var numPairsStruct NumPairsvar numPairs = & numPairsStruct func encodeAndserialize () {/ / XML encoding filename: =". / pairs.xml "bytes, _: = xml.MarshalIndent (numPairs,") "") ioutil.WriteFile (filename, bytes, 0644) / / JSON encoding filename = ". / pairs.json" bytes, _ = json.MarshalIndent (numPairs, ",") ioutil.WriteFile (filename, bytes, 0644) / / ProtoBuf encoding filename = ". / pairs.pbuf" bytes, _ = proto.Marshal (numPairs) ioutil.WriteFile (filename, bytes) 0644)} const HowMany = 200100100 / / two million func main () {rand.Seed (time.Now () .UnixNano ()) / / uncomment the modulus operations to get the more efficient version for i: = 0 I < HowMany NumPair + {N1: = rand.Int31 () /% 2047 if (N1 & 1) = = 0 {N1 if +} / / ensure it's odd N2: = rand.Int31 () / /% 2047 if (N2 & 1) = = 1 {N2 +} / / ensure it's even next: = & NumPair {Odd: N1, Even: N2,} numPairs.Pair = append (numPairs.Pair) Next)} encodeAndserialize ()}
The randomly generated odd and even values in each NumPair range from 0 to 2 billion. In terms of raw data (rather than encoded data), the integers generated in the Go program total 16MB: each NumPair is two integers, a total of 4 million integers, and the size of each value is four bytes.
For comparison purposes, the following table lists 2 million NumPair instances of sample NumsPairs messages encoded by XML, JSON, and Protobuf. Raw data are also included. Because the numPairs program generates random values, the output of the sample run is different, but close to the size shown in the table.
Encoding file byte size Pbuf/ other proportions no pairs.raw16MB169%Protobufpairs.pbuf27MB-JSONpairs.json100MB27%XMLpairs.xml126MB21%
Table 2. Coding overhead of 16MB integers
As expected, there is a significant difference between Protobuf and later XML and JSON. The Protobuf code is about 1/4 of JSON and 1/5 of XML. But the raw data clearly shows that Protobuf can also incur coding overhead: serialized Protobuf messages are larger 11MB than the raw data. Any encoding, including Protobuf, involves structured data, which inevitably adds bytes.
Each of the 2 million serialized instances of NumPair contains four integer values: one for each Even and Odd field in the Go structure, and one for each field and label in the Protobuf encoding. For raw data (instead of encoded data), each instance will reach 16 bytes, with 2 million instances in the sample NumPairs message. However, Protobuf tags, such as the int32 value in the NumPair field, use varint encoding, so the byte length is different. In particular, small integer values (in this case, including tags) require less than four bytes to be encoded.
If the numPairs program is modified so that the values of the two NumPair fields are less than 2048 and their encoding is one or two bytes, the Protobuf encoding decreases from 27MB to 16MB, which is the size of the original data. The following table summarizes the new code size in the sample run.
Encoding file byte size Pbuf/ other proportions Nonepairs.raw16MB100%Protobufpairs.pbuf16MB-JSONpairs.json77MB21%XMLpairs.xml103MB15%
Table 3. An integer less than 2048 that encodes 16MB
In short, the field value of the modified numPairs program is less than 2048, which can reduce the size of each four-byte integer value in the original data. However, Protobuf encoding still requires tags, which add bytes to the Protobuf message. Protobuf encoding does increase the message size, but if you want to encode relatively small integer values (whether fields or keys), you can reduce this overhead through the varint factor.
For medium-sized messages that contain mixed types of structured data (with relatively small integer values), Protobuf is significantly better than options such as XML and JSON. In other cases, the data may not be suitable for Protobuf encoding. For example, if two applications need to share a large number of text records or large integer values, you can use compression instead of coding.
Thank you for your reading, the above is the content of "go language how to use Protobuf to do data exchange". After the study of this article, I believe you have a deeper understanding of how go language uses Protobuf to do data exchange, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.