Getting started with gRPC (1)-- introduction to Protobuf 07/13 Update SLTechnology News&Howtos

Getting started with gRPC (1)-- introduction to Protobuf

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Getting started with gRPC (1)-- introduction to Protobuf 1. Introduction to Protobuf 1. Introduction to Protobuf

Protobuf, namely Protocol Buffers, is a way to serialize data structures across languages and platforms developed by Google. It is a flexible and efficient protocol for serializing data.

Protobuf is smaller, faster, and more convenient than XML and JSON formats. Protobuf is cross-language, and comes with a compiler (protoc), which only needs to be compiled with protoc to compile into Java, Python, C++, C#, Go and other language code, and then can be used directly, no need to write other code, with its own parsed code.

You only need to define the structured data to be serialized once (defined in the .proto file), and you can easily use different data streams to read and write structured data using specially generated source code (using the generation tool provided by protobuf). You can even update the definition of the data structure in the .proto file without breaking programs that rely on the old format.

GitHub address: https://github.com/protocolbuffers/protobuf

Download address of source code version in different languages:

Https://github.com/protocolbuffers/protobuf/releases/latest

2. Advantages and disadvantages of Protobuf.

The advantages of Protobuf are as follows:

A, performance number, high efficiency

After serialization, the byte space is 3-10 times less than that of XML, and the time efficiency of serialization is 20-100 times faster than XML.

B. there is a code generation mechanism

Encapsulate the operation of structured data into a class, which is easy to use.

C, support for backward and forward compatibility

When the client and the server use the same protocol at the same time, when the client adds a byte to the protocol, it will not affect the use of the client.

D, support multiple programming languages

Protobuf currently supports Java,C++,Python, Go, Ruby and other languages.

The disadvantages of Protobuf are as follows:

A, binary format leads to poor readability

B. lack of self-description

2. Protobuf compiler installation 1. C++ version Protobuf compiler installation

Download the C++ version of the Protobuf source code protobuf-cpp-3.6.1.tar.gz

Decompress the Protobuf source code:

Tar-zxvf protobuf-cpp-3.6.1.tar.gz

Enter the protobuf-3.6.1 source directory:

Cd protobuf-3.6.1

Configuration variables:

. / configure-- prefix=/usr/local/protobuf

Compile:

Make

Check, test:

Make check

Installation:

Sudo make install

Set the environment variable:

Export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/protobuf/libexport LIBRARY_PATH=$LIBRARY_PATH:/usr/local/protobuf/libexport PATH=$PATH:/usr/local/protobuf/bin

Check the version number:

Protoc-version

2. Protobuf compiler uses

Protobuf provides a protoc compiler for generating language code such as Java,Python,C++,Ruby,Objective-C,C#,Go through defined .proto files.

Protoc-proto_path=IMPORT_PATH-cpp_out=DST_DIR-java_out=DST_DIR-python_out=DST_DIR-go_out=DST_DIR-ruby_out=DST_DIR-javanano_out=DST_DIR objc_out=DST_DIR csharp_out=DST_DIR path/to/file.proto

(1) Import directory settings

IMPORT_PATH declares a specific directory of the parsing import where the .proto file resides. If you omit this value, the current directory is used. If you have multiple directories, you can call-proto_path multiple times, which will be accessed sequentially and the import will be performed. I=IMPORT_PATH is a simplified form of-- proto_path.

(2) generate code assignment

-- cpp_out: generate C++ code in target directory DST_DIR-- java_out: generate Java code in target directory DST_DIR-- python_out: generate Python code in target directory DST_DIR-- go_out: generate Go code in target directory DST_DIR-- ruby_out: generate Ruby code in target directory DST_DIR-- javanano_out: generate JavaNano-- in target directory DST_DIR Objc_out: generate Object code in the target directory DST_DIR-- csharp_out: generate Object code in the target directory DST_DIR-- php_out: generate Object code in the target directory DST_DIR

(3) Import proto message file to specify

One or more .proto files must be specified as input, and multiple .proto files can be specified only once. Although the file path is relative to the current directory, each file must be under its IMPORT_PATH so that each file can determine its canonical name.

(4) generate programming language related code

When you run a .proto file with the Protobuf compiler, the compiler generates code for the selected language, which can manipulate the message types defined in the .proto file, including getting, setting field values, serializing messages into an output stream, and parsing messages from an input stream.

For the C++ language, the compiler generates an .h file and a .cc file for each .proto file, and each message in the .proto file has a corresponding class.

For the Java language, the compiler generates a .java file and a special Builder class for each message type (used to create a message class interface).

For the Go language, the compiler generates a .pb.go file for each message type.

For the Ruby language, the compiler generates an .rb file for each message type.

3. Protobuf3 syntax 1. Message definition

In Protobuf, messages are structured data.

Message Person {string name = 1; int32 id = 2; string email = 3;}

The Person message format has three fields, and the data carried in the message corresponds to each field, each of which has a name and a type.

Multiple message types can be defined in a message file .proto, which is useful when defining multiple related messages.

/ / [START declaration] syntax = "proto3"; package Company.Person;import "google/protobuf/timestamp.proto"; / / [END declaration] / / [START messages] message Person {string name = 1; int32 id = 2; / / Unique ID number for this person.

The first line in the .proto file that is not commented or empty must use the Proto version declaration, which is as follows:

Syntax = "proto3"

If you do not use the proto3 version declaration, the Protobuf compiler defaults to the proto2 version.

The name of the Proto message file is as follows:

PackageName.MessageName.proto

PackageName is the package name declared by package

MessageName is the name of the message

2. Add comments

You can use the C/C++/java-style double slash (/ /) syntax format to add comments.

3 、 Package

An optional package declaration can be added to the .proto file to prevent naming conflicts between different message types. The declarator of the package generates code based on the different effects of the language used:

A. for the C++ language, the resulting classes are wrapped in the C++ namespace.

B. for the Java language, the package declarator becomes a package of java unless an explicit java_package is provided in the .proto file.

C. For the Go language, a package can be used as a Go package name unless an explicit option go_package is provided in the .proto file.

The resolution of the type name in the Protobuf syntax is consistent with that of C++: it starts from the inside and goes outward in turn, and each package is treated as the inner class of its parent class package. Of course, for Company.Person with "." The separation starts from the outermost.

The Protobuf compiler parses all type names defined in the .proto file. Code generators in different languages will know how to point to each specific type, even if they use different rules.

4. Field type

Field types include scalar types and composite types.

Scalar types include:

Composite types include enumerations or other message types.

5. Identifier

In the message definition, each field has a unique numeric identifier. Identifiers are used to identify fields in the binary format of the message and cannot be changed once used.

The smallest identifier can start with 1 and the maximum is 2 ^ 29-1 (536870911). You cannot use the identification number of [19000-19999] (reserved in the Protobuf protocol implementation, from FieldDescriptor::kFirstReservedNumber to FieldDescriptor::kLastReservedNumber). If you have to use a reserved identifier in a .proto file, an alarm will be given at compile time.

The identification number in [1minute 15] takes up one byte when it is encoded. The identification number within [16cm2047] takes up 2 bytes. Therefore, you should keep the identification number in [1Jing 15] for frequently occurring message elements.

6. Field rules

The field modifier for the message must be one of the following:

A, singular: a well-formed message should have 0 or 1 of this field (but no more than 1).

B, repeated: in a well-formed message, this field can be repeated any number of times (including 0 times), and the order of repeated values is preserved.

In proto3, the scalar field of repeated uses packed by default.

7. Reserved identifiers

By deleting or annotating all fields, future users may reuse identifiers when updating message types. Loading the same .proto file with an old version of the code can cause serious problems, including data corruption, privacy errors, and so on. To ensure that forward compatibility can specify reserved identifiers for field tag (reserved name may cause JSON serialization problems), the Protobuf compiler warns future users who try to use the corresponding field identifiers.

Do not declare both field names and identifiers in the same line of reserved declaration.

Message Foo {reserved 2,15,9 to 11; reserved "foo", "bar";} 8, default value

When a message is parsed, if the encoded message does not contain a specific singular element, the field corresponding to the parsed object is set to a default value, and the default values for different types are as follows:

For string, the default is an empty string

For bytes, the default is an empty bytes

For bool, the default is false

For numeric types, the default is 0

For enumerations, the default is the first defined enumeration value, which must be 0

For the message type (message), the field is not set, and the exact message is determined based on the language, usually an empty list in the corresponding language.

For scalar message fields, once the message is parsed, it is impossible to determine whether the field is set to the default value or not at all, and should be careful when defining the message type.

9. Enumerate

When you define a message type, you need to specify a value in a predefined sequence of values for a field in the message, and you can use enumerations to define the predetermined sequence. If you add a field of type PhoneType to a Person message, the value of type PhoneType may be MOBILE,HOME,WORK.

Message Person {string name = 1; int32 id = 2; / / Unique ID number for this person. String email = 3; enum PhoneType {MOBILE = 0; HOME = 1; WORK = 2;} message PhoneNumber {string number = 1; PhoneType type = 2;} repeated PhoneNumber phones = 4; google.protobuf.Timestamp last_updated = 5;}

Each enumerated type must map its first type to 0.

You can specify different enumeration constants to the same value through the allow_alias option of true, otherwise the compiler will generate an error message at the alias.

Enum EnumAllowingAlias {option allow_alias = true; UNKNOWN = 0; STARTED = 1; RUNNING = 1;} enum EnumNotAllowingAlias {UNKNOWN = 0; / / allow_alias STARTED = 1 / RUNNING = 1 is not set in EnumNotAllowingAlias.

Enumeration constants must be within the range of 32-bit integer values. Because values are variable-coded and not efficient enough for negative numbers, it is not recommended to use negative numbers in enum.

Enumerations can be defined inside or outside a message definition, and enumerations can be reused in any message definition in a .proto file. You can declare an enumeration type in one message and use enumerations in a different message (in the syntax format of MessageType.EnumType).

When you run the Protobuf compiler against a .proto file that uses enumerations, the generated code will have a corresponding enum (Java or C++) that is used to create a series of integer-valued symbolic constants (symbolic constants) in the classes generated at run time.

During deserialization, unrecognized enumerated values are saved in the message. For languages that support open enumerated types beyond the specified range (such as C++ and Go), unrecognized values are represented as supported integers; for languages that enclose enumerated types (Java), one of the enumerated types is used to represent unrecognized values and can be accessed using supported integers; in other cases, if the parsed message is sequenced, the unrecognized value remains intact.

10. Reference other message types

You can use other message types as field types. For messages defined within the same message file, message types can be referenced directly within other messages; for message types defined in other message files, the corresponding message types can be used by importing definitions from other message files. If you use the google.protobuf.Timestamp message type, you need to import the corresponding message file:

Import "google/protobuf/timestamp.proto"

If you want to reuse the message type outside the parent message type, you need to use it in the form of Parent.Type.

11. Any type

Any type messages allow messages to be used as a nested type without a .proto definition specified. An Any type includes an arbitrary message that can be serialized to the bytes type, as well as a URL as a global identifier and parsing message type.

To use the Any type, you need to import import google/protobuf/any.proto.

Import "google/protobuf/any.proto"; message ErrorStatus {string message = 1; repeated google.protobuf.Any details = 2;}

The default type URL for a given message type is type.googleapis.com/packagename.messagename.

Implementations of different languages will support dynamic libraries to help encapsulate or unencapsulate any values in a thread-safe manner. For example, in java, the Any type has special pack () and unpack () accessors, and in C++, there are PackFrom () and UnpackTo () methods.

12 、 Oneof

The Oneof definition is used to represent that only one and only one property in the group can be defined at the time of implementation, not multiple.

Message SampleMessage {oneof test_oneof {string name = 4; SubMessage sub_message = 9;}}

In the above definition, only name or sub_message can appear, not at the same time, and the repeated field cannot appear in Oneof. Repeatedly passing values to multiple Oneof fields will only take effect at the end, and the rest will be ignored.

13 、 Map

If you are creating an association map, Protobuf provides a quick syntax:

Map map_field = N

Where key_type can be any Integer or string type (except for any scalar type of floating and bytes), and value_type can be any type, but not a map type.

For example, create a mapping of Project, and each Projecct uses one string as key:

Map projects = 3

The field of Map can be repeated.

The order of serialization and the order of map iterators are uncertain, so don't expect Map to be processed in a fixed order

When generating a generated text format for a .proto file, map sorts in key order, and numeric key sorts numerically.

When parsing or merging from serialization, if there is a duplicate key, the latter key will not be used, if there is a duplicate key when parsing the map from the text format.

Backward compatibility issu

Map syntax serialization equates to the following, so even Protobuf implementations that do not support map syntax can process data:

Message MapFieldEntry {key_type key = 1; value_type value = 2;} repeated MapFieldEntry map_field = Nutter14, define the service

If you want to use the message type in the RPC (remote method invocation) system, you can define a RPC service interface in the .proto file, and the Protobuf compiler will generate the service interface code and stub according to the language of your choice. If you want to define a RPC service and have a method Search,Search method that can receive SearchRequest and return a SearchResponse, you can define it in the .proto file as follows:

Service SearchService {rpc Search (SearchRequest) returns (SearchResponse);}

The most intuitive RPC system using Protobuf is gRPC, the open source PRC system in the language and platform developed by Google. GRPC is very effective when using Protobuf. If you use a special Protobuf plug-in, you can generate relevant RPC code directly from .proto files.

If you don't want to use gRPC, you can use Protobuf for your own RPC implementation.

15. JSON mapping

Proto3 supports the coding specification of JSON, which makes it easy to share data between different systems.

If the JSON-encoded data is lost or if it is null itself, the data will be represented as the default value when parsing to Protobuf. If a field is represented as the default value in Protobuf, it will be ignored when converted to JSON encoding to save space.

16. Update message types

If an existing message format can no longer meet the new requirements, you need to add an additional field to the message, but the code written in the old version is still available. You can use update messages to solve the problem, and it is very simple to update the message without breaking the existing code. The rules for updating messages are as follows:

Do not change the identifiers of any existing fields.

B. if you add new fields, fields that use the old format can still be parsed by the newly generated code. You should remember the default values of the elements so that the new code can interact with the data generated by the old code in an appropriate way. Messages generated by the new code can also be parsed by the old code, but the newly added fields are ignored. Unrecognized fields are discarded during deserialization, and if the message is passed to the new code, the new field is still unavailable.

C, non-required fields can be removed. As long as the identifier is no longer used in the new message type (it is recommended to rename the field, such as prefix "OBSOLETE_" before the field).

D, int32, uint32, int64, uint64, and bool are all compatible and can be converted to each other without breaking forward and backward compatibility.

E, sint32, and sint64 are compatible, but are not compatible with other integer types.

F, string, and bytes are compatible-- as long as bytes is a valid UTF-8 encoding.

G, nested messages are compatible with bytes-- as long as bytes contains an encoded version of the message.

H, fixed32 and sfixed32 are compatible, while fixed64 and sfixed64 are compatible.

Enumerated types are compatible with int32,uint32,int64 and uint64 (note that they are truncated if the values are not compatible), but there may be different ways to handle them after client-side deserialization, for example, unrecognized proto3 enumerated types are retained in the message, but the representation depends on the language. Fields of type int always keep their

J, you can add new optional or repeated fields, but you must use new identifiers (identifiers that have never been used in the message, cannot use identifiers that have been deleted).

17. Options

A series of options can be annotated when defining a .proto file. Options does not change the meaning of the entire file declaration, but it can affect how it is handled in a particular environment. The complete available options can be found at google/protobuf/descriptor.proto.

Some options are file-level, meaning that they can be applied to the outermost scope and are not included in any message interior, enum, or service definition. Some options are message-level, meaning that it can be used inside the message definition. Of course, some options can be applied to domains, enum types, threshold values, service types, and service methods. So far, there is no valid option for all types.

Optimize_for (file option): can be set to LITE_RUNTIME,SPEED,CODE_SIZE. These values affect the generation of C++ and Java code in the following ways:

SPEED (default): the Protobuf compiler will generate optimal code by performing serialization, parsing, and other general operations on message types.

The CODE_SIZE:Protobuf compiler will produce a minimum number of classes for serialization, parsing, and various other operations through shared or reflection-based code. The CODE_SIZE approach will produce much less code than SPEED, but the operation will be relatively slow. The classes implemented in the CODE_SIZE generated code and their external API and SPEED patterns are the same, and are often used in applications that contain a large number of .proto files and do not blindly pursue speed.

The LITE_RUNTIME:Protobuf compiler relies on the runtime core class library to generate code (that is, using libprotobuf-lite instead of libprotobuf). The libprotobuf-lite core class library is much smaller than the full class library because it ignores some descriptors and reflections. This model is often used more on mobile phone platforms. The method implementation generated by the compiler using the LITE_RUNTIME pattern is comparable to the SPEED pattern, and the resulting class implements the MessageLite interface, but it is only a subset of the Messager interface.

Option optimize_for = CODE_SIZE

Cc_enable_arenas (file option): enable arena allocation for code generated by C++.

Objc_class_prefix (file option): sets the prefix for the Objective-C class to add to all classes and enumerated types generated by Objective-C from this .proto file. There is no default value, and the prefix used should be 3-5 uppercase characters recommended by * *. Note that the 2-byte prefix is reserved by Apple.

Deprecated (field option): if set to true, the field is obsolete and should not be used in the new code. There is no practical meaning in most languages.

Int32 old_field = 6 [deprecated=true]

Java_package (file option): specifies the package in which the java class is generated. If java_package is not explicitly declared in the .proto file, the default package name is used. It doesn't work when you don't need to generate java code.

Java_outer_classname (file option): specifies the name of the generated Java class. If java_outer_classname is not explicitly declared in the .proto file, the generated class name will be generated in a hump manner based on the name of the .proto file. For example (the java class generated by foo_bar.proto is called FooBar.java), it doesn't work when you don't need to generate java code.

Objc_class_prefix (file option): specifies the Objective-C class prefix, which precedes all class and enumerated type names. There is no default value, 3-5 uppercase letters should be used. Note that all 2-letter prefixes are reserved by Apple.

IV. Proto file coding specification

The Proto file encoding specification is as follows:

A. the description file has .proto as the file suffix.

B. Statements other than structure definition end with a semicolon, which includes: message, service, enum;rpc method definition ending semicolon is optional.

C and Message are named by hump naming, and fields are named by lowercase letters separated by underscores.

D and Enums type names are named by hump names, and field names are separated by uppercase letters and underscores.

The E, Service and rpc method names are named in the hump style.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.