In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
What this article shares with you is the tutorial about the three transmission protocols and implementation of Git. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
HTTP transport protocol
Git HTTP protocols are mainly divided into two types, one is dumb protocol (Dump), the other is intelligent protocol (Smart), which is also commonly used to provide Git hosting services.
Official document: https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt
HTTP dumb Protocol (Dump Protocol)
Before Git 1.6.6, only dumb protocol was provided, and dumb protocol only needs a standard HTTP static file service, which only needs to be able to download files, and the Git client will automatically traverse and pull files.
Whether it is a dumb protocol or an intelligent protocol, when Git uses the HTTP protocol for Fetch operations, it always first obtains the info/refs file, which is in the bare warehouse directory. If you already have a warehouse pulled through Git, this file is located in the .git / info/refs directory of the warehouse root. However, this file is generally not available, it requires you to execute git update-server-info in the appropriate directory to generate:
➜.git git: (master) cat info/refs21f45f60fa582d085497fb2d3bb50163e59891ee refs/heads/historyef8021acf4c29eb35e3084b7dc543c173d67ad2a refs/heads/master
The content of the file is mainly the version of each reference on the server. After getting these references, the client can compare them with the local references, and download the missing object files through HTTP.
Tips1: for more information on Git storage format, please see: https://github.com/git/git/blob/master/Documentation/gitrepository-layout.txt, which will be introduced in a later article.
Tips2: do you have to perform update-server-info manually if there are any updates? The answer is No. You can configure the post-receive hook on the Git server to perform updates automatically.
Therefore, the process of passing the dumb protocol Clone at once is as follows: (U: user C: client S: server)
U:git clone https://gitee.com/kesin/taskover.gitC U:git clone get https://gitee.com/kesin/taskover.git/info/refsS response with taskover.git/info/refsC:GET https://gitee.com/kesin/taskover.git/HEAD (default branch) S:Response with taskover.git/HEADC:Get https://gitee.com/kesin/taskover.git/objects/ef/8021acf4c29eb35e3084b7dc543c173d67ad2a starts traversing objects, finds those that are not available locally, and goes to the server to get them. If the server cannot get it directly, grab it from the Pack file until you get it all. C: check it out locally by performing checkout operation according to the default branch in HEAD
The addresses above are for demonstration purposes. The actual Gitee only supports intelligent protocols but not dumb protocols. After all, it is not secure for a public cloud service. It is no longer expanded here on how objects are traversed, which will be described in subsequent articles.
The implementation of the dumb protocol is very simple, which can be easily implemented through nginx. You only need to configure a static file server, and then put the Git repository into a single directory. You can also use Go to quickly implement a simple Git HTTP Dump Server:
/ From: https://gitee.com/kesin/go-git-protocols/blob/master/http-dumb/git-http-dumb.go// Usage:. / http-dumb-repo=/xxx/xxx/xxx/-port=8881func main () {repo: = flag.String ("repo", "/ Users/zoker/Tmp/repositories", "Specify a repositories root dir.") Port: = flag.String ("port", "8881", "Specify a port to start process.") Flag.Parse () http.Handle ("/", http.FileServer (http.Dir (* repo) fmt.Printf ("Dumb http server start at port% s on dir% s\ n", * port, * repo) _ = http.ListenAndServe (fmt.Sprintf (":% s", * port), nil)} HTTP Intelligent Protocol (Smart Protocol)
The biggest difference between HTTP intelligent protocol and dumb protocol is that dumb protocol has to specify the network address of file resources when getting the desired data, and to achieve its goal through multiple downloads. The initiative of the intelligent protocol lies in the server. The info/refs provided by the server can be dynamically updated, and the minimum set of objects needed by the client can be determined by the parameters sent by the client. The client will decompress and send it to the client, and the client will decompress it to get the data it wants.
By listening on the corresponding port, we can see that the client has sent two requests throughout the process:
Reference Discovery GET https://gitee.com/kesin/taskover/info/refs?service=git-{upload|receive}-pack
Data transmission POST https://gitee.com/kesin/taskover/git-{upload|receive}-pack
The Git HTTP protocol requires that both download and upload operations must first perform reference discovery, that is, you need to know the version information of each reference on the server, so that the server or client can know the difference between the two parties and what kind of data they need.
1. Reference discovery
Unlike the dumb protocol, the server of the intelligent protocol is a dynamic server, which can provide relevant reference information according to expectations. You can decide what kind of information you want the client to know according to your own business needs. By grabbing packets, we can see the data requested by the client and the format of the reference information returned by the Gitee server.
# GET http://git.oschina.net/kesin/getingblog.git/info/refs?service=git-upload-pack HTTP/1.1Host: git.oschina.netUser-Agent: git/2.24.3 (Apple Git-128) Accept-Encoding: deflate, gzipProxy-Connection: Keep-AlivePragma: no-cache# Gitee response HTTP/1.1 200OKCache-Control: no-cache, max-age=0, must-revalidateConnection: keep-aliveContent-Type: application/x-git-upload-pack-advertisementExpires: Fri 01 Jan 1980 00:00:00 GMTPragma: no-cacheServer: nginxX-Frame-Options: DENYX-Gitee-Server: Brzox/3.2.3X-Request-Id: 96e0af82-dffe-4352-9fa5-92f652ed39c7Transfer-Encoding: chunked001e# service=git-upload-pack0000010fca6ce400113082241c1f45daa513fabacc66a20d HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/testbody object-format=sha1 agent=git/2.29.2003c351bad7fdb498c9634442f0c3f60396e8b92f4fb refs/heads/dev004092ad3c48e627782980f82b0a8b05a1a5221d8b74 refs/heads/dev-pro0040ae747d0a0094af3d27ee86c33e645139728b2a9a refs/heads/develop0000
The information we need to pay attention to is in Header and Body. Here is a brief introduction. For a more detailed introduction, please see the http-protocol.txt document mentioned above. Header contains some conventions:
Cache-Control must disable caching, or you may not see the latest submission information
Content-Type must be application/x-$servicename-advertisement, or the client will deal with it in a dumb protocol.
The client needs to verify the returned status code. If it is 401, then prompt for the user name and password.
In addition, we can see that the returned Body format is different from the info/refs content used by the dumb protocol. Here is the format agreed by the smart protocol. The client identifies the supported attributes and verification information based on this. This is a data in pkt-line format:
The client needs to verify that the four characters of the first line match the regular ^ [0-9a-f] {4} #, where the four characters represent the length of the following content
The client needs to verify that the first line is # service=$servicename
The server has to ensure that each line ends with a LF newline character
The server needs to end the request response with a 0000 ID.
After the HEAD reference, there are a series of server capability parameters, which will tell the client server what capabilities it has, such as data exchange through multi_ack mode, which will not be discussed here. Then there is the specific information for each reference, and the first four characters of each line are the length of the line.
When introducing the dumb protocol, we use the info/refs file generated by the git update-server-info command, but it is obvious that we cannot use it directly in the smart protocol because it does not conform to the pkt-line format, so Git provides another way: get the latest reference information in pkt-line format directly through the git upload-pack command to see its parameter support:
[no-] strictDo not try / .git/ if is no Git directory.--timeout=Interrupt transfer after seconds of inactivity.--stateless-rpcPerform only a single read-write cycle with stdin and stdout. This fits with the HTTP POST request processingmodel where a program may read the request, write a response, and must exit.--advertise-refsOnly the initial ref advertisement is output, and the program exits immediately. This fits with the HTTP GETrequest model, where no request content is received but a response must be produced.The repository to sync from.
Upload-pack is a remote calling module used to send objects to the client, but it provides-- stateless-rpc and-- advertise-refs parameters, which allow us to quickly get the current reference status and exit. We can directly get the latest reference information by executing in the server's naked warehouse directory:
➜.git git: (master) git upload-pack-- stateless-rpc-- advertise-refs .010aef8021acf4c29eb35e3084b7dc543c173d67ad2a HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed no-done symref=HEAD:refs/heads/master agent=git/2.24.3. (Apple.Git-128) 003fef8021acf4c29eb35e3084b7dc543c173d67ad2a refs/heads/master0000%
Is the content here deja vu? it is the same as the format of the reference data returned by Gitee obtained by the above package, except that the # service=git-upload-pack in the first line is missing, so our thinking is very clear now. We can first implement the server processing found by reference in the first step. Through the parsing of the parameters, we can get the warehouse name and the corresponding operation name. You can further sort out the response format that the client wants:
/ support processing func handleRefs (w http.ResponseWriter, r * http.Request) {vars: = mux.Vars (r) repoName: = vars ["repo"] repoPath: = fmt.Sprintf ("% s% s", * repoRoot, repoName) service: = r.FormValue ("service") pFirst: = fmt.Sprintf ("# service=%s\ n", service) / / this example only deals with protocol v1 handleRefsHeader (& w) Service) / / Headers process cmdRefs: = exec.Command ("git", service [4:], "--stateless-rpc", "--advertise-refs", repoPath) refsBytes, _: = cmdRefs.Output () / / get pkt-line data responseBody: = fmt.Sprintf ("x # service=%s\ n0000% s", len (pFirst) + 4, service, string (refsBytes)) / / stitching Body _ _ = w.Write ([] byte (responseBody))} / / set Headersfunc handleRefsHeader (w * http.ResponseWriter, service string) {cType: = fmt.Sprintf ("application/x-%s-advertisement", service) (* w). Header (). Add ("Content-Type", cType) (* w). Header (). Set ("Expires", "Fri, 01 Jan 1980 00:00:00 GMT") (* w). Header (). Set ("Pragma") "no-cache") (* w) .Header () .Set ("Cache-Control", "no-cache, max-age=0, must-revalidate")}
As mentioned above, both pull and push need to be found by reference first. In fact, the difference between upload-pack and receive-pack is only the different commands to be called. We have also done the corresponding compatibility processing in the handleRefs function, which will not be discussed here.
two。 Data transmission
Data transmission is divided into two parts: client to server transmission (Push), server to client transmission (Fetch). The difference between the two is:
After the Fetch operation obtains the reference discovery, the server calculates the data that the client wants, and POST the data to the server in the format of pkt-line. The server calculates and packages the Pack, sends the package as a POST response to the client, and the client decompresses and updates the reference.
After the Push operation obtains the reference list of the server, the client locally calculates the missing data of the client, packages the data, and POST it to the server. After receiving it, the server decompresses and updates the reference.
The Fetch operation is practical to the upload-pack mentioned above. It is a remote calling module that sends the object to the client. In order to achieve the pull function, we only need to start git upload-pack-stateless-rpc on the server. This command blocks to receive a string of parameters, which are sent by the client's second request and passed to the command. Git will automatically calculate and package the minimum set of objects needed by the client and return the package data in the form of a stream. We just need to send the package to the client as a response to the POST request.
So, in the Fetch operation, what is the data sent by the client for the second POST request? let's grab a packet and analyze it:
POST http://gitee.com/kesin/bigRepo/git-upload-pack HTTP/1.1Host: gitee.comUser-Agent: git/2.24.3 (Apple Git-128) Accept-Encoding: deflate GzipProxy-Connection: Keep-AliveContent-Type: application/x-git-upload-pack-requestAccept: application/x-git-upload-pack-resultContent-Length: 44300b4want bee4d57e3adaddf355315edf2c046db33aa299e8 multi_ack_detailed no-done side-band-64k thin-pack include-tag ofs-delta deepen-since deepen-not agent=git/2.24.3. (Apple.Git-128) 00000032have 82a8768e7fd48f76772628d5a55475c51ea4fa2f0032have 4f7a2ea0920751a5501debbbc1debc403b46d7a00032have 7c141974a30bd218d4257d4292890a9008d301110032have f6bb00364bd5000a45185b9b16c028f485e842db0032have 47b7bd17fcb7de646cf49a26efb43c7b708498f30009done
After getting the data returned by the server for the first reference discovery, the client will calculate the data to be sent the second time based on the capabilities list and refs list provided by the server. For example, it will determine the capability parameters needed for communication between the client and the server according to the capability list of the server. The server must support all these capability parameters. In addition, the data sent by the client must contain a want instruction. When we Clone a warehouse, the data sent is all want instructions, but not have instructions, because there is nothing local; and when there is an updated Fetch operation with data, there will be have instructions. Based on the returned reference information, the client calculates the required Commit, Common Commit and the Commit that the server does not have, and sends these data to the server through the second request at one time. For more information on the negotiation process of the client, please see http-protocol.txt.
After receiving the data, the server will first confirm whether the objects specified by the want instruction can be found in the reference. If there is no want instruction or the objects specified by the instruction are not included in the server, it will return an error message to the client. Based on this information, the server calculates the collection of objects needed by the client, and packages these objects back to the client. After receiving, the client decompresses the packet and updates the reference.
The Push operation is more or less the same, except that in the second step, the client calculates the objects needed by the server based on the reference information of the server, and sends them to the server directly through the Post request, with some instruction information, such as which references are added, deleted and updated, as well as the version before and after the update, in the following format:
/ / https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt#L474 C: POST $GIT_URL/git-receive-pack HTTP/1.0 C: Content-Type: application/x-git-receive-pack-request C: C:.... 0a53e9ddeaddad63ad106860237bbf53411d11a7 441b40d833fdfa93eb2908e52742248faf0ee993 refs/heads/maint\ 0 report-status C: 0000 C: PACK....
The package data format here is "PACK" and starts with PACK. After receiving the data, the server starts a remote invocation command receive-pack, and then passes the data to the command in the form of a pipeline.
Therefore, the whole process of data transmission is nothing more than the exchange of data in the specified format between the upload-pack and receive-pack between the client and the server. According to this idea, we can continue to improve our Smart Git HTTP Server to increase the processing capacity of the second step:
Func processPack (w http.ResponseWriter, r * http.Request) {vars: = mux.Vars (r) repoName: = vars ["repo"] / / request repo not end with .git is supported with upload-pack repoPath: = fmt.Sprintf ("% s% s", * repoRoot, repoName) service: = vars ["service"] handlePackHeader (& w, service) / / start a process Data exchange through standard input and output cmdPack: = exec.Command ("git", service [4:], "--stateless-rpc", repoPath) cmdStdin, err: = cmdPack.StdinPipe () cmdStdout, err: = cmdPack.StdoutPipe () _ = cmdPack.Start () / / client and server data exchange go func () {_, _ = io.Copy (cmdStdin, r.Body) _ = cmdStdin.Close ()} () _ = io.Copy (w, cmdStdout) _ = cmdPack.Wait () / / wait for std complete} Git & & SSH transport protocol
Git protocol and SSH protocol are four-layer transport protocols, while HTTP is a seven-layer transport protocol, which is limited by the characteristics of HTTP protocol. HTTP has some problems such as transmission restrictions and timeouts in Git-related operations, especially in the transmission of large warehouses. Compared with HTTP, Git and SSH protocols are simpler and more stable in transmission.
Official document: https://github.com/git/git/blob/master/Documentation/technical/pack-protocol.txt
Git protocol
The biggest advantage of Git protocol is its high speed, because it does not have the rules of HTTP transport protocol, nor the cost of SSH encryption and decryption, but due to the shortcomings of the protocol, Git protocol is often used for downloading open source projects, not as a transport protocol for private projects.
When we studied the implementation of the HTTP intelligent protocol above, we knew that the interaction between the Git client and the server consists of two steps:
Get the reference on the server side
The client exchanges data with the server according to the reference data of the server.
The same is true of the Git protocol, except that compared with the HTTP protocol, the Git protocol directly establishes a connection with the server at layer 4 and completes two steps directly through this long link:
MMwkPXVJQrK2g/640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1) when operating using Git protocol, the client will first send the relevant information to the server. The format of this information is also in the format of pkt-line:
003egit-upload-pack / project.git\ 0host=myserver.com\ 0\ 0version=1\ 0
It contains command, warehouse name, Host and other related information. After the server establishes a connection and receives this string of information, it needs to process the information and find the location of the corresponding warehouse, that is, the directory. When all the information meets the requirements, you only need to start the upload-pack command on the server. It is important to note that we do not need to add-- stateless-rpc parameter. Directly git upload-pack {repo_path}, this command will immediately return the relevant reference information and block waiting for the next information input:
➜hello git: (master) ✗ git upload-pack .010234d8ed9a9f73d2cac9f50a8c8c03e4643990a2bf HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed symref=HEAD:refs/heads/master agent=git/2.24.3. (Apple.Git-128) 003f34d8ed9a9f73d2cac9f50a8c8c03e4643990a2bf refs/heads/master0000
At this time, what we do is actually forward the data, and we send the standard output information of the command to the client intact, and the client will carry out processing similar to the HTTP protocol to generate the data, then send the data to the server, and then send it to the standard input of the git upload-pack {repo_path} command intact, and then the server will return the corresponding packet through the standard output. We send it to the client intact and complete a Fetch operation, while the receive-pack operation principle of Push is the same, so I won't repeat it here.
It should be noted that if the information sent by the client does not meet the requirements, or if there is a problem during processing, we return an error to inform the client that the format of the error is also in pkt-line format, starting with ERR:
/ error-line = PKT-LINE ("ERR" SP explanation-text) func exitSession (conn net.Conn, err error) {errStr: = fmt.Sprintf ("ERR% s", err.Error ()) pktData: = fmt.Sprintf ("x% s", len (errStr) + 4, errStr) _, _ = conn.Write ([] byte (pktData)) _ = conn.Close ()}
After receiving this information, the client will print the information and close the connection, and the data of the whole process can be obtained by subcontracting. Interested students can grasp the packet to further understand the transmission process of the Git protocol.
Once we understand the process of the Git protocol, we can implement a simple Git protocol server through code:
Func handleRequest (conn net.Conn) {/ / processes the data sent by the client for the first time and gets the action and warehouse information service, repo, err: = parseData (conn) / / only supports Push and Fetch operations if service! = "git-upload-pack" & & service! = "git-receive-pack" {exitSession (conn, errors.New ("Not allowed command. \ n ")} repoPath: = fmt.Sprintf ("% s% s ", * repoRoot, repo) cmdPack: = exec.Command (" git ", service [4:], repoPath) cmdStdin, err: = cmdPack.StdinPipe () cmdStdout, err: = cmdPack.StdoutPipe () _ = cmdPack.Start () / / client server data exchange go func () {_, _ = io.Copy (cmdStdin, conn)} () _, _ = io.Copy (conn CmdStdout) err = cmdPack.Wait ()} SSH protocol
SSH protocol is also a widely used Git transmission protocol. Compared with Git protocol, SSH protocol is relatively secure in terms of data transmission and authority authentication, but the speed is slightly slower due to the cost of encryption and decryption, but this time cost is absolutely acceptable in front of security. Compared with Git protocol, the difference is that the data transmitted by SSH protocol is encrypted, and the same point is that the transmission process of SSH protocol is the same as that of Git protocol.
The download address of SSH is usually in the form of git@gitee.com:kesin/go-git-protocols.git. When executing Clone or Push, it will be broken down into:
Ssh user@example.com "git-upload-pack'/ project.git'"
Therefore, when passing parameters for the first time, the format of the SSH protocol is different from that of the Git protocol, and other situations are basically the same, such as reference discovery, Packfile mechanism, error handling, etc., which are no longer extended here and can participate in the official documents.
After understanding the SSH protocol, it is clear that we want to implement a Git SSH server. We only need to implement a SSH Server and do the corresponding data transmission in the corresponding Session. Let's implement a simple Git SSH service as follows:
Func main () {/ / init host key and public key authentication var hostOption ssh.Option hostOption = ssh.HostKeyFile (* hostKeyPath) keyHandler: = func (ctx ssh.Context, key ssh.PublicKey) bool {/ / replace your public key auth logic here pubKeyStr: = gossh.MarshalAuthorizedKey (key) return true / / set false to use password authentication} keyOption: = ssh.PublicKeyAuth (keyHandler) / / password validate authentication pwdHandler: = func (ctx ssh.Context) Password string) bool {/ / replace your own password auth logic here if ctx.User () = = "zoker" & & password = = "zoker" {return true} return false} pwdOption: = ssh.PasswordAuth (pwdHandler) / / process ssh session pack ssh.Handle (func (s ssh.Session) {handlePack (s) / / process request}) addr: = fmt.Sprintf (":% s") * port) log.Printf ("Starting ssh server on port% s\ n", * port) log.Fatal (ssh.ListenAndServe (addr, nil, hostOption, pwdOption, keyOption)} func handlePack (s ssh.Session) {args: = s.Command () service: = args [0] repoName: = args [1] / / allowed command if service! = "git-upload-pack" & service! = "git-receive-pack" {exitSession (s, errors.New ("Not allowed command. \ n ")} repoPath: = fmt.Sprintf ("% s% s ", * repoRoot, repoName) / / start standard input and output for data exchange. Is the following processing familiar? That's right, Git protocol is also handled in the same way: cmdPack: = exec.Command ("git", service [4:], repoPath) cmdStdin, err: = cmdPack.StdinPipe () cmdStdout, err: = cmdPack.StdoutPipe () _ = cmdPack.Start () go func () {_, _ = io.Copy (cmdStdin, s)} () _, _ = io.Copy (s, cmdStdout) _ = cmdPack.Wait ()} above are the three transport protocols of Git and their implementation tutorials The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.