Summarize from basic Git instruction to the principle behind it 05/02 Update SLTechnology News&Howtos

Summarize from basic Git instruction to the principle behind it

2025-05-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains the "summary from the basic Git instructions to the principle behind". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "summarize from the basic Git instructions to the principles behind".

1. Init

Before learning the principle of git, we first forget the cool git instructions such as commit,branch,tag, which we usually use, and we will find out the nature of them later.

You know, git is written by Linus when writing Linux, which is used for version management of Linux, so recording the change information of file projects in different versions is the core function of git.

Daniel always makes abstractions when designing software. if we want to understand their design ideas, we have to think under their abstractions. Although it's a little mysterious, these abstractions will eventually be implemented in the code, so don't worry, it's easy to understand.

First of all, we want to establish a concept of ojbect, which is the lowest abstraction of git, and you can think of git as an object database.

Don't talk too much nonsense, follow the instructions, you will have a new understanding of git. First, let's create a git repository in any directory:

My operating environment is win10 + git bash

$git init git-testInitialized empty Git repository in C:/git-test/.git/

You can see that git has created an empty git repository for us with a .git directory with the following directory structure:

$lsconfig description HEAD hooks/ info/ objects/ refs/

In the .git directory, let's focus on the .git / objects directory. we started by saying that git is an object database, and this directory is where git stores object.

After entering the .git / objects directory, we can see the info and pack directories, but this has nothing to do with the core function, we just need to know that there is nothing but two empty directories in the .git / objects directory.

At this point, let's stop and realize this part first. the logic is very simple. We only need to write an entry function, parse the parameters of the command line, and create the corresponding directory and files under the specified directory after getting the init instruction.

Here is my implementation: init

There is no error handling for creating files / directories in order to be easy to read.

I gave it a bit of a rustic name, called jun, er, actually call it anything (⊙ whatever ⊙).

2.object

Next, let's go to the git repository directory and add a file:

$echo "version1" > file.txt

Then we add the record of this file to the git system. It is important to note that we will not use add instruction addition for the time being, although we are likely to do so in normal times, but this is an article that reveals the principle, and here we are going to introduce a git instruction git hash-object that you may not have heard in normal times.

$git hash-object-w file.txt5bdcfc19f119febc749eef9a9551bc335cb965e2

A hash value is returned after the instruction is executed, which actually adds the contents of file.txt to the object database in the form of an object, and this hash value corresponds to the object.

To verify that git writes this object to the database (saved as a file), let's look at the .git / objects directory:

$find. Git / objects/-type f #-type is used to define the type, and f represents the file. Git/objects/5b/dcfc19f119febc749eef9a9551bc335cb965e2

An extra folder 5b is found, under which there is a file named dcfc19f119febc749eef9a9551bc335cb965e2, which means that git stores the first two characters of the object hash value as the directory name and the last 38 characters as the file name in the object database.

An official introduction to the git hash-object directive, which is used to calculate the ID value of an ojbect. -w is an optional parameter, which means that object is written to the object database; another parameter is-t, which specifies the type of object, which, if not specified, defaults to the blob type.

Now that you may wonder what information is stored in object, let's use the git cat-file instruction to check it out:

$git cat-file-p 5bdc #-p: to view the contents of object, we can only give the prefix version1 $git cat-file-t 5bdc #-t of hash value: view the type of object blob

With the above groundwork, let's uncover the secret of version control in git!

We change the contents of file.txt and rewrite it to the object database:

$echo "version2" > file.txt$ git hash-object-w file.txtdf7af2c382e49245443687973ceb711b2b74cb4a

The console returns a new hash, and let's take a look at the object database:

$find .git / objects-type f.git/objects/5b/dcfc19f119febc749eef9a9551bc335cb965e2.git/objects/df/7af2c382e49245443687973ceb711b2b74cb4a

Found an extra object! Let's look at the contents of the new object:

$git cat-file-p df7aversion2 $git cat-file-t df7ablob

Seeing here, you may have a better understanding of the concept that git is an object database: git saves the contents of each version of the file in an object.

If you want to restore file.txt to its first version, just do this:

$git cat-file-p 5bdc > file.txt

Then look at the contents of file.txt:

$cat file.txtversion1

At this point, a version control system that can record the version of the file and restore the file to any version state is complete.

Does it feel all right? it's not that hard? You can think of git as a key-value database, with a hash corresponding to an object.

Let's stop here and realize this part.

Recommend your own linux Cplink + communication group: 973961276! Sorted out some good study books and interview questions, interesting projects and hot technology teaching videos, interested friends can join the group to get. Friends who are looking for a job or are ready to change jobs can't miss it.

I was a little curious at first, why not use the cat instruction directly to check the object, but make up a git cat-file instruction myself? After thinking about it, git will definitely not save the contents of the file intact into object, it should be compressed, so we need special instructions to decompress and read.

Let's follow the official way of thinking to implement these two directives. Let's start with git hash-object. The contents of an object store are as follows:

The first step is to construct the header information, which is made up of the object type, a space, the number of bytes in the data content, and an empty byte. The format is as follows:

Blob 9\ u0000

Then concatenate the header information with the original data in the following format:

Blob 9\ u0000version1

Then use zlib to compress the spliced information above, and then save it into the object file.

The implementation of the git cat-file instruction is on the contrary, first decompress the data stored in the object file with zlib, divide the decompressed data according to spaces and bytes, and then return the content or type of object according to the parameter-t or-p.

Here is my implementation: hash-object and cat-file

A simple and rough process-oriented implementation is adopted, but I already vaguely feel that a lot of reusable functions will be used later, so write down the unit test first to facilitate later refactoring.

3. Tree object

In the previous chapter, careful buddies may find that git will save the contents of our files as object of type blob. These blob types of object seem to save only the contents of the file, not the file name.

And when we are in the development of a project, it is impossible to have only one file, usually we need to version a project, a project will contain multiple files and folders.

So the most basic blob object is not enough for us to use, we need to introduce a new object, called tree object, which can not only save file names, but also organize multiple files together.

But the problem is, it's easy to introduce the concept, but how to implement it in the code? The first thought in my head is to create a tree objct in memory, and then we add content to the specified tree object. But this seems troublesome. Every time you add something, you have to give a hash of tree object. And in this way, tree object is mutable, and a mutable object has gone against the original intention of saving fixed-version information.

Let's see how git thinks about this. Git introduced a concept called temporary storage area when he created tree object, which is a good idea! You think, our tree object is to save the version information of the entire project, the project has a lot of files, so we put all the files into the buffer, and git creates a tree object according to the contents of the buffer at one time, so we can record the version information!

Let's first manipulate git's buffer to deepen our understanding, first introduce a new instruction git update-index, which can artificially add a file to a new buffer, and add a parameter-- add, because the file did not exist in the buffer before.

$git update-index-add file.txt

Then let's take a look at the changes in the .git directory

$lsconfig description HEAD hooks/ index info/ objects/ refs/$ find .git / objects/-type fobjects/5b/dcfc19f119febc749eef9a9551bc335cb965e2objects/df/7af2c382e49245443687973ceb711b2b74cb4a

It is found that there is an extra file named index in the .git directory, which is probably our buffer. The object in the objects directory hasn't changed much.

Let's take a look at the contents of the buffer. One instruction is used here: git ls-files-- stage

$git ls-files-- stage100644 df7af2c382e49245443687973ceb711b2b74cb4a 0 file.txt

We found that the buffer stores our added records like this: the code of a file pattern, the blob object of the file content, a number, and the name of the file.

Then we save the contents of the current buffer as a tree object. Introduce a new instruction: git write-tree

$git write-tree907aa76a1e4644e31ae63ad932c99411d0dd9417

After entering the instruction, we get the hash value of the newly generated tree object. Let's verify that it exists and look at its contents:

$find .git / objects/-type f.git/objects/5b/dcfc19f119febc749eef9a9551bc335cb965e2 # blob object.git/objects/90/7aa76a1e4644e31ae63ad932c99411d0dd9417 with version1 file content new tree object.git/objects/df/7af2c382e49245443687973ceb711b2b74cb4a # file blob object$ git cat-file-p 907a100644 blob df7af2c382e49245443687973ceb711b2b74cb4a file.txt with version2 content

It is estimated that seeing here, you will have a preliminary understanding of the relationship between the temporary storage area and tree object.

Now let's take a closer look at two things: how a file whose content is not recorded by git will be recorded, and how a folder will be recorded.

Let's create a new file step by step and add a staging area:

$echo abc > new.txt$ git update-index-- add new.txt$ git ls-files-- stage100644 df7af2c382e49245443687973ceb711b2b74cb4a 0 file.txt100644 8baef1b4abc478178b004d62031cf7fe6db6f903 0 new.txt

After looking at the buffer, we find that the record of the new file has been appended to the temporary storage area, and also corresponds to a hash value. Let's look at the contents of the hash:

$find .git / objects/-type f.git/objects/5b/dcfc19f119febc749eef9a9551bc335cb965e2 # New object.git/objects/8b/aef1b4abc478178b004d62031cf7fe6db6f903 # file blob object.git/objects/90/7aa76a1e4644e31ae63ad932c99411d0dd9417 # tree object.git/objects/df/7af2c382e49245443687973ceb711b2b74cb4a # file content is version2 blob object$ git cat-file-p 8baeabc$ git cat-file-t 8baeblob

We found that when adding new.txt to the staging area, git automatically creates a blob object for the contents of the new.txt.

Let's try to create a folder and add it to the staging area:

$mkdir dir$ git update-index-add direrror: dir: is a directory-add files inside insteadfatal: Unable to process path dir

As a result, git tells us that we can't add an empty folder, and we need to add a file to the folder, so we add a file to the folder and then add it to the staging area again:

$echo 123 > dir/dirFile.txt$ git update-index-- add dir/dirFile.txt

Successful ~ then view the contents of the staging area:

$git ls-files-- stage100644 190a18037c64c43e6b11489df4bf0b9eb6d2c9bf 0 dir/dirFile.txt100644 df7af2c382e49245443687973ceb711b2b74cb4a 0 file.txt100644 8baef1b4abc478178b004d62031cf7fe6db6f903 0 new.txt$ git cat-file-t 190ablob

As in the previous demonstration, it automatically creates a blob object for the contents of the file.

Next, we save the current staging area as a tree object:

$git write-treedee1f9349126a50a52a4fdb01ba6f573fa309e8f$ git cat-file-p dee1040000 tree 374e190215e27511116812dc3d2be4c69c90dbb0 dir100644 blob df7af2c382e49245443687973ceb711b2b74cb4a file.txt100644 blob 8baef1b4abc478178b004d62031cf7fe6db6f903 new.txt

The new tree object saves the current version information of the staging area. It is worth noting that the staging area records the dir/dirFile.txt in the form of blob object, while in the process of saving the tree object, git creates a tree object for the directory dir. Let's verify:

$git cat-file-p 374e100644 blob 190a18037c64c43e6b11489df4bf0b9eb6d2c9bf dirFile.txt$ git cat-file-t 374etree

Find that this tree object created for the dir directory holds the information of difFile.txt, does it feel like it used to be similar! This tree object is the simulation of the file directory!

Let's stop! Let's do it!

This time we need to implement the above three instructions:

Git update-index-add

Git update-index updates the staging area, the official directive takes a lot of parameters, we only implement-- add, that is, add files to the staging area. The overall process is like this: if it is the first time to add a file to the buffer, we need to create an index file, and if the index file already exists, read the contents of the temporary storage area directly, and pay attention to the process of decompression. Then the new file information is added to the temporary storage area, and the contents of the temporary storage area are compressed and saved in the index file.

Here involves a serialization and anti-sequence operation, please allow me to lazily simulate ψ (. _.) through json. ) >.

Git ls-files-stage

Git ls-files is used to view the file information of the staging area and the workspace, and there are also many parameters. We only implement-- stage, to view the contents of the staging area (the ls-files instruction without parameters is to list all files in the current directory, including subdirectories). Implementation process: read the contents of the temporary storage area from the index file, decompress it and print it to the standard output according to a certain format.

Git write-tree

Git write-tree is used to convert the contents of the staging area into a tree object. According to the example we demonstrated earlier, we need to recursively descend and parse tree object for folders, which should be the most difficult part of this chapter.

The code is as follows: update-index-add, ls-files-stage, write-tree

Feeling that object could be abstracted, I refactored the code related to object: refactor object part

When this part is complete, we already have a system that can version the folder ("_)".

4.commit object

Although we can already use a tree object to represent the version information of the entire project, it still seems to have some shortcomings:

Tree object only records the version of the file. Who modified this version? What was the reason for the modification? Who was the last version of it? This information has not been preserved.

At this time, it's time for commit object to come out! How is it? isn't it cool to feel all the way up from the bottom?

Let's do it again with git before we think about how to implement it. Let's use the commit-tree directive to create a commit object that points to the tree object generated at the end of Chapter 3.

$git commit-tree dee1-m'first commit'893fba19d63b401ae458c1fc140f1a48c23e4873

Since the generation time is different from that of the author, the hash value you get will be different. Let's take a look at this newly generated commit object:

$git cat-file-p 893ftree dee1f9349126a50a52a4fdb01ba6f573fa309e8fauthor liuyj24 1608981484 + 0800committer liuyj24 1608981484 + 0800first commit

As you can see, this commit ojbect points to a tree object, the second and third lines are the author and submitter's information, and the blank line is the submission information.

Let's modify our project to simulate the changes in the version:

$echo version3 > file.txt$ git update-index-- add file.txt$ git write-treeff998d076c02acaf1551e35d76368f10e78af140

Then we create a new commit object and point its parent object to the first commit object:

$git commit-tree ff99-m'second commit'-p 893fb05c65b6fdd7e13a51aaf1abb8ff3e795835bfb0

We modify our project and then create a third commit object:

$echo version4 > file.txt$ git update-index-- add file.txt$ git write-tree1403e859154aee76360e0082c4b272e5d145e13e$ git commit-tree 1403-m'third commit'-p b05cfe2544fb26a26f0412ce32f7418515a66b31b22d

Then we execute the git log instruction to view our submission history:

Git log fe25commit fe2544fb26a26f0412ce32f7418515a66b31b22dAuthor: liuyj24 Date: Sat Dec 26 19:36:31 2020 + 0800 third commitcommit b05c65b6fdd7e13a51aaf1abb8ff3e795835bfb0Author: liuyj24 Date: Sat Dec 26 19:34:25 2020 + 0800 second commitcommit 893fba19d63b401ae458c1fc140f1a48c23e4873Author: liuyj24 Date: Sat Dec 26 19:18:04 2020 + 0800 first commit

How's it going? Is there a feeling of suddenly enlightened!

Let's stop and realize this part.

There are two instructions altogether.

Commit-tree

Create a commit object that points to a tree object, add author information, submitter information, submission information, and add a parent node (the parent node may not be specified). We temporarily write the author information and submitter information, which can be set by the git config directive, you can check .git / config, which is actually an operation to read and write the configuration file.

Log

According to the hash value of the incoming commit object, the parent node is found up and the information is printed, which can be quickly realized by recursion.

Here is my implementation: commit-tree, log

5. References

In the previous four chapters we laid the groundwork for a lot of low-level instructions for git. From this chapter, we will explain the common functions of git, which will definitely feel overwhelming.

Although our commit object has been able to fully record the version information, there is another fatal disadvantage: we need to locate this version with a long SHA1 hash value, if you say to your colleagues during development:

Hey! Hey! Could you review this version of the 32h62342 code for me?

Then I'm sure he'll answer you: what. no, no, no. Which version was it again? (+ _ +)?

So we have to consider naming our commit object, such as master.

Let's actually do git and name our latest submission object master:

$git update-ref refs/heads/master fe25

Then view the submission record with the new name:

$git log mastercommit fe2544fb26a26f0412ce32f7418515a66b31b22d (HEAD-> master) Author: liuyj24 Date: Sat Dec 26 19:36:31 2020 + 0800 third commitcommit b05c65b6fdd7e13a51aaf1abb8ff3e795835bfb0Author: liuyj24 Date: Sat Dec 26 19:34:25 2020 + 0800 second commitcommit 893fba19d63b401ae458c1fc140f1a48c23e4873Author: liuyj24 Date: Sat Dec 26 19:18:04 2020 + 0800 first commit

Good guy (→ _ →), why don't we give this feature a strong name, call it branch!

At this time, you may think that usually we submit on the master branch with a git commit-m instruction, and now I seem to understand the principle behind it:

The first is to write the record of the temporary storage area to a tree object by command write-tree to get the SHA1 value of the tree object.

Then create a new submission object with the command commit-tree.

The question is: the SHA1 value of the tree object used by the commit-tree instruction and the-m submission information are all available, but how do we get the SHA1 value of the parent submission object?

This is about to mention our HEAD quote! You will find a HEAD file in our .git directory. Let's take a look at its contents:

$lsconfig description HEAD hooks/ index info/ logs/ objects/ refs/$ cat HEADref: refs/heads/master

So when we do the commit operation, git will take the current reference, that is, the SHA1 value of the current commit object, as the parent of the new commit object in the HEAD file, so that the whole submission history can be concatenated!

Seeing here, do you feel a little bit about git branch creating branches and git checkout switching branches?

Now that we have three commit objects, we try to create a branch on the second commit object, again using the underlying instruction, we use the git update-ref directive to create a reference for the second commit:

$git update-ref refs/heads/bugfix b05c $git log bugfixcommit b05c65b6fdd7e13a51aaf1abb8ff3e795835bfb0 (bugfix) Author: liuyj24 Date: Sat Dec 26 19:34:25 2020 + 0800 second commitcommit 893fba19d63b401ae458c1fc140f1a48c23e4873Author: liuyj24 Date: Sat Dec 26 19:18:04 2020 + 0800 first commit

Then we change our current branch, that is, change the value of the .git / HEAD file, and we use the git symbolic-ref directive:

Git symbolic-ref HEAD refs/heads/bugfix

Once again, we use the log directive to view the log. If no parameters are added, the default is to view the current branch:

$git logcommit b05c65b6fdd7e13a51aaf1abb8ff3e795835bfb0 (HEAD-> bugfix) Author: liuyj24 Date: Sat Dec 26 19:34:25 2020 + 0800 second commitcommit 893fba19d63b401ae458c1fc140f1a48c23e4873Author: liuyj24 Date: Sat Dec 26 19:18:04 2020 + 0800 first commit

The current branch will be switched to bugfix!

We stop and realize this part, which is basically a simple file read and write operation.

Update-ref

Write the hash value of the submitted object to the file specified under .git / refs/heads. Since the previous implementation of the log instruction is not perfect, we need to refactoring it here to support the lookup of ref names.

Symbolic-ref

To modify ref, let's simply implement it and modify the HEAD file.

Commit

With the foundation laid by the above two instructions, we can implement the commit command. Repeat the process: first, write the record of the staging area to a tree object by commanding write-tree to get the SHA1 value of the tree object. Then create a new commit object with the command commit-tree, and the parent object of the new commit object is obtained from the HEAD file. Finally, the submission object information of the corresponding branch is updated.

This is my implementation: update-ref, symbolic-ref, commit

To achieve this, it is estimated that you are no longer interested in checkout,branch and other commands, checkout is to encapsulate symbolic-ref,branch is to encapsulate update-ref.

In order to increase the flexibility of instructions, git provides a lot of optional parameters for instructions, but in fact, they are all called by these underlying instructions. And with these underlying instructions, you will find that other extension functions can be easily implemented, so we will not expand ("_)" here.

6. Tag

After completing the above functions, it is estimated that you will have a deeper understanding of git, but I do not know if you have found a small problem:

When we develop the branch function, we will do version management based on the branch. But as the branch has a new commit, the branch points to the new commit object, which means that our branch is changing. But we always have some more important versions to record, and we need something the same to record a submitted version.

And because it is not very good to record the SHA1 value of a submitted version, we give these important submitted versions a name and store them in the form of tag. You may have noticed when implementing references that there is a tags directory under .git / refs/ in addition to heads. In fact, the principle is the same as reference, which records the hash value of a submitted object. Let's actually use git to type a tag to the first commit object in the current branch:

Then check this tag.

$git show v1.0commit 893fba19d63b401ae458c1fc140f1a48c23e4873 (tag: v1.0) Author: liuyj24 Date: Sat Dec 26 19:18:04 2020 + 0800 first commit

In this way, we can navigate to a certain version through the v1.0 tag.

I won't realize this, ah (→ _ →).

7. More

I wrote this article while reading the official documents. In fact, the outline of the whole git is very clear here. Because git itself is good enough, there is no need to rewrite one. The purpose of this article is to learn the core idea of git, that is, how to build an object database for version management.

In fact, we can look forward to other features of git (→ _ →):

Add instruction: in fact, it is the encapsulation of our update-index instruction, we usually directly add. Add all modified files to the cache. To do this, you can recursively traverse the directory and use the diff tool to update-index the modified file once.

Merge instruction: this I feel difficult to achieve, the current idea is this: through recursion, with the help of diff tools, add the extra part of the merge project to the merge project, if there is a conflict in the diff instruction, let the user resolve the conflict.

Rebase directive: in fact, it is to change the order in which objects are submitted, and the specific implementation is to change their parent values. A problem such as inserting a node or a linked list into the middle of a linked list is to adjust the linked list.

In addition to these, git also has the concept of a remote warehouse, which is essentially the same as a local warehouse, but involves a lot of synchronous collaboration. I feel that it is easier and more confident to continue to learn other functions of git now!

Finally, there are some reviews of myself as a mini git.

Finally, I would like to make some summary of what I have implemented and what can be improved compared with open source code:

No addressing function is implemented. Git can work in any directory of the warehouse, while mine can only work in the root directory of the warehouse. You should implement a function to find the .git directory under the current repository so that the entire system can have a unified entry when addressing the file directory.

The abstraction of object is not perfect. The mini project only implements adding the version to the object database, and cannot recover the version from the object database. In order to implement the recovered version, you need to develop a corresponding deserialization method for each object, that is, object should implement such a set of interfaces:

Type obj interface {serialize (Object) [] byte deserialize ([] byte) Object}

The problem with the directory delimiter, because I developed it with windows and tested it on git bash, all wrote the delimiter as /, which is not good.

At present, when you can keep commit,commit, you should check temporarily.. Whether there is an update in the storage area, no commit will be allowed without an update.

The judgment of command line parameters is a bit ugly, and a good way has not been found yet.

Thank you for your reading, the above is the "summary from the basic Git instruction to the principle behind" content, after the study of this article, I believe you have a deeper understanding of the summary from the basic Git instruction to the principle behind this problem, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.