What is the principle of Git? 07/12 Update SLTechnology News&Howtos

What is the principle of Git?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what is the principle of Git". In daily operation, I believe many people have doubts about the principle of Git. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is the principle of Git?" Next, please follow the editor to study!

Speaking of Git, I believe everyone is familiar with it. After all, as programmers, in addition to eating and sleeping, every day is to visit the world's largest tong (xing) jiao (you) website GitHub. Git is the most basic skill that everyone has to possess there. Today, instead of talking about the basic application of Git, let's talk about the principle of Git.

Git defines itself as a set of memory-addressable file systems. When you execute the git init command in a directory, a .git directory is generated. Its directory structure looks like this:

.git /

├── branches

├── config

├── description

├── HEAD

├── hooks

│ ├── applypatch-msg.sample

│ ├── commit-msg.sample

│ ├── post-update.sample

│ ├── pre-applypatch.sample

│ ├── pre-commit.sample

│ ├── prepare-commit-msg.sample

│ ├── pre-push.sample

│ ├── pre-rebase.sample

│ └── update.sample

├── info

│ └── exclude

├── objects

│ ├── info

│ └── pack

└── refs

├── heads

└── tags

The branches directory is no longer used, the description file is only used by GitWeb programs, and the config file saves the configuration of the project.

What we need to focus on are the HEAD and index files and the objects and refs directories. Some information about the temporary storage area is saved in index, which will not be introduced too much here.

Objects directory

This directory is used to store Git objects (including tree objects, commit objects, and blob objects). For an initial Git repository, there are only info and pack subdirectories in the objects directory, and there are no regular files. As the project progresses, the files we create, as well as some operation records, will be stored in this directory as Git objects.

In this directory, all objects generate a file with a corresponding SHA-1 checksum, and Git creates a subdirectory with the name of the first two bits of the checksum and saves the file with the remaining 38 bits.

Let's take a look at exactly what Git does when we make a commit.

$echo 'test content' > test.txt

$git add.

After executing the above command, the objects directory structure is as follows:

.git / objects/

├── d6

│ └── 70460b4b4aece5915caf5c68d12f560a9fe3e4

├── info

└── pack

There is an extra folder, and as mentioned above, this is an object that Git created for us, and we can use the underlying command to see the type of object and what it stores.

$git cat-file-t d670460b4b4aece5915caf5c68d12f560a9fe3e4

Blob

$git cat-file-p d670460b4b4aece5915caf5c68d12f560a9fe3e4

Test content

As you can see, this is a blob object, and the storage content is the contents of the file we just created. Next, continue with the submit operation.

$git commit-m'test message'

[master (root-commit) 2b00dca] test message

1 file changed, 1 insertion (+)

Create mode 100644 test.txt

$tree .git / objects/

.git / objects/

├── 2b

│ └── 00dcae50af70bb5722033b3fe75281206c74da

├── 80

│ └── 865964295ae2f11d27383e5f9c0b58a8ef21da

├── d6

│ └── 70460b4b4aece5915caf5c68d12f560a9fe3e4

├── info

└── pack

Now there are two more objects in the objects directory. Then use the cat-file command to look at these two files.

$git cat-file-t 2b00dcae50af70bb5722033b3fe75281206c74da

Commit

$git cat-file-p 2b00dcae50af70bb5722033b3fe75281206c74da

Tree 80865964295ae2f11d27383e5f9c0b58a8ef21da

Author jackeyzhe 1534670725 + 0800

Committer jackeyzhe 1534670725 + 0800

Test message

$git cat-file-t 80865964295ae2f11d27383e5f9c0b58a8ef21da

Tree

$git cat-file-p 80865964295ae2f11d27383e5f9c0b58a8ef21da

100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4 test.txt

You can see that one is a commit object and the other is a tree object. A commit object usually consists of four parts:

The Hash of the working directory snapshot, that is, the value of tree

Submitted description information

Submitter's information

Hash value submitted by the parent

Since this is my first submission, there is no hash value for the parent submission.

The tree object can be understood as a directory in the UNIX file system, which holds the information about the tree object and the blob object of the working directory. Next, let's take a look at how Git does version control.

Echo 'version1' > version.txt

$git add.

$git commit-m'first version'

[master 702193d] first version

1 file changed, 1 insertion (+)

Create mode 100644 version.txt

$echo 'version2' > version.txt

$git add.

$git commit-m'second version'

[master 5333a75] second version

1 file changed, 1 insertion (+), 1 deletion (-)

$tree .git / objects/

.git / objects/

├── 1f

│ └── a5aab2a3cf025d06479b9eab9a7f66f60dbfc1

├── 29

│ └── 13bfa5cf9fb6f893bec60ac11d86129d56fcbe

├── 2b

│ └── 00dcae50af70bb5722033b3fe75281206c74da

├── 53

│ └── 33a759c4bdcdc6095b4caac19743d9445ca516

├── 5b

│ └── dcfc19f119febc749eef9a9551bc335cb965e2

├── 70

│ └── 2193d62ffd797155e4e21eede20897890da12a

├── 80

│ └── 865964295ae2f11d27383e5f9c0b58a8ef21da

├── d6

│ └── 70460b4b4aece5915caf5c68d12f560a9fe3e4

├── df

│ └── 7af2c382e49245443687973ceb711b2b74cb4a

├── info

└── pack

$git cat-file-p 1fa5aab2a3cf025d06479b9eab9a7f66f60dbfc1

100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4 test.txt

100644 blob 5bdcfc19f119febc749eef9a9551bc335cb965e2 version.txt

$git cat-file-p 2913bfa5cf9fb6f893bec60ac11d86129d56fcbe

100644 blob d670460b4b4aece5915caf5c68d12f560a9fe3e4 test.txt

100644 blob df7af2c382e49245443687973ceb711b2b74cb4a version.txt

Git stores the hash value of the unchanged file directly into the tree object, and for the modified file, it generates a new object and stores the new object into the tree object. Let's take another look at the information about the commit object.

$git cat-file-p 5333a759c4bdcdc6095b4caac19743d9445ca516

Tree 2913bfa5cf9fb6f893bec60ac11d86129d56fcbe

Parent 702193d62ffd797155e4e21eede20897890da12a

Author jackeyzhe 1534672270 + 0800

Committer jackeyzhe 1534672270 + 0800

Second version

$git cat-file-p 702193d62ffd797155e4e21eede20897890da12a

Tree 1fa5aab2a3cf025d06479b9eab9a7f66f60dbfc1

Parent 2b00dcae50af70bb5722033b3fe75281206c74da

Author jackeyzhe 1534672248 + 0800

Committer jackeyzhe 1534672248 + 0800

First version

At this point, the commit object already has the parent information, so we can step back the version step by step along the parent. However, this is troublesome, we are generally accustomed to using git log to view submission records.

Refs directory

Before we introduce the refs directory, let's take a look at the directory structure.

$tree .git / refs/

.git / refs/

├── heads

│ └── master

└── tags

2 directories, 1 file

$cat .git / refs/heads/master

5333a759c4bdcdc6095b4caac19743d9445ca516

In a newly initialized Git repository, there are only two subdirectories, heads and tags, under the refs directory. Since we have just had a commit operation, git automatically generates a reference called master for us. The content of the master is the hash value of the last committed object. When you see here, you must be thinking that if we create such a reference for each submission, we don't need to remember the Hash value of each submission. Just look at the value of the reference and copy it back to the corresponding version. Yes, it's easy to return, but it doesn't make much sense, because we don't need to return frequently, especially in older versions, where the probability of return is closer to zero. Git does something more meaningful with this reference, which is branching.

When I create a new branch, git creates a new file in the .git / refs/heads directory. Of course, the newly created reference still points to the last commit of the current working directory. Normally, we don't take the initiative to modify these reference files, but if we have to, Git provides us with a update-ref command. You can change the value of the reference to point to a different commit object.

The files in the tags directory store the commit corresponding to the tags. When you type a tag for a submission, a file named tagged will be created under the tags directory, and the value is the hash value of this submission.

HEAD

When creating a new branch, how does Git know which branch we are currently in, and how does Git achieve branch switching? The answer is in the HEAD file.

$cat .git / HEAD

Ref: refs/heads/master

$git checkout test

Switched to branch 'test'

$cat .git / HEAD

Ref: refs/heads/test

Obviously, the HEAD file stores a reference to our current branch, and when we switch branches and commit again, Git will read the value of the reference corresponding to HEAD as the parent for this commit. We can also set the value of HEAD manually through the symbolic-ref command, but not in a form other than refs.

Packfiles

This is the end of the introduction of the directories and files we focus on at the beginning of the article. But as a file system, there is another problem, and that is space. As mentioned earlier, when a file is modified and submitted, Git creates a new snapshot. If this goes on for a long time, it will certainly take up a lot of storage space. The older version is of little value, so find a way to clear enough space for users to use.

The good news is that Git has its own gc (garbage collection) method. When there are too many loose objects in the repository, Git calls the git gc command (of course, we can also call this command manually) to package those objects. After packaging, two new files appear: an idx index file and a pack file. The index file contains the offset information of packfile, which can be quickly located to the file. After packaging, the latest version of each file's object stores the complete contents of the file. The previous version only saved the differences. In this way, the purpose of compressing the space is achieved.

At this point, the study of "what is the principle of Git" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.