How to use Git to manage binary large objects 07/02 Update SLTechnology News&Howtos

How to use Git to manage binary large objects

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use Git to manage binary large objects, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

Use Git to manage so-called binary assets.

What everyone seems to agree on is that Git doesn't support large binary object files well. Remember that binary large objects are different from large text files. Although Git has no problem with versioning large text files, it doesn't do much for opaque binaries and can only be submitted as a large physical black box.

Imagine a scenario where there is an exciting first-person decryption game for which you are creating a complex 3D model, with the source file saved in binary format, resulting in a file the size of 1GB. You submitted once that there was a new submission of 1GB size in the history of the Git source repository. Then, you modify the hair shape of the next model character, and then submit the update, because Git does not remove the hair from the head and the rest of the model, so you can only submit the amount of 1GB. Next, you change the eye color of the model and submit this part of the update: another GB submission. Minor changes to a model can result in three GB-level deliveries. This is a serious problem for a scale that wants to version all the resources of a game.

The difference is that text files such as obj, like other types of files, store the status of all updates and changes in a single submission, while obj files are a series of plain text lines that describe the model. If you modify the model and save it back to the obj file, Git can read the two files line by line, then create a differential version and get a fairly small commit. The finer the model, the smaller the commit, which is the standard Git use case. Although the file itself is large, Git uses overwriting or sparse storage to build a complete description of the current data usage state.

However, not all of them are plain text, but they all use Git, so solutions are needed, and several have emerged.

OSTree first emerged as a GNOME project designed to manage operating system binaries. It doesn't apply here, so I'll just skip it.

Git large File Storage (LFS) is an open source project on GitHub that is branched out from the git-media project. Git-media and git-annex are extensions that Git uses to manage large files. They are two different solutions to the same problem, each with its own advantages. Although none of them are official projects, in my opinion, each has its own characteristics:

Git-media is a centralized mode with a repository of common assets. You can tell git-media where large files need to be stored, whether on the hard disk, server or cloud storage server, and every user in the project regards this location as the central main storage location for large files. Git-annex focuses on the distribution pattern. Users create their own repositories, each with a local directory git/annex that stores large files. These annex are synchronized periodically, and each user has access to all resources as long as necessary. Unless specifically configured through annex-cost, git-annex gives priority to local storage over external storage.

For these, I have used git-media and git-annex in production, so I will give you an overview of how they work.

Git-media

Git-media is developed in the Ruby language, so install gem first. Gem is based on Ruby development kits. The installation instructions are on its website. Users who want to use git-meida need to install it, because gem is a cross-platform tool, so it works on all platforms.

After installing git-media, you need to set some configuration options for Git. It only needs to be configured once on each machine.

$git config filter.media.clean "git-media filter-clean" $git config filter.media.smudge "git-media filter-smudge"

In each repository where you want to use git-media, set an attribute to incorporate the filter you just created into the file type that you want to classify as "media". Don't be confused by this term. A better term is "asset", because "media" usually means audio, video, and photos, but you can also easily classify 3D models, baking, textures, and so on as media.

For example:

$echo "* .mp4 filter=media-crlf" > > .gitattributes $echo "* .mkv filter=media-crlf" > > .gitattributes $echo "* .gitattributes-crlf" > > .gitattributes $echo "* .flac filter=media-crlf" > > .gitattri butes$ echo "* .gitattri-crlf" > > .gitattributes

When you want to temporarily save files of these types stage, the files are copied to the git/media directory.

Assuming that the server already has a Git source warehouse, the final step is to tell the source warehouse where the "mother ship" is located, that is, where the media files will be stored when the media files are pushed to be shared by all users. This is set in the git/config file of the warehouse. Replace it with your user name, host, and path:

[git-media] transport = scpautodownload = false # defaults to true, and pull resource scpuser = sethscphost = example.comscppath = / opt/jupiter.git

If the SSH settings on your server are complex, such as using a non-standard port or a path to a non-default SSH key file, use ssh/config to set the default configuration for the host.

The use of git-media is the same as ordinary files, ordinary files and blob files can be treated the same, the same commit operation. The only difference in the process is that at some point you should synchronize your assets (or media) into the shared repository.

When you want to publish assets for your team or back up your own data, use the following command:

$git media sync

To replace a file in git-media with a changed version (for example, an audio file that has been bel canto, or a completed mask painting, or a video file that has been graded by color), you must explicitly tell Git to update the media. This overrides the default setting that git-media does not copy files that already exist remotely:

$git update-index-really-refresh

When other members of your team (or yourself, on other machines) clone the repository, resources will not be downloaded by default if the autodownload option is not set to true in git/config. However, a synchronization command from git-media, git media sync, can solve all problems.

Git-annex

The processing flow of git-annex is slightly different, using local repositories by default, but the basic idea is the same. You can install git-annex from your distribution's software repository or download it from the website as needed. Like git-media, any user who uses git-annex must install it on their machine.

Its initialization settings are simpler than git-media. Run the following command and replace it with your path to create a naked repository on your server:

$git init-bare-shared / opt/jupiter.git

Then clone to the local computer and mark it as the initial path of git-annex:

$git clone seth@example.com:/opt/jupiter.cloneCloning into 'jupiter.clone'... Warning: You appear to have clonedan empty repository. Checking connectivity... Done.$ git annex init "seth workstation" init seth workstation ok

Instead of using filters to distinguish between media resources or large files, you can use the git annex command to configure categorized large files:

$git annex add bigblobfile.flacadd bigblobfile.flac (checksum) ok (Recording state in Git...)

Submit just like a normal file:

$git commit-m'added flac source for sound fx'

But the push operation is different because git annex uses its own branch to track assets. Depending on how you manage your repository, you may need the-u option for your first push:

$git push-u origin master git-annexTo seth@example.com:/opt/jupiter.git* [new branch] master-> master* [new branch] git-annex-> git-annex

Like git-media, ordinary git push commands do not copy data to the server, but just send relevant messages. To actually share files, you need to run the synchronization command:

$git annex sync-content

People have submitted shared resources, you need to pull them, and the git annex sync command will prompt you to locally check out resources that you do not have locally but exist on the server.

Both git-media and git-annex are flexible enough to use local repositories instead of servers, so they are also often used to manage private local projects.

Git is a very powerful and scalable system application, we should not hesitate to use it. Let's try it now!

Thank you for reading this article carefully. I hope the article "how to use Git to manage binary large objects" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.