Understanding git for real by exploring the .git directory

“Whoah, I’ve just read this quick tuto about git and oh my god it is cool. I feel now super comfortable using it, and I’m not afraid at all to break something.”— said no one ever.

Using git as a beginner is like visiting a new country for someone who can’t read/speak the local language. As soon as you know where you are and where to go, everything is fine, but the moment you get lost, the big troubles begin (#badMetaphor).

There are a lot of posts out there about learning the basic commands of git, this is not one of them. What I’m going to try here is a different approach.

New users are usually afraid by git and really, it is hard not to be. It is a powerful tool for sure but it is not really user friendly. Lots of new concepts, commands doing completely different things if a file is passed as a parameter or not, cryptic feedback …

I think that one way to overcome those first difficulties is to do a little more than just git commit/push, I think that if we take the time to really understand what git is really made of, it can save you from a lot of troubles.

Get into the .git

So, let’s begin. When you create a git repo, using git init, git creates this wonderful directory: the .git. This folder contains all the informations needed for git to work. To be clear, if you want to remove git from your project, but keep your project files, just delete the .git folder. But come on, why would you do that?

├── HEAD
├── branches
├── config
├── description
├── hooks
│ ├── pre-commit.sample
│ ├── pre-push.sample
│ └── ...
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

Here is what’s your .git will look like before your first commit:

HEAD

We’ll come to this later

config

This file contains the settings for your repository, here will be written the url of the remote for example, your mail, username, … . Every-time you use ‘git config …’ in the console it ends here.

description

Used by gitweb (kind of an ancestor of github) to display the description of the repo.

hooks

Here is an interesting feature. Git comes with a set of script that you can automatically run at every meaningful git phase. Those scripts, called hooks, can be run before/after a commit/rebase/pull… The name of the script dictate when to execute it. An example of a useful pre-push hook would be to test that all the styling rules are respected to keep consistency in the remote (the distant repository).

info — exclude

So you can put the files you don’t want git to deal with in your .gitignore file. Well the exclude file is the same except that it won’t be shared. If you don’t want to track your custom IDE related config files for example, even though most of the time .gitignore is enough (please tell me in the comments if you really use this one).

What’s inside a commit ?

Every-time you create a file, and track it, git compresses it and stores it into its own data structure. The compressed object will have a unique name, a hash, and will be stored under the object directory.

Before exploring the object directory we’ll have to ask ourselves what is a commit. So a commit is kind of a snapshot of your working directory, but it is a little bit more than that.

In fact when you commit git does only two things in order to create the snapshot of your working directory:
1. If the file didn’t change, git just adds the name of the compressed file (the hash) into the snapshot.
2. If the file has changed, git compresses it, stores the compressed file in the object folder. Finally it adds the name (the hash) of this compressed file into the snapshot.

This is a simplification, this whole process is a little bit complicated and will be part of a future post.

And once that snapshot is created, it will also be compressed and be name with an hash, and where all those compressed objects end up ? In the object folder.

├── 4c
│ └── f44f1e3fe4fb7f8aa42138c324f63f5ac85828 // hash
├── 86
│ └── 550c31847e518e1927f95991c949fc14efc711 // hash
├── e6
│ └── 9de29bb2d1d6434b8b29ae775ad8c2e48c5391 // hash
├── info // let's ignore that
└── pack // let's ignore that too

This is what the object directory looked like after I created one empty file file_1.txt and commited it. Please note that if the hash of your file is “4cf44f1e…”, git will store this file under a “4c” subdirectory and then name the file “f44f1…”. This little trick reduces by 255 the size of the /objects directory.

You see 3 hash right. So one would be for my file_1.txt, the other would be for the snapshot created when I commited. What is the third one ? Well because a commit is an object in itself, it is also compressed and stored in the object folder.

What you need to remember is that a commit is made of 4 things :

The name (a hash) of the working directory’s snapshot
A comment
Commiter information
Hash of the parent commit

And that’s it, look by yourself what happen if we uncompressed the commit file :

// by looking at the history you can easily find your commit hash
// you also don't have to paste the whole hash, only enough
// characters to make the hash unique

git cat-file -p 4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828

This is what I get

tree 86550c31847e518e1927f95991c949fc14efc711
author Pierre De Wulf <>@gmail.com> 1455775173 -0500
committer Pierre De Wulf test@gmail.com> 1455775173 -0500

commit A

You see, we got as expected, the snapshot hash, the author, and my commit message. Two things are important here :

As expected, the snapshot hash “86550…” is also an object and can be found in the object folder.
Because it was my first commit, there is no parent.

What’s in my snapshot for real ?

git cat-file -p 86550c31847e518e1927f95991c949fc14efc711

100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 file_1.txt

And here we find the last object that was previously in our object store, the only object that was in our snapshot. It’s a blob, but that’s another story.

branch, tags, HEAD all the same

So now you understand that everything in git can be reached with the correct hash. Let’s take a look at the HEAD now. So what’s in that HEAD?

cat HEAD
ref: refs/heads/master

Okay, this is not an hash, and it makes sense, because the HEAD can be considered as a pointer to the tip of the branch you’re working on. And now if we look at what is in refs/heads/master here what we’ll see :

cat refs/heads/master
4cf44f1e3fe4fb7f8aa42138c324f63f5ac85828

Does that look familiar? Yes this is the exact same hash of our first commit. This shows you that branches and tags are nothing more than a pointer to a commit. Meaning that you can delete all the branches you want, all the tags you want, the commit they were pointing to are still going to be here. There are only be much more difficult to access. If you want to know more about all a this, go check the git book.

One last thing

So by now you should understand that all git does when you commit is “zipping” your current working directory and store into the objects folder with a bunch of other informations. But if you’re familiar enough with the tool you’ll now that you have complete control on what files should be included in the commit and what files should not.

I mean a commit isn’t really a snapshot of your working directory, it is a snapshot of the files you want to commit. And where does git store those file you want to commit before making the actual ? Well it stores them into the index file. We’re not going to dig deeper into it now, meanwhile if you’re really curious you can always take a look at this.

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。