Git Internals

Git under the hood

Photo by Evgeni Tcherkasski on Unsplash

Git is a database to store the snapshots of the codebase throughout its development phase. Although, developers are familiar with the basic commands, most are oblivious to the internal workings.

This will be a hands on tutorial on how git works internally.

Getting Started

Let’s setup a local git repository.

$ mkdir git-internals-tutorial ; cd $_ # create a new directory
$ git init # create a git repository locally

A git repository stores the snapshots in a .git folder. We can visualize the folder with a tree command.

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

Git Hash Object

Creates the hash of the contents of the file to be stored in the database.

Lets say we have a file file1.txt

$ echo 'hello' > file1.txt
$ cat file1.txt
hello
$ git hash-object file1.txt
ce013625030ba8dba906f756967f9e9ca394464a

The hash is the id of the content in the git database. The object id is computed solely based on the content and is agnostic of the file name.

$ echo 'hello' > file2.md
$ git hash-object file2.md
ce013625030ba8dba906f756967f9e9ca394464a

The object id may be computed even without creating a file.

$ echo 'hello' | git hash-object --stdin
ce013625030ba8dba906f756967f9e9ca394464a

Saving Git Blob object to Database

To write the file content to the database, a -w flag may be added

$ echo 'hello' | git hash-object --stdin -w
ce013625030ba8dba906f756967f9e9ca394464a

This will store the contents under a file in the objects directory with the object-id as the name

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── ce
│ │ └── 013625030ba8dba906f756967f9e9ca394464a
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
blob object

The cat-file command is used to get the contents of the file from the git object

$ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

The -p flag is to pretty print the contents of the object. Similarly, the -t flag outputs the type of the object. In our case it is a blob.

$ git cat-file -t ce013625030ba8dba906f756967f9e9ca394464a
blob

Creating a Git Tree Object

Similar to a blob, tree another type of object in git. It has blobs and also others trees under it.

Use the git write-tree command add files in the index to a tree

$ git add file1.txt # add file1 to the index space in git
$ git write tree # Add file1 to a tree and write it to database
dca98923d43cd634f4359f8a1f897bf585100cfe

This writes the tree with the file contents as the blob to the database. The git ls-tree can be used to view the contents on the tree:

Tree with on blob object
$ git ls-tree dca98923d43cd634f4359f8a1f897bf585100cfe
100644 blob ce013625030ba8dba906f756967f9e9ca394464a file1.txt
$ git cat-file -t dca98923d43cd634f4359f8a1f897bf585100cfe
tree

The tree object also can be found in the .git folder

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── ce
│ │ └── 013625030ba8dba906f756967f9e9ca394464a
│ ├── dc
│ │ └── a98923d43cd634f4359f8a1f897bf585100cfe
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

Making A Git Commit

Most will be familiar with the git commit command that makes a commit out of the changes in the staging area and updates the reference of the branch to that commit. Lets use the low level API to do the same.

A commit object in git has a tree, a link to the parent commit if present and information such as the commit message and the details about the author and the committer.

Lets start with the dca98923d43cd634f4359f8a1f897bf585100cfe tree which we have created in the last section.

$ git commit-tree dca98923d43cd634f4359f8a1f897bf585100cfe -m "Commit Message" # Creates a commit with the changes in the tree and the messages
1185a9903f20ca3059dcc96662fb05cc219bd654
Commit has a message and points to a tree

A commit with commit-id 1185a9903f20ca3059dcc96662fb05cc219bd654 has been created. You may see the commit with the command

$ git log 1185a9903f20ca3059dcc96662fb05cc219bd654
commit 1185a9903f20ca3059dcc96662fb05cc219bd654
Author: Praveen Mathew <email@email.com>
Date: Wed Feb 17 19:18:40 2021 +0530
Commit Message

Lets create another commit on top of it using the low level git APIs:

$ touch file2.txt
$ git add file2.txt
$ git write-tree
5d649a7d0557d17655fbdd362b34158a36b0a39d
$ git commit-tree 5d649a7d0557d17655fbdd362b34158a36b0a39d -m "Second Commit Message" -p 1185a9903f20ca3059dcc96662fb05cc219bd654 # 
7a4834354b351022ea9ddb818b7b2a889bdbb3cf
commits also points to the parent commit

Now the log of the commit 7a4834354b351022ea9ddb818b7b2a889bdbb3cf would show both the commits

$ git log 7a4834354b351022ea9ddb818b7b2a889bdbb3cf --oneline
7a48343 Second Commit Message
1185a99 Commit Message

However if you notice current branch pointed to by the HEAD is still the default one. The commits dont show up there.

$ git log
fatal: your current branch 'main' does not have any commits yet

Apart from creating the commits, there is one more thing that the git commit command does. It also updates the reference of the branch pointed to by the HEAD. That is done by the command git update-ref

$ git update-ref refs/heads/main 7a4834354b351022ea9ddb818b7b2a889bdbb3cf # points main reference to the commit
reference has been updated to point to the latest commit

Now the main branch is pointed to the commit.

$ git log --oneline
7a48343 (HEAD -> main) Second Commit Message
1185a99 Commit Message
(END)

Thus, we have a branch with some commits ?

References:

https://github.com/tpn/pdfs


Git Internals was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Praveen Mathew

Git under the hood

Photo by Evgeni Tcherkasski on Unsplash

Git is a database to store the snapshots of the codebase throughout its development phase. Although, developers are familiar with the basic commands, most are oblivious to the internal workings.

This will be a hands on tutorial on how git works internally.

Getting Started

Let’s setup a local git repository.

$ mkdir git-internals-tutorial ; cd $_ # create a new directory
$ git init # create a git repository locally

A git repository stores the snapshots in a .git folder. We can visualize the folder with a tree command.

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

Git Hash Object

Creates the hash of the contents of the file to be stored in the database.

Lets say we have a file file1.txt

$ echo 'hello' > file1.txt
$ cat file1.txt
hello
$ git hash-object file1.txt
ce013625030ba8dba906f756967f9e9ca394464a

The hash is the id of the content in the git database. The object id is computed solely based on the content and is agnostic of the file name.

$ echo 'hello' > file2.md
$ git hash-object file2.md
ce013625030ba8dba906f756967f9e9ca394464a

The object id may be computed even without creating a file.

$ echo 'hello' | git hash-object --stdin
ce013625030ba8dba906f756967f9e9ca394464a

Saving Git Blob object to Database

To write the file content to the database, a -w flag may be added

$ echo 'hello' | git hash-object --stdin -w
ce013625030ba8dba906f756967f9e9ca394464a

This will store the contents under a file in the objects directory with the object-id as the name

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── ce
│ │ └── 013625030ba8dba906f756967f9e9ca394464a
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
blob object

The cat-file command is used to get the contents of the file from the git object

$ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

The -p flag is to pretty print the contents of the object. Similarly, the -t flag outputs the type of the object. In our case it is a blob.

$ git cat-file -t ce013625030ba8dba906f756967f9e9ca394464a
blob

Creating a Git Tree Object

Similar to a blob, tree another type of object in git. It has blobs and also others trees under it.

Use the git write-tree command add files in the index to a tree

$ git add file1.txt # add file1 to the index space in git
$ git write tree # Add file1 to a tree and write it to database
dca98923d43cd634f4359f8a1f897bf585100cfe

This writes the tree with the file contents as the blob to the database. The git ls-tree can be used to view the contents on the tree:

Tree with on blob object
$ git ls-tree dca98923d43cd634f4359f8a1f897bf585100cfe
100644 blob ce013625030ba8dba906f756967f9e9ca394464a file1.txt
$ git cat-file -t dca98923d43cd634f4359f8a1f897bf585100cfe
tree

The tree object also can be found in the .git folder

$ tree .git
.git
├── HEAD
├── config
├── hooks
├── objects
│ ├── ce
│ │ └── 013625030ba8dba906f756967f9e9ca394464a
│ ├── dc
│ │ └── a98923d43cd634f4359f8a1f897bf585100cfe
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

Making A Git Commit

Most will be familiar with the git commit command that makes a commit out of the changes in the staging area and updates the reference of the branch to that commit. Lets use the low level API to do the same.

A commit object in git has a tree, a link to the parent commit if present and information such as the commit message and the details about the author and the committer.

Lets start with the dca98923d43cd634f4359f8a1f897bf585100cfe tree which we have created in the last section.

$ git commit-tree dca98923d43cd634f4359f8a1f897bf585100cfe -m "Commit Message" # Creates a commit with the changes in the tree and the messages
1185a9903f20ca3059dcc96662fb05cc219bd654
Commit has a message and points to a tree

A commit with commit-id 1185a9903f20ca3059dcc96662fb05cc219bd654 has been created. You may see the commit with the command

$ git log 1185a9903f20ca3059dcc96662fb05cc219bd654
commit 1185a9903f20ca3059dcc96662fb05cc219bd654
Author: Praveen Mathew <email@email.com>
Date: Wed Feb 17 19:18:40 2021 +0530
Commit Message

Lets create another commit on top of it using the low level git APIs:

$ touch file2.txt
$ git add file2.txt
$ git write-tree
5d649a7d0557d17655fbdd362b34158a36b0a39d
$ git commit-tree 5d649a7d0557d17655fbdd362b34158a36b0a39d -m "Second Commit Message" -p 1185a9903f20ca3059dcc96662fb05cc219bd654 # 
7a4834354b351022ea9ddb818b7b2a889bdbb3cf
commits also points to the parent commit

Now the log of the commit 7a4834354b351022ea9ddb818b7b2a889bdbb3cf would show both the commits

$ git log 7a4834354b351022ea9ddb818b7b2a889bdbb3cf --oneline
7a48343 Second Commit Message
1185a99 Commit Message

However if you notice current branch pointed to by the HEAD is still the default one. The commits dont show up there.

$ git log
fatal: your current branch 'main' does not have any commits yet

Apart from creating the commits, there is one more thing that the git commit command does. It also updates the reference of the branch pointed to by the HEAD. That is done by the command git update-ref

$ git update-ref refs/heads/main 7a4834354b351022ea9ddb818b7b2a889bdbb3cf # points main reference to the commit
reference has been updated to point to the latest commit

Now the main branch is pointed to the commit.

$ git log --oneline
7a48343 (HEAD -> main) Second Commit Message
1185a99 Commit Message
(END)

Thus, we have a branch with some commits ?

References:

https://github.com/tpn/pdfs


Git Internals was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.


This content originally appeared on Level Up Coding - Medium and was authored by Praveen Mathew


Print Share Comment Cite Upload Translate Updates
APA

Praveen Mathew | Sciencx (2021-04-06T15:23:28+00:00) Git Internals. Retrieved from https://www.scien.cx/2021/04/06/git-internals/

MLA
" » Git Internals." Praveen Mathew | Sciencx - Tuesday April 6, 2021, https://www.scien.cx/2021/04/06/git-internals/
HARVARD
Praveen Mathew | Sciencx Tuesday April 6, 2021 » Git Internals., viewed ,<https://www.scien.cx/2021/04/06/git-internals/>
VANCOUVER
Praveen Mathew | Sciencx - » Git Internals. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2021/04/06/git-internals/
CHICAGO
" » Git Internals." Praveen Mathew | Sciencx - Accessed . https://www.scien.cx/2021/04/06/git-internals/
IEEE
" » Git Internals." Praveen Mathew | Sciencx [Online]. Available: https://www.scien.cx/2021/04/06/git-internals/. [Accessed: ]
rf:citation
» Git Internals | Praveen Mathew | Sciencx | https://www.scien.cx/2021/04/06/git-internals/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.