Extremely Linear Git History

By Gustav Westling

🔥 This blog post is introducing git-linearize aka zegl/extremely-linear.

One of the things that I like to do in my projects, is to make the git history as linear as possible.

Usually this means to rebase commits onto the main branch, but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.

Today I'm taking this one step further, and I'm introducing a new concept: extremely linear git history.

With our extremely linear history, the first commit in a repo hash a hash that starts with 0000000, the second commit is 0000001, the third is 0000002, and so on!

Incremental version numbers makes it easy to talk about revisions. You immediately know that version 230 comes after 200, and if you create 10 new versions per day, it's easy to have an intuition for how old a commit is based on your current latest version.

Extremely Linear Git History Extremely Linear Git History (on GitHub)

Backstory

In git, commits are reffered to by the SHA‑1 sum of the commit object itself. We can inspect a raw commit object using git cat-file and verify it's hash by reading it from disk (compressed with zlib), and testing the checksum. The resulting checksum should always the same as the file name.

$ git cat-file commit d9ef231178b5004c17fe4e4e1807728567a69b84
tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6
parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2
author Gustav Westling <gustav@westling.dev> 1669040942 +0100
committer Gustav Westling <gustav@westling.dev> 1669040995 +0100

blog: start working on a blogpost for extremely linear git history
$ pigz -d < .git/objects/d9/ef231178b5004c17fe4e4e1807728567a69b84 | sha1sum
d9ef231178b5004c17fe4e4e1807728567a69b84  -

The message of the commit comes last in the object, and is easily modifiable. To change the checksum of the object, we can append junk data to the commit message. We just don't know what the junk payload is.

Crunching the numbers

There is no way to easily create content with the desired prefix (that would prevent the whole point of checksums). So we only have one option: testing many combinations of junk data until we can find one that passes our criteria. It's basically the same mechanism that powers Bitcoin and other proof-of-work systems. In this case I guess it could be called "proof-of-work, for your work". Or "proof-of-proof-of-work-work"? Heh.

I've been doing the commit message crunching using githashcrash by Mattias Appelgren.

On my macbook, I'm able to generate and test ~15 million hashes per second. Since we're looking for an input that creates an predefined 8 character prefix, I should get a hit every 16^8 => 4 294 967 296 iterations, about once every 5 minutes on average! You can make this process faster by using shorter prefixes, a 6 character prefix takes only 1 second to generate (but it doesn't look as nice in some Git UIs, and since this is a project that prioritizes form over function, I'm willing to waste more CPU cycles).

For the commit example above, we can use githashcrash to find junk data that changes the commit prefix to be whatever we want, like 00000000. After a some crunching, githashcrash finds that we can append magic: MTQIpN2AmwQA to our commit message to create our desired hash! Aaaah, it's glorious!

2022/11/21 15:56:04 Time: 12m19.962437166s
2022/11/21 15:56:04 Tested: 1.0845869819e+10
2022/11/21 15:56:04 14.66 MH/s
2022/11/21 15:56:04 Found: 00000000508749e5231fa5b43efcf7ac31385058
$ git cat-file commit 00000000508749e5231fa5b43efcf7ac31385058
tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6
parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2
author Gustav Westling <gustav@westling.dev> 1669040942 +0100
committer Gustav Westling <gustav@westling.dev> 1669041794 +0100

blog: start working on a blogpost for extremely linear git history

magic: MTQIpN2AmwQA

Automating

With some bash-glue we can automate this process, and extremely-linearize your branches in one single command. To test it out (please don't), install with brew install zegl/tap/git-linearize and in any repository run git linearize to "fix" it!

I've converted a recent toy project of mine to use this format of prefixes, and honestly, it looks really neat!

Appendinx: leet haxor prefixes

  • Prefix all commits with "C0DE" — git linearize --format "c0de"
  • Full collision (entire hash is zeros, then 000...1, etc.) — git linearize --format "%040d" (takes ~1033 years to run per commit)

Check out zegl/extremely-linear on GitHub for testing git-linearize and the "shit" ("short git") wrapper!

Update: GPU-powered invisible hash crashing

git-linearize now uses lucky-commit as it's hash generation backend. It's using your GPU for generating hashes, and is about 20x faster than the CPU based implementation. Wow! lucky-commit also cleverly uses only invisible whitespace characters for padding the commit messages.

Thanks to kinduff on Hacker News for telling me about lucky-commit.


By Gustav Westling,
2022-11-22 (wow, cool date!)

Discuss on Hacker News