One of the things that I like to do in my projects, is to make the git history as linear as possible.
Usually this means to rebase commits onto the main branch, but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
Today I'm taking this one step further, and I'm introducing a new concept: extremely linear git history.
With our extremely linear history, the first commit in a repo hash a hash that starts with
0000000, the second commit is
0000001, the third is
0000002, and so on!
Incremental version numbers makes it easy to talk about revisions. You immediately know that version 230 comes after 200, and if you create 10 new versions per day, it's easy to have an intuition for how old a commit is based on your current latest version.
Extremely Linear Git History (on GitHub)
In git, commits are reffered to by the
SHA‑1 sum of the commit object itself. We can
inspect a raw commit object using
git cat-file and verify it's hash by reading it from
disk (compressed with zlib), and testing the checksum. The resulting checksum should always the same as
the file name.
$ git cat-file commit d9ef231178b5004c17fe4e4e1807728567a69b84 tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6 parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2 author Gustav Westling <email@example.com> 1669040942 +0100 committer Gustav Westling <firstname.lastname@example.org> 1669040995 +0100 blog: start working on a blogpost for extremely linear git history
$ pigz -d < .git/objects/d9/ef231178b5004c17fe4e4e1807728567a69b84 | sha1sum d9ef231178b5004c17fe4e4e1807728567a69b84 -
The message of the commit comes last in the object, and is easily modifiable. To change the checksum of the object, we can append junk data to the commit message. We just don't know what the junk payload is.
There is no way to easily create content with the desired prefix (that would prevent the whole point of checksums). So we only have one option: testing many combinations of junk data until we can find one that passes our criteria. It's basically the same mechanism that powers Bitcoin and other proof-of-work systems. In this case I guess it could be called "proof-of-work, for your work". Or "proof-of-proof-of-work-work"? Heh.
I've been doing the commit message crunching using githashcrash by Mattias Appelgren.
On my macbook, I'm able to generate and test ~15 million hashes per second. Since we're looking
for an input that creates an predefined 8 character prefix, I should get a hit every
16^8 => 4 294 967 296 iterations, about once every 5 minutes on average! You can make this process faster by using shorter
prefixes, a 6 character prefix takes only 1 second to generate (but it doesn't look as nice in some
Git UIs, and since this is a project that prioritizes form over function, I'm willing to waste more CPU cycles).
For the commit example above, we can use githashcrash to find junk data that changes the commit prefix
to be whatever we want, like
00000000. After a some crunching,
githashcrash finds that we can append
magic: MTQIpN2AmwQA to our commit message to create
our desired hash! Aaaah, it's glorious!
2022/11/21 15:56:04 Time: 12m19.962437166s 2022/11/21 15:56:04 Tested: 1.0845869819e+10 2022/11/21 15:56:04 14.66 MH/s 2022/11/21 15:56:04 Found: 00000000508749e5231fa5b43efcf7ac31385058
$ git cat-file commit 00000000508749e5231fa5b43efcf7ac31385058 tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6 parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2 author Gustav Westling <email@example.com> 1669040942 +0100 committer Gustav Westling <firstname.lastname@example.org> 1669041794 +0100 blog: start working on a blogpost for extremely linear git history magic: MTQIpN2AmwQA
With some bash-glue we can automate this
process, and extremely-linearize your branches in one single command. To test it out
(please don't), install with
brew install zegl/tap/git-linearize and in any
git linearize to "fix" it!
I've converted a recent toy project of mine to use this format of prefixes, and honestly, it looks really neat!
git linearize --format "c0de"
git linearize --format "%040d"(takes ~1033 years to run per commit)
Check out zegl/extremely-linear on GitHub for testing git-linearize and the "shit" ("short git") wrapper!
git-linearize now uses lucky-commit as it's hash generation backend.
It's using your GPU for generating hashes, and is about 20x faster than the CPU based implementation. Wow!
lucky-commit also cleverly uses only invisible whitespace characters for padding the commit messages.
Thanks to kinduff on Hacker News for telling me about lucky-commit.