One of the things that I like to do in my projects, is to make the git history as linear as possible.
Usually this means to rebase commits onto the main branch, but it can also mean to only allow merges in one direction, from feature branches into main, never the other way around. It kind of depends on the project.
Today I'm taking this one step further, and I'm introducing a new concept: extremely linear git history.
With our extremely linear history, the first commit in a repo hash a hash that starts with 0000000
, the second commit is 0000001
, the third is 0000002
, and so on!
Incremental version numbers makes it easy to talk about revisions. You immediately know that version 230 comes after 200, and if you create 10 new versions per day, it's easy to have an intuition for how old a commit is based on your current latest version.
Extremely Linear Git History (on GitHub)
In git, commits are reffered to by the SHA‑1
sum of the commit object itself.
We can inspect a raw commit object using git cat-file
and verify it's hash by reading
it from disk (compressed with zlib), and testing the checksum. The resulting checksum should always
the same as the file name.
$ git cat-file commit d9ef231178b5004c17fe4e4e1807728567a69b84 tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6 parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2 author Gustav Westling <gustav@westling.dev> 1669040942 +0100 committer Gustav Westling <gustav@westling.dev> 1669040995 +0100 blog: start working on a blogpost for extremely linear git history
$ pigz -d < .git/objects/d9/ef231178b5004c17fe4e4e1807728567a69b84 | sha1sum d9ef231178b5004c17fe4e4e1807728567a69b84 -
The message of the commit comes last in the object, and is easily modifiable. To change the checksum of the object, we can append junk data to the commit message. We just don't know what the junk payload is.
There is no way to easily create content with the desired prefix (that would prevent the whole point of checksums). So we only have one option: testing many combinations of junk data until we can find one that passes our criteria. It's basically the same mechanism that powers Bitcoin and other proof-of-work systems. In this case I guess it could be called "proof-of-work, for your work". Or "proof-of-proof-of-work-work"? Heh.
I've been doing the commit message crunching using githashcrash by Mattias Appelgren.
On my macbook, I'm able to generate and test ~15 million hashes per second. Since we're looking
for an input that creates an predefined 8 character prefix, I should get a hit every 16^8 => 4 294 967 296
iterations, about once every 5 minutes on average! You can make this process faster by using shorter
prefixes, a 6 character prefix takes only 1 second to generate (but it doesn't look as nice in some
Git UIs, and since this is a project that prioritizes form over function, I'm willing to waste more
CPU cycles).
For the commit example above, we can use githashcrash to find junk data that changes the commit
prefix to be whatever we want, like 00000000
. After a some crunching, githashcrash
finds that we can append magic: MTQIpN2AmwQA
to our commit message to create our desired
hash! Aaaah, it's glorious!
2022/11/21 15:56:04 Time: 12m19.962437166s 2022/11/21 15:56:04 Tested: 1.0845869819e+10 2022/11/21 15:56:04 14.66 MH/s 2022/11/21 15:56:04 Found: 00000000508749e5231fa5b43efcf7ac31385058
$ git cat-file commit 00000000508749e5231fa5b43efcf7ac31385058 tree 2ccda48edc8ed3a96ac7576c57a5d645de2396f6 parent ad877a8e0240bdec6757781fdc3f2b45b8ced7a2 author Gustav Westling <gustav@westling.dev> 1669040942 +0100 committer Gustav Westling <gustav@westling.dev> 1669041794 +0100 blog: start working on a blogpost for extremely linear git history magic: MTQIpN2AmwQA
With some bash-glue we can automate this
process, and extremely-linearize your branches in one single command. To test it out
(please don't), install with brew install zegl/tap/git-linearize
and in any
repository run git linearize
to "fix" it!
I've converted a recent toy project of mine to use this format of prefixes, and honestly, it looks really neat!
git linearize --format "c0de"
git linearize --format "%040d"
(takes ~1033 years to run per commit)Check out zegl/extremely-linear on GitHub for testing git-linearize and the "shit" ("short git") wrapper!
git-linearize now uses lucky-commit as it's hash generation backend. It's using your GPU for generating hashes, and is about 20x faster
than the CPU based implementation. Wow! lucky-commit
also cleverly uses only invisible whitespace characters for padding the
commit messages.
Thanks to kinduff on Hacker News for telling
me about lucky-commit.
By Gustav Westling,
2022-11-22 (wow, cool date!)
Discuss on Hacker News