More about fit

fit is my rewrite of Git from scratch. I’ve particularly enjoyed working through the low-level details and delighting in each milestone. On this page I have and will continue to document some points of interest along the way. This has helped me collect my thoughts and hopefully also serves as a good roadmap to attempt this journey yourself.

Existing resources

I initially stumbled upon Jon Gjengset’s stream where he follows the CodeCrafters course which walks through a basic Git implementation. His stream was so good that I stopped watching immediately. Ever wary of tutorial hell, I decided to set out to attempt his journey myself. What he was doing sounded interesting, but the joy of solving the problem would’ve been lost had I watched him come up with the answer himself.

There’s also Thibault Polge’s Write Yourself a Git! which, while quite nice, makes some opinionated design decisions and leans quite heavily on the relaxed rules of Python duck-typing and indifference toward memory usage. My goal is to make design decisions that are sound in principle, even if they’re not perfect.

Rust is great for a write-up like this because those who wish to follow in an interpreted language like Python can easily do so by relaxing the rules of Rust, while those using a language like C++ can benefit from a discussion of lower-level details like memory management.

To avoid spoiling the answer for you in turn, I’ll do my best here to stay away from fully realized code. I found joy in writing low-level parsing code so I’ll try not to take that away.

Prerequisites

I can’t explain everything here, so there are some things I’ll assume are known to all of us:

Beginning

I balked at the idea that an interpreter runtime would be necessary to run fit, especially considering that I intended to actually use it. Thus Python and friends were out of the question.

I eventually settled on Rust because it is the compiled language with which I’m most familiar. There are also many other benefits to using Rust which have been argued many times in many places. Of course, use whatever you’d like if you do end up experimenting with a project like this.

For Rust users in particular, I agree with Jon’s decision to use clap, a very common command-line argument parser. Of course it would be faster and lighter on code size to write my minimal parser, but for this project I wasn’t interested in squeezing out every single optimization available. I also think his incorporation of anyhow is reasonable because, again, “mostly fast” is sufficient for me.

Given these, here’s what one command looks like under this architecture:

// main.rs
use clap::{Parser, Subcommand};

mod commands;

#[derive(Subcommand, Debug)]
enum Command {
    /// docs here
    Init(commands::init::Options),
}

/// docs here
#[derive(Parser, Debug)]
#[command(version, about, long_about = None)]
struct Args {
    #[command(subcommand)]
    subcommand: Command,
}

fn main() -> anyhow::Result<()> {
    let args = Args::parse();

    match args.subcommand {
        Command::Init(options) => commands::init::invoke(options)?,
    }

    Ok(())
}

Then the command can live in its own file:

// of course, the corresponding mod declaration must exist in src/commands/mod.rs
// then, assuming we're in src/commands/init.rs:

use anyhow::Context;

#[derive(Debug, clap::Parser)]
pub(crate) struct Options {
    /// docs
    some_option: bool,
}

pub(crate) fn invoke(options: Options) -> anyhow::Result<()> {
    something_fallible.context("try something fallible")?;
    Ok(())
}

Following Codecrafters

For this part, the Codecrafters course linked above remains a pretty good resource. I find their examples a little unclear sometimes, so I’ve rephrased some here. However, feel free to use their course as a reference as well. In fact, if you subscribe to their platform, they’ll even test your implementation as you go.

First command: init

Codecrafters begins with a pretty simple command: the creation of .git with the init subcommand. This is simple; you just have to ensure the following tree exists:

.git
├── HEAD
├── objects
└── refs

3 directories, 1 file

and HEAD should contain the exact string ref: refs/heads/main\n. Eventually, you’d want to read the Git config key at init.defaultBranch and respect that.

finding .git

cat-file over blobs only

hash-object with --write

ls-tree

Leaving the Codecrafters path

At this point I tx`hink Codecrafters makes a serious error in the interest of brevity; they ask you to write a tree object directly from the filesystem, ignoring the index. There is no corresponding faculty in Git, and the index really is an integral part of the flow of using Git. At this point I found Thibault Polge’s guide more helpful.

references

read config

tags

read index

ignores

compare index to disk

mutate the index: add and rm

commit

checkout

unpack-objects

fetch

clone

push