Maybe You Should Commit Everything You Need to Run Your Code

I’ve been committing generated code and vendor folders for over a year. I am delighted with the overall simplification of continuous integration, building, and deploying flows. They all complete faster and more reliably. Teams are happier and more productive with these changes. Let me convince you to give it a try!

If you had asked me a couple of years ago what I think about committing generated code or third-party dependencies, I would probably have hit you with a stick. Why would you add code that you can download with one command?! I’ve turned exactly 180° on that.

Small note: My primary language is Go. We don’t use bloated frameworks, if we use one at all. Keeping dependencies low is one of the language’s core values (“A little copying is better than a little dependency”). This influences the benefits-and-disadvantages ratio. I probably wouldn’t commit node_modules ;)

Onboarding is such a joy

I recently had to change my work laptop. I was pretty versed in how the entire project was set up. However, there were always some issues or inconveniences when you set it up from the ground up. This time, I cloned the repository and ran the service I needed to test. Clone and run. That’s it.

Of course, with time, you need to set up all the other parts (access to private libraries, installing code generators, adding linters). However, hitting the ground running within fifteen minutes is pure joy. When you see the code for the first time, you can focus on the issue, instead of fighting with multiple problems, usually all at once. Let me configure that access to private libraries once I actually have to update one of them.

Working with it daily

One of the stepping stones toward my appreciation of generated code was always having it at my fingertips, whether I was reviewing a pull request or working on it.

When you are forced to attach changed generated code, or modified dependencies, it is very apparent when reviewing. Just glancing at a pull request, before asking for reviews, makes it obvious if I made unintentional changes. As a reviewer, I also appreciate the ability to navigate through code easily, without pulling and generating it myself.

While working locally, updating from the main branch is much quicker without re-downloading dependencies and re-generating code (this becomes an annoyance as the application grows). It reduces the possibility of running not-up-to-date code (I can’t be the only one who spent hours debugging, only to realize that I forgot to regenerate the code).

Continuous integration and building

The setup and the speed of building, testing, and deploying are significantly better. In our case, we achieved a three times shorter pipeline times, while simplifying CI jobs’ scripts.

The speed comes from parallelizing jobs: We ran build, unit tests, integration tests, and e2e tests simultaneously. At the same time, these jobs became simpler, as no additional setup was required (there is one complication - we need to add an extra job that checks whether files are up-to-date). Achieving the same speed with caching and proxying is usually impossible and certainly a lot harder to do and understand later.

The simplification comes from not having to worry about the dependencies. Building scripts can assume that all dependencies are there. In our case, this meant removing many lines and executing more linearly (for example, setting up access to private libraries is usually different between the local environment and CI). Without the burden of performance implications, the order of CI jobs can stay natural.

Pull requests

While reading this article, one of my coworkers mentioned the size of pull requests as an issue. As of today, GitHub has a limit of 300 files for diffing. This API limit can make some processes, like linting, fail. It can be annoying when updating dependencies with breaking changes. (We wouldn’t be updating dependencies AND code in the same pull request otherwise, right?)

Safety?

Removed libraries, changed versions, or changed generator tools do not affect us when we roll back - we will always have the same state as the previous deployment. A library’s changed source code won’t make it to production without the developer’s approval. The safety benefit depends heavily on the tooling. In Go, these are already taken care of by official tools, which makes this point mute.

(In Go, package consistency is ensured through checksums. This makes all published libraries and their versions immutable (version tags and the code can not change). Since libraries are proxied through proxy.golang.org, we do not have to worry about packages we depend on disappearing.)

Few tips

In case you are interested in trying it for yourself, I’m attaching a few tips that I encountered along the way:

Mark desired files as generated

If your VCS doesn’t collapse these files automatically, you should be able to configure it by marking them with linguist-generated=true in .gitattributes:

path/to/file.txt linguist-generated=true
*.generated.go   linguist-generated=true
/vendor/*        linguist-generated=true

Check if files are up to date in your CI

A lack of consistency would make the entire ordeal not worth it. Having a different committed dependency than the one specified in go.mod (or equivalent) is a nightmare to debug. To combat this, we run our scripts to generate code and update dependencies. Afterward, we only have to check if there are any changes compared to the main.

# First update dependencies and generate code

CHANGES=$(git status --porcelain=v1 | grep --ve 'codecov.out' | wc -l)
if [ "$CHANGES" -ne 0 ]
then
	echo "Dependencies and/or generated code have changes"
	git status
	exit 1
fi

grep --ve is there to ignore files, which are dynamic and not required for running code. For example, code coverage report.

What about the size of the repository?

I was surprised to learn that there is no easy way to check repository size over time (at least on our GitHub account). I would prefer a pretty graph rather than my word: except for the first commit, where we added all modules, it had a negligible effect on our git repository. There was no slowdown on daily pulls, no issues with deployments, and no need to add --depth 1 to git clone. (For size perspective: repository weighs 190 MB, while external libraries weigh 120 MB.)

Convinced to give it a try?

Onboarding is such a joy#

Working with it daily#

Continuous integration and building#

Pull requests#

Safety?#

Few tips#

Mark desired files as generated#

Check if files are up to date in your CI#

What about the size of the repository?#