Last night, Lars, Mark, and I discussed Jeremy Kun's The communicative value of using Git well post. While a lot of our discussion was spawned by the article, we did go off-piste a little, and I hope that my notes below will enlighten you all as to a bit of how we see revision control these days. It was remarkably pleasant to read an article where the comments section wasn't a cesspool of horror, so if this posting encourages you to go and read the article, don't stop when you reach the bottom -- the comments are good and useful too.


This was a fairly non-contentious article for us though each of us had points we wished to bring up and chat about it turned into a very convivial chat. We saw the main thrust of the article as being about using the metadata of revision control to communicate intent, process, and decision making. We agreed that it must be possible to do so effectively with Mercurial (thus deciding that the mention of it was simply a bit of clickbait / red herring) and indeed Mark figured that he was doing at least some of this kind of thing way back with CVS.

We all discussed how knowing the fundamentals of Git's data model improved our ability to work wih the tool. Lars and I mentioned how jarring it has originally been to come to Git from revision control systems such as Bazaar (bzr) but how over time we came to appreciate Git for what it is. For Mark this was less painful because he came to Git early enough that there was little more than the fundamental data model, without much of the porcelain which now exists.

One point which we all, though Mark in particular, felt was worth considering was that of distinguishing between published and unpublished changes. The article touches on it a little, but one of the benefits of the workflow which Jeremy espouses is that of using the revision control system as an integral part of the review pipeline. This is perhaps done very well with Git based workflows, but can be done with other VCSs.

With respect to the points Jeremy makes regarding making commits which are good for reviewing, we had a general agreement that things were good and sensible, to a point, but that some things were missed out on. For example, I raised that commit messages often need to be more thorough than one-liners, but Jeremy's examples (perhaps through expedience for the article?) were all pretty trite one-liners which perhaps would have been entirely obvious from the commit content. Jeremy makes the point that large changes are hard to review, and Lars poined out that Greg Wilson did research in this area, and at least one article mentions 200 lines as a maximum size of a reviewable segment.

I had a brief warble at this point about how reviewing needs to be able to consider the whole of the intended change (i.e. a diff from base to tip) not just individual commits, which is also missing from Jeremy's article, but that such a review does not need to necessarily be thorough and detailed since the commit-by-commit review remains necessary. I use that high level diff as a way to get a feel for the full shape of the intended change, a look at the end-game if you will, before I read the story of how someone got to it. As an aside at this point, I talked about how Jeremy included a 'style fixes' commit in his example, but I loathe seeing such things and would much rather it was either not in the series because it's unrelated to it; or else the style fixes were folded into the commits they were related to.

We discussed how style rules, as well as commit-bisectability, and other rules which may exist for a codebase, the adherence to which would form part of the checks that a code reviewer may perform, are there to be held to when they help the project, and to be broken when they are in the way of good communication between humans.

In this, Lars talked about how revision control histories provide high level valuable communication between developers. Communication between humans is fraught with error and the rules are not always clear on what will work and what won't, since this depends on the communicators, the context, etc. However whatever communication rules are in place should be followed. We often say that it takes two people to communicate information, but when you're writing commit messages or arranging your commit history, the second party is often nebulous "other" and so the code reviewer fulfils that role to concretise it for the purpose of communication.

At this point, I wondered a little about what value there might be (if any) in maintaining the metachanges (pull request info, mailing list discussions, etc) for historical context of decision making. Mark suggested that this is useful for design decisions etc but not for the style/correctness discussions which often form a large section of review feedback. Maybe some of the metachange tracking is done automatically by the review causing the augmentation of the changes (e.g. by comments, or inclusion of design documentation changes) to explain why changes are made.

We discussed how the "rebase always vs. rebase never" feeling flip-flopped in us for many years until, as an example, what finally won Lars over was that he wants the history of the project to tell the story, in the git commits, of how the software has changed and evolved in an intentional manner. Lars said that he doesn't care about the meanderings, but rather about a clear story which can be followed and understood.

I described this as the switch from "the revision history is about what I did to achieve the goal" to being more "the revision history is how I would hope someone else would have done this". Mark further refined that to "The revision history of a project tells the story of how the project, as a whole, chose to perform its sequence of evolution."

We discussed how project history must necessarily then contain issue tracking, mailing list discussions, wikis, etc. There are exist free software projects where part of their history is forever lost because, for example, the project moved from Sourceforge to Github, but made no effort (or was unable) to migrate issues or somesuch. Linkages between changes and the issues they relate to can easily be broken, though at least with mailing lists you can often rewrite URLs if you have something consistent like a Message-Id.

We talked about how cover notes, pull request messages, etc. can thus also be lost to some extent. Is this an argument to always use merges whose message bodies contain those details, rather than always fast-forwarding? Or is it a reason to encapsulate all those discussions into git objects which can be forever associated with the changes in the DAG?

We then diverted into discussion of CI, testing every commit, and the benefits and limitations of automated testing vs. manual testing; though I think that's a little too off-piste for even this summary. We also talked about how commit message audiences include software perhaps, with the recent movement toward conventional commits and how, with respect to commit messages for machine readability, it can become very complex/tricky to craft good commit messages once there are multiple disparate audiences. For projects the size of the Linux kernel this kind of thing would be nearly impossible, but for smaller projects, perhaps there's value.

Finally, we all agreed that we liked the quote at the end of the article, and so I'd like to close out by repeating it for you all...

Hal Abelson famously said:

Programs must be written for people to read, and only incidentally for machines to execute.

Jeremy agrees, as do we, and extends that to the metacommit information as well.