Conventionally we communicate programming ideas with talks, papers, and blog posts. But we can also communicate ideas with entire codebases. If someone finds a security exploit, she’ll sometimes publish a proof of concept to prove the exploit isn’t just theoretical.

Now let’s say the exploit PoC comes with a ton of command-line flags: verbose mode, configuration options, output formats, the whole works. Now the writer is communicating something subtly different: not just that the exploit exists, but she wants you to experiment with it. She’s making it as easy as possible for you to play with the exploit yourself and come up with variations and consequences.

This makes codebases like any other kind of communication medium. There are different styles you can use to say subtly different things. There are also different “genres”, or overt things you use the codebase to say. Some examples:

Of Genres

This is by no means exhaustive. A codebase can:

Of Styles

Styles are qualities inside a codebase that encourages certain kinds of reader responses. Done right, they also signal intentionality, that you intend for those responses. Again, not exhaustive.

Repeated style patterns across codebases can communicate fundamental values. If a company releases a lot of open source, and many of them have a single flourish, it shows that engineers at the company get to work with exciting technology. If everything uses TDD, it shows that the company treats TDD as a fundamental part of the development process.

Like other forms of media, a codebase style can show unintentional things about the author. What does it say about a programmer when their favorite test string is “boobies”?

Considerations

For a codebase to communicate well, the message needs to be easy to understand. So the code layout needs to be very simple. Most production code doesn’t communicate well because it’s split across files in nested folders. That’s hard to navigate. Try to have only a few files and make the most important ones clear.

Boilerplate code hampers communication by adding noise to the code. If you’re trying to communicate something that’s language-agnostic, use low-boilerplate languages for the codebase.

Science code is often bad at communicating because it’s a spaghetti mess (example). At the same time, lots of abstractions and indirection tax the reader’s working memory. The code needs to be organized but not too organized.

If you have an intentional genre and style, you should make that clear in a project readme. Better overexplain than underexplain. I think it also makes sense to talk directly to the reader. Call out specific lines, go into detail.

Unlike in regular programs, redundancy helps with communication. Presenting the same thing in increasingly complex ways, or show variations and their consequences.

All this means that communicative codebases will look different from other kinds of code. Genre definitely; production code can also have styles, but it’ll be harder for the casual reader to notice them.

Questions I have about this


This was issue #181 of Computer Things.
You can subscribe, unsubscribe, or view this email online.

304 S. Jones Blvd #3567
Las Vegas NV 89107

This email brought to you by Buttondown, the easiest way to start and grow your newsletter.