(This article is a repost from my personal blog at https://marccgk.github.io)
I recently wrote about why I think Object Oriented Programming is not a good tool to write code.
In that post I wrote:
However, in my experience, OOP is taken as the gold standard for software develoment by the majority of professionals.
Object Oriented Programming is often taught and, therefore, learned, right after the basic programming constructs: variables, conditionals, loops, basic types and, in languages like C++, pointers and basic memory allocation / deallocation. Hence, most people have only approached medium to large codebases from an OOP lens. Although there are large codebases written in non OOP languages, e.g. the Linux Kernel, written in C, I’d argue most sizable codebases currently in development are built using Object Oriented Design (Not an empirical study, based on my experience working on C++ codebases).
In this blog post I’d like to present a different starting point to approach writing software. Obligatory caveat: this is not a one size fits all solution, nor it doesn’t pretend to be. Writing code is still a craft more than a science, so in my opinion, every “do this, don’t do that” advice should be paired with measurable pros and cons.
Having said that, I should also note that as a community, software developers don’t even agree on what characteristics make the list of advantages or disadvantages nor what priorities should be assigned to them.
Data-Oriented Design
I assume you’re familiar with Mike Acton’s Data-Oriented Design CppCon 2014 talk. If not, go watch it now, I’ll wait.
Data-Oriented Design (DOD) is a fundamental concept, understanding how the hardware works (at a high level) is a prerequisite to writing instructions for a computer to execute. However, DOD doesn’t tell you how to write code. In this conversation with Christer Ericson that Mike Acton made public, Christer explains why DOD is not a modeling approach:
Q: Isn’t Data-Oriented Design just dataflow programming?
Or: Why DoD isn’t a modelling approach at all.
I’ll quote an exchange with Christer Ericson’s answer (with permission) on this subject:
No, not dataflow programming.
Dataflow programming, as well as OOD for that matter, is a modelling approach, and specifically for dataflow programming, by expressing data connectivity as a graph.
While neither me, nor Mike [Acton], nor Noel [Llopis] has ever provided an “official definition” of DOD (nor have we really been interested in doing so, nor would we necessarily 100% agree on one), I would argue that DOD is not a modelling approach, in fact it’s the opposite thereof.
As Mike has eloquently pointed out elsewhere, computation is a transformation of data from one form into another. DOD is a methodology (or just a way of thinking) where we focus on streamlining that transformation by focusing on the input and output data, and making changes to the formats to make the transformation “as light” as possible. (Here there are two definitions of “light.” Mike would probably say that “light” means efficient in terms of compute cycles. I would probably say “light” means in terms of code complexity. They’re obviously related/connected. The truth might be in between.)
I say this is the opposite of a modelling approach, because modelling implies that you are abstracting or not dealing with the actual data, but in DOD we do the opposite, we focus on the actual data, to such a degree that we redefine its actual layout to serve the transformation.
DOD is, in essence, anti-abstraction (and therefore not-modelling).
In practice, we find a balance between the anti-abstraction of pure DOD and code architecture component needs.
What modeling approach should I follow, then?
I don’t know. I think different people will find that different modeling approaches work better or worse for them, and that is alright.
Personally, I just like to keep things simple, really simple, and build complexity only when 100% warranted. I’ve seen only too many clean up, refactor, simplify, and address technical debt tasks to understand that complex solutions in the name of some abstract goal (e.g. extensible, single responsibility principle, DRY, etc…) don’t work out in the end.
So, what does simple mean to me?
Simplicity
Simple means straightforward. Simple code is code that is easy to understand by itself, no knowledge of other code or foreign concepts needed.
This immediately rules out most of the C++ STL, like:
std::vector<int> v; // initialized somehow int sum = std::accumulate(v.begin(), v.end(), 0);
This is not simple code, it’s just short. But short doesn’t make it simple. To understand this 2 line snippet you must be familiar with a lot of concepts: std::vector (more complex than arrays), iterators, STL algorithms, templates or generic metaprogamming, etc…
On the other hand, this is a lot easier to understand:
int* values; // initialized somehow int valueCount; int sum = 0; for (int i = 0; i < valueCount; ++i) { sum += values[i]; }
This last snippet only requires basic programming concepts to be completely understood: variables, pointers, loops, etc…
Warranted complexity
There’s only so much that can be done with simple loops and accumulating values. With growing amounts of features and, therefore code, complexity will invariably materialize. However, striving for simplicity should always be on the front of your mind, fighting entropy and not giving in to the path of least resistance.
There are inherent complex problems, though, and they require complex solutions. One of such complex problems is extensibility, that is a way for the user to provide custom functionality to some software without modifying the source code. This problem justifies, and necessitates, a complex solution, e.g. a plugin architecture.
How do you translate that into a modeling approach?
As I said, this can adopt different forms for different people. I can only speak for myself, but what follows is what I’ve found to work for me and what I’ve learned from other people that take a similar approach to code.
Write straightforward code
Code is easier to understand when it’s written linearly, in a procedural way. That generally takes the form of long functions that do one thing conceptually, but might be composed of multiple sub-tasks to accomplish the main one. These sub-tasks are not extracted into their own standalone functions, though, until there’s a reason for it, that is, the code is already duplicated in 2 or 3 places and the commonality is large enough (i.e. 90%+) that the cognitive overhead of another function is superseded by its usefulness.
However, there’s also room for small functions. In contrast with large functions that implement main features, short functions tend to be helper functions that get called from the large ones and return immediately. Some examples of such functions are allocations (e.g. a temporary allocator might just bump a pointer), make/create functions, math libraries (e.g. vectors, matrices), etc…
Physically separating the code like this has several advantages both for the programmer and the compiler:
Context: code is rarely useful in isolation, context matters a lot. Having code close together provides more context than separating it through multiple functions.
Shallow call stacks: deep call stacks are a symptom of complexity. “Vertical implementations” (i.e. functions that call other functions) keep code away from useful context, making understanding harder.
Compiler optimizations: keeping secondary functions short and shallow helps the compiler with better optimization opportunities, like inlining.