25/03/2019

How to Write 80% Less Code

by Jos Jong

The list of attempts to make it possible to write less code is endless. Many of them use knowledge of the data model to automate a considerable part of the database access, such as these:

Fourth generation programming languages (4GL)
Computer Assisted Software Engineering (CASE) tools
Model-driven development (MDD)
Domain specific languages
Object relational mapping tools (ORM)
Object-oriented databases (OODBMS)
Modern no-code and low-code environments (e.g. Codemotion and Mendix)

And, of course, using the data model as a basis for standard behavior makes a lot of sense. Unless you think that passing around data from one format to another, gluing strings together to form database queries, and manually handling the results of these queries should be regarded business logic. I dare to claim that, on average, only 20% of the code that we write is pure and actual business logic.

But, then again, why did these things never go mainstream? Where did it go wrong?

It’s not just due to being proprietary (vendor lock in), slow builds (e.g., as a result of code generation), or not being general purpose enough. We can solve all that with the right implementation and by making things open source. No - there are way bigger issues with all the attempts listed above.

These are the subject in my recent book, called Vertically Integrated Architectures. Instead of proposing yet another variant of the same theme, I went back to the drawing board, so to say, and asked some fundamental questions. Why are many of these solutions in need of manual performance tuning in the end? How compatible is a model-driven solution with a heterogeneous application environment? How general purpose can we make them? And, what is the fundamental cause behind the so-called database impedance mismatch?

The core of my analysis comes down to these two challenges:

Data model versioning: Any solution that only supports a single version of the data model will not be able to automatically cope with data model migrations against external systems and non-web clients.
When to compile: Any solution that tries to generate and/or compile code, ahead of any client request coming in, is doomed to be inefficient as soon as the requests get more complex.

Let’s go into some more details for both subjects.

Data Model Versioning

Imagine a completely isolated application with no external interfaces that only supports its own web client. In such a scenario, changes to the data model never result in compatibility issues. You might only have to create a script to migrate a test or production database.

However, as soon as we add an external API or native (mobile) client, we do have to cope with API versioning in the end. This is because data model changes are likely to impact the API and it is not always possible to force external systems to keep the same release schedule. Even when we must deal with a system that is in control of the same owner (company, department,) not being dependent on each other’s release schedule is, in a way, the whole point of having subsystems.

So, what do we do in such a scenario? We add a version number to the API. And we write extra code to covert data where needed. This means that there must be a place to put that code: the service implementation. This is why code generation doesn’t work in such scenarios. Code being generated based on the current version of the data model will not be able to automatically cope with older versions of the same data model.

I dedicated a whole chapter in my book for the solution that I propose – making a data model aware of its own version history.

When To Compile

The when to compile dilemma relates to the languages that we use. No matter how fancy a language, 3GL programming languages get translated into in-memory instructions – to allocate memory, do some calculations, and copy data from memory to memory location. As soon as we want to persist data, we must manually write code to call a database API. And this is where the trouble starts. We could save a lot of code if the language would automatically read and write to the database whenever needed. But, a 3GL compiler doesn’t look ahead. It does not know that, after loading customer data, we also want to load the customers’ orders and related product data.

Object relation mappers (ORM) and object databases tried to solve this. But, without special trickery (smart caching, API options, hints, etc.), they simply fail because the language and thus, the compiler, cannot know whether to combine certain queries to the database or not. Remember that network roundtrips are extremely costly! On one side, we need so-called lazy loading to prevent loading the whole database for every query; on the other side, we cannot do without combining data requests (non-lazy) to get a reasonable performance in all scenarios.

The only way out is a language that can be analyzed in the context of a complete service request, just like database queries (for example, SQL) are only analyzed and compiled based on the full request that comes in.

About the Author

Jos Jong is a self-employed independent senior software engineer and software architect. He has been developing software for more than 35 years, in both technical and enterprise environments. His knowledge ranges from mainframes, C++, and Smalltalk to Python, Java, and Objective-C. He has worked with numerous different platforms and kept studying to learn about other programming languages and programming concepts. In addition to developing many generic components, some code generators and advanced data synchronization solutions, he has prototyped several innovative database and programming language concepts. He is an abstract thinker, who loves to study the fundamentals of software engineering, and is always eager to reflect on new trends. You can connect with him on Twitter @jos_jong_nl.

This article was contributed by Jos Jong, author of Vertically Integrated Architectures.