Development → the Fewer, the Better – Programming Languages Opportunities

29.08.2016

Pavel Сherniavskiy

1 Comment

Many programming languages include redundant functionality. And when we develop any language, we make every effort we can to make that unnecessary functionality go away.

The programming languages out there are many, and yet new ones tend to appear every so often. Are they better than the programming languages that already exist? Well, let’s see. To start with, we need to define what we mean when we say “better” with regards to a programming language.

If we consider the historical development trends, we might discover an approach, where developers create an original programming language by only examining an existing language, finding redundant functionality within it and inventing a brand new language without that unnecessary functionality.

According to Antoine de Saint-Exupery, perfection is achieved when we have nothing more we could add, but rather when we have nothing more we could eliminate or get rid of.

This article will explore several examples of how the most popular programming languages have features, which are commonly recognized as redundant. We will also review a couple of cases, where the programming languages have similar features that might soon be classified as redundant as well.

Unlimited Room for Error

When we saw the miracle of the first computers being invented, there was no other way to create a computer program, but using the computer code or the assembler language. One can basically express virtually anything the processor can do using machine code because machine codes strongly correspond to the set group of the processor instructions. Machine codes can, of course, be used to write decent software, but there exist millions of programs, which have been written using the machine code, and some of them freeze, fail or use the hardware that they run on in a wrong way.

Let’s be realistic. Today you probably use some high-level programming language, when you code. Unfortunately, however, it is not that simple to make things work the right way even with that. Each accurate piece of software has a multitude of counterparts that just are not functioning.

If it is the machine code you use, you are offered an unlimited number of opportunities to create a unique piece of faulty software. After all, you can execute any processor command at any point in time. A set of correct operation sequences is just a limited subset of all the possible sequences of instructions.

The first programmers have quickly discovered that if they write programs using machine codes, they will inevitably run into errors. Moreover, the resulting code appears long and very difficult to read. The invention of the assembler language offered another way to resolve the issue. Although in reality that did not help much: the programs written using the assembler language were slightly easier to read, but all they did was repeat the machine instructions one by one, so the potential risk of writing an invalid program remained unchanged.

High-Level Programming Languages

Some time later, suffering from machine codes and assembler languages, programmers have finally come up with high-level programming languages. The first languages that have been invented back at the time are not that popular today. However, the C programming language represents an example of those first high-level languages that were still a rather low level at first.

You can write any valid program using the C programming language. But there remains quite a chance that what you would get as a result is an invalid program that could quickly fail. Since a language is somewhat an abstraction over the machine code, there also exist certain sets of machine commands that we cannot express using that language. Invalid programs represent most of such commands. That can be taken further to mean that the abstraction making such commands impossible is a positive thing.

Now, let’s take one close look at what has happened: the transition from the machine code to the high-level programming languages has eliminated certain features. We could write virtually anything using machine codes, but a high-level programming language has limited the range to some extent. However, we have found our peace with that, as the advantages of this approach outweigh its drawbacks.

GOTO

In the year 1968, Edsger Wybe Dijkstra published his famous work on how the goto statement is harmful. According to Dijkstra, the concept of using the goto statement is faulty in itself, and programs would be much better off without the goto. This gave way to a decades-long dispute, which today led us to the understanding that Dijkstra was, in fact, right. Many popular programming languages today are doing quite well without having the goto statement (Java and JavaScript to name a few).

Because goto was virtually found to be a useless statement, it being thrown out of the language does not result in fewer decently functional programs that could be written using the language. However, it does decrease the number of invalid faulty programs. Do you follow the pattern? Just remove something that is useless and you will advance. This is, in fact, nothing new. Robert C. Martin said that many years ago.

We cannot just remove the first features that come to mind from the language because we would then risk losing the possibility to write certain valid programs.

But we can eliminate some functions without causing any harm to the language.

Exceptions

No one today would argue with the fact that errors must be processed using a precise mechanism for handling error conditions. At the very least, there is definitely no doubt that the error codes alone are not enough. We need something more serious, but also something that would enable us to maintain the balance between usefulness and performance.

Exceptions in our programming languages are problematic in that they are the veiled GOTO statements. And as we have previously discussed, it is not quite so good to use the GOTO statement.

The best approach could be using a certain composite type, which comprises information on the successful or unsuccessful execution of a certain block of code, and on it containing any errors.

Pointers

As Robert C. Martin has rightly pointed out, the legacy programming languages like C and C++ give us a possibility to manipulate pointers. But if we introduce the notion of polymorphism, we no longer need pointers in their pure sense. There are no pointers in Java, just like there aren’t any in JavaScript. C# does have pointers, but we only need them in rare cases, like a direct calling of the WinApi function.

All of these programming languages have proven that one does not need pointers to transfer anything anywhere using a link. We could easily get rid of tips.

Number Types

The majority of strongly typed programming languages offer the possibility to pick between several number types: 32-bit unsigned integer, 32-bit integer, 16-bit integer, decimal floating-point number, etc. And although that did make sense in the 50-s, it no longer does now. We spend lots of time performing micro-optimization like selecting the correct numeric data type, but we lose the big picture. According to Douglas Crockford, JavaScript only has one single number type, and that in itself is an awesome idea. However, Crockford believes, that it is a shame that that number type is the wrong one.

Given the resources of today’s computers, we can easily afford to have a programming language, which gives us exactly one number type. Such a language would provide us with an excellent opportunity to throw away all that chaos linked to having multiple numeric types.

Null Pointers

A null pointer is a concept in a programming language that is not entirely understood. There isn’t anything wrong about the fact that some value can be set or missing. This concept is found in many basic programming languages. It is called Maybe in Haskell, or option in F#, or null in T-SQL. But there is one thing in common for each and every one of these programming languages. And that is the fact that this feature is optional. Although it is possible to declare a value nullable, the value is not nullable by default.

However, due to Tony Hoarse’s mistake, which he admits to and refers to as a billion-dollar error, many languages still do have null pointers. Examples would be C, C++, Java, C#. It is in fact not the concept of the null pointer per se that is a problem, but rather the fact that any pointer can be null by default, which is why it is simply impossible to differentiate between situations, in which null is the expected and valid value of the pointer, and those situations, when that is a defect.

If we develop a language without the null pointers, we will never have to think about generating or processing null pointer access errors.

Provided that Tony Hoare is indeed right in his billion-dollar assessment of the damage that comes from this bug, taking away the null pointer right here and now might help us all save some really good money in perspective. Because some Turing-complete languages without the null pointer (like the aforementioned F#, Haskell, and T-SQL) do exist, we know for sure that writing any valid program without this concept is possible. This only means that eliminating the null pointer will do away with a large layer of errors.

Changing Values of Variables

The possibility to mutate a variable as you proceed with programming represents one of the central concepts in the object-oriented, procedural and imperative styles. This is exactly why the variable is called a variable. It seems logical and intuitively correct since the processor contains registers. And all we do, when we run a program is write data into the registers, execute commands and read the results. This is also logical if we consider it from a different angle. The overwhelming majority of programs have been designed somehow to alter the state of the external world – record data into the database, send an e-mail, display an image on the screen, print a document, etc.

However, it appears that changing the variable values causes a huge number of software errors. Let’s consider this code written in C#:

var r = this.mapper.Map(rendition);

When the value is returned by the Map method – has the rendition parameter been modified? Okay, if you follow certain principles, it shouldn’t have happened. However, we can only find out for sure if we review the Map method code. And what if it transfers the value further on? What if one of the methods changes it? What if that only happens in certain cases? The debugging becomes very complicated. C# (just like Java and JavaScript) offers no ways to prevent this problem from occurring.

And when we have a sophisticated program with a long call stack, it is virtually impossible to say anything definitive about the code, as any method can change anything. And once the number of variables in the project reaches dozens or even hundreds, keeping their state in the head will no longer be possible. “Has the isDirty flag been changed? Where? How is this linked to the customerStatus?”

Let’s imagine for a minute that the possibility to change variables was removed from a programming language:

Most popular programming languages would not go that far, but if we consider Haskell, which is a Turing-complete language, we will notice that virtually any program is possible to write without implicitly mutating the variable value.

And here many of you could argue that Haskell is overcomplicated and non-intuitive, but to me, this falls under the same group as the standard arguments in a protection of the goto statement. Even if you have relied on the goto statement for quite a period, sooner rather than later you will have to learn new ways of writing the code without its help. This also applies to the mutable variables concept – you will eventually need to find methods that would enable you to model identical behavior without it.

Comparison of References

In object-oriented languages like C# or Java the default operation for reference types is a comparison of references. If there are two variables, and they point to a single memory address, such variables are treated as equal. If those two variables, however, point to two specific memory blocks (even in situations, when such memory blocks are filled with identical data) – they are not considered equal. This is not intuitive and inevitably leads to bugs.

What if the comparison of references would be eliminated from the programming language altogether?

What if we take objects and compare them based on their content.

I am not one hundred percent positive about this, but judging from my experience, when we compare any two objects, all we need to find out is whether or not their content is equal, rather than whether the reference is same. It is possible that comparison of references might come in handy for some rare optimization purposes – but for that, we could keep the feature of reference comparison somewhere in the standard library, rather than retaining it as default behavior.

Inheritance

Even today, whenever we look, we see inheritance. However, 20 years back the Gang of Four told us to stick to using composition rather than inheritance. There is not a single thing that you are able to do using inheritance, but cannot do using composition and interfaces. The converse would not be true for the single inheritance languages: there are certain things you can do using interfaces (like implementing more than one), but you cannot do using inheritance. In essence, composition represents nothing more than a superset of inheritance.

And this is not pure theory. For many years now I have managed to write code avoiding inheritance. Once you get the feel of it, it will become easy and familiar.

Interfaces

Interfaces that are available in many strongly typed programming languages (like Java and C#) offer a mechanism for implementing polymorphism. Therefore, you can bring different operations together, as interface methods. However, when we use SOLID, one of the consequences might be the fact that you would have to prefer the interfaces that define one particular role, rather than the interfaces, which include sets of methods. It logically follows that each interface must have exactly one method. In such situations the name itself and the declaration of the interface become redundant – all we are interested in is the operation described by the interface, the operation’s parameters and outcome. And we will have other means to express that, like delegates in C# and lambdas in Java.

There is nothing new or scary about that. Functional languages used a function as a basic unit of composition for many-many years.

According to my experience, virtually anything can be modeled using interfaces with one method. This can be automatically taken to mean that the same things can be modeled without the interfaces themselves. Let me repeat myself, there are no astonishing discoveries here, this is exactly how the functional languages operate and yet remain Turing complete.

Reflection

If you have ever come across meta-programming in .NET or Java, the truth is, you can most likely tell what reflection is. It is, in fact, nothing else but a set of APIs and features of the programming language or platform, which helps us extract information regarding the code and execute the code.

Meta-programming is a very useful and irreplaceable tool, so it would be a shame to leave it behind. But reflection is not the only way to use meta-programming. Some programming languages are homoiconic. That means that the programs written in these languages represent structural data, that we can use as a source of information or an object of manipulation by itself. Such languages do not require reflection as a separate language feature because meta-programming is already integrated into the language, but using a different way.

That is to say that reflection exists purely as an instrument for meta-programming. If meta-programming could be utilized through homoiconicity instead, the reflection would only become another redundant language feature.

Cyclic Dependencies

Although the zero pointers represent the largest source of code issues, there is another problem, which results in issues of similar extent in terms of supporting the extending code base. And that is coupling. One aspect of the issues tied to coupling is represented by cyclic dependencies. Languages like C# and Java offer no built-in features to avoid cyclic dependencies.

Below is one my errors, which I found only because I specifically searched for it: One of the seemingly decent AtomEventStore modules had the IXmlWritable interface in it:

public interface IXmlWritable

{

void WriteTo(XmlWriter xmlWriter, IContentSerializer serializer);

}

From what we can see, the WriteTo method takes an IContentSerializer argument, which is declared the following way:

public interface IContentSerializer

{

void Serialize(XmlWriter xmlWriter, object value);

XmlAtomContent Deserialize(XmlReader xmlReader);

}

Please, not that Deserialize() returns XmlAtomContent. So in what way is XmlAtomContent defined? Here it is below:

public class XmlAtomContent : IXmlWritable

Look, it implements IXmlWritable – and here it is, the cyclic dependency that requires declaration of IXmlWritable through itself!

I always double-check my code for similar issues, but this error managed to sneak in. In the F# language (just like in OCaml, if I am not mistaken) such code would not even compile. Although F# does support small cyclic dependencies at the level of modules using keywords and rec, it makes it impossible for you to end up with a cyclic dependency by mistake. You would have to express your intention explicitly to establish such a coupling. And even in this situation, it would be impossible to cross the boundary of one module or library.

What an excellent way to protect against the tightly coupled code! Eliminate the possibility of accidental cyclic dependencies, and what you will get is an improved programming language.

This topic was even examined in the field: Scott Wlaschin has analyzed real projects developed in C# and F#. What he discovered was that the F# projects have fewer cyclic dependencies than the C# projects. This study was then continued by Evelina Gabasova in her work.

Conclusion

In my article, I tried to show you how a programming language can be made better through removal of some of its features. Eliminate a redundant function, and you would still have a Turing-complete programming language that could be used to express anything you want (well, almost anything). But the number of ways to end up with errors would be much smaller.

It is quite possible that an ideal programming language is a language without:

GOTO
Exceptions
Pointers
Multiple number types
Null pointers
Variable mutation
Comparison of references
Inheritance
Interfaces
Reflection
Cyclic dependencies

Have I listed all the possible redundant features out there? Probably not. Well, then the designers of new programming languages still have a chance to develop a programming language that would be even more advanced by removing something else from the existing ones.

Please, rate my article. I did my best!

(No Ratings Yet)

Loading…

Stay tuned. Monthly digest of the best stories.

One response to “Development → the Fewer, the Better – Programming Languages Opportunities”

ปั้มไลค์ says:

Reply

Like!! I blog frequently and I really thank you for your content. The article has truly peaked my interest.

06.06.2020 at 13:36