Saturday, July 22, 2017

Down with NULL!

The introduction of null is arguably the biggest mistake in the history of computer science. Usage of null makes for sloppy code, cascading and often redundant null checks, buggy code due to missed null checks, and it makes for poor APIs.

Tony Hoare, the inventor of ALGOL W, calls it his billion-dollar mistake.

“I call it my billion-dollar mistake…At that time, I was designing the first comprehensive type system for references in an object-oriented language. My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.”.

Unfortunately, he wasn’t the only programming language inventor that couldn’t resist the temptation to put in a null reference. The inventors of Java, C# and many more fell into the same trap, causing years of pain and misery.

The designers of newer languages like Rust and Swift realized that that only way to eliminate the destructive side effects of null is by avoiding it altogether.

These languages use the optional feature instead of null.

The beauty of the optional feature is that only optional properties or methods can be null. If you try to set a non-optional property to null, you'll get compilation error.

That's an excellent language feature, because it allows to reliably exclude the possibility of null everywhere in the code, except for the parts that use optional, which should be the exception rather than the rule.

Swift is a great example. Instead of allowing every property (or method) to return nil, Swift forces you to explicitly declare that a property can be nil, otherwise, it assumes that the property is initialized.

So the below code will not compile, since the property ‘color’ is not initialized!

class Box {
   var color: String
}

You must do something like this:

class Box {
   var color: String = "Red"
}

Now, even if you never saw the implementation of Box, you know that you can safely use the property color without nil check..

// Create instance of class.
var example = Box()
if (example.color != nil) ... // NO NEED!

You can explicitly declare that a property or a method can be nil, by making it optional (adding ‘?’ post-fix after the type). However, especially for properties, that should be the exception rather than the rule.

class Box{
   var color: String = "Read"
   var colorOptional: String?
}

Now the caller knows that nil check is required. The caller doesn’t have direct access to the property, it has to unwrap it first using ‘!’

// Create Box instance.
var box = Box()

if (box.colorOptional == nil){
   print("It is nil")
}

box.colorOptional = "Blue"

if let c = box.colorOptional {
   print("Box has color \(c)")
   // The optional can be accessed with an exclamation mark.
   print(box.colorOptional!)
}

Languages like c++ and Java have added optional support via a community library, or directly to the standard library. It helps, it’s better than nothing, but it’s really too little too late.

Starting from Java SE 8 you can do this:

class Box {
    Optional<Integer> color;
}
Box box = new Box();
box.color.ifPresent(x -> System.out.println(x));

However, since Java’s type system allows null everywhere (If you try to set a non-optional property to null, the compiler wouldn’t mind), we can’t reliably exclude the possibility of null, and we still end up with null checks everywhere.

if (str != null && !str.equals("")) {
   // Do something with str
}

Unfortunately, in order to effectively combat the the terror of null, the optional feature must be backed into the language from the very beginning, and it must be enforced wholesale by the compiler.

Saturday, April 9, 2016

Scale up and scale out with .NET framework

Recently I came across a question about the ability of .NET to scale, comparing to other frameworks. Here’s my answer. I hope you’ll find it useful.

While the scalability of an application is mostly determined by the way in which the code is written, the framework / platform that is being used can significantly influence the amount of effort required to produce an application that scales gracefully to many cores (scale up) and many machines (scale out).

Before joining Microsoft, I was part of a team that built a distributed, mission critical Command and Control system using .NET technologies (almost exclusively). The applications that make up the system are deployed on powerful servers, they are all stateful, massively concurrent, with strict throughput requirements. We were able to build a quality, maintainable, production ready system in less than 3 years (which is a record time for such system).

While working for Microsoft in the past 6 years, I worked on hyper scale services deployed on thousands of machines, all of which have been written using .NET tech. In most cases, unless there’s a good reason, .NET is used for the front ends (ASP.NET MVC), mid-tier and backend services.

Here’s how .NET empowers applications that need to scale up and scale out.

Scale up:

Building a reliable, high performance application that scales gracefully to many core hardware is hard to do. One of the challenges associated with scalability is finding the most efficient way to control the concurrency of the application. Practically, we need to figure out a way to divide the work and distribute it among the threads such that we put the maximum amount of cores to work. Another challenge that many developers struggle with is synchronizing access to shared mutable state (to avoid data corruption, race conditions and the like) while minimizing contentions between the threads.

Concurrency Control

Take the below concurrency/throughput analysis for example, note how throughput peaks at concurrency level of 20 (threads) and degrades when concurrency level exceeds 25. 

​So how can a framework help you maximize throughput and improve the scalability characteristic of your application?

It can make concurrency control dead simple. It can include tools that allow you to visualize, debug and reason about your concurrent code. It can have first-class language support for writing asynchronous code​. It can include best in class synchronization and coordination primitives and collections.

In the past 13 years I’ve been following the incremental progress that the developer division made in this space - it has been a fantastic journey.

In the first .NET version the managed Thread Pool was introduced to provide a convenient way to run asynchronous work. The Thread Pool optimizes the creation and destruction of threads (according to JoeDuffy, it cost ~200,000 cycles to create a thread) through the use of heuristic ‘thread injection and retirement algorithm’ that determines the optimal number of threads by looking at the machine architecture, rate of incoming work and current CPU utilization.

In .NET 4.0, TPL (Task Parallel Library) was introduced. The Task Parallel Library includes many features that enable the application to scale better. It supports worker thread local pool (to reduce contentions on the global queue) with work stealing capabilities, and the support for concurrency levels tuning (setting the number of tasks that are allowed to run in parallel)

In .NET 4.5 -  the async-await keywords were introduced, making asynchronicity a first class language feature, using compiler magic to make code that looks synchronous run asynchronously. As a result – we get all the advantages of asynchronous programming with a fraction of the effort.

Consistency / Synchronization

Although more and more code is now tempted to run in parallel, protecting shared mutable data from concurrent access (without killing scalability) is still a huge challenge. Some applications can get away relatively easy by sharing only immutable objects, or using lock free synchronization and coordination primitives (e.g. ConcurrentDictionary) which eliminate the need for locks almost entirely. However, in order to achieve greater scalability there’s not escape from using fine-grained locks.

In an attempt to provide a solution for mutable in-memory data sharing that on one hand scales, and on the other hand easy to use and less error prone than fine grained locks – the team worked on a Software Transactional Memory support for .NET that would have ease the tension between lock granularity and concurrency. With STM, instead of using multiple locks of various kinds to synchronize access to shared objects, you simply wrap all the code that access those objects in a transaction and let the runtime execute it atomically and in isolation by doing the appropriate synchronization behind the scenes. Unfortunately, this project never materialized.

As far as I know, the .NET team was the only one that even made a serious effort to make fine grand concurrency simpler to use in a non functional language.

Speaking of functional languages, F# is a great choice for building massively concurrent applications. Since  in F# structures are immutable by default, sharing state and avoiding locks is much easier. F# also integrates seamlessly with the .NET ecosystem, which gives you access to all the third party .NET libraries and tools (including TPL).

Scale out:

Say that you are building a stateless website/service that needs to scale to support millions of users: You can deploy your ASP.NET application as WebSite or Cloud Service to Microsoft Azure public cloud (and soon Microsoft Azure Stack for on premises) and run it on thousands of machines. You get automatic load balancing inside the data-center, and you can use Traffic Manager to load balance requests cross data-centers. All with very little effort.

If you are building a statesful service (or combination of stateful and staeless), you can use the Azure Service Fabric, which will allow you deploy and manage hundreds or thousandof .NET applications on a cluster of machines. You can scale up or scale down your cluster easily, knowing that the applications scale according to available resources.

Note that you can use the above with none .NET applications. But most of the tooling and libraries are optimized for .NET.