Saturday, April 9, 2016

Scale up and scale out with .NET framework

Recently I came across a question about the ability of .NET to scale, comparing to other frameworks. Here’s my answer. I hope you’ll find it useful.

While the scalability of an application is mostly determined by the way in which the code is written, the framework / platform that is being used can significantly influence the amount of effort required to produce an application that scales gracefully to many cores (scale up) and many machines (scale out).

Before joining Microsoft, I was part of a team that built a distributed, mission critical Command and Control system using .NET technologies (almost exclusively). The applications that make up the system are deployed on powerful servers, they are all stateful, massively concurrent, with strict throughput requirements. We were able to build a quality, maintainable, production ready system in less than 3 years (which is a record time for such system).

While working for Microsoft in the past 6 years, I worked on hyper scale services deployed on thousands of machines, all of which have been written using .NET tech. In most cases, unless there’s a good reason, .NET is used for the front ends (ASP.NET MVC), mid-tier and backend services.

Here’s how .NET empowers applications that need to scale up and scale out.

Scale up:

Building a reliable, high performance application that scales gracefully to many core hardware is hard to do. One of the challenges associated with scalability is finding the most efficient way to control the concurrency of the application. Practically, we need to figure out a way to divide the work and distribute it among the threads such that we put the maximum amount of cores to work. Another challenge that many developers struggle with is synchronizing access to shared mutable state (to avoid data corruption, race conditions and the like) while minimizing contentions between the threads.

Concurrency Control

Take the below concurrency/throughput analysis for example, note how throughput peaks at concurrency level of 20 (threads) and degrades when concurrency level exceeds 25. 

​So how can a framework help you maximize throughput and improve the scalability characteristic of your application?

It can make concurrency control dead simple. It can include tools that allow you to visualize, debug and reason about your concurrent code. It can have first-class language support for writing asynchronous code​. It can include best in class synchronization and coordination primitives and collections.

In the past 13 years I’ve been following the incremental progress that the developer division made in this space - it has been a fantastic journey.

In the first .NET version the managed Thread Pool was introduced to provide a convenient way to run asynchronous work. The Thread Pool optimizes the creation and destruction of threads (according to JoeDuffy, it cost ~200,000 cycles to create a thread) through the use of heuristic ‘thread injection and retirement algorithm’ that determines the optimal number of threads by looking at the machine architecture, rate of incoming work and current CPU utilization.

In .NET 4.0, TPL (Task Parallel Library) was introduced. The Task Parallel Library includes many features that enable the application to scale better. It supports worker thread local pool (to reduce contentions on the global queue) with work stealing capabilities, and the support for concurrency levels tuning (setting the number of tasks that are allowed to run in parallel)

In .NET 4.5 -  the async-await keywords were introduced, making asynchronicity a first class language feature, using compiler magic to make code that looks synchronous run asynchronously. As a result – we get all the advantages of asynchronous programming with a fraction of the effort.

Consistency / Synchronization

Although more and more code is now tempted to run in parallel, protecting shared mutable data from concurrent access (without killing scalability) is still a huge challenge. Some applications can get away relatively easy by sharing only immutable objects, or using lock free synchronization and coordination primitives (e.g. ConcurrentDictionary) which eliminate the need for locks almost entirely. However, in order to achieve greater scalability there’s not escape from using fine-grained locks.

In an attempt to provide a solution for mutable in-memory data sharing that on one hand scales, and on the other hand easy to use and less error prone than fine grained locks – the team worked on a Software Transactional Memory support for .NET that would have ease the tension between lock granularity and concurrency. With STM, instead of using multiple locks of various kinds to synchronize access to shared objects, you simply wrap all the code that access those objects in a transaction and let the runtime execute it atomically and in isolation by doing the appropriate synchronization behind the scenes. Unfortunately, this project never materialized.

As far as I know, the .NET team was the only one that even made a serious effort to make fine grand concurrency simpler to use in a non functional language.

Speaking of functional languages, F# is a great choice for building massively concurrent applications. Since  in F# structures are immutable by default, sharing state and avoiding locks is much easier. F# also integrates seamlessly with the .NET ecosystem, which gives you access to all the third party .NET libraries and tools (including TPL).

Scale out:

Say that you are building a stateless website/service that needs to scale to support millions of users: You can deploy your ASP.NET application as WebSite or Cloud Service to Microsoft Azure public cloud (and soon Microsoft Azure Stack for on premises) and run it on thousands of machines. You get automatic load balancing inside the data-center, and you can use Traffic Manager to load balance requests cross data-centers. All with very little effort.

If you are building a statesful service (or combination of stateful and staeless), you can use the Azure Service Fabric, which will allow you deploy and manage hundreds or thousandof .NET applications on a cluster of machines. You can scale up or scale down your cluster easily, knowing that the applications scale according to available resources.

Note that you can use the above with none .NET applications. But most of the tooling and libraries are optimized for .NET.

No comments:

Post a Comment