Sunday, October 18, 2009

Concurrency Levels Tuning with Task Parallel Library (How Many Threads to Use?)

In the post ‘Scaling Up with Task Parallel Library’ we reviewed the way in which the TPL (Task Parallel Library) library facilitates the process of adding concurrency to applications. One of TPL’s features that I thought worth a separate discussion is that it enables developers to manually tune the amount of threads (concurrency level) that will be used to serve a given set of tasks.

Manual Concurrency Control?

Relaying on TPL to determine the concurrently levels is probably the best idea for most applications. As for now, TPL’s default configuration is to set the concurrency levels to one thread per processor while in cases of low CPU utilization it may take the liberty to add threads as appropriate. This scheme is often a good solution since having two or more threads running on the same processor result in time slicing overhead which can drastically harm performance.

The primary overheads of time slicing are 1) saving register state of a thread when suspending it, and restoring the state when resuming it, 2) restoring a thread's cache state, 3) and thrashing virtual memory.

However, the thread per processor scheme may result in low CPU utilization since a running thread may be interrupted (e.g by a mouse event) or blocked (e.g while waiting for I/O operation to complete) for a while, where while blocking the matching processor doesn’t do any actual work.

Since most of the tasks in the application do block some of the time (some for long periods), it’s safe to say that appropriate concurrency levels tuning leads to improvements in performance. The only problem is that it can be really hard to calculate the “appropriate” number of threads which will yield optimal performances, while miss calculating can lead to degradation in performance and unexpected behavior, one should take into account that high concurrency levels can result in excessive context switching or resource contention and that low concurrency levels can result in low CPU utilization.

image

Putting  All the Cores To Work - All the Time - With Minimum Contentions

The following figure (from MSDN) shows throughput (number of completed work items per second, while each work item has 100ms of execution time with 10ms of CPU usage and 90ms of waiting) as a function of concurrency level (number of threads) running on a dual-core 2GHz machine.

image

As you can see in the figure above, throughput peaks at concurrency level of 20 threads and degrades when concurrency level exceeds 25 threads while the degradation is mainly due to context switching. Clearly we can’t conclude that 20 is the right amount of threads for a dual-core machine, as in reality most work items don't wait 90% of the time. However, it’s pretty clear that we should consider the percentage of time that tasks will block in order to correctly tune the concurrency level – if high throughput is indeed our primary goal.

Since the TPL default policy is to use one thread per processor, we can conclude that TPL initially assumes that the workload of a task is ~100% working and 0% waiting, and if the initial assumption fails and the task enters a waiting state (i.e. starts blocking) - TPL with take the liberty to add threads as appropriate.

Calculating the ‘Appropriate’ Amount of Threads

The following simplified formula calculates the “ideal” number of threads according to the number of CPUs on the machine and the percentage of time that the tasks will block.

NumThreads = NumCPUs / (1 – BlockingTimeInPercentage)

Although we can’t really calculate the actual blocking time, this formula can give us a clue about the number of threads that will produce the best performance.

Calculating Maximum Speedup

Another way of addressing the problem is by calculating the maximum number of tasks that can run in parallel before beginning to see degradation in performance rather than a continued speedup. Amdahl's Law suggests the following equation (S is the percentage of the work that cannot be parallelized):

SppedUp = 1/(S + ((1 – S)/NumCPUs))

Concurrency Control in TPL (C# .NET Source Code)

In the code snippets bellow the ‘idea threads per processor’ is set to 2 yielding concurrency level of 4 threads for dual-core machine. This means that the TPL will ideally use 4 threads to parallel the 3 tasks that were assigned with the scheduler/parallel-options. However, TPL will exceed 4 threads if some of the tasks are blocking in order to keep the CPU utilization high.

Visual Studio 2008 – With TaskManagerPolicy
   1: ThreadPriority priority = ThreadPriority.Highest;
   2:  
   3: TaskManagerPolicy defaultPolicy = TaskManager.Default.Policy;
   4: int minProcessors = defaultPolicy.MinProcessors;
   5: int idealProcessors = defaultPolicy.IdealProcessors;
   6: int idealThreadsPerProcessor = 2;
   7: int maxStackSize = defaultPolicy.MaxStackSize;
   8:  
   9: TaskManagerPolicy policy = new TaskManagerPolicy(
  10:     minProcessors, idealProcessors, idealThreadsPerProcessor, 
  11:     maxStackSize, priority);
  12:  
  13: TaskManager tm = new TaskManager(policy);
  14:            
  15: Task t1 = Task.Factory.StartNew(delegate { taskWork(); }, tm);
  16: Task t2 = Task.Factory.StartNew(delegate { taskWork(); }, tm);
  17: Task t3 = Task.Factory.StartNew(delegate { taskWork(); }, tm);
Visual studio 2010 - With ‘ParallelOptions’

   1: const int idealThreadsPerProcessor = 2;
   2:  
   3: int concurrencyLevel =
   4:     Environment.ProcessorCount * idealThreadsPerProcessor;
   5:  
   6: ParallelOptions options = new ParallelOptions 
   7: {
   8:     MaxDegreeOfParallelism = concurrencyLevel
   9: };
  10:  
  11: Parallel.For(0, 3, options, delegate(int param) 
  12: {
  13:     taskWork(); 
  14: });

Further Reading

Why Too Many Threads Hurts Performance, and What to do About It.

CLR Inside Out - Using concurrency for scalability (Joe Duffy)

Saturday, October 3, 2009

UML Use Case Diagrams – Modeling the System Functionality

UML Use case diagrams are used to illustrate the behavior of the system during requirements analysis, they show system wide use cases and point out which use case is performed for which actor.

A use case describes a sequence of actions that make up one or more business process and provide something of measurable value to an actor, an actor is a person, organization, or external system that plays a role in one or more interactions with your system.

In this post we’ll take a step by step tour through constructing a use case diagram and writing a use case.

Constructing a Use Case Diagram

Just after the system engineers are done translating the customer requests into system requirements - they start designing the business processes that will implement the requirements while taking advantage of the use-case diagram in order to describe the various business processes from the standing point of the user.

The figure bellow shows a use case diagram which present the processes involved in an ‘automatic threats detection’ system which periodically processes images coming from a surveillance video camera and uses an external threats evaluation system in order to detect threats approaching towards a secured area.

image

As you can see the diagram consist of 5 use cases that describe the 1) system initialization process which can be initiated by a simple user or by an operator user, 2) the login process which can be initiated only by an operator user, 3) the image processing process which is driven by a video stream coming from the surveillance camera, 4) the threat evaluation process which is part of the image processing process and uses a sub-system which is referred to as ‘threat evaluation system’, 5) the threats distribution process which is enabled only under some conditions.  

The Relationships

The ‘User’ actor is associated with the ‘Initiate System’ use case which implies that the there’s a business process that describes what happened during system initialization, and that the ‘User’ is one of the actors that can trigger it.

The ‘Evaluate Threat’ use case is associated with the ‘Threat Evaluation System’ actor which implies that during the threat evaluation process the ‘Threat Evaluation System’ is called into action. 

The generalization relationship between the ‘User’ and the ‘Operator’ actors implies that all the use cases associated with the ‘User’ (parent) actor are inherently associated with the ‘Operator’ (child) actor.

The ‘Login’ use case depend on the ‘Initiate System’ use case which means that the operator is not allowed to initiate the login process if the system wasn’t initialized before. The same goes for the ‘Process Camera Image’ use case that doesn’t start until the system in fully initialized.

The ‘Process Camera Image’ use case includes the ‘Evaluate Threat’ use case which implies that the behavior of the ‘Evaluate Threat’ use case is inserted into the behavior of the ‘Process Camera Image’ use case, or in other words, “The act of processing camera image includes a sub process of evaluating the image for threats”.

We often use the ‘Include’ association to extract common behaviors from multiple use cases into a single description, but it’s also common to use the ‘Include’ association to divide a use case (container) into several smaller use cases (parts).

The ‘Distribute Threats Information’ use case extends the ‘Evaluate Threat’ use case which implies that the behavior of the ‘Distribute Threats Information’ use case may be inserted in the ‘Evaluate Threat’ use case under some conditions, or in other words, “In case the distribution feature is enabled, the threats evaluation process is extended with the distribution of the detected threats”.

Writing a Use Case

After drawing an overall view of the system use-cases - we switch from the UML tool to our favorite document editor and start describing the use cases one by one. We’ll usually use a template that consist of several core sections that the engineering team agreed upon. The template that I use consists of the sections brought to you in the following description.

image

The ‘Per Conditions’ sections defines the conditions that must be met in order for the use case to execute. The ‘Triggers’ section defines one or more actions that can trigger the use case. The ‘Main Success Scenario’ sections describes the actions that make up the use case – one can be brief and provide few sentences summarizing the actions and one can provide in detail review of each action, include alternative paths etc. The ‘Post Conditions’ section describes what the change in state of the system will be after the use case completes.

Links

PPT - Use Case Diagrams