Revisiting Exception Handling in async methods

It is interesting to observe how the Exceptions behave in async code. While the .Net framework attempts to ensure the exprerience of handling failures in async methods are as similar to the synchronous methods, there are subtle differences which is worth understanding.

Let us examine the following code.

async Task<string> Foo()
{
	var client = new HttpClient();
	try
	{
		return await client.GetStringAsync("http://www.InvalidUrl.com");
	}
	catch(HttpRequestException ex)
	{
		Console.WriteLine($"Exception of Type {ex.GetType()} has been raised");
	}
	catch(AggregateException ex)
	{
		Console.WriteLine($"Exception of Type {ex.GetType()} has been raised");
	}
	catch(Exception ex)
	{
		Console.WriteLine($"Exception of Type {ex.GetType()} has been raised");
	}
	return default;
}

What could be the output of the above code ? Specifically, what type of an exception would be caught assuming the Url is invalid ?

Before we get to the answer, let us examine how the returned Task<string> object indicate the failure.

Property/MethodIndication
StatusFaulted
IsFaultedtrue
ExceptionAggregateException
Wait()Throws AggregateException
ResultThrows AggregateException
await TaskThrows First Exception with AggregateException

The last 3 rows holds importance to our question. Let us reexamnine the method call.

return client.GetStringAsync("http://www.InvalidUrl.com").

In the above case, we would recieve an AggregateException with an HttpRequestException within it.

When Exception causes Status |= Faulted

As mentioned above, the status of Task would be set to Faulted in most cases, except for one particular kind of Exception,the OperationCancelledException.

Let us write some code before we discuss this further.

async void Main()
{
	var task = Foo();
	try
	{
		await task;
	}
	catch(Exception Ex)
	{
		Console.WriteLine($"Task Status:{task.Status}, Exception:{Ex.Message}");
	}
}

async Task Foo()
{
	throw new OperationCanceledException();
}

Examining the output

Task Status:Canceled, Exception:The operation was canceled.

The TPL uses OperationCanceledException when a Token from CancelationTokenSource is canceled by the calling method. If a method like the code above, decides to throw this special exception, then instead of the Status being set to Faulted, the Status is set to Canceled.

Lazy Exceptions

There is another aspect of Exception Handling in async methods that are worth examining. An async task would not directly throw an exception, instead it would return a faulted Task. Significance of the method could be better understood with help of a bit of code.

async void Main()
{
	try
	{
		var fooTask = Foo(-3);
		Console.WriteLine("Task is not awaited yet");
		await fooTask;
		Console.WriteLine("Task Completed");
	}
	catch(ArgumentException ex)
	{
		Console.WriteLine($"{nameof(ArgumentException)} has been raised");
	}
}


public async Task<string> Foo(int value)
{
	Console.WriteLine($"Method {nameof(Foo)} Invoked");
	if(value<0)
	{
		throw new ArgumentException();
	}
	
	Console.WriteLine($"Method {nameof(Foo)} mocking real task via Delay");
	await Task.Delay(1000);
	return default;
}

The Foo() method has a precondition check which validates if the passed arguement is a positive number. Otherwise, it raises an ArgumentException. With the example code, the method invoker is passing a negative value to the method, and should hit the precondition block.

Let us examine the output and discuss further.

Method Foo Invoked
Task is not awaited yet
ArgumentException has been raised

As you can observe, the message “Task is not awaited yet” is displayed before the exception thrown. This is because exceptions would not be raised untill the task is awaited (or completed). This lazy nature of evaluation of exceptions could be useful at most times, but in times such as above, where preconditions needs to be evaluated and the developer would prefer an early evaluation, this would need a slight workaround.

The idea, similar to how we made iterator methods to evalute exceptions early (and as John Skeets mentions in his invaluable book series C# in Depth), lies in introducing a synchronous method which does the arguement validation, and which in-turn calls the original method. If the original method is moved as an internal method of proposed method, the original method can now safely assume that the arguements are validated.

public Task<string> Foo(int value)
{
	Console.WriteLine($"Method {nameof(Foo)} Invoked");
	if(value<0)
	{
		throw new ArgumentException();
	}
	
	async Task<string> FooAsync()
	{
		Console.WriteLine($"Method {nameof(Foo)} mocking real task via Delay");
		await Task.Delay(1000);
		return default;
	}
	return FooAsync();
}

This ensures the validation and subsequent exception is evaluated early. Let’s hit F5 and run our code now.

Method Foo Invoked
ArgumentException has been raised

As observed, the exception has been evaluated early and we get the much desired result.

That’s all for now, see you soon again

C# 8 : Using Declaration

The next feature we would explore in C# 8 is more off a syntatic sugar, but neverthless it is important to understand the difference between the new syntax and the original feature it is covering up. We are all aware of the Using statements, which allows correct usage of the IDisposible objects.

Using Statement

Let us begin by writing an example with the Old way of doing things. We will first introduce our Custom Object with IDisposible implemented.

public interface ITalk
{
void Talk(string message);
}

public class CustomDisposibleObject : IDisposable,ITalk
{
public void Dispose()
{
Console.WriteLine($"Disposing {nameof(CustomDisposibleObject)}");
}

public void Talk(string message)
{
Console.WriteLine($"{nameof(CustomDisposibleObject)}-{nameof(ITalk.Talk)} : {message}");
}
}

We will now use CustomDisposibleObject with the Using Statement.

public int UsingStatement()
{
using (var customDisposibleObject = new CustomDisposibleObject())
{
customDisposibleObject.Talk(nameof(IExample.UsingStatement));
return default;
}
}

The Using statement is of course a syntatic sugar over the Try-Finally block. The compiler translates the above code as (check using Telerik’s JustDecompile)

public int UsingStatement()
{
int num;
CustomDisposibleObject customDisposibleObject = new CustomDisposibleObject();
try
{
customDisposibleObject.Talk("UsingStatement");
num = 0;
}
finally
{
if (customDisposibleObject != null)
{
customDisposibleObject.Dispose();
}
}
return num;
}

So far, so good. So what is the real problem with the Using Statement and why would one introduce the Using Declaration. To begin with, the Using Statement syntax is way too verbose and breaks the normal flow of syntax. This is where the Using Declaration comes into picture.

Using Declaration

The Using Declaration Sytax removes much of the ceremony associated with the Using Statement blocks. For example, if one were rewrite the above code via the new Using Declaration.

public int UsingDeclaration()
{
using var customDisposibleObject = new CustomDisposibleObject();
customDisposibleObject.Talk(nameof(IExample.UsingDeclaration));
return default;
}

The cerimonial braces are now gone and the code look less verbose. The obvious question would be – When is the object disposed ?

The object is disposed when it leaves the scope, which in the case of the code above is the method.

So how does the new syntax gets translated by the compiler ? Let us rerun the JustDecompile and check it.

public int UsingDeclaration()
{
int num;
CustomDisposibleObject customDisposibleObject = new CustomDisposibleObject();
try
{
customDisposibleObject.Talk("UsingDeclaration");
num = 0;
}
finally
{
if (customDisposibleObject != null)
{
customDisposibleObject.Dispose();
}
}
return num;
}

As you can observe, there is no difference at all. The new syntax removes the cerimonial braces, making the code look less nested.

Sample Code for this article could be found in my Github.

C# 8 : Index and Range

While C# as a language has grown leaps and bounds, one are that was least addressed was manipulation of collections (arrays in particular). C# 8 looks to change exactly that by introducing two new Types.

System.Index

System.Index is a structure that can be used to Index a collection either from the start or the end. In previous versions of the language, there was no direct way to index a collection from the end. For example,

// Get the 2nd element from end in the array
var secondElementFromLast = arr[arr.Length-2];

This changes with introduction of the System.Index, which fortunately is introduced with its own syntatic sugar. The code above, for accessing the second last element could be now rewritten as

// Get the 2nd element from end in the array
var secondElementFromLast = arr[^2]; // With the uniary prefix 'hat' operator

To access the elements from the begining of the array, you could use

// Get 2nd element from the start in the array
var secondElementFromStart = arr[2]; // No changes here

System.Range

In previous versions of C#, there was no easy way to get a slice of the collection. Let’s say, you wanted to get all elements from 2nd to 5th element in the array. You could achieve this using Linq using the Enumerable.Skip and Enumerable.Take methods. For example

var slice = list.Skip(1).Take(4);

C# 8.0 introduces the System.Range type, again with support of syntatic sugar to ease the life of developers. The above code could now be rewritten as

var slice = list[1..5];

The Range Structure represents range that has a start and end indexes and is represented by binary infix x..y. Do note that both operands of the range could be ommited to provide different meanings. For example

var slice1 = list[5..]; // All elements starting from the 5th element in collection
var slice2 = list[..5]; // First 5 elements in collection
var slice3 = list[..^5]; // Elements starting from first till the 5th element from last
var slice4 = list[^5..]; // Last 5 elements in collection
var slice5 = list[..]; // Entire List

We will continue exploring newer features of the language in coming blog posts.

Automapper – Use IoC for creating Destinations

Automapper by default creates new instances of destination using the default contructor. If you need to ask the IoC to create the new instance, it actually turns out to be pretty simple.
We will begin by setting up our Unity Container to register the IMapper.

var mapper = MappingProfile.InitializeAutoMapper(_unityContainer).CreateMapper();

_unityContainer.RegisterInstance<IMapper>(mapper);

As you can see, you are also initializing the Automapper with certain configurations. Let’s see what exactly it is. Following is the definition of MapperProfile.

public static class MappingProfile
{
public static MapperConfiguration InitializeAutoMapper(IUnityContainer container)
{
MapperConfiguration config = new MapperConfiguration(cfg =>
{
cfg.ConstructServicesUsing(type => container.Resolve(type));
cfg.AddProfile(new AssemblyProfile());
});
return config;
}
}

As you can observe, you are configuring the Automapper to Construct using the IUnityContainer. The following line does the magic for you as it uses the existing Container to create new instances (if registered with IoC) each time Automapper finds a particular type in destination.

cfg.ConstructServicesUsing(type => container.Resolve(type));

In the next blog post, we will investigate Automapper in bit more deeply. For now, please refere the sample code demonstrating the blog post at my Github

Awaitable Pattern

How do you determine what types could be awaited ? That is one question that often comes to mind and the most common answer would be

  • Task
  • Task<TResult>
  • void – Though it should be strictly avoided

However, are we truely restricted to them ? What are the other Types that could be awaited ? The anwer lies in the awaitable pattern.

Awaitable Pattern

The awaitable pattern requires to have a parameterless instance or static non-void method GetAwaiter that returns an Awaitable Type.

public T GetAwaiter()

Where T, the awaiter Type implements
* INotifyCompletion or ICriticallyNotifyCompletion
* Has a boolean instance property IsCompleted
* Non-generic parameterless instance method GetResult

Approach 01 – Use TaskAwaiter

Let’s begin by an example that reusing Task or Task<TResult> awaiter instead of creating our own awaiter. For demonstration purpose, we assume a requirement where in we should be able to use the Process Class to execute given command asynchronously and return the result. Ideally, we should be able to do the following.

var result = await "dir"

The above command should be able to execute “dir” command using Process and return the result. We will begin by writing the GetAwaiter extension method for string.

public static class CommandExtension
{
public static TaskAwaiter GetAwaiter(this string command)
{
var tcs = new TaskCompletionSource();
var process = new Process();
process.StartInfo.FileName = "cmd.exe";
process.StartInfo.Arguments = $"/C {command}";
process.StartInfo.UseShellExecute = false;
process.StartInfo.RedirectStandardOutput = true;
process.EnableRaisingEvents = true;
process.Exited += (s, e) => tcs.TrySetResult(process.StandardOutput.ReadToEnd());
process.Start();
return tcs.Task.GetAwaiter();
}
}

The above code reuses the Awaiter for Task<TResult>. The method initiates the Process and use TaskCompletionSource to set the result in the Exited event of Process. If you examine the source code of TaskAwaiter , you can observe that it implements the ICriticallyNotifyCompletion interface and has the IsCompleted Property as well as GetResult method.

Let us now write some demonstrative code.

private async void btnDemoUsingTaskAwaiter_Click(object sender, EventArgs e)
{
AppendToLog($"Started Method {nameof(btnDemoUsingTaskAwaiter_Click)}");
await InvokeAsyncCall();
AppendToLog($"Continuing Method {nameof(btnDemoUsingTaskAwaiter_Click)}");
}
private async Task InvokeAsyncCall()
{
AppendToLog($"Starting Method {nameof(InvokeAsyncCall)}");
var result = await "dir";
AppendToLog($"Recieved Result, Continuing Method {nameof(InvokeAsyncCall)}");
AppendToLog(result);
AppendToLog($"Ending Method {nameof(InvokeAsyncCall)}");
}
public void AppendToLog(string message)
{
logText.Text += $"{Environment.NewLine}{message}";
}

Approach 02 – Implement Custom Awaiter

Let us now assume another situation where-in, the method is invoked in a non-UI thread, and the continuation requires you to update Controls in UI (in other words, needs UI Thread). For purpose of learning, let us find a solution for the problem by implementing a Custom Awaiter.

We will begin by defining our Custom Awaiter that satisfies the laws defined in the Awaitable Pattern section above.

public static class CommandExtension
{
public static UIThreadAwaiter GetAwaiter(this string command)
{
var tcs = new TaskCompletionSource();
Task.Run(() =>
{
var process = new Process();
process.StartInfo.FileName = "cmd.exe";
process.StartInfo.Arguments = $"/C {command}";
process.StartInfo.UseShellExecute = false;
process.StartInfo.RedirectStandardOutput = true;
process.EnableRaisingEvents = true;
process.Exited += (s, e) => tcs.TrySetResult(process.StandardOutput.ReadToEnd());

process.Start();
});

return new UIThreadAwaiter(tcs.Task.GetAwaiter().GetResult());
}
}
public class UIThreadAwaiter : INotifyCompletion
{
bool isCompleted = false;
string resultFromProcess;

public UIThreadAwaiter(string result)
{
resultFromProcess = result;
}
public bool IsCompleted => isCompleted;
public void OnCompleted(Action continuation)
{
if (Application.OpenForms[0].InvokeRequired)
Application.OpenForms[0].BeginInvoke((Delegate)continuation);

}

public string GetResult()
{
return resultFromProcess;
}
}

The UIThreadAwaiter implements the INotifyCompletion interface. As one could asssume from the code above, the ability to use UI Thread for continuation tasks are executed with the help of BeginInvoke.

Let us now write some demo code to demonstrate the custom awaiter.

private void btnExecuteOnDifferentThread_Click(object sender, EventArgs e)
{
AppendToLog($"Started Method {nameof(btnExecuteOnDifferentThread_Click)}");
Task.Run(() => InvokeAsyncCall()).ConfigureAwait(false);
AppendToLog($"Continuing Method {nameof(btnExecuteOnDifferentThread_Click)}");
}

private async Task InvokeAsyncCall()
{
var result = await "dir";
AppendToLog($"Recieved Result, Continuing Method {nameof(InvokeAsyncCall)}");
AppendToLog(result);
AppendToLog($"Ending Method {nameof(InvokeAsyncCall)}");
}
public void AppendToLog(string message)
{
try
{
txtLog.Text += $"{Environment.NewLine}{message}";
}
catch (Exception ex)
{
var errorMessage = $"Exception:{ex.Message}{Environment.NewLine}{Environment.NewLine}Message:{message}";
MessageBox.Show(errorMessage, "Error");
}
}

private async void btnExecuteOnSameThread_Click(object sender, EventArgs e)
{
AppendToLog($"Started Method {nameof(btnExecuteOnDifferentThread_Click)}");
await InvokeAsyncCall();
AppendToLog($"Continuing Method {nameof(btnExecuteOnDifferentThread_Click)}");
}

As demonstrated in examples above, the await keyword is not restricted to few types. Instead, we could create our Custom Awaiter which statisfies the Awaitable Pattern.

Complete code for this post is available on my Github

Member Serialization

Serialization at times throws curious situations that makes us look beyond the normal usages. Here is another situation that took sometime to find a solution for. Let’s examine the following code.

public class Test:BaseClass
{
  public string Name {get;set;}
}
[DataContract]
public class BaseClass
{
}

What would be the output if one was to serialize an instance of the Test class ?

var instance = new Test{Name="abc"};
var result = JsonConvert.SerializeObject(instance);

Let’s examine the output

{

}

Not surprising right ? After all the base class has been decorated with DataContractAttribute. This would ensure the only the members (or members of derieved classes) with DataMemberAttribute would be serialized.

As seen in the code above, while the base class doesn’t have any property, the child class has a single property (Name), which is NOT decorated with the mentioned attribute.
This is as per the design and works well in most cases. If one needs to be ensure the Child class members needs to be serialized, one needs to decorate it with the DataMemberAttribute.

public class Test:BaseClass
{
  [DataMember]
  public string Name {get;set;}
}
[DataContract]
public class BaseClass
{
}

This would ensure the Property Name is serialized.

{"Name":"abc"}

The other option he have is to remove the DataContractAttribute from the BaseClass, which would produce the same result.

But what if the Developer have following constrains
* Cannot access/change the BaseClass
* Should not use the DataMemberAttribute

The second constraint is chiefly driven by the factor that there are many Properties in the Child Class. This would require the developer to use the DataMemberAttribute on each of them, which is quite painful and naturally, one would want to avoid it.

The solution lies in a lesser known Property of the well known JsonObjectAttribute. The MemberSerialization.OptOut enumeratation ensures the following behavior.

All public members are serialized by default. Members can be excluded using JsonIgnoreAttribute or NonSerializedAttribute.

While this is the default member serialization mode, this gets overriden due the presence of ​DataContractAttribute` in the Parent Class.
Let’s modify our code again.

[JsonObject(MemberSerialization.OptOut)]
public class Test:BaseClass
{
  public string Name {get;set;}
}
[DataContract]
public class BaseClass
{
}

As seen the code above, the only change required would be decorated the Child Class with JsonObjectAttribute passing in the MemberSerialization.OptOut Enumeration for Member Serialization Mode.
This would produce the desired output

{"Name":"abc"}

A second look at char.IsSymbol()

Let us begin by examining a rather simple looking code.

var input = "abc#ef";
var result = input.Any(char.IsSymbol);

What would the output of the above code ? Let’s hit F5 and check it.

False

Surprised ?? One should not feel guilty if he is surprised. It is rather surprising one does not look behind to understand what exactly char.IsSymbol does. After all, it is one of the rather underused method.

So why this pecular behaior ? What exactly is a Symbol according to the char.IsSymbol() method. The answer lies in the documentation of the method.

Valid symbols are members of the following categories in UnicodeCategory: >MathSymbol, CurrencySymbol, ModifierSymbol, and OtherSymbol.

The character ‘#’ naturally doesn’t fall under the required categories. Now, with that understanding, let us examine few other characters.

var charList = new[]{'!','@','$','*','+','%','-'};
foreach(var ch in charList)
{
Console.WriteLine($"{ch} = IsSymbol:{char.IsSymbol(ch)}");
}

The output again, has few curious facts to verify. Let’s check the output first.

! = IsSymbol:False
@ = IsSymbol:False
$ = IsSymbol:True
* = IsSymbol:False
+ = IsSymbol:True
% = IsSymbol:False
- = IsSymbol:False

Some of the results are self-explanatory, but what looks interesting for us would be the characters "*","-", and "%". All three of them looks to fall under Mathematical symbols. This might raise eyebrows on why they weren’t recognized as Symbols.

The answer lies in the UnicodeCategory of the character. Let us change the code a bit to include the unicode category as well for each character.

var charList = new[]{'!','@','$','*','+','%','-'};
foreach(var ch in charList)
{
Console.WriteLine($"{ch} = IsSymbol:{char.IsSymbol(ch)}"
+ $"UnicodeCategory:{Char.GetUnicodeCategory(ch)}");
}

Before further discussion let us examine the output as well.

! = IsSymbol:FalseUnicodeCategory:OtherPunctuation
@ = IsSymbol:FalseUnicodeCategory:OtherPunctuation
$ = IsSymbol:TrueUnicodeCategory:CurrencySymbol
* = IsSymbol:FalseUnicodeCategory:OtherPunctuation
+ = IsSymbol:TrueUnicodeCategory:MathSymbol
% = IsSymbol:FalseUnicodeCategory:OtherPunctuation
- = IsSymbol:FalseUnicodeCategory:DashPunctuation

The answer to previous question now stares on us. The characters "*,%,-" lies under the OtherPunctuation and DashPunctuation Categories.

That explains the behavior of char.IsSymbol(). In most cases, it would be better to use Regex for validating passwords or other strings that needs to be validated for special characters.

Deserialize Json to Generic Type

One of the recent question in stackoverflow found interesting was about a Json, which needs to be deserialized to a Generic Class. What makes the question interesting was the Generic Property would have a different Json Property name depending on the type T. Consider the following Json

{
status: false,
employee:
{
firstName: "Test",
lastName: "Test_Last"
}
}

This needs to be Deseriliazed to the following class structures

public class Response<T>
{

[JsonProperty(PropertyName = "status")]
public bool Status {get;set;}

public T Item {get;set;}

}

[JsonObject(Title = "employee")]
public class Employee
{

[JsonProperty(PropertyName = "firstName")]
public string FirstName {get; set;}

[JsonProperty(PropertyName = "lastName")]
public string LastName {get; set;}

}

 

However, Response<T> being a generic class, would need to support additional Types as well. For example, the Json could also look like the following

{
status: false,
company:
{
companyname: "company name",
headquaters: "location"
}
}

Where the company needs to be deserialized to

[JsonObject(Title = "company")]
public class Employee {

[JsonProperty(PropertyName = "companyname")]
public string CompanyName {get; set;}

[JsonProperty(PropertyName = "headquaters")]
public string HeadQuaters {get; set;}

}

The solution lies in writing a Custom Contract Resolver, which does the magic. Let’s go ahead and write the ContractResolver.

public class GenericContractResolver<T> : DefaultContractResolver
{

protected override JsonProperty CreateProperty(MemberInfo member, MemberSerialization memberSerialization)
{
var property = base.CreateProperty(member, memberSerialization);
if (property.UnderlyingName == nameof(Response<T>.Item))
{
foreach( var attribute in System.Attribute.GetCustomAttributes(typeof(T)))
{
if(attribute is JsonObjectAttribute jobject)
{
property.PropertyName = jobject.Title;
}
}
}
return property;
}
}

The role of the ContractResolver is pretty simple. As soon as it recognizes the Generic Type passed, it would replace the property name with the name described in JsonObjectAttribute.

Now you can use the ContractResolver to deserialize the Json. For example

var result = JsonConvert.DeserializeObject<Response<Employee>>(json,
new JsonSerializerSettings
{
ContractResolver = new GenericContractResolver<Employee>()
});

Demo Samples could be found here in my C# Fiddles

String or Array Converter : Json

Imagine you have a method which returns a Json String of following format.

{Name:'Anu Viswan',Languages:'CSharp'}

In order to deserialize the JSON, you could define a class as the following.

public class Student
{
public string Name{get;set;}
public string Languages{get;set;}
}

This work flawlessly. But imagine a situation when your method could return either a single Language as seen the example above, but it could additionally return a json which has multiple languages. Consider the following json

{Name:'Anu Viswan',Languages:['CSharp','Python']}

This might break your deserialization using the Student class. If you want to continue using Student Class with both scenarios, then you could make use of a Custom Convertor which would string to a collection. For example, consider the following Converter.

class SingleOrArrayConverter<T> : JsonConverter
{
public override bool CanConvert(Type objectType)
{
return (objectType == typeof(List<T>));
}

public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
JToken token = JToken.Load(reader);
if (token.Type == JTokenType.Array)
{
return token.ToObject<List<T>>();
}
return new List<T> { token.ToObject<T>() };
}

public override bool CanWrite
{
get { return false; }
}

public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}

 

Now, you could redefine your Student class as

public class Student
{
public string Name{get;set;}
[JsonConverter(typeof(SingleOrArrayConverter<string>))]
public List Languages{get;set;}
}

This would now work with both string and arrays.

Case 1 : Output

case 1

Case 2 : Output

case 2

Revisiting Threads – Overhead of explicit threads

Recently I had the good fortune to read some of the invaluable books such as CLR via C# by Jeffery Rictcher, C# in Depth by John Skeet and Writing High Performance code in .Net by Ben Watson. It allowed me to revisit some of the basics on Threads and I thought to write down my notes from the books. In this first part on Asynchronous Programming, we will begin by examining (or revisiting) internals of a thread and thereby understanding why creating explicit threads are such a bad idea.

Typical possible overhead of threads can be classified into two broad categories.
* Space , in terms of Memory Consumption
* Time, in terms of execution performace

Keeping the overheads in mind, let us look at what happens when a new thread is created.

Memory Allocation

For each new thread that is created, the operating system assigns each of the following data structures

Thread Kernel Object

Thread Kernel Object is a data structure/memory block allocated by the OS, which can be accessed only by the Kernel. The key objective of the Thread Kernel Object is to store information regarding the particular thread, including the thread context.The thread context includes states of CPU registers when the thread was last executed.

In addition the Thread Kernal Object also stores statistical information regarding the thread such as the Creation Time, State, Priority, Number of Context Switches done, Kernal Mode Time and User Mode Time among others.

Further more, the Thread Kernal Object also contains Stack pointer pointing to the starting location of stackframe of current function that is being executed in the thread and Instruction pointer to the current instruction that was executed by the CPU.It also contains address spaces refering the TEB and Stacks (User Mode and Kernal Mode).

Thread Environment Block (TEB)

The TEB, or Thread Environment Block is a block of memory allocated in the user mode (and hence accessible for application) for each thread which typically consumes 1 Page (4 Kb in most common processors) of Memory.

One of the key objectives of the TEB is to maintain a stack comprising of head of an exception handling chain. The node is removed each time the code exists the try block.

The TEB is also responsible for Threads Local Storage and data structures to be used for GDI/Open GL.

User Mode Stack

The User Mode Stack maintains reference to the address space indicating what the thread needs to execute once the method ends, which it removes when the method ends. It is also used for storing all the local variables and method parameters used in the method.

Windows by default allocates 1 MB per thread, but it can grow if the requirement arises.

Kernal Mode Stack

When the method access a Kernal Mode function, the arguements of the methods are stored in a different data structure called Kernal Model Stack. The application cannot directly access the Kernal Mode Stack. This is done for security reasons and during execution of Kernal functions, the OS copies the parameters from User Mode Stack to Kernal Mode Stack.

For a 32 bit System, the Kernal mode stack is typically 12Kb and 24Kb in case of 64 bit machines.

Unmanaged DLLs

One of the policies that Windows Operating System follows requires that for every new thread that is created, all unmanaged DLLs in the process should invoke their DLL_Main called with DLL_THREAD_ATTACH flag passed. Similarly, DLL_THREAD_DETACH is oassed when the Thread dies. This is required by some DLLs for initialization and clean up.

This,understandably has a performance implication every time a thread is created.

Context Switching

Every processor can run only a single thread at a time. Each thread is allowed to run for a specified sclice of time,(known as Thread Quantum) typically around 15-20 ms. When the thread quantum expires, the scheduler picks another thread from the another thread, allowing it to use the processor.

The OS Thread scheduler stores the kernel thread object in different queues based on the state of the thread (Ready, Waiting and Exiting). When the thread quantum finishes for a thread, the scheduler checks the Ready Queue, and picks a new thread causing a context switching.

Context Switching is the process of storing/restoring state of the given thread so that it can be resumed. This includes restoring the state of CPU registers with the states stored in Thread Kernel Object

Every context switching requires
* Save state of CPU registers for current thread in the Threads Kernel Object.
* Picks another thread.
* Load state of CPU registers for new thread, which has been previously stored in the new thread’s Kernel object

Additionally, when the context switching occurs, the CPU is already processing a thread and the executing threads code/data resides in the CPU’s cache. This is done to avoid frequent access to RAM, which is slighly slower compared to CPU’s own cache. CPU now must now access RAM to populate CPU’s cache

This whole proces has to repeat every 15/20 ms, which is a performance overhead. Obvious question that rises in mind is, wouldn’t that happen even with the Thread Pool.

The answer is Yes, but however, the one of the critical decission which the Thread Pool makes is maintaining optimal amount of threads. We will go into details of thread pool later, but the point of interest at this point would be how the thread pool ensures the number of threads remained optimal and doesn’t go out of hand. Also, with lesser threads, there would be higher chance for your thread to get an oppurtunity to schedule its run.

Garbage Collection

When the Garbage collector runs, the CLR suspends all the threads and walk through the stack to find roots to mark the object in heap. The GC would again walk though the stack again to update the roots once the objects has been moved.

This is another case where lesser or optimal number of threads would improve the performance.

Summary

All the above factors highlights why it is a bad choice to create threads explicitly. While threads are highly useful for employing asynchronous operations in your application, one needs to strike the right balance as far as the number of threads that are alive at a moment. Considering the amount of memory overhead required for allocating the thread, it would be highly useful if one could reuse the threads. This is exactly what the thread pool does.

Having said so, there are cases when creating threads explicitly could be recommended.
* By default, all thread pool threads are running in Normal Priority. When you need to run a thread in a non-Normal priority, you have the option to create explicit threads.

  • You need to create a Foreground threads. The threads in the threadpool are background threads.

  • If you have a extremely long running compute bound task, and you want avoid taxing the thread pool logic, you have a case where you could depend on explicit thread.

In the next part, we would examine Thread Pool and how it manages the optimal thread count balance.