String or Array Converter : Json

Imagine you have a method which returns a Json String of following format.

{Name:'Anu Viswan',Languages:'CSharp'}

In order to deserialize the JSON, you could define a class as the following.

public class Student
{
public string Name{get;set;}
public string Languages{get;set;}
}

This work flawlessly. But imagine a situation when your method could return either a single Language as seen the example above, but it could additionally return a json which has multiple languages. Consider the following json

{Name:'Anu Viswan',Languages:['CSharp','Python']}

This might break your deserialization using the Student class. If you want to continue using Student Class with both scenarios, then you could make use of a Custom Convertor which would string to a collection. For example, consider the following Converter.

class SingleOrArrayConverter : JsonConverter
{
public override bool CanConvert(Type objectType)
{
return (objectType == typeof(List));
}

public override object ReadJson(JsonReader reader, Type objectType, object existingValue, JsonSerializer serializer)
{
JToken token = JToken.Load(reader);
if (token.Type == JTokenType.Array)
{
return token.ToObject<List>();
}
return new List { token.ToObject() };
}

public override bool CanWrite
{
get { return false; }
}

public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
throw new NotImplementedException();
}
}

Now, you could redefine your Student class as

public class Student
{
public string Name{get;set;}
[JsonConverter(typeof(SingleOrArrayConverter))]
public List Languages{get;set;}
}

This would now work with both string and arrays.

Case 1 : Output

case 1

Case 2 : Output

case 2

Oxyplot : Selectable Point

While OxyPlot continues to be one of the attractive plotting library for .Net developers, there are times when you find longing for features that could make it even more better.

One of such feature is ability to select/highlight a point in a series. While there is a Selectable Property with LineSeries, it doesn’t quite let you highlight the Data Point.

However, this can be easily achieved using another series with has a single point (the point which was selected). I guess it is easier to show the code than describe it. So let’s go ahead and hit Visual Studio.

We would begin by defining couple of Custom Series. For the sake of example, I am considering LineSeries in this case. The first Custom Series we would define would be the series which displays the Selected Item.

public class SelectedLineSeries:LineSeries
{
}

As you can observe, it hardly does anything new. The purpose of the definition is to allow is to easily differenciate between our special Series from Line Series. This would be further clarified when introduce our second custom series, which denotes a series which can be selected.

public class SelectableLineSeries:LineSeries
{
public bool IsDataPointSelectable { get; set; }

public DataPoint CurrentSelection { get; set; }

public OxyColor SelectedDataPointColor { get; set; } = OxyColors.Red;

public double SelectedMarkerSize { get; set; }

public SelectableLineSeries()
{
SelectedMarkerSize = MarkerSize;
MouseDown += SelectableLineSeries_MouseDown;
}

private void SelectableLineSeries_MouseDown(object sender, OxyMouseDownEventArgs e)
{
if (IsDataPointSelectable)
{
var activeSeries = (sender as OxyPlot.Series.Series);
var currentPlotModel = activeSeries.PlotModel;
var nearestPoint = activeSeries.GetNearestPoint(e.Position, false);
CurrentSelection = nearestPoint.DataPoint;

currentPlotModel = ClearCurrentSelection(currentPlotModel);

var selectedSeries = new SelectedLineSeries
{
MarkerSize = MarkerSize + 2,
MarkerFill = SelectedDataPointColor,
MarkerType = MarkerType
};

selectedSeries.Points.Add(CurrentSelection);
currentPlotModel.Series.Add(selectedSeries);
currentPlotModel.InvalidatePlot(true);
}
}

private PlotModel ClearCurrentSelection(PlotModel plotModel)
{
while(plotModel.Series.Any(x=> x is SelectedLineSeries))
{
plotModel.Series.Remove(plotModel.Series.First(x=> x is SelectedLineSeries));
}
return plotModel;
}

}

The core functionality of the Class is defined in the Series Mouse Down event. This ensures that whenever a point is selected, a new series is added to the Parent PlotModel, which has a single Data Point ( same as the currently selected data point).

Well, that’s all our definition is about. Now we can go ahead and define our PlotModel to be used along with Series definition.

var random = new Random();
var collection = Enumerable.Range(1, 5).Select(x => new DataPoint(x, random.Next(100, 400))).ToList();
var series = new SelectableLineSeries
{
IsDataPointSelectable = true,
MarkerFill = OxyColors.Blue,
MarkerType = MarkerType.Square,
LineStyle = LineStyle.Solid,
Color = OxyColors.Blue,
ItemsSource = collection,
MarkerSize = 5
};

GraphModel = new PlotModel();

GraphModel.Axes.Add(new OxyPlot.Axes.LinearAxis
{
Position = Axes.AxisPosition.Bottom
});

GraphModel.Axes.Add(new OxyPlot.Axes.LinearAxis
{
Position = Axes.AxisPosition.Left
});

GraphModel.Series.Add(series);
GraphModel.InvalidatePlot(true);

That’s it. Hit F5 and test your Selectable Line Series. You can access the source code shown in the demo in my Github.

Oxyplot Selectable

Revisiting Threads – Overhead of explicit threads

Recently I had the good fortune to read some of the invaluable books such as CLR via C# by Jeffery Rictcher, C# in Depth by John Skeet and Writing High Performance code in .Net by Ben Watson. It allowed me to revisit some of the basics on Threads and I thought to write down my notes from the books. In this first part on Asynchronous Programming, we will begin by examining (or revisiting) internals of a thread and thereby understanding why creating explicit threads are such a bad idea.

Typical possible overhead of threads can be classified into two broad categories.
* Space , in terms of Memory Consumption
* Time, in terms of execution performace

Keeping the overheads in mind, let us look at what happens when a new thread is created.

Memory Allocation

For each new thread that is created, the operating system assigns each of the following data structures

Thread Kernel Object

Thread Kernel Object is a data structure/memory block allocated by the OS, which can be accessed only by the Kernel. The key objective of the Thread Kernel Object is to store information regarding the particular thread, including the thread context.The thread context includes states of CPU registers when the thread was last executed.

In addition the Thread Kernal Object also stores statistical information regarding the thread such as the Creation Time, State, Priority, Number of Context Switches done, Kernal Mode Time and User Mode Time among others.

Further more, the Thread Kernal Object also contains Stack pointer pointing to the starting location of stackframe of current function that is being executed in the thread and Instruction pointer to the current instruction that was executed by the CPU.It also contains address spaces refering the TEB and Stacks (User Mode and Kernal Mode).

Thread Environment Block (TEB)

The TEB, or Thread Environment Block is a block of memory allocated in the user mode (and hence accessible for application) for each thread which typically consumes 1 Page (4 Kb in most common processors) of Memory.

One of the key objectives of the TEB is to maintain a stack comprising of head of an exception handling chain. The node is removed each time the code exists the try block.

The TEB is also responsible for Threads Local Storage and data structures to be used for GDI/Open GL.

User Mode Stack

The User Mode Stack maintains reference to the address space indicating what the thread needs to execute once the method ends, which it removes when the method ends. It is also used for storing all the local variables and method parameters used in the method.

Windows by default allocates 1 MB per thread, but it can grow if the requirement arises.

Kernal Mode Stack

When the method access a Kernal Mode function, the arguements of the methods are stored in a different data structure called Kernal Model Stack. The application cannot directly access the Kernal Mode Stack. This is done for security reasons and during execution of Kernal functions, the OS copies the parameters from User Mode Stack to Kernal Mode Stack.

For a 32 bit System, the Kernal mode stack is typically 12Kb and 24Kb in case of 64 bit machines.

Unmanaged DLLs

One of the policies that Windows Operating System follows requires that for every new thread that is created, all unmanaged DLLs in the process should invoke their DLL_Main called with DLL_THREAD_ATTACH flag passed. Similarly, DLL_THREAD_DETACH is oassed when the Thread dies. This is required by some DLLs for initialization and clean up.

This,understandably has a performance implication every time a thread is created.

Context Switching

Every processor can run only a single thread at a time. Each thread is allowed to run for a specified sclice of time,(known as Thread Quantum) typically around 15-20 ms. When the thread quantum expires, the scheduler picks another thread from the another thread, allowing it to use the processor.

The OS Thread scheduler stores the kernel thread object in different queues based on the state of the thread (Ready, Waiting and Exiting). When the thread quantum finishes for a thread, the scheduler checks the Ready Queue, and picks a new thread causing a context switching.

Context Switching is the process of storing/restoring state of the given thread so that it can be resumed. This includes restoring the state of CPU registers with the states stored in Thread Kernel Object

Every context switching requires
* Save state of CPU registers for current thread in the Threads Kernel Object.
* Picks another thread.
* Load state of CPU registers for new thread, which has been previously stored in the new thread’s Kernel object

Additionally, when the context switching occurs, the CPU is already processing a thread and the executing threads code/data resides in the CPU’s cache. This is done to avoid frequent access to RAM, which is slighly slower compared to CPU’s own cache. CPU now must now access RAM to populate CPU’s cache

This whole proces has to repeat every 15/20 ms, which is a performance overhead. Obvious question that rises in mind is, wouldn’t that happen even with the Thread Pool.

The answer is Yes, but however, the one of the critical decission which the Thread Pool makes is maintaining optimal amount of threads. We will go into details of thread pool later, but the point of interest at this point would be how the thread pool ensures the number of threads remained optimal and doesn’t go out of hand. Also, with lesser threads, there would be higher chance for your thread to get an oppurtunity to schedule its run.

Garbage Collection

When the Garbage collector runs, the CLR suspends all the threads and walk through the stack to find roots to mark the object in heap. The GC would again walk though the stack again to update the roots once the objects has been moved.

This is another case where lesser or optimal number of threads would improve the performance.

Summary

All the above factors highlights why it is a bad choice to create threads explicitly. While threads are highly useful for employing asynchronous operations in your application, one needs to strike the right balance as far as the number of threads that are alive at a moment. Considering the amount of memory overhead required for allocating the thread, it would be highly useful if one could reuse the threads. This is exactly what the thread pool does.

Having said so, there are cases when creating threads explicitly could be recommended.
* By default, all thread pool threads are running in Normal Priority. When you need to run a thread in a non-Normal priority, you have the option to create explicit threads.

  • You need to create a Foreground threads. The threads in the threadpool are background threads.

  • If you have a extremely long running compute bound task, and you want avoid taxing the thread pool logic, you have a case where you could depend on explicit thread.

In the next part, we would examine Thread Pool and how it manages the optimal thread count balance.

Partitioner and Parallel Loops

Two common traps when using Parallel Loops could be summarized as following.
*  The amount of work done in the loop is not significantly larger than the amount of time spend in synchronizing any shared states.
*  Amount of work done is less than the cost of delegate or method invocation.

Both of the problems results in significant performance implications. However, both issues can be easily solved using the Partitioner.

Partitioner splits the range into set of tuples that describes a subset range that needs be iterated over the original collection. Let’s write some code with and without Partitioner and benchmark them.

[Benchmark]
public void ParallelLoopWithoutPartioner()
{
var maxValue = 100000;
var sum = 0L;

Parallel.For(0, maxValue, (value) =>
{
Interlocked.Add(ref sum, value);
});
}

[Benchmark]
public void ParallelLoopWithPartioner()
{
var maxValue = 100000;
var sum = 0L;
var partioner = Partitioner.Create(0,maxValue);

Parallel.ForEach(partioner, range =>
{
var (minValueInRange, maxValueInRange) = range;
var subTotal = 0;
for (int value = minValueInRange; value < maxValueInRange; value++)
{
subTotal += value;
}
Interlocked.Add(ref sum, subTotal);
});
}

Both methods calculates the Sum of first N Numbers using Parallel Loops by accessing a shared variable sum.

This creates a significant ‘wait delay’ when using the first approach. The second approach, which uses the Partitioner, splits the range into subsets and access the shared state less frequently. The results of Benchmark are shown below.

benchmark

SOLID : Liskov Substitution Principle (Part 2)

In the first part of the post, we visited the core definition of Liskov Substitution principle and took time to understand the problem using an example. In this second part, we would take time to understand some of the governing rules of the principle.

The rules that needs to be followed for LSP compliance can be broadly categorized into mainly two categories.

  • Contract Rules
  • Variance Rules

Let us examine the Contract Rules first.

Contact Rules
The three contact rules are defined as follows.
* Preconditions cannot be strengthened in a subtype.
* Postconditions cannot be weakened in a subtype.
* Invariants—conditions that must remain true—of the supertype must be preserved in a subtype

Let us examine each of them breifly.

Preconditions cannot be strengthend in a subtype.

Preconditions are conditions that are necessary for a method to run reliably. For example, consider a method in a school application, which returns the count of students above specified age.

public virtual int GetStudentCount(int minAge)
{
if(minAge <=0)
throw new ArgumentOutOfRangeException("Invalid Age");
return default; // For demo purpose
}

As observed in the above method, a non-positive value for age needs to be flagged as an exception and method should stop executing. The method does this check as the first step before it executes any logic related to the funcationlity of the method and thereby preventing the method execution unless the parameters are valid. This is a typical example of preconditions. There are other ways to add preconditions, but that would be out of scope of the current context.

So what happens if the precondition is strengthed. Let us assume that the application for Kindergarden for the school derieves from the same class, but adds another precondition.

public override int GetStudentCount(int minAge)
{
if(minAge <=0) throw new ArgumentOutOfRangeException("Invalid Age"); if(minAge >=5)
throw new ArgumentOutOfRangeException("Invalid Age");
return default; // For demo purpose
}

This creates a conflict in the calling class. The caller was invoking the method from the base Class, it would not throw an exception if the value is greater then 5. However, if the caller was invoking the method from the Child Class, it would now throw an exception, thanks to the new precondition we have added. Let’s demonstrate the same using Unit Tests.

[Test]
[TestCaseSource(typeof(Preconditions), nameof(TestCases))]
public void GetStudentCount(School school,int age)
{
Assert.GreaterOrEqual(school.GetStudentCount(age), 0);
}

public static IEnumerable TestCases
{
get
{
yield return new TestCaseData(new School(), 7);
yield return new TestCaseData(new Kindergarden(), 7);
}
}
}

The Unit Tests show that the strengthening of Preconditions would now indirectly force the caller to violate the fact that it should not make any assumptions about the type it is acting on (and thereby violating Open Closed Principle).

Post Conditions cannot be weakend in a subtype

During the execution of the method, the state of the object is most likely to be mutated, opening the door of possiblity that the method might leave the object in an invalid state due to errors in the method. Post Conditions ensures that the state of the object is in a valid state when the method exits.

Much similiar to the Preconditions, the Post Conditions are commonly implemented using Guard clauses, which ensure the state is valid after the execution of logic (any code that might alter the state of the object) associated with the method. For the same reason, the Post Conditions are placed at the end of the method.

The behavior of Post Conditions in sub-classes is the complete opposite of what happens with Pre Conditions. The Post Conditions, if weakened, could l;ead to existing clients to behave incorrectly. Let us explore a scenario. We will write the Unit Tests first.

[Test]
[TestCaseSource(typeof(Postconditions), nameof(TestCases))]
public void TestLabFeeForSchool(School school, long studentId)
{
Assert.Greater(school.CalculateLabFees(studentId),0);
}

public static IEnumerable TestCases
{
get
{
yield return new TestCaseData(new School(), 7);
}
}

The above Test Case ensures that the CalculateLabFees method returns a positive and non-zero value as result. Let us assume that School.CalculateLabFees method was defined as following.

public class School
{
public virtual decimal CalculateLabFees(long studentId)
{
var labFee = MockMethodForCalculatingLabFees(studentId);

if (labFee <= 0)
return 1000;
return labFee;
}

private decimal MockMethodForCalculatingLabFees(long studentID)
{
return default;
}
}

The Post Condition in the base class ensures that if the labFee is less than or equal to zero, a default value of 1000 is returned. The Test Case would pass successfully with the instance of class. The CalculateLabFees method would return a positive, non-zero value as expected by the Unit Test.

Now, let us assume the Kindergarden Implementation. The Kindergarden doesn’t have Lab Fees and hence, would return a default value 0 for the CalculateLabFee method.

public class Kindergarden : School
{
public override decimal CalculateLabFees(long studentId)
{
return 0;
}
}

As you can notice the Post Condition has been weakened in the Child Class. This would cause issues with client which would expect a positive response, as with the Unit Test Case. This highlights another instance where significance of honoring LSP is highlighted.

Invariants must be maintained

An invariant describes a condition of a process that remains true before the process begins and after the process ends, until ofcourse the object is out of scope.

Consider the following base class.

public class Student
{
public string Name{ get; set; }

public void CheckInvariants()
{
if (string.IsNullOrEmpty(Name))
Name = "Name Not Assigned";
}
public virtual void AssignName(string name)
{
CheckInvariants();
Name = name;
CheckInvariants();
}
}

The class requires the Name property to be always have a value. The CheckInvariants method assigns a default value if the clause is violated. Let us assume we have an sub class NurseryStudent.

public class NurseryStudent : Student
{
public override void AssignName(string name)
{
Name = name;
}

}

The author of NurseryStudent forgots to ensure the Invariants holds true after the AssignName exits. This is a violation of LSP as a client using the SuperClass would expect the Name to always have a value. Consider the following code.

[Test]
[TestCaseSource(typeof(Invariants), nameof(TestCases))]
public void TestInvariants(Student student,string name)
{
student.AssignName(name);
Assert.Greater(student.Name.Length, 0);
}

public static IEnumerable TestCases
{
get
{
yield return new TestCaseData(new Student(), string.Empty);
yield return new TestCaseData(new NurseryStudent(), string.Empty);
}
}

The Unit Test Case would fail

Let’s write the Unit Test Case before writing the example for sub class.

[Test]
[TestCaseSource(typeof(Invariants), nameof(TestCases))]
public void TestInvariants(Student student,string name)
{
student.AssignName(name);
Assert.Greater(student.Name.Length, 0);
}

public static IEnumerable TestCases
{
get
{
yield return new TestCaseData(new Student(), null);
yield return new TestCaseData(new NurseryStudent(), null);
}
}

The Test Case would fail when an instance of NurserStudent is passed to the method as the Name property would have an unexpected value of null when leaving the AssignName method. This would cause exceptions in all clients which expects the behavior of invariants are retained by any subclass of the Student.

In this post we explored the Contract Rules for LSP. In the part, we would explore the variance rules involved for conforming with LSP. All code samples given in the post are available in my Github

Conditional Serialization using NewtonSoft Json

One of the least explored feature of Newtonsoft Json is the ability serialize properties conditionally. Consider the hypothetical situation wherein you want to serialize a property in a class only if a condition is satisfied. For example,

public class User
{
public string Name {get;set;}
public string Department {get;set;}
public bool IsActive {get;set;}
}

If the requirement is that you need to include serialize the Department Property only if the User Is Active, then the easiest way to do it would be to use the Conditional Serialization functionality of Json.Net. All you need to do is include a method that
a) Returns a boolean indicating whether to serialize or not.
b) Should be named with Property named prefixed with ‘ShouldSerialize’

For example, for the Property Department, the method should be named ‘ShouldSerializeDepartment’. Example,

public bool ShouldSerializeDepartment()=> IsActive;

Complete Code

public class User
{
public string Name {get;set;}
public string Department{get;set;}
public bool IsActive {get;set;}
public bool ShouldSerializeDepartment()=> IsActive;
}

Client Code

var user = new User{ Name = "Anu Viswan", IsActive = false} ;
var result = JsonConvert.SerializeObject(user);

Output

{"Name":"Anu Viswan","IsActive":false}

Serializing/Deserializing Dictionaries with Tuple as Key

Sometimes you run into things that might look trivial but it just do not work as expected. One such example is when you attempt to serialize/Deserialize a Dictionary with Tuples as the key. For example

var dictionary = new Dictionary<(string, string), int>
{
[("firstName1", "lastName1")] = 5,
[("firstName2", "lastName2")] = 5
};

var json = JsonConvert.SerializeObject(dictionary);
var result = JsonConvert.DeserializeObject<Dictionary<(string, string), string>>(json);

The above code would trow an JsonSerializationException when deserializing. But the good part is, the exception tells you exactly what needs to be done. You need to use an TypeConverter here.

Let’s define our required TypeConverter

public class TupleConverter<T1, T2> : TypeConverter
{
public override bool CanConvertFrom(ITypeDescriptorContext context, Type sourceType)
{
return sourceType == typeof(string) || base.CanConvertFrom(context, sourceType);
}

public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
{
var elements = Convert.ToString(value).Trim('(').Trim(')').Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
return (elements.First(), elements.Last());
}
}

And now, you can alter the above code as

TypeDescriptor.AddAttributes(typeof((string, string)), new TypeConverterAttribute(typeof(TupleConverter<string, string>)));
var json = JsonConvert.SerializeObject(dictionary);
var result = JsonConvert.DeserializeObject<Dictionary<(string, string), string>>(json);

With the magic portion of TypeConverter in place, your code would now work fine. Happy Coding.