.Net 6 : Benchmark performance of JsonSerializer.DeserializeAsyncEnumerable

This should have been part of my earlier post on System.Text.Json Support for IAsyncEnumerable, but it slipped off my mind. So here we are.

To understand the significance of this feature in .Net 6, one need to understand the circumstances under which these might be useful. The first of those would be of course, that the we could be consuming the data even as the rest of the JSON is yet to be deserialized.

The significance is further amplified when you are only interested in the some earlier part of the data. Now you do not really need to deserialize the entire JSON (considering it is a huge one), hold it up in your buffers, and then use only a fraction of those. This could provide immense performance boost to the application.

Let us compare and benchmark the performance of various methods exposed by System.Text.Json for deserialization and attempt to understand it better.

There will be 3 methods which we would be placing under the hammer.

  • JsonSerializer.Deserialize<T>
  • JsonSerializer.DeserializeAsync<T>
  • JsonSerializer.DeserializeAsyncEnumerable<T>

Let us write some code to benchmark them.

[Benchmark]
public void TestDeseriliaze()
{
    foreach(var item in DeserializeWithoutStreaming().TakeWhile(x => x.Id < DATA_TO_COMSUME))
    {
        // DoSomeWork
    }
}

public IEnumerable<Data> DeserializeWithoutStreaming()
{
    var deserializedData = JsonSerializer.Deserialize<IEnumerable<Data>>(serializedString);
    return deserializedData;
}

[Benchmark]
public async Task TestDeseriliazeAsync()
{
    foreach (var item in (await DeserializeAsync()).TakeWhile(x => x.Id < DATA_TO_COMSUME))
    {
        // DoSomeWork
    }
}

public async Task<IEnumerable<Data>> DeserializeAsync()
{
    var memStream = new MemoryStream(Encoding.UTF8.GetBytes(serializedString));
    var deserializedData = await JsonSerializer.DeserializeAsync<IEnumerable<Data>>(memStream);
    return deserializedData;
}



[Benchmark]
public async Task TestDeserializeAsyncEnumerable()
{
    await foreach (var item in DeserializeWithStreaming().TakeWhile(x => x.Id < DATA_TO_COMSUME))
    {
        // DoSomeWork
    }
}

public async IAsyncEnumerable<Data> DeserializeWithStreaming()
{
    using var memStream = new MemoryStream(Encoding.UTF8.GetBytes(serializedString));
    await foreach(var item in  JsonSerializer.DeserializeAsyncEnumerable<Data>(memStream))
    {
        yield return item;
    }
}

Scenario 1 : Consuming only first 20% of the JSON Data

The first scenario we need to consider is when only a fairly small amount of the JSON data is consumed, say the first 20% of the data. While Deserialize<T> and DeserializeAsync would need to deserialize the entire JSON, even if the client would consume only the first 20% of that data, on other hand, DeserializeAsyncEnumerable would deserialize on-demand. This is evident in the benchmark results as well, where the performance of the DeserializeAsyncEnumerable is almost 3 times better.

MethodMeanErrorStdDev
TestDeseriliaze4.810 ms0.0952 ms0.2573 ms
TestDeseriliazeAsync5.166 ms0.1008 ms0.1161 ms
TestDeserializeAsyncEnumerable1.531 ms0.0305 ms0.0825 ms

Scenario 2: Consuming about 80% of the JSON Data

In the second scenario, we will consider when the client consume 80% of data. As one could assume, the now a larger part of JSON data has to be consumed and hence the performance margin decreases.

MethodMeanErrorStdDev
TestDeseriliaze4.960 ms0.0974 ms0.1877 ms
TestDeseriliazeAsync5.238 ms0.0997 ms0.1297 ms
TestDeserializeAsyncEnumerable4.851 ms0.0859 ms0.0804 ms

This is expected too, as more of the JSON is deserialized the performance difference is hardly significant, if not non-existent. But still, there is an advantage of using the DeserializeAsyncEnumerable – you would not have to wait for the entire JSON to be deserialized, the on-demand streaming approach allows you to consume the data as soon parts of JSON are deserialized.

I felt this is a huge improvement, especially when the concerned JSON is significantly large. Like many others, I am equally excited to see the improvements in .Net in recent years and is looking forward for the release of .Net 6.

.Net 6 : System.Text.Json support for IAsyncEnumerable

As the Preview 4 of .Net 6 becomes available, one of the things that excites me is the System.Text.Json support for IAsyncEnumerable. The IAsyncEnumerable, introduced in .Net Core 3 and C# 8, enables us to iterate over async Enumerables. The newer version extends this support to the System.Text.Json.

Consider the following data.

[{"Id":0,"Value":"915777539"},{"Id":1,"Value":"1332243482"},{"Id":2,"Value":"306207588"},
 {"Id":3,"Value":"1413388423"},{"Id":4,"Value":"2145941621"},{"Id":5,"Value":"1041779876"},
 {"Id":6,"Value":"1121436961"},{"Id":7,"Value":"520045044"},{"Id":8,"Value":"1357859915"},
 {"Id":9,"Value":"1340510964"},{"Id":10,"Value":"1183306988"},{"Id":11,"Value":"502467538"},
 {"Id":12,"Value":"31513434"},{"Id":13,"Value":"999086707"},{"Id":14,"Value":"961728759"},
 {"Id":15,"Value":"1756662810"},{"Id":16,"Value":"1018107007"},{"Id":17,"Value":"433502262"},
 {"Id":18,"Value":"1784715926"},{"Id":19,"Value":"1418088822"},{"Id":20,"Value":"645106286"},
 {"Id":21,"Value":"1720929044"},{"Id":22,"Value":"1102142546"},{"Id":23,"Value":"2138442183"},
 {"Id":24,"Value":"208176799"},{"Id":25,"Value":"1700100438"},{"Id":26,"Value":"769308703"},
 "Id":27,"Value":"1558581057"},{"Id":28,"Value":"352810944"},{"Id":29,"Value":"299925316"}]

we could now write a streaming deserialization method using the JsonSerializer.DeserializeAsyncEnumerable. For example.

public async IAsyncEnumerable<T> DeserializeStreaming<T>(string data)
{
    using var memStream = new MemoryStream(Encoding.UTF8.GetBytes(data));

    await foreach(var item in JsonSerializer.DeserializeAsyncEnumerable<T>(memStream))
    {
        yield return item;
        await Task.Delay(1000);
    }
}

// Data
public class Data
{
    public int Id { get; set; }
    public string Value { get; set; }

}

The async streams of deserialized data provides an oppurtunity to deserialize on demand, which could be great addition particularly deserializing large data.

var instance = new StreamingSerializationTest();
await foreach (var item in instance.DeserializeStreaming<Data>(dataString))
{
    Console.WriteLine($"Data Item: {nameof(Data.Id)}={item.Id} , {nameof(Data.Value)}={item.Value}");

    if(item.Id > 5)
    {
        break;
    }
}

As you can observe in the output below, this would deserialiaze only on-demand.

Streaming Deserialize Demo
Inside Deserializing..
Data Item: Id=0 , Value=915777539
Inside Deserializing..
Data Item: Id=1 , Value=1332243482
Inside Deserializing..
Data Item: Id=2 , Value=306207588
Inside Deserializing..
Data Item: Id=3 , Value=1413388423
Inside Deserializing..
Data Item: Id=4 , Value=2145941621
Inside Deserializing..
Data Item: Id=5 , Value=1041779876
Inside Deserializing..
Data Item: Id=6 , Value=1121436961

As the moment, the Deseriliazation is severely limited to root level Json Arrays, but I guess that would over time as .Net 6 reaches release. Let us take a look at the serialization as well now. Turns out that is easy as well.

private async IAsyncEnumerable<Data> Generate(int maxItems)
{
    var random = new Random();
    for (int i = 0; i < maxItems; i++)
    {
        yield return new Data
        {
            Id = i,
            Value = random.Next().ToString()
        };
    }
}

public async Task SerializeStream()
{
    using var stream = Console.OpenStandardOutput();
    var data = new { Data = Generate(30) };
    await JsonSerializer.SerializeAsync(stream, data);
}

At this point, am not quite excited about streaming Serialization as I am about streaming Deserilization. This is because of the lack of usecase it might support. But am not denying there could be usecases, and over the time, I might be equally excited about it as well.

Complete sample of the code in this demo could be found in my Github

We will continue exploring the .Net 6 features in the upcoming posts as well. Until then, enjoy coding…

Additional Release Burndown Charts

To say that Burndown Charts are significant for any Agile teams would to state the obvious. Anyone who has worked in an Agile project could tell you how useful these charts are in understanding overall progress of the project.

A typical burndown chart during development of the project might look as the following.

Traditional Release Burndown Chart

The Blue line indicate the ideal or desired completion of work, while the Red line indicate the actual completed work. The vertical axis indicates the total story points while the horizontal axis indicates the time axis.

As observed in the chart below, there is few things curious here. By the 3rd iteration, the team manages to complete about 20 User points and seems to be pretty much on the track, however in the next iteration, the pending work seems to be have suddenly increased. This seems to happened again in the 6th Iteration. So did the team fail to do any work ? Or was there new User Stories added ? It could also be the case where the team re-estimated some of the User Stories.

Similarly, what is not obvious in this chart is that during the 6th iteration, User managed to complete 10 User Points, but at the same time, Product Owner decides to remove User Stories worth 10 points from the backlog.

Burndown Bar Chart

While the traditional burn charts provides a lot of information, there is one place it is found lacking. More often than not the scope of the project could expand(or shrink) during the course of the development. The Project Owner could add more users stories (or remove) to the backlog. This could happen as an impact of market or as the understanding of the features expand. Furthermore, the team might have better understanding of some of the User Stories and would have re-estimated them. Both these activities could siginificantly affect the remaining work.

The traditional burn charts fails to indicate this expansion of work clearly. This is where a Burndown Bar Chart would come in handy. At this point, I should point that this is not exactly replacement of traditional burndown chart – but additional charts which could provide more clarity to the progress. In this case, the Bar chart would provide more clarity about velociy and scope changes.

A typical Burndown Bar chart might look like the following.

Release Burndown Bar Charts

Each bar represents the amount of work that is left in the release prior to start of iteration. During the iteration team would work on various User Stories and most of them would be ideally completed. The completed work is indicated by lowering the top of the bar for the next iteration. The difference between two adjacent bars (iterations) indicate the velocity of the proceeding iteration.

When the Product adds or remove User Stories to backlog, the scope change is reflected by lowering or raising the bottom of the bar. Similiarly, if the team re-estimates the work, the top of the bar is lowered/raised.

In the graph below, the Product Owner adds User Stories worth 30 User Points in the 3rd iteration. In the 4th Iration, another 40 User Points worth User Stories are added to the backlog. This is indicated by lowering the bottom of the bar.

During the 6th iteration, the Product Owners decides to remove User Stories worth 10 User Points. This is indicated by raising the bottom of the bar.

This an useful way to indicate the scope changes and supports the traditional Burndown chart by adding more clarity to it. This ensures the stake holders understands why the burndown chart behaves as we discussed earlier.

Parking Lot Chart

The Parking Lot Chart is yet another representation of the remaining work. This provides a birds-eye view of the work remaining.

Parking lot chart

As seen in the image above, the Parking Lot Chart contains a group of rectangles. Each of rectangle is annotated by

  • Theme
  • Number of User Stories in the theme
  • Total User Points in the Theme
  • Percentage of completion of stories in the Theme

The intention behind the graph is to compress a lot of information in a smaller space and thus providing a high-level view on the progress of the project. The representation is based on themes, which is the grouping of similiar User Stories.

The Boxes could be colored to indicate whether the work remaining in the theme is on schedule or falling back in the schedule and needs attension.

Summary

As seen, there are various chart which could be added in addition to the traditional burndown chart to help the stake holder understand the progress of the project. One could also innovate and merge some of the charts together. For example, you could modify the traditional burndown chart to include the burndown bar chart representation as well so that you do not need to maintain two separate graphs (rather maintain a single merged version).

Of course there could other useful graphs as well which could indicate the progress of the chart in an useful, and we will continue to explore them in the posts soon.

Prioritizing Features using Kano Model

The Product Backlog provides a collection of features that the Product should ideally implement. But not every feature has the same priority. Some of the features are more important than others and of course, the Product Owner doesn’t go around picking random features while prioriterzing the features.

There are various models and in this example, we will explore the Kano Model.

The Kano Model

As per Nariako Kano’s model, the features could be broadly categorized into 3 categories.

  • Thresholds/Must Have Features These represent the minimum set of features that should be present to meet User expectations. Improving the must have features beyond a limit would have little impact on the customer satisfaction. For example, for accommodation, a minimum requirement of User would be a clean room with basic ammenities.
  • Linear Features Features which increases customer satisfaction as it increases is known as Linear Features. This includes the size of the room or bed, freebies in the room etc.
  • Delighters Delighters on other hand are features which adds to the premium quality of product, often adding greatly to customer satisfaction. These could include private pools. These are features which the Customer might not quite miss if not present, but would be delighted if present.

Now that we have understood how we would like to group the features, the process of actually grouping them begins. As per Kano, this could be done by asking two questions per feature to the user group.

  • Assuming the feature is present, also known as Functional question
  • Assuming the feature is not present, also known as Dysfunctional Question.

The answers are questions are typically collected as

  1. I like it that way
  2. I expected it that way
  3. I am neutral
  4. I can live with it that way
  5. I dislike it that way

The answers could be mapped to 3 feature groups using the following table.

The answers from the user groups (typically 20-30 users) are aggregated and their distribution could be observed to determine the priority group.

The responses with high values are considered.

This was one approach for User Story Prioritization. In the next post, we will explore Relative Weighting approach.