Benchmarking Span<T> Performance

Span<T> is yet another addition to C# 7.x and is particularly useful in developing memory intensive applications.  So what is Span all about ?

 

As Microsoft describes it, Span<T> is a new value Type which enables the representation of contiguous regions of arbitrary memory, regardless of whether the memory is associated with a managed object, is provided by native code  or is on the stack, with a performance characteristics like that of an array.

 

It feels like you can use pointers, but without entering ‘unsafe’ code. That definitely is interesting. We will be looking into Span<T> closely in upcoming posts, but for this post, we will compare Span with Strings when parsing sub strings.  We will begin by how sub strings work. We will keep it simple for sake of the example. Consider the following code.
public void DummyMethod(string data) { }

string keyString = "234567";
DummyMethod(keyString.Substring(0, 2));
DummyMethod(keyString.Substring(2, 2));
DummyMethod(keyString.Substring(4, 2));

Remember string is immutable, and each time we are invoking the Substring method, we are allocating a new memory location is allocated with the substring. Now that isn’t a scenario you wouldn’t want to be if you are working on a memory intensive application and you want to parse a really long string. But what is we could parse the string right from the memory location allocated for keyString. That would be extremely efficient right ?

That’s where Span<T> comes in. It would allow us to point to a contiguous region of memory, and allow us to parse through it without needing a different memory allocation. Let’s rewrite the above code using Span<T>.

public void DummyMethod(ReadOnlySpan data) { }

ReadOnlySpan keyString = "234567".AsReadOnlySpan();
DummyMethod(keyString.Slice(0, 2));
DummyMethod(keyString.Slice(2, 2));
DummyMethod(keyString.Slice(4, 2));

Notice that the string has a nice little extension method to create a ReadOnlySpan. We are also using the Slice method (instead of Substring) to access the specific part of memory location. We will do a bit of bench marking to understand the performance implication. Let’s create a BenchmarkDemo Class for our demonstrative purpose. Complete source code is available at my Github.

[Benchmark]
public void UsingSubString()
{
    string keyString = "long string";
    for (int i = 0; i < IterationLimit; i++)
        DummyStringMethod(keyString.Substring(0, i));
}
void DummyStringMethod(string _) { }

[Benchmark]
public void UsingSpan()
{
    ReadOnlySpan keyString = "long string".AsReadOnlySpan();
    for (int i = 0; i < IterationLimit; i++)
        DummySpanMethod(keyString.Slice(0, i));
}

void DummySpanMethod(ReadOnlySpan _) { }

The above code is for demonstrative purpose, hence we are using a dummy string "long string". In the example in Github, you can find the same replaced with lorem ipsum.

Alright, now lets run the Benchmark code and analyse the memory allocation against iteration loops of 10,100, and 400. We are using BenchmarkDotNet for example.

Benchmark Span

As you can see the “UsingSubString” method uses a huge amount of memory as the calls to substring increases. On other hand, Span based method hardly uses any and doesn’t have any different as the number of calls increases.

That’s it for now, we will investigate more on Span<T> in upcoming posts.

Ref Value Type

In an earlier post, we discussed the readability factor of the reference semantics, mentioning how it kind of makes the code less readable. However, that doesn’t take away the big door of opportunities the new features add. Consider the following scenario.

As a developer, you want to develop a extension method called Increment
for Type "int", which increments the value by 1.
The extension method should have a void return type.
We would ideally would like to do the following.

int testValue = 1;
testValue.Increment();
Console.WriteLine(testValue);   // This should print a value 2;

Now, prior to C# 7, this was difficult. There was no way we could pass a reference of a value type to an extension method. Which means that every time we call the extension method, we would be passing a copy of the value, rather than the reference, increment the copied value and then just throw away the value. The above code would print a value “1”.

This is where the new reference type comes in play. We could now use reference type in extension methods. Let’s write the code using C# 7 now.
public static int Increment(this ref int Val) => ++Val;
Now if we were to run our code, testValue would show a value “2” as desired. That’s interesting right ?.  Of course, there are other befits too. Consider the scenario where you have a gigantic struct (it is another question whether it is ideal to use a value type when the type is expected to be huge, but for sake of the demonstration we will use struct here) which gets passed to different methods. Each type you are calling the method, you are creating a copy of the gigantic struct (allocating memory).  This is going to be troublesome if you are working on a memory critical environment and the struct is going to be passed around.
With C# 7.0, you can not just pass the value type as reference, but we can also return a value type as ref. That is just brilliant by the language developers right. Let’s go ahead and look into the syntax of the same by modifying our extension method so that it returns a reference as well.
public static ref int Increment(this ref int data)
{
++data;
return ref data;
}

int testValue = 1;
ref int pointsToTestValue = ref testValue.Increment();
Console.WriteLine(testValue);
pointsToTestValue++;
Console.WriteLine(testValue);

The Increment method increments the value as it did earlier. Furthermore it now returns the same value as reference. Do note the changes in syntax for invoking as well as defining a method that return ref.

The output of above code would be

2
3

There might be scenarios where you might have to pass a data as reference, but ensure it doesn’t change. Here is where the new ‘in’ parameter comes in. The “in” keywords accepts the data by reference, but ensure that a developer who works on in future doesn’t change the value accidentally. “in” can be used in interface as well, to ensure that the implementing classes doesn’t change the value in the method.

public int Increment(in int data)
{
++data; // this is not allowed
return data;
}
Update : 
You could also make a returned reference read only, though we would go back to using the readonly keyword rather than “in”. (makes sense too). The syntax would be as follows.

 

static public ref readonly int Increment(ref int data) => ref data;

ref readonly int readonlyReference = ref Increment(ref h);
readonlyReference++; // This is not allowed.

One obvious question of having a reference return type would be , what happens to the scope ? Does that mean GC needs to be aware of references outside scope of a method ? Not exactly, the Language team has thought well about it and has implemented certain safety constrains. The method can only return a ref that either came as a parameter or are in same heap. You cannot return a reference that is created within the method.

We will cover some more features of C# 7.x in coming posts, especially the ones that has to do with reference semantics.