Avoiding Unnecessary byte[] Allocations in HttpContent

Over the past year and a half I’ve been actively contributing to .NET Core, mostly chipping away at low-hanging performance optimizations and other minor improvements. I thought it’d be interesting to highlight some of that work.

Today I’ll be going over an improvement I made to HttpContent.ReadAsStringAsync() to avoid unnecessary byte[] allocations when detecting the content’s encoding.

When reading the content as a string, HttpContent attempts to use the encoding specified in the Content-Type header’s charset parameter (if present), otherwise it tries to detect the encoding by looking for a byte order mark (BOM) at the start of the data, falling back to UTF8 if no BOM is detected.

The way the encoding detection was previously implemented could result in up to 4 unnecessary byte[] allocations every time the response content is read as a string.

Here’s essentially how it used to be implemented:

private static Encoding[] s_encodingsWithBom =
{
    Encoding.UTF8, // EF BB BF
    // UTF32 Must be before Unicode because its BOM is similar but longer.
    Encoding.UTF32, // FF FE 00 00
    Encoding.Unicode, // FF FE
    Encoding.BigEndianUnicode, // FE FF
};

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    byte[] preamble;
    foreach (Encoding testEncoding in s_encodingsWithBom)
    {
        preamble = testEncoding.GetPreamble();
        if (ByteArrayHasPrefix(data, dataLength, preamble))
        {
            encoding = testEncoding;
            preambleLength = preamble.Length;
            return true;
        }
    }
    
    encoding = null;
    preambleLength = 0;
    return false;
}

private static bool ByteArrayHasPrefix(byte[] byteArray, int dataLength, byte[] prefix)
{
    if (prefix == null || byteArray == null || prefix.Length > dataLength || prefix.Length == 0)
        return false;
    for (int i = 0; i < prefix.Length; i++)
    {
        if (prefix[i] != byteArray[i])
            return false;
    }
    return true;
}

TryDetectEncoding loops over a static array of Encoding instances, calling GetPreamble() on each instance to get the encoding’s BOM, and then checks to see if the data starts with that BOM. The main problem with this is the use of GetPreamble, which always allocates and returns a new byte[] each time GetPreamble is called, which means we’re potentially allocating up to 4 byte[]s each time TryDetectEncoding is called.

One simple way to avoid these allocations would be to pre-allocate and cache each of the known preamble byte[]s.

Something like:

private static readonly KeyValuePair<byte[], Encoding>[] s_preambleEncodingPairs =
{
    new KeyValuePair<byte[], Encoding>(Encoding.UTF8.GetPreamble(), Encoding.UTF8),
    // UTF32 Must be before Unicode because its BOM is similar but longer.
    new KeyValuePair<byte[], Encoding>(Encoding.UTF32.GetPreamble(), Encoding.UTF32),
    new KeyValuePair<byte[], Encoding>(Encoding.Unicode.GetPreamble(), Encoding.Unicode),
    new KeyValuePair<byte[], Encoding>(Encoding.BigEndianUnicode.GetPreamble(), Encoding.BigEndianUnicode),
};

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    foreach (KeyValuePair<byte[], Encoding> pair in s_preambleEncodingPairs)
    {
        byte[] preamble = pair.Key;
        if (ByteArrayHasPrefix(data, dataLength, preamble))
        {
            encoding = pair.Value;
            preambleLength = preamble.Length;
            return true;
        }
    }

    encoding = null;
    preambleLength = 0;
    return false;
}

This is around 2.5x faster and avoids the repeated byte[] allocations. But it’s still looping through each preamble/encoding pair to detect the encoding. Can we do better?

Here’s what I ended up with:

private const int UTF8PreambleLength = 3;
private const int UTF8PreambleFirst2Bytes = 0xEFBB;
private const byte UTF8PreambleByte2 = 0xBF;

private const int UTF32PreambleLength = 4;
private const int UTF32OrUnicodePreambleFirst2Bytes = 0xFFFE;
private const byte UTF32PreambleByte2 = 0x00;
private const byte UTF32PreambleByte3 = 0x00;

private const int UnicodePreambleLength = 2;

private const int BigEndianUnicodePreambleLength = 2;
private const int BigEndianUnicodePreambleFirst2Bytes = 0xFEFF;

private static bool TryDetectEncoding(byte[] data, int dataLength, out Encoding encoding, out int preambleLength)
{
    if (dataLength >= 2)
    {
        int first2Bytes = data[0] << 8 | data[1];

        switch (first2Bytes)
        {
            case UTF8PreambleFirst2Bytes:
                if (dataLength >= UTF8PreambleLength && data[2] == UTF8PreambleByte2)
                {
                    encoding = Encoding.UTF8;
                    preambleLength = UTF8PreambleLength;
                    return true;
                }
                break;

            case UTF32OrUnicodePreambleFirst2Bytes:
                if (dataLength >= UTF32PreambleLength && data[2] == UTF32PreambleByte2 && data[3] == UTF32PreambleByte3)
                {
                    encoding = Encoding.UTF32;
                    preambleLength = UTF32PreambleLength;
                }
                else
                {
                    encoding = Encoding.Unicode;
                    preambleLength = UnicodePreambleLength;
                }
                return true;

            case BigEndianUnicodePreambleFirst2Bytes:
                encoding = Encoding.BigEndianUnicode;
                preambleLength = BigEndianUnicodePreambleLength;
                return true;
        }
    }

    encoding = null;
    preambleLength = 0;
    return false;
}

If the data is at least 2 bytes in length, treat the first two bytes as an int and switch on that value. From there, check any remaining preamble bytes, and if there is a match, set the detected encoding and preambleLength parameters and return true.

This approach is around 10.5x faster than the original implementation and avoids all unnecessary allocations.

The new implementation isn’t quite as straightforward as the original approach, but in this case the tradeoff is worth it for the reduced memory allocations and improved speed in the HTTP client stack.

What is the difference between Array.Empty<T>() and Enumerable.Empty<TResult>()?

When I wrote about Array.Empty<T>(), one of the common questions readers asked was:

What is the difference between Array.Empty<T>() and Enumerable.Empty<TResult>()?

The primary difference is the return type: Array.Empty<T>() returns T[] (an array), whereas Enumerable.Empty<TResult>() returns IEnumerable<TResult> (an enumerable).

You’re going to be able to do more with the result of Array.Empty<T>() than Enumerable.Empty<TResult>() as arrays implement the gamut of collection interfaces (e.g. IList<T>, IReadOnlyList<T>, ICollection<T>, IReadOnlyCollection<T>, IEnumerable<T>, etc.), in addition to all the functionality that Array itself provides.

There’s also a difference in layering:

  • Array.Empty<T>() is at the lowest layer on the System.Array type in System.Runtime.dll in .NET Core (mscorlib.dll in .NET Framework).

  • Enumerable.Empty<TResult>() is in a higher layer on the System.Linq.Enumerable type in System.Linq.dll in .NET Core (System.Core.dll in .NET Framework).

In .NET Framework, Array.Empty<T>() and Enumerable.Empty<TResult>() each have their own separate caches, but I recently changed Enumerable.Empty<TResult>() in .NET Core to simply return the result of calling Array.Empty<T>(), so both are now backed by the same cache in .NET Core. I’m not sure when/if this change will flow into a future version of .NET Framework.

Going forward, I’ll be using Array.Empty<T>() over Enumerable.Empty<TResult>() as Array.Empty<T>()’s return type provides more functionality (array vs. IEnumerable<T>) and requires less dependencies (lower layer).

Empty Array Enumerators

Speaking of Array.Empty<T>(), there’s a related change in .NET Framework 4.6 and .NET Core: empty arrays now return a single cached instance of IEnumerator<T> when GetEnumerator() is called.

Previously, calls to GetEnumerator() on an empty array would always return a new enumerator instance. That’s no longer the case!

Array.Empty<T>()

I occasionally find myself in situations where it is desirable to use or return an empty array or collection. If the empty case is common, I’ll usually cache and reuse the empty array as an optimization to avoid unnecessary future allocations, particularly when writing high performance library code.

For example:

private static byte[] s_empty;

public byte[] GenerateBytes()
{
    // Fast path for the empty case
    if (IsEmpty)
    {
        return s_empty ?? (s_empty = new byte[0]);
    }

    // Actual operation here...
}

This can lead to many private static s_empty fields sprinkled throughout a code base, with each field holding a reference to its own private empty array instance. Of course, a central cache can be used to share empty array instances within a code base, but such coupling isn’t always desirable and it doesn’t enable sharing across all application code and libraries (including third party and .NET Framework libraries) used in an application.

.NET Framework 4.6 and .NET Core address this with the new Array.Empty<T>() method, which, as the name implies, returns an empty array. Array.Empty<T>() provides a central cache of empty array instances, enabling sharing of the empty instances across all libraries and application code that use it.

It’s currently implemented as follows:

public abstract class Array : ...
{
    ...

    public static T[] Empty<T>()
    {
        return EmptyArray<T>.Value;
    }

    ...
}

internal static class EmptyArray<T>
{
    public static readonly T[] Value = new T[0];
}

The first time Array.Empty<T>() is called for a given T, a new empty T[] array is created, cached, and returned; subsequent calls return the cached instance.

Note: While this is how Array.Empty<T>() is currently implemented, the documentation only states that it “returns an empty array” – there’s no mention of caching or returning the same cached instances. Presumably this is intentional to allow for potential future fine-tuning of the cache strategy (e.g. I could imagine future changes that allowed for unused instances to be garbage collected after some expiration threshold).

Regardless, I’d still recommend Array.Empty<T>() as the preferred method of retrieving empty arrays going forward.

With this, the earlier example can now be updated to simply return Array.Empty<byte>() directly:

public byte[] GenerateBytes()
{
    // Fast path for the empty case
    if (IsEmpty)
    {
        return Array.Empty<byte>();
    }

    // Actual operation here...
}

The fast path returns the same empty byte[] array instance as any other caller of Array.Empty<byte>().


Most uses of new T[0] throughout the .NET Framework libraries have been replaced with Array.Empty<T>() (uses of new T[0] that were missed for .NET Framework 4.6 have since been addressed in .NET Core, with a final use case still under discussion).

One interesting place where Array.Empty<T>() is now used is with params arrays.

Consider the following Log method:

public void Log(string message, params object[] args)
{
}

params allows Log to be called without specifying any args:

Log("Hello World!"); // no args passed

The C# compiler used in Visual Studio 2013 and earlier will compile this as:

Log("Hello World!", new object[0]);

The Roslyn C# compiler used in Visual Studio 2015 and later, on the other hand, will use Array.Empty<T>(), if it’s available:

Log("Hello World!", Array.Empty<object>());

Related: empty array enumerators are now cached.

Archive