EN VI

C# - Memory leaking in .NET HttpClient, JsonSerializer or misused Stream?

How to C# - Memory leaking in .NET HttpClient, JsonSerializer or misused Stream?

I have a basic background class in an otherwise empty ASP.NET Core 8 Minimal API project.

App startup is just:

builder.Services.AddHttpClient();
builder.Services.AddHostedService<SteamAppListDumpService>();

The background class is for saving snapshots of a Steam API endpoint, all basic stuff:

public class SteamAppListDumpService : BackgroundService
{
    static TimeSpan RepeatDelay = TimeSpan.FromMinutes(30);
    private readonly IHttpClientFactory _httpClientFactory;

    private string GetSteamKey() => "...";

    private string GetAppListUrl(int? lastAppId = null)
    {
        return $"https://api.steampowered.com/IStoreService/GetAppList/v1/?key={GetSteamKey()}" +
            (lastAppId.HasValue ? $"&last_appid={lastAppId}" : "");
    }

    public SteamAppListDumpService(IHttpClientFactory httpClientFactory)
    {
        _httpClientFactory = httpClientFactory;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while (!stoppingToken.IsCancellationRequested)
        {
            await DumpAppList();
            await Task.Delay(RepeatDelay, stoppingToken);
        }
    }

    public record SteamApiGetAppListApp(int appid, string name, int last_modified, int price_change_number);
    public record SteamApiGetAppListResponse(List<SteamApiGetAppListApp> apps, bool have_more_results, int last_appid);
    public record SteamApiGetAppListOuterResponse(SteamApiGetAppListResponse response);

    protected async Task DumpAppList()
    {
        try
        {
            var httpClient = _httpClientFactory.CreateClient();
            var appList = new List<SteamApiGetAppListApp>();
            int? lastAppId = null;
            do
            {
                using var response = await httpClient.GetAsync(GetAppListUrl(lastAppId));
                if (!response.IsSuccessStatusCode) throw new Exception($"API Returned Invalid Status Code: {response.StatusCode}");

                var responseString = await response.Content.ReadAsStringAsync();
                var responseObject = JsonSerializer.Deserialize<SteamApiGetAppListOuterResponse>(responseString)!.response;
                appList.AddRange(responseObject.apps);
                lastAppId = responseObject.have_more_results ? responseObject.last_appid : null;

            } while (lastAppId != null);

            var contentBytes = JsonSerializer.SerializeToUtf8Bytes(appList);
            using var output = File.OpenWrite(Path.Combine(Config.DumpDataPath, DateTime.UtcNow.ToString("yyyy-MM-dd__HH-mm-ss") + ".json.gz"));
            using var gz = new GZipStream(output, CompressionMode.Compress);
            gz.Write(contentBytes, 0, contentBytes.Length);
        }
        catch (Exception ex)
        {
            Trace.TraceError("skipped...");
        }
    }
}

The API returns approx 16 MB of data in total, then it compresses/saves it to a 4 MB file, every 30 minutes, nothing else. In between runs, when the garbage collector runs I would expect the memory consumption to drop to almost nothing, but it increases over time, as an example it's been running for 2 hours on my PC and is consuming 700MB memory. On my server it's been running for 24 hours and is now consuming 2.5 GB memory.

As far as I can tell all the streams are disposed, HttpClient is created using the recommended IHttpClientFactory, does anyone know why this basic functionality is consuming so much memory even after garbage collection? I've tried looking at it in the VS manage memory dump but can't find much useful. Does this point to a memory leak in one of the classes (i.e. HttpClient / SerializeToUtf8Bytes) or am I missing something?

The responseString and contentBytes are usually around 2MB.

Solution:

Any time you allocate a contiguous block of memory >= 85,000 bytes in size, it goes into the large object heap. Unlike the regular heap it isn't compactified unless you do so manually^[1] so it can grow due to fragmentation giving the appearance of a memory leak. See Why Large Object Heap and why do we care?.

As your responseString and contentBytes are usually around 2 MB I would recommend rewriting your code to eliminate them. Instead, asynchronously stream directly from your server and to your JSON file using the relevant built-in APIs like so:

const int BufferSize = 16384;
const bool UseAsyncFileStreams = true; //https://learn.microsoft.com/en-us/dotnet/api/system.io.filestream.-ctor?view=net-5.0#System_IO_FileStream__ctor_System_String_System_IO_FileMode_System_IO_FileAccess_System_IO_FileShare_System_Int32_System_Boolean_

protected async Task DumpAppList()
{
    try
    {
        var httpClient = _httpClientFactory.CreateClient();
        var appList = new List<SteamApiGetAppListApp>();
        int? lastAppId = null;
        do
        {
            // Get the SteamApiGetAppListOuterResponse directly from JSON using HttpClientJsonExtensions.GetFromJsonAsync() without the intermediate string.
            // https://learn.microsoft.com/en-us/dotnet/api/system.net.http.json.httpclientjsonextensions.getfromjsonasync
            // If you need customized error handling see 
            // https://stackoverflow.com/questions/65383186/using-httpclient-getfromjsonasync-how-to-handle-httprequestexception-based-on
            var responseObject = (await httpClient.GetFromJsonAsync<SteamApiGetAppListOuterResponse>(GetAppListUrl(lastAppId)))
                !.response;
            appList.AddRange(responseObject.apps);
            lastAppId = responseObject.have_more_results ? responseObject.last_appid : null;

        } while (lastAppId != null);

        await using var output = new FileStream(Path.Combine(Config.DumpDataPath, DateTime.UtcNow.ToString("yyyy-MM-dd__HH-mm-ss") + ".json.gz"),
                                                FileMode.Create, FileAccess.Write, FileShare.None, bufferSize: BufferSize, useAsync: UseAsyncFileStreams);
        await using var gz = new GZipStream(output, CompressionMode.Compress);
        // See https://faithlife.codes/blog/2012/06/always-wrap-gzipstream-with-bufferedstream/ for a discussion of buffer sizes vs compression ratios.
        await using var buffer = new BufferedStream(gz, BufferSize);
        // Serialize directly to the buffered, compressed output stream without the intermediate in-memory array.
        await JsonSerializer.SerializeAsync(buffer, appList);
    }
    catch (Exception ex)
    {
        Trace.TraceError("skipped...");
    }
}

Notes:

GZipStream does not buffer its input so there is a chance that streaming to it incrementally can result in worse compression ratios. However, as discussed by Bradley Grainger in Always wrap GZipStream with BufferedStream, buffering the incremental writes using a buffer that is 8K or larger effectively eliminates the problem.
According to the docs, the useAsync argument to the FileStream constructor

Specifies whether to use asynchronous I/O or synchronous I/O. However, note that the underlying operating system might not support asynchronous I/O, so when specifying true, the handle might be opened synchronously depending on the platform. When opened asynchronously, the BeginRead(Byte[], Int32, Int32, AsyncCallback, Object) and BeginWrite(Byte[], Int32, Int32, AsyncCallback, Object) methods perform better on large reads or writes, but they might be much slower for small reads or writes. If the application is designed to take advantage of asynchronous I/O, set the useAsync parameter to true. Using asynchronous I/O correctly can speed up applications by as much as a factor of 10, but using it without redesigning the application for asynchronous I/O can decrease performance by as much as a factor of 10.

Thus you may need to test to see whether, in practice, you get better performance with UseAsyncFileStreams equal to true or false. You may also need to play around with the buffer sizes to get the best performance and compression ratio -- always being sure to keep the buffer smaller than 85,000 bytes.
If you think large object heap fragmentation may be a problem, see the MSFT article The large object heap on Windows systems: A debugger for suggestions on how to investigate further.

Since your DumpAppList() method only runs every half hour anyway, you might try compacting the large object heap manually after each run to see if that helps:

 protected override async Task ExecuteAsync(CancellationToken stoppingToken)
 {
     while (!stoppingToken.IsCancellationRequested)
     {
         await DumpAppList();
         GCSettings.LargeObjectHeapCompactionMode = GCLargeObjectHeapCompactionMode.CompactOnce;
         GC.Collect();           

         await Task.Delay(RepeatDelay, stoppingToken);
     }
 }

You may want to pass the CancellationToken stoppingToken into DumpAppList().

^[1] Do note that, in Memory management and garbage collection (GC) in ASP.NET Core: Large object heap, MSFT writes:

In containers using .NET Core 3.0 and later, the LOH is automatically compacted.

So my statement about when LOH compaction occurs may be out of date on certain platforms.