Skip to content

DotNetStats.UpdateMetrics() blocks multiple threads causing lock contention and threadpool exhaustion. #499

@baal2000

Description

@baal2000

Description

A .NET application could experience theese symptoms after a short CPU spike, and then contributing to further process performance degradation that could last for minutes.

Image

The process dump shows 100s of Kestrel threads congregating on a lock inside Prometheus.DotNetStats.UpdateMetrics()

lock (_updateLock)

Prometheus.DotNetStats.UpdateMetrics()
Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<RunBeforeCollectCallbacksAsync>d__22, Prometheus.NetStandard]](<RunBeforeCollectCallbacksAsync>d__22 ByRef)
Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.CollectorRegistry+<CollectAndSerializeAsync>d__21, Prometheus.NetStandard]](<CollectAndSerializeAsync>d__21 ByRef)
Prometheus.MetricServerMiddleware+<Invoke>d__7.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Prometheus.MetricServerMiddleware+<Invoke>d__7, Prometheus.AspNetCore]](<Invoke>d__7 ByRef)
Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Builder.Extensions.MapMiddleware+<InvokeCore>d__4, Microsoft.AspNetCore.Http.Abstractions]](<InvokeCore>d__4 ByRef)
Microsoft.AspNetCore.Builder.Extensions.MapMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)
Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware+<InvokeCore>d__4.MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware+<InvokeCore>d__4, Microsoft.AspNetCore.ResponseCompression]](<InvokeCore>d__4 ByRef)
Microsoft.AspNetCore.ResponseCompression.ResponseCompressionMiddleware.Invoke(Microsoft.AspNetCore.Http.HttpContext)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequests>d__238`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequests>d__238`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__237`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol+<ProcessRequestsAsync>d__237`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__237`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection+<ProcessRequestsAsync>d__12`1[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ProcessRequestsAsync>d__12`1<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[[System.__Canon, System.Private.CoreLib]](Microsoft.AspNetCore.Hosting.Server.IHttpApplication`1<System.__Canon>)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__8[[System.__Canon, System.Private.CoreLib]].MoveNext()
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[[Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1+<ExecuteAsync>d__8[[System.__Canon, System.Private.CoreLib]], Microsoft.AspNetCore.Server.Kestrel.Core]](<ExecuteAsync>d__8<System.__Canon> ByRef)
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].ExecuteAsync()
Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1[[System.__Canon, System.Private.CoreLib]].System.Threading.IThreadPoolWorkItem.Execute()
System.Threading.ThreadPoolWorkQueue.Dispatch()
System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart()
DebuggerU2MCatchHandlerFrame

A single thread is inside the locked section:

Interop+Sys.Open(System.String, OpenFlags, Int32)
Microsoft.Win32.SafeHandles.SafeFileHandle.Open(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, System.IO.FileOptions, Int64, System.IO.UnixFileMode, Int64 ByRef, System.IO.UnixFileMode ByRef, Boolean, Boolean ByRef, System.Func`4<ErrorInfo,OpenFlags,System.String,System.Exception>)
System.IO.Strategies.OSFileStreamStrategy..ctor(System.String, System.IO.FileMode, System.IO.FileAccess, System.IO.FileShare, System.IO.FileOptions, Int64, System.Nullable`1<System.IO.UnixFileMode>)
Interop+procfs.TryReadFile(System.String, System.String ByRef)
Interop+procfs.TryParseStatFile(System.String, ParsedStat ByRef)
System.Diagnostics.ProcessManager.CreateProcessInfo(ParsedStat ByRef, ParsedStatus ByRef, System.String)
System.Diagnostics.ProcessManager.CreateProcessInfo(Int32)
System.Diagnostics.Process.EnsureState(State)
Prometheus.DotNetStats.UpdateMetrics()
...

Configuration

  • .NET 8
  • Ubuntu 20.04 x64

Analysis

The lock had been introduced in #173 as a solution to the Process instance thread safety concern.

Suggestion

  1. If it is the responsibility of the caller to protect the Prometheus endpoint through appropriate Kestrel limits, then add an appropriate section similar to Protecting the metrics endpoint from unauthorized access .
  2. As precaution the library could use a caching approach to collecting the metrics by storing in memory for a short period of time to reuse by concurrent requests. After the cache expires it would refresh the metrics and repeat the cycle.Something like:
            TimeSpan elapsed = cachedMetricsExpiryStopwatch.Elapsed;
            var collectedMetrics = collectedMetricsCached;

            // Check if the cache exists and not has expired. If either is false then collect the metrics into collectedMetricsCached struct.
            // LastWriteRecheckInterval is a reasonably small value that limits time resolution from infinite a range of 100 milliseconds to a second, 
            if (collectedMetrics is null || elapsed > LastWriteRecheckInterval)
            {
                lock (collectProcessMetricStopwatch)
                {
                    collectedMetrics = collectedMetricsCached;
                    elapsed = cachedMetricsExpiryStopwatch.Elapsed;

                    if (collectedMetrics is null || elapsed > LastWriteRecheckInterval)
                    {
                         // TBD: GetProcessMetrics. It refreshes Process instance, collects the metrics and stores as fields
                         // into collectedMetricsCached structure, resets cachedMetricsExpiryStopwatch.
                         collectedMetrics= GetProcessMetrics(); 
                    }
                }
            }
            
            // Pass the cached metrics to the requestor.
            for (var gen = 0; gen <= GC.MaxGeneration; gen++)
                _collectionCounts[gen].IncTo(collectedMetrics.CollectionCount[gen]);

            _totalMemory.Set(collectedMetrics.TotalMemory);
            _virtualMemorySize.Set(collectedMetrics.VirtualMemorySize64);
            _workingSet.Set(collectedMetrics.WorkingSet64);
            _privateMemorySize.Set(collectedMetrics.PrivateMemorySize64);
            _cpuTotal.IncTo(collectedMetrics.TotalProcessorTime.TotalSeconds);
            _openHandles.Set(collectedMetrics.HandleCount);
            _numThreads.Set(collectedMetrics.Threads.Count);

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions