ManagedLifetimeMetricHandle locks ReadWriterLockSlim, causes reader lock contention, high CPU and the threadpool exhaustion.

## Description
We have observed a severe performance issue in our high-throughput application using prometheus-net (v8.2.0+).

The issue stems from the usage of `ReaderWriterLockSlim` in `ManagedLifetimeMetricHandle` to protect metric leases. `ReaderWriterLockSlim` enforces a Writer Preference policy. This means that when the background "Reaper" task requests the write lock to clean up expired metrics, it immediately blocks all new concurrent readers (metric updates), even before the writer has acquired the lock.

The Mechanism of Failure:

- High Throughput Readers: Our application calls `GetOrCreateLifetimeAndIncrementLeaseCount` (acquiring the Read Lock) thousands of times per second.
- Suspected infrequent Writers: 
a) The background "Reaper" task (TakeExpiredLeases) wakes up periodically to clean up expired metrics and calls EnterWriteLock().
b) `GetOrCreateLifetimeAndIncrementLeaseCount` calls EnterWriteLock() to initialize new metric if it is not found.
- The "Dam" Effect: As soon as the Writer requests the lock, `ReaderWriterLockSlim` blocks all new Readers to prevent writer starvation.
- The Stampede and Kernel Thrashing: Due to high throughput, thousands of reader threads queue up instantly while waiting for the Writer to enter and complete. This stampede causes extreme CPU contention (kernel time) as threads spin and call `Thread.Yield` / `do_sched_yield`, leading to thread pool starvation and application unresponsiveness.

## Expected Behavior
Background maintenance tasks (like cleaning up expired metrics or adding new metric) should not block the hot path of metric collection, or at least should not strictly prioritize themselves over the application's primary workload to the point of creating a denial-of-service.

## Actual Behavior
The application enters a "death spiral." The write lock request effectively pauses the application's metric recording. The resulting queue of blocked threads causes the lock primitives to thrash the CPU scheduler upon release.

##  Evidence / Stack Traces
We captured the crash using Linux perf. The trace shows threads stuck in `ReaderWriterLockSlim.SpinLock.EnterSpin`, calling `Thread.Sleep(1)` (mapped to `do_sched_yield`), consuming 100% CPU.

Stack Trace:
```
kernel.kallsyms!do_sched_yield
...
System.Threading.ReaderWriterLockSlim+SpinLock.EnterSpin(...)
System.Threading.ReaderWriterLockSlim.TryEnterReadLockCore(...)
Prometheus.ManagedLifetimeMetricHandle...GetOrCreateLifetimeAndIncrementLeaseCount(...)
Prometheus.ManagedLifetimeMetricHandle....WithLease(...)
Prometheus.MeterAdapter.OnMeasurementRecorded(...)
```
#### CPU stats:
<img width="554" height="140" alt="Image" src="https://github.com/user-attachments/assets/db21240d-b0b9-483e-8f6e-62de578c2876" />

#### The Thread Pool stats:
<img width="554" height="140" alt="Image" src="https://github.com/user-attachments/assets/2f967447-ab80-44de-8b67-10e1a84a973b" />

#### The Linux perf trace
##### The starved readers
<img width="988" height="253" alt="Image" src="https://github.com/user-attachments/assets/0180a7d4-84b9-438d-9155-a4da78899b02" />

##### The Reaper's writer lock 
<img width="988" height="222" alt="Image" src="https://github.com/user-attachments/assets/9a113436-b924-46cc-ae0c-35f9271667cc" />

## Related Issues
This investigation may have identified the likely root cause for [#499](https://github.com/prometheus-net/prometheus-net/issues/499). In that report, we observed identical thread pool exhaustion. We captured a process dump at the time, but it was inconclusive. A dump provides a static snapshot of thread states (Waiting/Running) but cannot capture CPU contention dynamics. It showed threads waiting on locks that could've been just a symptom.

## Relevant Code
- The usage of ReaderWriterLockSlim: https://github.com/prometheus-net/prometheus-net/blob/60e9106a83ff1274fec0022c37366f04822b1d1b/Prometheus/ManagedLifetimeMetricHandle.cs#L263
- The "Reaper" task taking the Write Lock: https://github.com/prometheus-net/prometheus-net/blob/60e9106a83ff1274fec0022c37366f04822b1d1b/Prometheus/ManagedLifetimeMetricHandle.cs#L417
- The GetOrCreateLifetimeAndIncrementLeaseCount taking the Write Lock: https://github.com/prometheus-net/prometheus-net/blob/60e9106a83ff1274fec0022c37366f04822b1d1b/Prometheus/ManagedLifetimeMetricHandle.cs#L286

## Suggested Fixes
- Alternative Primitives: Consider a standard lock (Monitor or the .NET 9+ [System.Threading.Lock](https://learn.microsoft.com/en-us/dotnet/api/system.threading.lock)). While a standard lock also blocks, it lacks the strict "Writer Preference" that dams up readers before acquisition, which might reduce the severity of the queue pile-up. Ironically, `ReadWriterLockSlim` replaced the standard lock in [Rewrite lifetime tracking in `ManagedLifetimeMetricHandle` to be leaner
](https://github.com/prometheus-net/prometheus-net/commit/87ad4479c03fb7bd55091687eb353ca656938c90) part of https://github.com/prometheus-net/prometheus-net/pull/458 but could not find any actual issue that explains why this had been done.
- Granularity: Use finer-grained locking (`ConcurrentDictionary`) or lock-free structures for the lease handles to avoid a global lock on the hot path.

## A Workaround
Please confirm that it is possible to effectively disable the Reaper Task by 
```
MeterAdapterOptions.MetricsExpireAfter = Timeout.InfiniteTimeSpan;
```
and it is a safe thing to do for the application that suppresses Debug Metrics but collects Process Metrics, Event Counters and Meters?
```
        Prometheus.Metrics.SuppressDefaultMetrics(new SuppressDefaultMetricOptions
        {
            SuppressDebugMetrics = true,
            SuppressProcessMetrics = false,
            SuppressEventCounters = false,
            SuppressMeters = false
        });
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ManagedLifetimeMetricHandle locks ReadWriterLockSlim, causes reader lock contention, high CPU and the threadpool exhaustion. #507

Description

Expected Behavior

Actual Behavior

Evidence / Stack Traces

CPU stats:

The Thread Pool stats:

The Linux perf trace

The starved readers

The Reaper's writer lock

Related Issues

Relevant Code

Suggested Fixes

A Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ManagedLifetimeMetricHandle locks ReadWriterLockSlim, causes reader lock contention, high CPU and the threadpool exhaustion. #507

Description

Description

Expected Behavior

Actual Behavior

Evidence / Stack Traces

CPU stats:

The Thread Pool stats:

The Linux perf trace

The starved readers

The Reaper's writer lock

Related Issues

Relevant Code

Suggested Fixes

A Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions