Summary
Instant::into_mach_absolute_time_ceil (library/std/src/sys/time/unix.rs:102) invokes mach_timebase_info on every call, without caching. The returned numer/denom values are boot-time constants of the process and never change, so the kernel-side resolution (Mach trap / commpage read) is pure waste on every invocation.
This function is on the hot path of std::thread::sleep_until on every Apple target (#[cfg(target_vendor = "apple")])
A cache for exactly this value used to exist (added in #77727, 2020). It was removed in bc300102d4 together with the broader feature it backed (clock_gettime migration). The cache was not restored when mach_timebase_info was reintroduced for sleep_until.
Reproduction
Code-level evidence:
grep -rn 'mach_timebase_info' library/std/src/sys/
# library/std/src/sys/time/unix.rs:104,110,115,116 -- only call site, no cache scaffolding
git log --all --oneline -S 'mach_timebase_info' -- library/std/
# f30cc74fb41 -- cache introduced (#77727, 2020)
# bc300102d43 -- function removed entirely (Sep 2023, clock_gettime migration)
# 959be82effe -- function re-introduced WITHOUT cache (#151004, 2026)
Micro-benchmark on stable rustc (no nightly, no custom std required) showing the per-call cost regressed:
use std::hint::black_box;
use std::sync::OnceLock;
use std::time::Instant;
#[repr(C)]
struct MachTimebaseInfo { numer: u32, denom: u32 }
unsafe extern "C" {
fn mach_timebase_info(info: *mut MachTimebaseInfo) -> i32;
}
#[inline(never)]
fn uncached() -> (u32, u32) {
let mut t = MachTimebaseInfo { numer: 0, denom: 0 };
let kr = unsafe { mach_timebase_info(&mut t) };
assert_eq!(kr, 0);
(t.numer, t.denom)
}
#[inline(never)]
fn cached() -> (u32, u32) {
static T: OnceLock<(u32, u32)> = OnceLock::new();
*T.get_or_init(uncached)
}
fn main() {
let _ = cached(); // warm-up
const N: u32 = 10_000_000;
let s = Instant::now();
let mut acc = 0u64;
for _ in 0..N { let (a,b) = uncached(); acc = acc.wrapping_add(a as u64).wrapping_add(b as u64); }
let d1 = s.elapsed();
black_box(acc);
let s = Instant::now();
let mut acc = 0u64;
for _ in 0..N { let (a,b) = cached(); acc = acc.wrapping_add(a as u64).wrapping_add(b as u64); }
let d2 = s.elapsed();
black_box(acc);
println!("uncached: {:?}/call", d1 / N);
println!("cached: {:?}/call", d2 / N);
}
Result on x86_64-apple-darwin, macOS 15.7.7, rustc -O:
uncached: 3ns/call (10M iters in 34.5ms)
cached: 1ns/call (10M iters in 10.9ms)
The trap-or-commpage cost is ~3x the cost of a relaxed atomic load. Every sleep_until call on Apple pays that delta unnecessarily.
Suggested fix
Extract a private helper in the same module that memoizes the pair via crate::sync::OnceLock<(u32, u32)>; into_mach_absolute_time_ceil then reads the cached pair and does only the arithmetic. Change is purely additive, no API change, no behavior change, stays inside the existing #[cfg(target_vendor = "apple")] gate. Mirrors the precedent set by #77727.
I have a working patch and can open a PR if useful.
Meta
rustc 1.93.1 (01f6ddf75 2026-02-11)
host: x86_64-apple-darwin
macOS 15.7.7 (Sequoia)
Affects every Apple target: x86_64-apple-darwin, aarch64-apple-darwin, iOS, tvOS, watchOS, visionOS - the regressed code path is gated #[cfg(target_vendor = "apple")].
Summary
Instant::into_mach_absolute_time_ceil(library/std/src/sys/time/unix.rs:102) invokesmach_timebase_infoon every call, without caching. The returnednumer/denomvalues are boot-time constants of the process and never change, so the kernel-side resolution (Mach trap / commpage read) is pure waste on every invocation.This function is on the hot path of
std::thread::sleep_untilon every Apple target (#[cfg(target_vendor = "apple")])A cache for exactly this value used to exist (added in #77727, 2020). It was removed in
bc300102d4together with the broader feature it backed (clock_gettimemigration). The cache was not restored whenmach_timebase_infowas reintroduced forsleep_until.Reproduction
Code-level evidence:
Micro-benchmark on stable rustc (no nightly, no custom std required) showing the per-call cost regressed:
Result on
x86_64-apple-darwin, macOS 15.7.7,rustc -O:The trap-or-commpage cost is ~3x the cost of a relaxed atomic load. Every
sleep_untilcall on Apple pays that delta unnecessarily.Suggested fix
Extract a private helper in the same module that memoizes the pair via
crate::sync::OnceLock<(u32, u32)>;into_mach_absolute_time_ceilthen reads the cached pair and does only the arithmetic. Change is purely additive, no API change, no behavior change, stays inside the existing#[cfg(target_vendor = "apple")]gate. Mirrors the precedent set by #77727.I have a working patch and can open a PR if useful.
Meta
Affects every Apple target:
x86_64-apple-darwin,aarch64-apple-darwin, iOS, tvOS, watchOS, visionOS - the regressed code path is gated#[cfg(target_vendor = "apple")].