Skip to content

Conversation

@cstyan
Copy link
Contributor

@cstyan cstyan commented Dec 16, 2025

Another optimization with the help of tasks.

GetWorkspaceBuildByJobID is our most expensive query in the last 7d, called 3.3M times in that period. AFAICT this is called most frequently as a result of the <provider>-instance-identity endpoint calls, which I believe are associated with agent -> coderd connection establishment. We may have an issue with connections having to be re-established frequently, but in any case we have at least two cascaded calls to this query as a result of the chain of DB calls in this function.

Looking at a trace for this endpoint we can see that there's at least 13 RT to the database for the endpoint, 2 of which are GetWorkspaceBuildByJobID. There's also ~0.4 calls per second to these endpoints, which means 0.8 calls per second to GetWorkspaceBuildByJobID, or ~484k per week. So this should result in just under 15% reduction in the call volume to this query.

We're achieving this by circumventing the calls to api.Database.GetWorkspaceResourceByID and api.Database.GetProvisionerJobByID, both of which eventually call GetWorkspaceBuildByJobID as part of their auth checks in dbauthz, and instead calling a new function/query GetWorkspaceResourceWithJobByID which combines the two queries we actually need into one and checks the authorization context to ensure we have the right permissions without the need for an additional DB query to do so.

@github-actions
Copy link

github-actions bot commented Dec 16, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@cstyan cstyan force-pushed the callum/workspacebyjob-perf branch from 73627a0 to 915b2b0 Compare December 16, 2025 20:40
…ance-identity

Optimize handleAuthInstanceID to use a new combined query that fetches
both workspace resource and provisioner job information in a single
database call.

Before: GetWorkspaceResourceByID cascades to GetProvisionerJobByID
(which calls GetWorkspaceBuildByJobID), then we explicitly call
GetProvisionerJobByID again (triggering GetWorkspaceBuildByJobID again).

After: GetWorkspaceResourceWithJobByID fetches both resource and job
info in one query, checking first that the right authorizeContext
permissions are present.

This reduces GetWorkspaceBuildByJobID calls from 2 to 0 per
instance-identity request (~0.4 req/s).
@cstyan cstyan force-pushed the callum/workspacebyjob-perf branch from f48b1de to 19cc851 Compare December 16, 2025 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants