perf: reduce calls to GetWorkspaceBuildByJobID #21298
Open
+147
−13
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Another optimization with the help of tasks.
GetWorkspaceBuildByJobID is our most expensive query in the last 7d, called 3.3M times in that period. AFAICT this is called most frequently as a result of the
<provider>-instance-identityendpoint calls, which I believe are associated with agent -> coderd connection establishment. We may have an issue with connections having to be re-established frequently, but in any case we have at least two cascaded calls to this query as a result of the chain of DB calls in this function.Looking at a trace for this endpoint we can see that there's at least 13 RT to the database for the endpoint, 2 of which are
GetWorkspaceBuildByJobID. There's also ~0.4 calls per second to these endpoints, which means 0.8 calls per second toGetWorkspaceBuildByJobID, or ~484k per week. So this should result in just under 15% reduction in the call volume to this query.We're achieving this by circumventing the calls to
api.Database.GetWorkspaceResourceByIDandapi.Database.GetProvisionerJobByID, both of which eventually callGetWorkspaceBuildByJobIDas part of their auth checks in dbauthz, and instead calling a new function/queryGetWorkspaceResourceWithJobByIDwhich combines the two queries we actually need into one and checks the authorization context to ensure we have the right permissions without the need for an additional DB query to do so.