-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: remove calls to AgentFn in sub APIs that can use static values #21327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, MetadataAPI, AppsAPI, StatsAPI, and ConnectionLogAPI called GetWorkspaceAgentByID on every request to retrieve the agent ID and Name. For deployments with 1000+ agents, this resulted in 4000+ queries/second. This change introduces agent field caching to eliminate these redundant database queries. Since agent ID and Name are static fields that never change during an agent connection lifetime, we can safely cache them when the agent API is initialized. Changes: - Add CachedAgentFields struct with thread-safe ID and Name getters - Update agentapi.New() to accept agent parameter and initialize cache - Modify MetadataAPI, AppsAPI, StatsAPI, and ConnectionLogAPI to use cached fields instead of calling AgentFn - Update all test files to initialize agent cache properly - Add fallback to AgentFn if cache is not populated (safety) This reduces database query load by ~4000 queries/second for deployments with 1000 agents, while maintaining backward compatibility with existing code paths. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit refactors the agent caching implementation to ensure type safety and prevent functions from accessing non-cached agent fields. Key changes: 1. **CachedAgentFields API Pattern**: Updated CachedAgentFields to follow the same pattern as CachedWorkspaceFields: - Added AsAgentID() returning (uuid.UUID, bool) - Added AsAgentFields() returning (id, name, ok) - Removed IsPopulated() in favor of the new pattern 2. **publishWorkspaceUpdate Signature**: Changed from taking *database.WorkspaceAgent to *CachedAgentFields. This enforces at compile time that only cached fields can be accessed. 3. **Agent Caching Safety**: Added documentation clarifying that agent ID and Name are immutable fields that never change during an agent's lifetime, unlike workspace fields which can change during prebuild claims. 4. **Sub-API Updates**: Updated AppsAPI, LifecycleAPI, and LogsAPI to use the new signature. APIs that need full agent data (LifecycleAPI, LogsAPI) create temporary CachedAgentFields instances from retrieved agents. 5. **Test Updates**: Updated all test mock functions to match the new PublishWorkspaceUpdateFn signature. All tests pass and the implementation maintains backward compatibility while improving type safety.
This commit fixes the issues identified in code review: 1. **Removed RWMutex**: CachedAgentFields no longer needs a mutex since values are set once at initialization and never modified after that, unlike CachedWorkspaceFields which can be updated via the refresh loop. 2. **Added proper ok checks**: All API methods now use the ok pattern to check if cached fields are populated: - BatchUpdateAppHealths: Uses AsAgentID() with ok check - UpdateStats: Uses AsAgentFields() with ok check - BatchUpdateMetadata: Uses AsAgentID() with ok check - Connection: Uses AsAgentFields() with ok check 3. **Removed ID() and Name() methods**: These were removed in favor of AsAgentID() and AsAgentFields() which return (value, ok) pairs following the same pattern as CachedWorkspaceFields. 4. **Cleaned up imports**: Removed unused uuid import from stats.go. All tests pass and the implementation now properly follows the established patterns in the codebase.
When the cache is not populated and we fall back to querying the agent, we need to create a CachedAgentFields from the fallback agent to pass to PublishWorkspaceUpdateFn. Previously we were still passing a.Agent which could be unpopulated, causing the publish function to receive an empty cache. Now we create a local agentCache variable that is either set to a.Agent (if populated) or a newly created CachedAgentFields from the fallback agent.
LogsAPI and LifecycleAPI don't have cached agent fields, so they should continue using the database.WorkspaceAgent signature. Added publishWorkspaceUpdateLegacy wrapper to handle these APIs.
- Remove publishWorkspaceUpdateLegacy wrapper - publishWorkspaceUpdate now takes uuid.UUID directly instead of *CachedAgentFields - AppsAPI.PublishWorkspaceUpdateFn signature simplified to take uuid.UUID - BatchUpdateAppHealths gets agent ID from cache or fallback and passes it directly - LogsAPI and LifecycleAPI use inline closures to extract agent.ID
- Change cachedAgentFields from *CachedAgentFields to CachedAgentFields (value type) - Remove AsAgentID() and AsAgentFields() methods with ok pattern - Add simple ID() and Name() accessors - Remove all fallback logic in sub-APIs since agent cache is always initialized - Agent ID and Name are set once at initialization and never change
…-APIs Removes AgentFn field from AppsAPI, StatsAPI, MetadataAPI, and ConnLogAPI since all these APIs now get agent ID and name from the cached agent fields. The AgentFn callback was only needed as a fallback mechanism, but since the cache is always populated on initialization, the fallback was dead code. Updates: - coderd/agentapi/apps.go: Remove AgentFn field - coderd/agentapi/stats.go: Remove AgentFn field - coderd/agentapi/metadata.go: Remove AgentFn field - coderd/agentapi/connectionlog.go: Remove AgentFn field - coderd/agentapi/api.go: Remove AgentFn from sub-API initialization - coderd/agentapi/*_test.go: Update tests to remove AgentFn references 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fixes two issues from previous commit: 1. In stats.go and connectionlog.go: Remove unnecessary RBAC wrapping since when AsWorkspaceIdentity() returns !ok (cache is empty), calling it again inside the fallback block will also return !ok, making the RBAC wrapping dead code. The RBAC fast path is only useful when the cache HAS a workspace identity to inject. 2. In connectionlog_test.go: Correctly initialize workspaceAsCacheFields to populate the cache, preventing unnecessary GetWorkspaceByAgentID calls. The test comment explicitly states "With agent/workspace caching, GetWorkspaceByAgentID is not called", so we need a populated cache. Changes: - coderd/agentapi/stats.go: Remove RBAC wrapping, use base ctx directly - coderd/agentapi/connectionlog.go: Remove RBAC wrapping, use base ctx - coderd/agentapi/connectionlog_test.go: Restore workspaceAsCacheFields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Since CachedAgentFields and all its usage are within the agentapi package, we don't need accessor methods. Make the fields public (ID, Name) and access them directly instead of using ID() and Name() methods. This simplifies the code while maintaining the same functionality. The fields remain read-only in practice since UpdateValues is only called once at initialization. Changes: - coderd/agentapi/cached_agent.go: Remove ID() and Name() methods, make fields public - coderd/agentapi/apps.go: Use a.Agent.ID instead of a.Agent.ID() - coderd/agentapi/stats.go: Use a.Agent.ID and a.Agent.Name directly - coderd/agentapi/metadata.go: Use a.Agent.ID directly - coderd/agentapi/connectionlog.go: Use a.Agent.ID and a.Agent.Name directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove the UpdateValues method from CachedAgentFields and use struct literal initialization instead. This is more idiomatic Go since the fields are now public. Changes: - Remove UpdateValues method from cached_agent.go - Update api.go to use struct literal initialization - Update all test files to use inline initialization instead of empty struct + UpdateValues call - Remove unused agentCache variable from metadata_test.go All tests pass successfully.
Use a.Agent.ID and a.Agent.Name directly instead of allocating intermediate variables. This reduces unnecessary allocations and makes the code more concise. Changes: - connectionlog.go: Remove agentID and agentName variables - stats.go: Remove agentID and agentName variables - metadata.go: Remove agentID variable - apps.go: Remove agentID variable All tests pass successfully.
Remove unnecessary database.WorkspaceAgent allocation in stats reporting by passing only the required fields (agentID and agentName) directly to ReportAgentStats. Changes: - ReportAgentStats: Accept agentID and agentName parameters instead of full database.WorkspaceAgent object - stats.go: Remove minimalAgent variable allocation, pass cached fields directly - agentapi.New: Keep full database.WorkspaceAgent parameter, extract fields internally for caching This eliminates the minimalAgent struct allocation while keeping the agentapi.New API clean by accepting the full agent object. All tests pass successfully.
|
I have read the CLA Document and I hereby sign the CLA Callum Styan seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. |
In a number of the AgentAPI sub APIs we have frequent calls to
AgentFn(GetWorkspaceAgentByID) even when the functions present in that API will really only use the agent ID (and sometimes the agent Name).We already have caching of the workspace itself when the workspace is not a prebuild, and use it when needing values from the workspace row and for RBAC fast path purposes. However, the agent can also be cached in cases when we only need to use static fields, unless there is an additional RBAC/auth related purpose to the AgentFn calls that I am missing here.
The main outcome of this PR is that for frequently called AgentAPI endpoints, especially the metadata updates, we will eliminate the need to call
GetWorkspaceAgentByIDcompletely. In the case of recent scaletesting for 600 workspaces with 15 metadata each at a 10s interval this was ~300 requests per second.