-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(agent): support deleting dev containers #21247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
49794cc to
4504755
Compare
4504755 to
0fcab3f
Compare
mafredri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick review, didn't really dig into the tests yet.
agent/agentcontainers/api.go
Outdated
| return | ||
| } | ||
|
|
||
| // Check if the devcontainer is currently starting - if so, we can't delete it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be rewritten and explain why we've decided to have this limitation. Technically there's nothing preventing us from implementing cancellation of create, etc.
PS. I see you -.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Believe it or not, I think I wrote that 😂
agent/agentcontainers/api.go
Outdated
| } | ||
|
|
||
| // Similarly, if already stopping, don't allow another delete. | ||
| if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusStopping { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make this state deleting instead? I would like to keep stopping as a separate state for the future, one that does not mean delete (i.e. start/stop/delete for the full suite).
Logically a delete could be a stop followed by a delete. I think both are valid use-cases (a user may want to just quickly restart the container to reset the processes, deletion will take potentially much longer).
| Detail: err.Error(), | ||
| }) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error paths seem to be lacking state handling for the devcontainer. Should probably update dc.Error, perhaps others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have gone ahead and implemented some better error handling
agent/agentcontainers/api.go
Outdated
| delete(api.recreateSuccessTimes, workspaceFolder) | ||
| delete(api.recreateErrorTimes, workspaceFolder) | ||
| delete(api.usingWorkspaceFolderName, workspaceFolder) | ||
| delete(api.injectedSubAgentProcs, workspaceFolder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check: Will this leave the devcontainer available in the dashboard to start, like we do for e.g. detected containers?
Just because the user deleted it, doesn't necessarily mean they won't want to start it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this PR is to explicitly support removing it, not stopping it (as per #19062)
| } | ||
|
|
||
| // Delete the subagent if it exists. | ||
| if subAgentID != uuid.Nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we run this cleanup even if stop/remove fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can't see a reason for not cleaning it up
| httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{ | ||
| Message: "An internal error occurred deleting the agent", | ||
| Detail: err.Error(), | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These failures leave the API in a weird state, we've entered a terminal state (stopping), and can't re-invoke delete even if the errors are ephemeral. The container will become immortal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've slightly reworked this. On error they now enter WorkspaceAgentDevcontainerStatusError with the Error field filled in.
| return nil, f.execErr | ||
| } | ||
|
|
||
| func (f *fakeContainerCLI) Stop(ctx context.Context, name string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're introducing mutation, we'll need to add a mutex to these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added!
| }, nil | ||
| } | ||
|
|
||
| func (api *API) devcontainerByIDLocked(devcontainerID string) (codersdk.WorkspaceAgentDevcontainer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With methods like this it's nice to add a comment that screams at you to make sure you hold the mutex when you call it.
Also, it would make sense to use this method in handleDevcontainerRecreate().
agent/agentcontainers/api.go
Outdated
| if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusDeleting { | ||
| api.mu.Unlock() | ||
| httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{ | ||
| Message: "Devcontainer is already being deleted", | ||
| Detail: fmt.Sprintf("Devcontainer %q is already being deleted.", dc.Name), | ||
| }) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably also short-circuit this in the recreate handler. If someone tries to delete and then immediately recreate a devcontainer while it's in the "deleting" state, I would imagine a 409 Conflict is the correct response.
agent/agentcontainers/api.go
Outdated
|
|
||
| // NOTE(DanielleMaywood): | ||
| // We currently do not support canceling the startup of a dev container. | ||
| if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusStarting { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Would it make sense to define a method IsTerminal() on the WorkspaceAgentDevcontainerStatus type that returns true for any *-ing state? My understanding is that we don't want to allow any modifications to *-ing devcontainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an IsTransitional method for seeing if we're in Starting Stopping or Deleting. I didn't think it made sense to add Running in this since we're wanting to delete from that state.
We don't allow transitioning a Dev Container when it is currently being transitioned
| WorkspaceAgentDevcontainerStatusError WorkspaceAgentDevcontainerStatus = "error" | ||
| ) | ||
|
|
||
| func (s WorkspaceAgentDevcontainerStatus) IsTransitional() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Naming, I believe Transitioning better captures the what this is (Is could be dropped to as it's inferred via ing).
| api.mu.Unlock() | ||
|
|
||
| httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{ | ||
| Message: "Unable to delete transitioning Devcontainer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Message: "Unable to delete transitioning Devcontainer", | |
| Message: "Unable to delete transitioning devcontainer", |
| proc.stop() | ||
| } | ||
|
|
||
| api.mu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't update devcontainer state, re-injection can happen at this point.
Also, why do we unlock just to re-acquire lock 3 lines below?
Suggestion: Drop this Unlock and both next Lock's
Alternatively: Move stopping here, the no-container case will just transition from stopping very quickly which probably fine?
| api.mu.Unlock() | ||
|
|
||
| httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{ | ||
| Message: "An internal error occurred stopping the container", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Message: "An internal error occurred stopping the container", | |
| Message: "An error occurred stopping the container", |
Nit: Internal suggests fault in api.
| Detail: err.Error(), | ||
| }) | ||
| return | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If following above advice, Lock here.
| api.mu.Unlock() | ||
|
|
||
| httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{ | ||
| Message: "An internal error occurred removing the container", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Message: "An internal error occurred removing the container", | |
| Message: "An error occurred removing the container", |
| httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{ | ||
| Message: "Devcontainer recreation already in progress", | ||
| Detail: fmt.Sprintf("Recreation for devcontainer %q is already underway.", dc.Name), | ||
| Message: "Unable to recreate transitioning Devcontainer", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Message: "Unable to recreate transitioning Devcontainer", | |
| Message: "Unable to recreate transitioning devcontainer", |
| containers: codersdk.WorkspaceAgentListContainersResponse{ | ||
| Containers: []codersdk.WorkspaceAgentContainer{devContainer1}, | ||
| }, | ||
| arch: "<none>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Missing comment explaining weirdness, I suppose this also suggests we don't want subagent injection for this test? A few other too.
| }, | ||
| }, | ||
| lister: &fakeContainerCLI{ | ||
| arch: "<none>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of these tests seem to verify what happens with subagents present?
Seems relevant considering the race I identified with mutex unlocking and injection.
Relates to #19062
Add logic to the agent to allow deleting Dev Containers
PR Stack: