Skip to content

Conversation

@DanielleMaywood
Copy link
Contributor

@DanielleMaywood DanielleMaywood commented Dec 12, 2025

@DanielleMaywood DanielleMaywood force-pushed the danielle/devcontainers/delete/agent branch from 49794cc to 4504755 Compare December 12, 2025 12:12
@DanielleMaywood DanielleMaywood force-pushed the danielle/devcontainers/delete/agent branch from 4504755 to 0fcab3f Compare December 12, 2025 14:43
@DanielleMaywood DanielleMaywood marked this pull request as ready for review December 12, 2025 15:19
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick review, didn't really dig into the tests yet.

return
}

// Check if the devcontainer is currently starting - if so, we can't delete it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be rewritten and explain why we've decided to have this limitation. Technically there's nothing preventing us from implementing cancellation of create, etc.

PS. I see you -.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe it or not, I think I wrote that 😂

}

// Similarly, if already stopping, don't allow another delete.
if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusStopping {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this state deleting instead? I would like to keep stopping as a separate state for the future, one that does not mean delete (i.e. start/stop/delete for the full suite).

Logically a delete could be a stop followed by a delete. I think both are valid use-cases (a user may want to just quickly restart the container to reset the processes, deletion will take potentially much longer).

Detail: err.Error(),
})
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error paths seem to be lacking state handling for the devcontainer. Should probably update dc.Error, perhaps others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have gone ahead and implemented some better error handling

delete(api.recreateSuccessTimes, workspaceFolder)
delete(api.recreateErrorTimes, workspaceFolder)
delete(api.usingWorkspaceFolderName, workspaceFolder)
delete(api.injectedSubAgentProcs, workspaceFolder)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check: Will this leave the devcontainer available in the dashboard to start, like we do for e.g. detected containers?

Just because the user deleted it, doesn't necessarily mean they won't want to start it again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this PR is to explicitly support removing it, not stopping it (as per #19062)

}

// Delete the subagent if it exists.
if subAgentID != uuid.Nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run this cleanup even if stop/remove fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can't see a reason for not cleaning it up

httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{
Message: "An internal error occurred deleting the agent",
Detail: err.Error(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These failures leave the API in a weird state, we've entered a terminal state (stopping), and can't re-invoke delete even if the errors are ephemeral. The container will become immortal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've slightly reworked this. On error they now enter WorkspaceAgentDevcontainerStatusError with the Error field filled in.

return nil, f.execErr
}

func (f *fakeContainerCLI) Stop(ctx context.Context, name string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're introducing mutation, we'll need to add a mutex to these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added!

}, nil
}

func (api *API) devcontainerByIDLocked(devcontainerID string) (codersdk.WorkspaceAgentDevcontainer, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With methods like this it's nice to add a comment that screams at you to make sure you hold the mutex when you call it.

Also, it would make sense to use this method in handleDevcontainerRecreate().

Comment on lines 1282 to 1289
if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusDeleting {
api.mu.Unlock()
httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{
Message: "Devcontainer is already being deleted",
Detail: fmt.Sprintf("Devcontainer %q is already being deleted.", dc.Name),
})
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably also short-circuit this in the recreate handler. If someone tries to delete and then immediately recreate a devcontainer while it's in the "deleting" state, I would imagine a 409 Conflict is the correct response.


// NOTE(DanielleMaywood):
// We currently do not support canceling the startup of a dev container.
if dc.Status == codersdk.WorkspaceAgentDevcontainerStatusStarting {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Would it make sense to define a method IsTerminal() on the WorkspaceAgentDevcontainerStatus type that returns true for any *-ing state? My understanding is that we don't want to allow any modifications to *-ing devcontainers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an IsTransitional method for seeing if we're in Starting Stopping or Deleting. I didn't think it made sense to add Running in this since we're wanting to delete from that state.

We don't allow transitioning a Dev Container when it is currently being
transitioned
WorkspaceAgentDevcontainerStatusError WorkspaceAgentDevcontainerStatus = "error"
)

func (s WorkspaceAgentDevcontainerStatus) IsTransitional() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Naming, I believe Transitioning better captures the what this is (Is could be dropped to as it's inferred via ing).

api.mu.Unlock()

httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{
Message: "Unable to delete transitioning Devcontainer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Message: "Unable to delete transitioning Devcontainer",
Message: "Unable to delete transitioning devcontainer",

proc.stop()
}

api.mu.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't update devcontainer state, re-injection can happen at this point.

Also, why do we unlock just to re-acquire lock 3 lines below?

Suggestion: Drop this Unlock and both next Lock's

Alternatively: Move stopping here, the no-container case will just transition from stopping very quickly which probably fine?

api.mu.Unlock()

httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{
Message: "An internal error occurred stopping the container",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Message: "An internal error occurred stopping the container",
Message: "An error occurred stopping the container",

Nit: Internal suggests fault in api.

Detail: err.Error(),
})
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If following above advice, Lock here.

api.mu.Unlock()

httpapi.Write(ctx, w, http.StatusInternalServerError, codersdk.Response{
Message: "An internal error occurred removing the container",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Message: "An internal error occurred removing the container",
Message: "An error occurred removing the container",

httpapi.Write(ctx, w, http.StatusConflict, codersdk.Response{
Message: "Devcontainer recreation already in progress",
Detail: fmt.Sprintf("Recreation for devcontainer %q is already underway.", dc.Name),
Message: "Unable to recreate transitioning Devcontainer",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Message: "Unable to recreate transitioning Devcontainer",
Message: "Unable to recreate transitioning devcontainer",

containers: codersdk.WorkspaceAgentListContainersResponse{
Containers: []codersdk.WorkspaceAgentContainer{devContainer1},
},
arch: "<none>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Missing comment explaining weirdness, I suppose this also suggests we don't want subagent injection for this test? A few other too.

},
},
lister: &fakeContainerCLI{
arch: "<none>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these tests seem to verify what happens with subagents present?

Seems relevant considering the race I identified with mutex unlocking and injection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants