Skip to content

Conversation

@zedkipp
Copy link
Contributor

@zedkipp zedkipp commented Dec 20, 2025

Add the agent forwarding of boundary audit logs from workspaces to coderd via the agent API, and re-emission of boundary logs to coderd stderr.

Log format example:

[API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..

Corresponding boundary PR: coder/boundary#124
RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
#21280

@zedkipp zedkipp force-pushed the zedkipp/boundary-logs branch 3 times, most recently from 334f1ca to ab19300 Compare December 22, 2025 17:55
@zedkipp zedkipp force-pushed the zedkipp/boundary-logs branch 4 times, most recently from ced7c18 to adf8dda Compare December 23, 2025 19:01
@zedkipp zedkipp marked this pull request as ready for review December 23, 2025 21:39
Add receiving boundary logs via stream unix socket in the agent, forwarding
of boundary audit logs from agent to coderd via agent API, and re-emission
of boundary logs to coderd stderr.

Log format example:
[API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
@zedkipp zedkipp force-pushed the zedkipp/boundary-logs branch from adf8dda to 752536f Compare December 24, 2025 00:23
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we proceed, we need to extract the message framing into its own package so it can be shared. There's a lot of duplication across this PR and https://github.com/coder/boundary/pull/124, and it will lead to inconsistencies.

We also need a more resilient approach towards the socket addressing; /tmp is not guaranteed to exist.

conn, err := s.listener.Accept()
if err != nil {
if ctx.Err() != nil {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd advise against silently dropping context errors.

In this case you could log that the accept loop has terminated.

Comment on lines +134 to +139
s.wg.Add(1)
go func() {
defer s.wg.Done()
<-ctx.Done()
_ = conn.Close()
}()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you not move the conn.Close to the same defer as cancel() but run it afterwards?
I'm not seeing why we need the extra goroutine.

// Even though the length of data received can be larger than maxMsgSize,
// practically they are not expected to be. This is a sanity check and
// allows re-using a small fixed size read buffer.
const maxMsgSize = 1 << 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wire format allows for 2^28=256MiB but here you're imposing a 2^15=32KiB limit.
It'd be helpful to move this higher up, or document it along with the wire protocol description.


// startBoundaryLogProxyServer starts the boundary log proxy socket server.
func (a *agent) startBoundaryLogProxyServer() {
const boundaryAuditSocketPath = "/tmp/boundary-audit.sock"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/tmp is not guaranteed to exist.

I'd suggest passing an env var down with the path to the socket which both the agent and boundary will use.

Comment on lines +24 to +48
func sendMessage(t *testing.T, conn net.Conn, req *agentproto.ReportBoundaryLogsRequest) {
t.Helper()

data, err := proto.Marshal(req)
if err != nil {
//nolint:gocritic // In tests we're not worried about conn being nil.
t.Errorf("%s marshal req: %s", conn.LocalAddr().String(), err)
}

var header uint32
//nolint:gosec // In tests we're not worried about possible integer overflow.
header |= uint32(len(data))
header |= 1 << 28

err = binary.Write(conn, binary.BigEndian, header)
if err != nil {
//nolint:gocritic // In tests we're not worried about conn being nil.
t.Errorf("%s write conn length: %s", conn.LocalAddr().String(), err)
}

_, err = conn.Write(data)
if err != nil {
//nolint:gocritic // In tests we're not worried about conn being nil.
t.Errorf("%s write conn data: %s", conn.LocalAddr().String(), err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated code from sendBoundaryLogsRequest; see my review of the boundary code which suggests extract the framing into its own package for reuse.

// practically they are not expected to be. This is a sanity check and
// allows re-using a small fixed size read buffer.
const maxMsgSize = 1 << 15
buf := make([]byte, maxMsgSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each conn is prospectively allocating 32KiB; we may never use that full buffer.

I'd suggest preallocating something smaller & more reasonable, and grow the slice if/when needed.

Comment on lines +33 to +37
slog.F("decision", allowBoolToString(l.Allowed)),
slog.F("workspace_id", a.WorkspaceID.String()),
slog.F("http_method", r.HttpRequest.Method),
slog.F("http_url", r.HttpRequest.Url),
slog.F("event_time", logTime.Format(time.RFC3339Nano)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are the other fields injected?

  • owner
  • workspace_name
  • agent_name
  • request_id

Does the logger already have these?

@@ -0,0 +1,588 @@
//go:build linux || darwin
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Windows? :thinking_face:

switch r := l.Resource.(type) {
case *agentproto.BoundaryLog_HttpRequest_:
if r.HttpRequest == nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could hide protocol bugs; you should log this.

slog.F("event_time", logTime.Format(time.RFC3339Nano)),
}
if l.Allowed {
fields = append(fields, slog.F("matched_rule", r.HttpRequest.MatchedRule))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to log which rule caused the denial? That would also be very useful.

}

a.Log.With(fields...).Info(ctx, "boundary_request")
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably handle the default case a bit better. Maybe log something like "unexpected type"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants