-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat: add boundary log forwarding from agent to coderd #21345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
334f1ca to
ab19300
Compare
ced7c18 to
adf8dda
Compare
Add receiving boundary logs via stream unix socket in the agent, forwarding of boundary audit logs from agent to coderd via agent API, and re-emission of boundary logs to coderd stderr. Log format example: [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
adf8dda to
752536f
Compare
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we proceed, we need to extract the message framing into its own package so it can be shared. There's a lot of duplication across this PR and https://github.com/coder/boundary/pull/124, and it will lead to inconsistencies.
We also need a more resilient approach towards the socket addressing; /tmp is not guaranteed to exist.
| conn, err := s.listener.Accept() | ||
| if err != nil { | ||
| if ctx.Err() != nil { | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd advise against silently dropping context errors.
In this case you could log that the accept loop has terminated.
| s.wg.Add(1) | ||
| go func() { | ||
| defer s.wg.Done() | ||
| <-ctx.Done() | ||
| _ = conn.Close() | ||
| }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you not move the conn.Close to the same defer as cancel() but run it afterwards?
I'm not seeing why we need the extra goroutine.
| // Even though the length of data received can be larger than maxMsgSize, | ||
| // practically they are not expected to be. This is a sanity check and | ||
| // allows re-using a small fixed size read buffer. | ||
| const maxMsgSize = 1 << 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wire format allows for 2^28=256MiB but here you're imposing a 2^15=32KiB limit.
It'd be helpful to move this higher up, or document it along with the wire protocol description.
|
|
||
| // startBoundaryLogProxyServer starts the boundary log proxy socket server. | ||
| func (a *agent) startBoundaryLogProxyServer() { | ||
| const boundaryAuditSocketPath = "/tmp/boundary-audit.sock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/tmp is not guaranteed to exist.
I'd suggest passing an env var down with the path to the socket which both the agent and boundary will use.
| func sendMessage(t *testing.T, conn net.Conn, req *agentproto.ReportBoundaryLogsRequest) { | ||
| t.Helper() | ||
|
|
||
| data, err := proto.Marshal(req) | ||
| if err != nil { | ||
| //nolint:gocritic // In tests we're not worried about conn being nil. | ||
| t.Errorf("%s marshal req: %s", conn.LocalAddr().String(), err) | ||
| } | ||
|
|
||
| var header uint32 | ||
| //nolint:gosec // In tests we're not worried about possible integer overflow. | ||
| header |= uint32(len(data)) | ||
| header |= 1 << 28 | ||
|
|
||
| err = binary.Write(conn, binary.BigEndian, header) | ||
| if err != nil { | ||
| //nolint:gocritic // In tests we're not worried about conn being nil. | ||
| t.Errorf("%s write conn length: %s", conn.LocalAddr().String(), err) | ||
| } | ||
|
|
||
| _, err = conn.Write(data) | ||
| if err != nil { | ||
| //nolint:gocritic // In tests we're not worried about conn being nil. | ||
| t.Errorf("%s write conn data: %s", conn.LocalAddr().String(), err) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated code from sendBoundaryLogsRequest; see my review of the boundary code which suggests extract the framing into its own package for reuse.
| // practically they are not expected to be. This is a sanity check and | ||
| // allows re-using a small fixed size read buffer. | ||
| const maxMsgSize = 1 << 15 | ||
| buf := make([]byte, maxMsgSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each conn is prospectively allocating 32KiB; we may never use that full buffer.
I'd suggest preallocating something smaller & more reasonable, and grow the slice if/when needed.
| slog.F("decision", allowBoolToString(l.Allowed)), | ||
| slog.F("workspace_id", a.WorkspaceID.String()), | ||
| slog.F("http_method", r.HttpRequest.Method), | ||
| slog.F("http_url", r.HttpRequest.Url), | ||
| slog.F("event_time", logTime.Format(time.RFC3339Nano)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are the other fields injected?
- owner
- workspace_name
- agent_name
- request_id
Does the logger already have these?
| @@ -0,0 +1,588 @@ | |||
| //go:build linux || darwin | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Windows? :thinking_face:
| switch r := l.Resource.(type) { | ||
| case *agentproto.BoundaryLog_HttpRequest_: | ||
| if r.HttpRequest == nil { | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could hide protocol bugs; you should log this.
| slog.F("event_time", logTime.Format(time.RFC3339Nano)), | ||
| } | ||
| if l.Allowed { | ||
| fields = append(fields, slog.F("matched_rule", r.HttpRequest.MatchedRule)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to log which rule caused the denial? That would also be very useful.
| } | ||
|
|
||
| a.Log.With(fields...).Info(ctx, "boundary_request") | ||
| default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably handle the default case a bit better. Maybe log something like "unexpected type"?
Add the agent forwarding of boundary audit logs from workspaces to coderd via the agent API, and re-emission of boundary logs to coderd stderr.
Log format example:
Corresponding boundary PR: coder/boundary#124
RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
#21280