feat: filter irrelevant gitHub events before rule evaluation#50
feat: filter irrelevant gitHub events before rule evaluation#50dkargatzis merged 4 commits intowarestack:mainfrom
Conversation
📝 WalkthroughWalkthroughThis PR introduces centralized GitHub webhook event filtering by creating a new Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Dispatcher as Dispatcher
participant EventFilter as EventFilter
participant Queue as Queue
participant Processor as Processor
Client->>Dispatcher: POST webhook event
Dispatcher->>EventFilter: should_process_event(event)
alt Event is filtered
EventFilter-->>Dispatcher: FilterResult(should_process=false, reason)
Dispatcher-->>Client: return filtered status
else Event passes filter
EventFilter-->>Dispatcher: FilterResult(should_process=true)
Dispatcher->>Queue: enqueue event
Queue->>Processor: deliver event
Processor->>Processor: enrich & evaluate rules
Processor-->>Client: return processing result
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (67.5%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage. @@ Coverage Diff @@
## main #50 +/- ##
=======================================
+ Coverage 67.0% 67.5% +0.5%
=======================================
Files 151 153 +2
Lines 9384 9533 +149
=======================================
+ Hits 6290 6443 +153
+ Misses 3094 3090 -4 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
🧹 Nitpick comments (6)
src/event_processors/push.py (2)
40-47: Good early exit, but the zero-SHA string is hardcoded (duplicated).Import
NULL_SHAfromsrc.core.utils.event_filterinstead of repeating the magic string. This was already flagged in the event filter module review.♻️ Suggested change
+from src.core.utils.event_filter import NULL_SHA + ... - if payload.get("deleted") or not payload.get("after") or payload.get("after") == "0000000000000000000000000000000000000000": + if payload.get("deleted") or not payload.get("after") or payload.get("after") == NULL_SHA:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/event_processors/push.py` around lines 40 - 47, Replace the hardcoded zero-SHA string in the early-exit check inside push.py with the shared constant NULL_SHA from src.core.utils.event_filter: import NULL_SHA at top, then use NULL_SHA in the condition that currently compares payload.get("after") == "0000000000000000000000000000000000000000". Keep the same logger.info("push_skipped_deleted_or_empty") and return of ProcessingResult unchanged.
117-119: Unreachable null-SHA branch — dead code after the early exit at line 40.The early return on line 40 already catches
payload.get("after") == "000…0"andnot payload.get("after"), so by the time execution reaches line 117,sha(which ispayload.get("after")) can never be falsy or equal to the null SHA. This check is now dead code. Consider removing it or, if you want to keep it for safety, add a comment explaining it's a defensive guard.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/event_processors/push.py` around lines 117 - 119, The check of sha = payload.get("after") followed by if not sha or sha == "0000...0" in push.py is dead code because an earlier guard already returns when payload.get("after") is falsy or the null-SHA; remove the redundant conditional and its logger.warning to clean up the code, or if you prefer to keep it as a defensive guard, replace the conditional with a single-line comment noting it's defensive and leave the code path unreachable in normal flow; reference the variable sha and payload.get("after") when making the change so you update the correct block.src/core/utils/event_filter.py (3)
50-58: Add full type annotations todictparameters.The
payloadparameter uses baredictwithout type arguments across all internal functions. Per coding guidelines, usedict[str, Any].♻️ Suggested change
-def _apply_filters(event_type: EventType, payload: dict) -> FilterResult: +def _apply_filters(event_type: EventType, payload: dict[str, Any]) -> FilterResult:Apply the same to
_filter_pull_request,_filter_push, and_is_repo_archived.As per coding guidelines:
**/*.py: Use modern typing only:dict[str, Any],list[str],str | None(noDict,List,Optional)Also applies to: 61-77, 80-88, 91-93
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/utils/event_filter.py` around lines 50 - 58, Update all bare dict parameter annotations to use dict[str, Any] and add the necessary import of Any: change the signature of _apply_filters(event_type: EventType, payload: dict) -> FilterResult to payload: dict[str, Any], and do the same for _filter_pull_request, _filter_push, and _is_repo_archived (and any other internal helpers referenced in the diff). Ensure you import Any from typing at the top of the module and keep the return types unchanged (e.g., FilterResult) so type checking follows the project's modern typing conventions.
21-26: Consider makingFilterResultfrozen for immutability.Since
FilterResultinstances are never mutated after creation,frozen=Truewould enforce that invariant and align with the coding guidelines' preference for immutable internal state via dataclasses.♻️ Suggested change
-@dataclass +@dataclass(frozen=True) class FilterResult: """Result of event filter check."""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/utils/event_filter.py` around lines 21 - 26, FilterResult is mutable but never changed after creation; make it immutable by adding frozen=True to the dataclass decorator (i.e., change `@dataclass` to `@dataclass`(frozen=True)) so instances of FilterResult are read-only and enforce the intended invariant.
16-16:NULL_SHAis duplicated as hardcoded strings insrc/event_processors/push.py(lines 40 and 118).The push processor inline-checks the same zero SHA string instead of importing this constant. Consider importing
NULL_SHAfromsrc/core/utils/event_filterinpush.pyto maintain a single source of truth.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/utils/event_filter.py` at line 16, Replace the duplicated hardcoded zero SHA strings in the push processor with the single source constant: import NULL_SHA from src.core.utils.event_filter into the push module and use NULL_SHA wherever the literal "0000000000000000000000000000000000000000" is currently compared (both occurrences in the push processor logic); update any conditional checks or assignments in functions within src/event_processors/push.py that reference the literal so they reference NULL_SHA instead.src/event_processors/pull_request/processor.py (1)
1-1: Standardloggingis used instead ofstructlogwith structured fields.The coding guidelines specify structured logging at boundaries with fields like
operation,subject_ids,decision,latency_ms. This file uses the standardloggingmodule throughout (pre-existing), but the new log at line 52-57 would benefit from structured logging, especially since the filter module (event_filter.py) already usesstructlog. This is a pre-existing pattern in the file, so flagging as optional.Also applies to: 14-14
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/event_processors/pull_request/processor.py` at line 1, This file imports the stdlib logging and adds a plain log where structured logging is required; change to structlog by importing structlog and acquiring a logger via structlog.get_logger(), then replace the plain log call (the new log around lines 52-57 in this module) with a structured log call that includes the fields operation, subject_ids, decision, and latency_ms (match the field names used in event_filter.py) — e.g., logger.info(...) or logger.bind(...).info(...) with those fields and a clear message; leave other pre-existing stdlib logging untouched if not needed but prefer structlog at this boundary.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
src/core/utils/event_filter.pysrc/event_processors/pull_request/processor.pysrc/event_processors/push.pysrc/webhooks/dispatcher.pysrc/webhooks/handlers/pull_request.pytests/integration/webhooks/test_webhook_flow.pytests/unit/core/test_event_filter.pytests/unit/event_processors/test_pull_request_processor.pytests/unit/event_processors/test_push_processor.py
💤 Files with no reviewable changes (1)
- src/webhooks/handlers/pull_request.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)
**/*.py: Use modern typing only:dict[str, Any],list[str],str | None(noDict,List,Optional)
GitHub/HTTP/DB calls must beasync def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validatedBaseModelfrom Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields:operation,subject_ids,decision,latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include:decision,confidence (0..1), shortreasoning,recommendations,strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out withasyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g.,AgentError) witherror_type,message,context,timestamp,retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict,List,Optional) in code review
Reject blocking calls in async code; reject bareexcept:clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...
Files:
src/core/utils/event_filter.pytests/unit/event_processors/test_pull_request_processor.pytests/integration/webhooks/test_webhook_flow.pysrc/event_processors/push.pysrc/webhooks/dispatcher.pytests/unit/core/test_event_filter.pytests/unit/event_processors/test_push_processor.pysrc/event_processors/pull_request/processor.py
tests/**/*.py
📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)
tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Usepytest.mark.asynciofor async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green
Files:
tests/unit/event_processors/test_pull_request_processor.pytests/integration/webhooks/test_webhook_flow.pytests/unit/core/test_event_filter.pytests/unit/event_processors/test_push_processor.py
🧠 Learnings (1)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Applied to files:
tests/integration/webhooks/test_webhook_flow.py
🧬 Code graph analysis (6)
src/core/utils/event_filter.py (1)
src/core/models.py (2)
EventType(91-105)WebhookEvent(116-144)
tests/unit/event_processors/test_pull_request_processor.py (4)
tests/unit/event_processors/test_push_processor.py (2)
processor(29-38)task(42-52)src/tasks/task_queue.py (2)
Task(19-47)installation_id(41-47)tests/unit/event_processors/pull_request/test_enricher.py (1)
enricher(14-15)src/event_processors/pull_request/enricher.py (1)
enrich_event_data(39-90)
tests/integration/webhooks/test_webhook_flow.py (2)
src/webhooks/dispatcher.py (2)
WebhookDispatcher(13-57)register_handler(25-28)src/tasks/task_queue.py (2)
TaskQueue(64-233)start_workers(209-215)
src/event_processors/push.py (1)
src/event_processors/base.py (1)
ProcessingResult(16-23)
src/webhooks/dispatcher.py (2)
src/core/utils/event_filter.py (1)
should_process_event(29-47)src/webhooks/handlers/issue_comment.py (1)
event_type(18-19)
tests/unit/core/test_event_filter.py (2)
src/core/models.py (2)
EventType(91-105)WebhookEvent(116-144)src/core/utils/event_filter.py (2)
FilterResult(22-26)should_process_event(29-47)
🪛 GitHub Actions: Run pre-commit hooks
src/event_processors/push.py
[error] 1-1: Ruff detected formatting issues; changes were applied by the hook during the run.
tests/unit/core/test_event_filter.py
[error] 1-1: Ruff detected formatting changes in test file; 2 files reformatted by ruff-format.
🔇 Additional comments (9)
src/core/utils/event_filter.py (1)
29-47: LGTM on the public API and logging.The
should_process_eventfunction provides clean structured logging withevent_type,repo, andreasonfields, and the overall dispatch to internal filters is well organized.tests/unit/event_processors/test_pull_request_processor.py (1)
59-78: LGTM — good coverage for the closed-PR skip path.The test correctly validates the early-exit behavior: no enrichment or agent calls for closed/merged PRs, and the result is successful with no violations.
tests/unit/event_processors/test_push_processor.py (1)
89-106: LGTM — good coverage for the deleted-branch skip path.Correctly validates that the push processor's early exit skips engine execution and check run creation for deletion events.
src/webhooks/dispatcher.py (1)
45-47: LGTM — clean pre-filter integration.The filter is correctly placed after handler lookup (avoiding unnecessary filtering for unhandled events) and before enqueue. The return format is consistent with other dispatch outcomes.
src/event_processors/pull_request/processor.py (1)
51-63: LGTM — defense-in-depth guard for invalid PR states.Good secondary check in case the dispatcher's pre-filter is bypassed (e.g., direct processor invocation). The early return avoids unnecessary API calls and enrichment.
tests/integration/webhooks/test_webhook_flow.py (2)
235-276: LGTM — solid integration test for filtered event suppression.Good end-to-end validation that a deleted-branch push event is filtered at the dispatcher level and never reaches the handler. The payload is realistic and the assertion on
call_count == 0is precise.
38-57: Good payload updates to align with filter requirements.The addition of
state,merged, anddraftfields to the PR payload ensures these tests exercise the real filter path rather than accidentally getting filtered out.tests/unit/core/test_event_filter.py (2)
1-8: Unusedpytestimport.
pytestis imported but not used in any test (no parametrize, fixtures, marks, or raises). However, per review guidelines, I won't flag unused imports unless indicated by static analysis. The pipeline failure mentions Ruff formatting issues which may address this.
11-134: Good test coverage across all filter paths.The tests thoroughly cover PR actions (opened/synchronize/reopened pass; closed filtered), PR states (merged, draft), push scenarios (valid, deleted, null SHA, empty after), archived repos, and passthrough for other event types. The
_make_eventhelper keeps tests concise.One minor gap: there's no test for a push event where the
afterkey is entirely absent from the payload (as opposed to empty string). This is a subtle edge case that's handled by the code (payload.get("after")returnsNone, andnot NoneisTrue), but an explicit test would document the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/core/utils/event_filter.py`:
- Around line 50-58: Update all bare dict parameter annotations to use dict[str,
Any] and add the necessary import of Any: change the signature of
_apply_filters(event_type: EventType, payload: dict) -> FilterResult to payload:
dict[str, Any], and do the same for _filter_pull_request, _filter_push, and
_is_repo_archived (and any other internal helpers referenced in the diff).
Ensure you import Any from typing at the top of the module and keep the return
types unchanged (e.g., FilterResult) so type checking follows the project's
modern typing conventions.
- Around line 21-26: FilterResult is mutable but never changed after creation;
make it immutable by adding frozen=True to the dataclass decorator (i.e., change
`@dataclass` to `@dataclass`(frozen=True)) so instances of FilterResult are
read-only and enforce the intended invariant.
- Line 16: Replace the duplicated hardcoded zero SHA strings in the push
processor with the single source constant: import NULL_SHA from
src.core.utils.event_filter into the push module and use NULL_SHA wherever the
literal "0000000000000000000000000000000000000000" is currently compared (both
occurrences in the push processor logic); update any conditional checks or
assignments in functions within src/event_processors/push.py that reference the
literal so they reference NULL_SHA instead.
In `@src/event_processors/pull_request/processor.py`:
- Line 1: This file imports the stdlib logging and adds a plain log where
structured logging is required; change to structlog by importing structlog and
acquiring a logger via structlog.get_logger(), then replace the plain log call
(the new log around lines 52-57 in this module) with a structured log call that
includes the fields operation, subject_ids, decision, and latency_ms (match the
field names used in event_filter.py) — e.g., logger.info(...) or
logger.bind(...).info(...) with those fields and a clear message; leave other
pre-existing stdlib logging untouched if not needed but prefer structlog at this
boundary.
In `@src/event_processors/push.py`:
- Around line 40-47: Replace the hardcoded zero-SHA string in the early-exit
check inside push.py with the shared constant NULL_SHA from
src.core.utils.event_filter: import NULL_SHA at top, then use NULL_SHA in the
condition that currently compares payload.get("after") ==
"0000000000000000000000000000000000000000". Keep the same
logger.info("push_skipped_deleted_or_empty") and return of ProcessingResult
unchanged.
- Around line 117-119: The check of sha = payload.get("after") followed by if
not sha or sha == "0000...0" in push.py is dead code because an earlier guard
already returns when payload.get("after") is falsy or the null-SHA; remove the
redundant conditional and its logger.warning to clean up the code, or if you
prefer to keep it as a defensive guard, replace the conditional with a
single-line comment noting it's defensive and leave the code path unreachable in
normal flow; reference the variable sha and payload.get("after") when making the
change so you update the correct block.
dkargatzis
left a comment
There was a problem hiding this comment.
The code changes look very useful for optimizing event handling! However, the PR title is misleading and there are some opportunities to reduce duplication and improve typing.
src/core/utils/event_filter.py
Outdated
| return result | ||
|
|
||
|
|
||
| def _apply_filters(event_type: EventType, payload: dict) -> FilterResult: |
There was a problem hiding this comment.
Typing: Please update dict to dict[str, Any] for all payload parameters (here and in helper functions) to align with project standards.
| return ProcessingResult( | ||
| success=True, | ||
| violations=[], | ||
| api_calls_made=0, |
There was a problem hiding this comment.
DRY / Constant: Please import NULL_SHA from src.core.utils.event_filter instead of hardcoding the "0000..." string here. This ensures consistency if we ever need to update how null SHAs are handled.
src/core/utils/event_filter.py
Outdated
| PR_ACTIONS_PROCESS = frozenset({"opened", "synchronize", "reopened"}) | ||
|
|
||
|
|
||
| @dataclass |
There was a problem hiding this comment.
Suggestion: Make FilterResult immutable by adding frozen=True to the @dataclass decorator. This prevents accidental modification of the filter decision downstream.
|
@dkargatzis |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/event_processors/push.py (1)
119-120: Hardcoded null SHA still present despite importingNULL_SHALine 7 imports
NULL_SHAand the new early-exit guard (line 41) uses it, but this pre-existing check at line 119 still embeds the literal 40-zero string. This was the root of the past review comment and should be updated to complete the fix.♻️ Use the constant
- if not sha or sha == "0000000000000000000000000000000000000000": + if not sha or sha == NULL_SHA:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/event_processors/push.py` around lines 119 - 120, Replace the hardcoded 40-zero SHA literal in the early-exit guard with the imported NULL_SHA constant: in src/event_processors/push.py update the conditional that currently reads if not sha or sha == "0000000000000000000000000000000000000000": to use NULL_SHA (i.e., if not sha or sha == NULL_SHA) and keep the existing logger.warning call (logger.warning("No valid commit SHA found, skipping check run")) unchanged; this ensures consistency with the earlier guard that already references NULL_SHA.src/core/utils/event_filter.py (2)
72-73:mergedcheck is unreachable dead codeGitHub's API guarantees that a merged PR has
state == "closed"andmerged == True. Any PR that reaches line 72 has already passed thestate != "open"guard on line 69, somergedcan never beTrueat that point. The check is a no-op and adds confusion about invariants.♻️ Remove unreachable check
if state != "open": return FilterResult(should_process=False, reason=f"PR state '{state}' not open") - if pr.get("merged"): - return FilterResult(should_process=False, reason="PR already merged") - if pr.get("draft"):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/utils/event_filter.py` around lines 72 - 73, The pr.get("merged") branch in event_filter.py is unreachable because any merged PR will already satisfy state == "closed" and has passed the earlier state != "open" guard, so remove the if pr.get("merged") -> return FilterResult(...) block; update any comments/tests that reference this redundant merged check and keep the existing state-based filtering logic (the FilterResult usage and surrounding function in event_filter.py should remain unchanged).
44-44: Unnecessaryhasattrguard —EventTypeis always an Enum
event_typeis typed asEventType, which extends bothstrandEnum, so.valuealways exists. The ternary is dead defensive code.♻️ Simplify
- event_type=event_type.value if hasattr(event_type, "value") else str(event_type), + event_type=event_type.value,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/utils/event_filter.py` at line 44, The ternary using hasattr(event_type, "value") is unnecessary because EventType is an Enum with a .value; replace the expression event_type=event_type.value if hasattr(event_type, "value") else str(event_type) with a direct use of event_type.value (i.e., always use event_type.value) in the code that constructs the event (referencing the event_type variable / EventType enum in event_filter.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/core/utils/event_filter.py`:
- Around line 42-47: The structured log call for "event_filtered" is missing
required fields; update the logger.info invocation (the one using event_type,
event.repo_full_name and result.reason in event_filter.py) to include operation
(e.g., "event_filter" or event_type), subject_ids (pull from event.subject_ids
or event.get_subject_ids()), decision (use result.decision or
result.allowed/blocked), and latency_ms (capture a start timestamp before
filtering and compute int((time.time() - start_ts) * 1000) after; import time if
needed); keep the existing repo and reason fields and add these four keys to the
structured payload.
In `@src/event_processors/push.py`:
- Around line 41-48: The info log emitted when skipping deleted/empty pushes is
unstructured; update the logger.info call inside the push skip branch (the block
guarded by payload.get("deleted") or not payload.get("after") or
payload.get("after") == NULL_SHA) to emit structured fields including operation
(e.g., "push.process"), subject_ids (derive from payload — e.g.,
payload.get("repository", {}).get("id") or payload.get("ref")/commit id),
decision ("skipped_deleted_or_empty"), and latency_ms (use the computed
int((time.time() - start_time) * 1000) value already used for
ProcessingResult.processing_time_ms). Keep the ProcessingResult return as-is but
ensure the log fields match those names exactly so callers can correlate logs
and the returned processing_time_ms.
---
Nitpick comments:
In `@src/core/utils/event_filter.py`:
- Around line 72-73: The pr.get("merged") branch in event_filter.py is
unreachable because any merged PR will already satisfy state == "closed" and has
passed the earlier state != "open" guard, so remove the if pr.get("merged") ->
return FilterResult(...) block; update any comments/tests that reference this
redundant merged check and keep the existing state-based filtering logic (the
FilterResult usage and surrounding function in event_filter.py should remain
unchanged).
- Line 44: The ternary using hasattr(event_type, "value") is unnecessary because
EventType is an Enum with a .value; replace the expression
event_type=event_type.value if hasattr(event_type, "value") else str(event_type)
with a direct use of event_type.value (i.e., always use event_type.value) in the
code that constructs the event (referencing the event_type variable / EventType
enum in event_filter.py).
In `@src/event_processors/push.py`:
- Around line 119-120: Replace the hardcoded 40-zero SHA literal in the
early-exit guard with the imported NULL_SHA constant: in
src/event_processors/push.py update the conditional that currently reads if not
sha or sha == "0000000000000000000000000000000000000000": to use NULL_SHA (i.e.,
if not sha or sha == NULL_SHA) and keep the existing logger.warning call
(logger.warning("No valid commit SHA found, skipping check run")) unchanged;
this ensures consistency with the earlier guard that already references
NULL_SHA.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/core/utils/event_filter.pysrc/event_processors/push.pytests/unit/core/test_event_filter.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/unit/core/test_event_filter.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)
**/*.py: Use modern typing only:dict[str, Any],list[str],str | None(noDict,List,Optional)
GitHub/HTTP/DB calls must beasync def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validatedBaseModelfrom Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields:operation,subject_ids,decision,latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include:decision,confidence (0..1), shortreasoning,recommendations,strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out withasyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g.,AgentError) witherror_type,message,context,timestamp,retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict,List,Optional) in code review
Reject blocking calls in async code; reject bareexcept:clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...
Files:
src/core/utils/event_filter.pysrc/event_processors/push.py
🧠 Learnings (2)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Use modern typing only: `dict[str, Any]`, `list[str]`, `str | None` (no `Dict`, `List`, `Optional`)
Applied to files:
src/core/utils/event_filter.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Reject old typing syntax (`Dict`, `List`, `Optional`) in code review
Applied to files:
src/core/utils/event_filter.py
🧬 Code graph analysis (1)
src/core/utils/event_filter.py (1)
src/core/models.py (2)
EventType(91-105)WebhookEvent(116-144)
| logger.info( | ||
| "event_filtered", | ||
| event_type=event_type.value if hasattr(event_type, "value") else str(event_type), | ||
| repo=event.repo_full_name, | ||
| reason=result.reason, | ||
| ) |
There was a problem hiding this comment.
Structured log at filter boundary is missing required fields per guidelines
The coding guidelines require structured log entries at boundaries to carry operation, subject_ids, decision, and latency_ms. The current log omits all four.
♻️ Proposed fix
- logger.info(
- "event_filtered",
- event_type=event_type.value if hasattr(event_type, "value") else str(event_type),
- repo=event.repo_full_name,
- reason=result.reason,
- )
+ logger.info(
+ "event_filtered",
+ operation="should_process_event",
+ subject_ids={"repo": event.repo_full_name, "delivery_id": event.delivery_id},
+ decision="skip",
+ event_type=event_type.value,
+ reason=result.reason,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/core/utils/event_filter.py` around lines 42 - 47, The structured log call
for "event_filtered" is missing required fields; update the logger.info
invocation (the one using event_type, event.repo_full_name and result.reason in
event_filter.py) to include operation (e.g., "event_filter" or event_type),
subject_ids (pull from event.subject_ids or event.get_subject_ids()), decision
(use result.decision or result.allowed/blocked), and latency_ms (capture a start
timestamp before filtering and compute int((time.time() - start_ts) * 1000)
after; import time if needed); keep the existing repo and reason fields and add
these four keys to the structured payload.
| if payload.get("deleted") or not payload.get("after") or payload.get("after") == NULL_SHA: | ||
| logger.info("push_skipped_deleted_or_empty") | ||
| return ProcessingResult( | ||
| success=True, | ||
| violations=[], | ||
| api_calls_made=0, | ||
| processing_time_ms=int((time.time() - start_time) * 1000), | ||
| ) |
There was a problem hiding this comment.
Unstructured log for the new skip path; missing required guideline fields
logger.info("push_skipped_deleted_or_empty") is a bare string with no structured fields. Guidelines require operation, subject_ids, decision, and latency_ms at processing boundaries.
♻️ Add structured fields
- logger.info("push_skipped_deleted_or_empty")
+ logger.info(
+ "push_skipped_deleted_or_empty",
+ operation="process_push",
+ subject_ids={"repo": task.repo_full_name, "ref": ref},
+ decision="skip",
+ latency_ms=int((time.time() - start_time) * 1000),
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/event_processors/push.py` around lines 41 - 48, The info log emitted when
skipping deleted/empty pushes is unstructured; update the logger.info call
inside the push skip branch (the block guarded by payload.get("deleted") or not
payload.get("after") or payload.get("after") == NULL_SHA) to emit structured
fields including operation (e.g., "push.process"), subject_ids (derive from
payload — e.g., payload.get("repository", {}).get("id") or
payload.get("ref")/commit id), decision ("skipped_deleted_or_empty"), and
latency_ms (use the computed int((time.time() - start_time) * 1000) value
already used for ProcessingResult.processing_time_ms). Keep the ProcessingResult
return as-is but ensure the log fields match those names exactly so callers can
correlate logs and the returned processing_time_ms.
- Add centralized event_filter.py module for filtering irrelevant events - Filter archived repositories, closed/merged/draft PRs, deleted branches - Add FilterResult dataclass (frozen=True for immutability) - Add NULL_SHA constant for consistent null SHA handling - Migrate push.py and pull_request.py from stdlib logging to structlog - Add event filtering at dispatcher level before handler dispatch - Add delivery_id parameter to WebhookEvent for better tracing - Add comprehensive unit tests for event filtering logic Closes #50 Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
feat: add event filtering to prevent rule evaluation on irrelevant GitHub events
Implement intelligent event filtering to skip rule engine evaluation on
irrelevant GitHub events. This prevents unnecessary processing and user
confusion from comments on closed/merged PRs or branch deletion events.
Changes:
fix: #25
Summary by CodeRabbit