Skip to content

fix: resolve deployment status blocking without clear reason#49

Merged
dkargatzis merged 6 commits intowarestack:mainfrom
naaa760:fix/res-dep-sts-blc-res
Feb 26, 2026
Merged

fix: resolve deployment status blocking without clear reason#49
dkargatzis merged 6 commits intowarestack:mainfrom
naaa760:fix/res-dep-sts-blc-res

Conversation

@naaa760
Copy link
Contributor

@naaa760 naaa760 commented Feb 17, 2026

fix: #26

fix: resolve deployment status blocking without clear reason

  • Always call callback URL on exceptions (fallback approval)
  • Add 30s timeout for agent execution
  • Validate callback_url and environment before API calls
  • Add retry with exponential backoff for GitHub API calls
  • Fix scheduler re-evaluation (use evaluation_result.violations)
  • Remove non-time violations from scheduler
  • Add structured logging and unit tests

Summary by CodeRabbit

  • New Features

    • Added timeout and retry safeguards for agent analysis and approval submissions; structured processing events and richer deployment metadata.
    • Scheduler now detects and removes expired or invalid pending deployments.
  • Bug Fixes

    • Improved validation and early-exit for missing/invalid callback URLs or environments; fallback approval on certain failures.
    • Better serialization and handling of violation details with more robust approval/rejection retries.
  • Tests

    • Added unit tests for protection flows, fallback behavior, timeouts, and validation helpers.

@naaa760 naaa760 requested a review from dkargatzis as a code owner February 17, 2026 12:48
@coderabbitai
Copy link

coderabbitai bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds timeout and retry around agent execution and GitHub API calls, adds callback/environment validators, restructures approval/rejection flows with fallback approvals and structured logging, and changes scheduler re-evaluation to return (should_approve, should_remove).

Changes

Cohort / File(s) Summary
Deployment Protection Rule Processing
src/event_processors/deployment_protection_rule.py
Added AGENT_TIMEOUT_SECONDS; wrap agent execution with execute_with_timeout; add _is_valid_callback_url and _is_valid_environment; add retry (retry_async) for approve/reject calls via new _send_deployment_review; emit deployment_processing_start/complete/error events; serialize violations; fallback approval on failures; loosen _format_violations_comment typing.
Deployment Scheduler Re-evaluation
src/tasks/scheduler/deployment_scheduler.py
Added AGENT_TIMEOUT_SECONDS, MAX_CONSECUTIVE_FAILURES; use execute_with_timeout for engine/agent calls; _re_evaluate_deployment now returns (should_approve, should_remove); add _approve_deployment with retry; normalized UTC timestamps and structured logging; improved pending removal/time-based logic.
Unit Tests for Processor
tests/unit/event_processors/test_deployment_protection_rule.py
New tests covering agent exceptions/timeouts, fallback approval behavior, missing/invalid callback URL handling, validators _is_valid_callback_url/_is_valid_environment, retry exhaustion, and time-based violation scheduling; uses monkeypatch/mocks.
Utilities / Imports
src/core/utils/retry.py, src/core/utils/timeout.py (imports used)
Imported and applied retry_async and execute_with_timeout in processor and scheduler flows to add retries and timeouts (usage only; implementations unchanged).

Sequence Diagram(s)

sequenceDiagram
    participant Proc as Processor
    participant Timeout as Timeout Wrapper (execute_with_timeout)
    participant Agent as AI Agent
    participant Retry as retry_async
    participant API as GitHub API

    Proc->>Timeout: execute_with_timeout(agent.execute, AGENT_TIMEOUT_SECONDS)
    alt agent completes
        Timeout->>Agent: execute
        Agent-->>Timeout: result (violations)
        Timeout-->>Proc: result
        Proc->>Proc: normalize/serialize violations
        alt violations present
            Proc->>Retry: retry_async(_reject_deployment)
            Retry->>API: POST rejection callback
            API-->>Retry: success / error
            Retry-->>Proc: done
        else no violations
            Proc->>Retry: retry_async(_approve_deployment)
            Retry->>API: POST approval callback
            API-->>Retry: success / error
            Retry-->>Proc: done
        end
    else timeout/error
        Timeout-->>Proc: Exception
        Proc->>Proc: log error, compute processing_time
        alt valid callback & env
            Proc->>Retry: retry_async(_approve_deployment) with fallback comment
            Retry->>API: POST approval callback
            API-->>Proc: result
        else
            Proc-->>Proc: early return (no callback/action)
        end
    end
    Proc-->>Proc: emit deployment_processing_complete / error events
Loading
sequenceDiagram
    participant Scheduler as Deployment Scheduler
    participant Timeout as Timeout Wrapper (execute_with_timeout)
    participant Agent as AI Agent
    participant Evaluator as _re_evaluate_deployment
    participant API as GitHub API

    Scheduler->>Scheduler: _check_pending_deployments loop
    Scheduler->>Timeout: execute_with_timeout(agent.command, AGENT_TIMEOUT_SECONDS)
    alt execution succeeds
        Timeout->>Agent: run command
        Agent-->>Timeout: result
        Timeout-->>Scheduler: result
        Scheduler->>Evaluator: _re_evaluate_deployment(deployment)
        Evaluator-->>Scheduler: (should_approve, should_remove)
        alt should_approve
            Scheduler->>API: approve via retry_async
            API-->>Scheduler: success/failure
        else should_remove
            Scheduler->>Scheduler: remove pending deployment
        else
            Scheduler->>Scheduler: reschedule (time-based)
        end
    else timeout/error
        Timeout-->>Scheduler: error
        Scheduler->>Scheduler: increment failure_count / possibly remove after MAX_CONSECUTIVE_FAILURES
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through timeouts, retried with care,

When agents falter, I nudge approvals fair.
Violations shaped, in logs I leave a clue,
No deployment waiting—I'll see it through. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: resolve deployment status blocking without clear reason' accurately reflects the main objective of preventing deployments from remaining stuck in waiting state.
Linked Issues check ✅ Passed The PR implementation meets the core objectives: adds 30s timeout (AGENT_TIMEOUT_SECONDS), validates callback_url and environment, implements retry logic, adds fallback approval, restructures scheduler re-evaluation, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes directly address issue #26 requirements: timeout handling, validation, retry logic, fallback approvals, scheduler fixes, and test coverage. No out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/tasks/scheduler/deployment_scheduler.py (1)

74-90: ⚠️ Potential issue | 🟠 Major

callback_url missing from required fields can silently leave deployments blocked.

callback_url is not listed in required_fields (Lines 76–86), so a deployment can be added to the scheduler without one. Later, _approve_deployment (Line 324) silently returns when callback_url is missing, meaning the deployment will never be approved or rejected — exactly the "stuck in waiting" state this PR aims to fix.

Also, the falsy check on Line 87 (not deployment_data.get(field)) will reject valid values like deployment_id=0 or installation_id=0. Use a None check instead.

Proposed fix
         required_fields = [
             "deployment_id",
             "repo",
             "installation_id",
             "environment",
             "event_data",
             "rules",
             "violations",
             "time_based_violations",
             "created_at",
+            "callback_url",
         ]
-        missing_fields = [field for field in required_fields if not deployment_data.get(field)]
+        missing_fields = [field for field in required_fields if deployment_data.get(field) is None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 74 - 90, The
validation in required_fields is missing "callback_url" and uses a falsy check
that rejects valid zero values; update the required_fields list to include
"callback_url" and change the missing_fields computation from using not
deployment_data.get(field) to an explicit check like deployment_data.get(field)
is None so only None/missing values fail; ensure this validation (around the
required_fields and missing_fields logic that logs via logger.error and returns)
prevents silently enqueuing deployments that _approve_deployment will later skip
due to a missing callback_url.
src/event_processors/deployment_protection_rule.py (2)

176-189: ⚠️ Potential issue | 🟡 Minor

Inconsistent environment key in scheduler payload.

Line 181 uses deployment.get("environment") (from the nested deployment dict in the webhook payload), while Line 44 extracts environment from the top-level payload.get("environment"). These could differ. Use the already-extracted environment variable for consistency with the rest of the method.

Proposed fix
-                        "environment": deployment.get("environment"),
+                        "environment": environment,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 176 - 189,
The scheduler payload is using deployment.get("environment") while the handler
earlier extracted environment = payload.get("environment"); change the payload
to use that already-extracted environment variable instead of reading from the
nested deployment dict so the value is consistent across the method; update the
dict passed to get_deployment_scheduler().add_pending_deployment(...) to set
"environment": environment (and leave all other keys as-is) to ensure
get_deployment_scheduler and downstream code receive the same environment value
the rest of the function uses.

332-348: ⚠️ Potential issue | 🟡 Minor

_convert_rules_to_new_format will crash on dict rules; add dict handling.

This method accesses rule.description, rule.enabled, rule.severity, rule.event_types directly, assuming Rule objects. However, the upstream filtering logic at lines 80–100 explicitly handles both objects (hasattr(r, "event_types")) and dicts (isinstance(r, dict)), meaning dicts can reach this converter. The scheduler's static method DeploymentScheduler._convert_rules_to_new_format uses only .get() calls, safe for dicts. Add a type check (isinstance(rule, dict)) or ensure only Rule objects are passed downstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 332 - 348,
The _convert_rules_to_new_format function currently assumes rule is an object
and will attribute-access fields (description, enabled, severity, event_types,
parameters) which crashes for dict-form rules; update
_convert_rules_to_new_format to first check if isinstance(rule, dict) and, for
dicts, read values via rule.get(...) with safe defaults (e.g., empty dict/list),
otherwise keep the existing object-path behavior; also handle
backwards-compatible extraction from "conditions" for both dicts (e.g., iterate
rule.get("conditions", [])) and objects (rule.conditions) and normalize
severity/event_types by checking for .value or using the raw value as currently
done.
🧹 Nitpick comments (5)
src/event_processors/deployment_protection_rule.py (3)

1-12: Uses stdlib logging while scheduler uses structlog.

This file imports logging (Line 1) while deployment_scheduler.py uses structlog. The structured logging calls here pass extra={} dicts, which works with stdlib but won't produce the same output format as structlog's keyword arguments. For consistency across the deployment processing pipeline, consider switching to structlog here as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 1 - 12, The
module currently imports stdlib logging and creates logger =
logging.getLogger(__name__), causing structured logs to differ from the
scheduler; replace the stdlib logger with structlog (import structlog and create
logger = structlog.get_logger(__name__)), then update all logging calls in this
file (references to logger.debug/info/warning/error) to pass structured context
as keyword args instead of using extra={} so they match the structlog style used
by get_deployment_scheduler and the rest of the pipeline.

30-36: URL validator accepts schemes like "httpsomething://".

url.strip().startswith("http") will match any string beginning with "http", including non-URL strings like "httpfoo". A stricter check avoids false positives:

Proposed fix
     `@staticmethod`
     def _is_valid_callback_url(url: str | None) -> bool:
-        return bool(url and isinstance(url, str) and url.strip().startswith("http"))
+        return bool(url and isinstance(url, str) and (url.strip().startswith("https://") or url.strip().startswith("http://")))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 30 - 36, The
URL validator _is_valid_callback_url currently accepts any string starting with
"http" (e.g., "httpsomething://"); change it to parse and validate the URL
properly by using urllib.parse.urlparse (or an equivalent) inside
_is_valid_callback_url and ensure the parsed scheme is exactly "http" or "https"
and that parsed.netloc is non-empty (and optionally ensure the path/hostname
meet expected constraints) so only well-formed http(s) URLs are accepted; keep
the function signature and return a boolean.

14-14: AGENT_TIMEOUT_SECONDS is duplicated across two files.

This constant is defined identically in both deployment_scheduler.py (Line 18) and here (Line 14). Extract it to a shared location (e.g., src/core/constants.py or src/core/utils/timeout.py alongside execute_with_timeout) so it stays consistent.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` at line 14,
AGENT_TIMEOUT_SECONDS is duplicated; extract it into a shared module (e.g.,
core.constants) and update both modules to import it instead of redefining it:
create a single AGENT_TIMEOUT_SECONDS constant in core.constants, remove the
local AGENT_TIMEOUT_SECONDS definitions in deployment_protection_rule.py and
deployment_scheduler.py, and replace their usages (including any calls to
execute_with_timeout) with the imported constant to ensure a single source of
truth.
src/tasks/scheduler/deployment_scheduler.py (2)

264-287: Time-based violation detection via keyword matching is fragile.

The keyword list ("hour", "day", "time", "monday", etc.) is matched against free-text rule_description and message. This can produce false positives (e.g., a rule mentioning "daytime" or "timestamp") or false negatives (e.g., a time rule phrased differently). Consider tagging violations with a structured violation_type field at the source (agent/engine) instead of relying on text matching.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 264 - 287, The
current free-text keyword matching inside the violations loop (where violations
are inspected and appended to time_based_violations or other_violations) is
fragile; update the logic to first check for a structured field like
violation.get("violation_type") or violation.get("type") and treat values like
"time", "time-based", or "schedule" as time-based, falling back only if that
field is absent to a safer text check (use whole-word/regex token matching or a
normalized token set rather than naive substring checks) before appending to
time_based_violations or other_violations; ensure you update the code paths
around the violations loop and any callers that create violations to prefer
setting violation_type.

89-101: Mixed logging styles: f-strings vs structlog keyword args.

Lines 89, 101, 114, and 186 use logger.error(f"...") / logger.info(f"..."), while the rest of the file correctly uses structlog's keyword-argument style (e.g., Line 92–97). f-string interpolation bypasses structlog's structured field extraction. As per coding guidelines, use structured logging at boundaries with fields like operation, subject_ids, etc.

Example fix for Line 89
-            logger.error(f"Missing required fields for deployment scheduler: {missing_fields}")
+            logger.error("deployment_scheduler_missing_fields", missing=missing_fields)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 89 - 101, Replace
the f-string logging calls with structlog-style keyword args so fields are
captured: change logger.error(f"Missing required fields for deployment
scheduler: {missing_fields}") to a structured logger.error call that passes a
message string plus missing_fields=missing_fields (and include
operation="deployment_scheduler_add" or similar); likewise replace
logger.info(...) that currently uses an f-string to pass the static message and
fields via keyword args (use deployment_id=deployment_data["deployment_id"],
repo=deployment_data["repo"], time_based_violations=len(...)). Also update the
exception handler logger.error(f"Error adding deployment to scheduler: {e}") to
logger.error with a clear message and error/exception passed as a kwarg or use
exc_info=True so structured logging captures the exception; refer to
logger.error, logger.info, deployment_data, missing_fields and
pending_deployments.append to find the exact spots to change.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8302582 and c42d7c9.

📒 Files selected for processing (3)
  • src/event_processors/deployment_protection_rule.py
  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
  • src/event_processors/deployment_protection_rule.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
🧠 Learnings (2)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
🧬 Code graph analysis (3)
src/tasks/scheduler/deployment_scheduler.py (4)
src/core/utils/retry.py (1)
  • retry_async (81-135)
src/core/utils/timeout.py (1)
  • execute_with_timeout (15-45)
src/event_processors/deployment_protection_rule.py (3)
  • _approve_deployment (258-287)
  • _convert_rules_to_new_format (332-348)
  • _do_approve (259-266)
src/integrations/github/api.py (2)
  • get_installation_access_token (98-128)
  • review_deployment_protection_rule (628-663)
tests/unit/event_processors/test_deployment_protection_rule.py (3)
src/event_processors/deployment_protection_rule.py (3)
  • process (38-248)
  • _is_valid_callback_url (31-32)
  • _is_valid_environment (35-36)
src/tasks/task_queue.py (2)
  • Task (19-47)
  • installation_id (41-47)
src/integrations/github/api.py (1)
  • review_deployment_protection_rule (628-663)
src/event_processors/deployment_protection_rule.py (5)
src/core/utils/retry.py (1)
  • retry_async (81-135)
src/core/utils/timeout.py (1)
  • execute_with_timeout (15-45)
src/event_processors/base.py (3)
  • BaseEventProcessor (26-61)
  • ProcessingResult (16-23)
  • _get_rule_provider (53-55)
src/tasks/task_queue.py (3)
  • Task (19-47)
  • repo_full_name (33-38)
  • installation_id (41-47)
src/integrations/github/api.py (1)
  • review_deployment_protection_rule (628-663)
🪛 GitHub Actions: Run pre-commit hooks
src/tasks/scheduler/deployment_scheduler.py

[error] 145-145: UP038 Use X | Y in isinstance call instead of (X, Y)


[error] 379-379: UP038 Use X | Y in isinstance call instead of (X, Y)

src/event_processors/deployment_protection_rule.py

[warning] 1-1: ruff-format reformatted files during pre-commit (formatting changes applied)

🪛 GitHub Actions: Unit and Integration Tests
tests/unit/event_processors/test_deployment_protection_rule.py

[error] 51-51: Test 'test_exception_calls_fallback_approval' failed: expected result.success to be False but was True.


[error] 74-74: Test 'test_exception_without_callback_skips_approval' failed: expected result.success to be False but was True.

🔇 Additional comments (4)
src/tasks/scheduler/deployment_scheduler.py (3)

237-254: Timeout-wrapped agent execution and violations extraction look good.

The execute_with_timeout integration with AGENT_TIMEOUT_SECONDS aligns with the PR objective. Defensive getattr usage in Lines 250–254 handles possible missing attributes gracefully. However, if result itself is None (e.g., timeout raises but is caught by the outer except), the code path is safe since Line 248 guards with if result.data.

One note: result.data access on Line 248 would raise AttributeError if execute_with_timeout were to return None. That can't happen with the current execute_with_timeout implementation (it raises on timeout), but it's worth a comment for clarity.


340-369: Retry wrapping for scheduled approval — LGTM.

The retry_async with 3 retries, exponential backoff, and bounded max_delay is well-configured. Validation of callback_url and installation_id before attempting the API call prevents unnecessary retries on clearly invalid state.


145-146: ⚠️ Potential issue | 🟡 Minor

Pipeline failure: use int | float in isinstance call (UP038).

The pre-commit hook flags this line. Same issue on Line 379 in get_status.

Proposed fix
-                if isinstance(created_at, (int, float)):
+                if isinstance(created_at, int | float):

And similarly for Line 379:

-                    if isinstance(created_at, (int, float)):
+                    if isinstance(created_at, int | float):

Likely an incorrect or invalid review comment.

src/event_processors/deployment_protection_rule.py (1)

258-287: Retry-wrapped approve/reject methods — LGTM.

The retry wrapping with retry_async and proper None-result handling provides the resilience needed for transient GitHub API failures. The pattern is consistent between _approve_deployment and _reject_deployment.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 213-248: The fallback_comment currently embeds the raw exception
string (str(e)) which can leak internal details; update the exception handling
in deployment_protection_rule.py so before constructing fallback_comment you
produce a sanitized and truncated error excerpt (e.g., remove newlines, redact
file paths/absolute paths, strip stack traces, replace long tokens, and limit to
a safe length like 200 chars), then use that sanitized_excerpt in
fallback_comment; keep the full error in logger.error but ensure the call to
_approve_deployment(...) uses only the sanitized excerpt and a generic prefix
like "Processing failed: <sanitized_excerpt>. Approved as fallback to avoid
indefinite blocking." Reference symbols: _approve_deployment, fallback_comment,
_is_valid_callback_url, _is_valid_environment, and the exception block
constructing ProcessingResult.
- Around line 143-161: Change the direct attribute accesses on violation objects
to defensive getattr calls (providing sensible defaults) to avoid AttributeError
when fields are missing: in the block that builds violation_dict from
eval_result.violations (where variables are analysis_result / eval_result /
violation and the resulting list is violations), replace expressions like
violation.rule_description, violation.severity, violation.message,
violation.details, violation.how_to_fix, violation.docs_url,
violation.validation_strategy.value (or .validation_strategy) and
violation.execution_time_ms with getattr(violation, "<attr>", "<default>")
patterns (for validation_strategy preserve previous logic but guard with getattr
and check for .value safely), mirroring the approach used in
deployment_scheduler.py so missing attributes produce safe defaults (e.g., empty
string) instead of raising.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 379-380: The timestamp handling in get_status is producing a naive
datetime when created_at is numeric; update the conversion that currently uses
datetime.fromtimestamp(created_at).isoformat() to create a UTC-aware datetime
instead (e.g., use datetime.fromtimestamp(created_at, tz=UTC).isoformat()),
matching the file's use of UTC (see other usages like
datetime.fromtimestamp(..., tz=UTC) and datetime.now(UTC)) so all comparisons
remain tz-aware and consistent.

In `@tests/unit/event_processors/test_deployment_protection_rule.py`:
- Around line 1-109: Add unit tests exercising the missing error and happy
paths: (1) simulate agent.execute hanging past AGENT_TIMEOUT_SECONDS and assert
execute_with_timeout triggers fallback approval/handled error when calling
DeploymentProtectionRuleProcessor.process; (2) mock retry_async to exhaust
retries for DeploymentProtectionRuleProcessor._approve_deployment and
_reject_deployment and assert the final failure behavior and that
github_client.review_deployment_protection_rule is (not) called accordingly; (3)
create a task whose agent returns violations containing time-based keywords and
assert get_deployment_scheduler().add_pending_deployment is called with the
expected repo/deployment info; (4) add a happy-path test where agent.execute
returns violations and assert processor.process results in
github_client.review_deployment_protection_rule being called with
state="rejected" and appropriate comment; use the symbols
DeploymentProtectionRuleProcessor.process, _approve_deployment,
_reject_deployment, execute_with_timeout, retry_async, AGENT_TIMEOUT_SECONDS,
get_deployment_scheduler, and add_pending_deployment to locate code.
- Around line 78-95: The test currently sets
processor.rule_provider.get_rules.return_value = [] so it hits the "no rules"
path; change it to return at least one rule (e.g.,
processor.rule_provider.get_rules.return_value = [MagicMock()]) and make the
agent evaluation return a non-empty violations list
(mock_agent.execute.return_value = MagicMock(data={"evaluation_result":
MagicMock(violations=[MagicMock()])})), keep deployment_callback_url as
"not-a-valid-url", then assert that
processor.github_client.review_deployment_protection_rule.assert_not_called()
and that the returned result indicates rejection (e.g., result.success is False)
so the test verifies invalid callback URL prevents the API call when
rules/violations exist.
- Around line 44-56: The tests never reach agent execution because
processor.rule_provider.get_rules yields an empty MagicMock iterable; update
both tests (test_exception_calls_fallback_approval and
test_exception_without_callback_skips_approval) to configure
processor.rule_provider.get_rules to return at least one rule whose event_types
includes "deployment" so deployment_rules is non-empty and code reaches
self.engine_agent.execute; e.g., set
processor.rule_provider.get_rules.return_value (or .return_value for the
AsyncMock) to an iterable like [{"event_types": ["deployment"], "name":
"test-rule"}] before calling await processor.process(task).

---

Outside diff comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 176-189: The scheduler payload is using
deployment.get("environment") while the handler earlier extracted environment =
payload.get("environment"); change the payload to use that already-extracted
environment variable instead of reading from the nested deployment dict so the
value is consistent across the method; update the dict passed to
get_deployment_scheduler().add_pending_deployment(...) to set "environment":
environment (and leave all other keys as-is) to ensure get_deployment_scheduler
and downstream code receive the same environment value the rest of the function
uses.
- Around line 332-348: The _convert_rules_to_new_format function currently
assumes rule is an object and will attribute-access fields (description,
enabled, severity, event_types, parameters) which crashes for dict-form rules;
update _convert_rules_to_new_format to first check if isinstance(rule, dict)
and, for dicts, read values via rule.get(...) with safe defaults (e.g., empty
dict/list), otherwise keep the existing object-path behavior; also handle
backwards-compatible extraction from "conditions" for both dicts (e.g., iterate
rule.get("conditions", [])) and objects (rule.conditions) and normalize
severity/event_types by checking for .value or using the raw value as currently
done.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 74-90: The validation in required_fields is missing "callback_url"
and uses a falsy check that rejects valid zero values; update the
required_fields list to include "callback_url" and change the missing_fields
computation from using not deployment_data.get(field) to an explicit check like
deployment_data.get(field) is None so only None/missing values fail; ensure this
validation (around the required_fields and missing_fields logic that logs via
logger.error and returns) prevents silently enqueuing deployments that
_approve_deployment will later skip due to a missing callback_url.

---

Nitpick comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 1-12: The module currently imports stdlib logging and creates
logger = logging.getLogger(__name__), causing structured logs to differ from the
scheduler; replace the stdlib logger with structlog (import structlog and create
logger = structlog.get_logger(__name__)), then update all logging calls in this
file (references to logger.debug/info/warning/error) to pass structured context
as keyword args instead of using extra={} so they match the structlog style used
by get_deployment_scheduler and the rest of the pipeline.
- Around line 30-36: The URL validator _is_valid_callback_url currently accepts
any string starting with "http" (e.g., "httpsomething://"); change it to parse
and validate the URL properly by using urllib.parse.urlparse (or an equivalent)
inside _is_valid_callback_url and ensure the parsed scheme is exactly "http" or
"https" and that parsed.netloc is non-empty (and optionally ensure the
path/hostname meet expected constraints) so only well-formed http(s) URLs are
accepted; keep the function signature and return a boolean.
- Line 14: AGENT_TIMEOUT_SECONDS is duplicated; extract it into a shared module
(e.g., core.constants) and update both modules to import it instead of
redefining it: create a single AGENT_TIMEOUT_SECONDS constant in core.constants,
remove the local AGENT_TIMEOUT_SECONDS definitions in
deployment_protection_rule.py and deployment_scheduler.py, and replace their
usages (including any calls to execute_with_timeout) with the imported constant
to ensure a single source of truth.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 264-287: The current free-text keyword matching inside the
violations loop (where violations are inspected and appended to
time_based_violations or other_violations) is fragile; update the logic to first
check for a structured field like violation.get("violation_type") or
violation.get("type") and treat values like "time", "time-based", or "schedule"
as time-based, falling back only if that field is absent to a safer text check
(use whole-word/regex token matching or a normalized token set rather than naive
substring checks) before appending to time_based_violations or other_violations;
ensure you update the code paths around the violations loop and any callers that
create violations to prefer setting violation_type.
- Around line 89-101: Replace the f-string logging calls with structlog-style
keyword args so fields are captured: change logger.error(f"Missing required
fields for deployment scheduler: {missing_fields}") to a structured logger.error
call that passes a message string plus missing_fields=missing_fields (and
include operation="deployment_scheduler_add" or similar); likewise replace
logger.info(...) that currently uses an f-string to pass the static message and
fields via keyword args (use deployment_id=deployment_data["deployment_id"],
repo=deployment_data["repo"], time_based_violations=len(...)). Also update the
exception handler logger.error(f"Error adding deployment to scheduler: {e}") to
logger.error with a clear message and error/exception passed as a kwarg or use
exc_info=True so structured logging captures the exception; refer to
logger.error, logger.info, deployment_data, missing_fields and
pending_deployments.append to find the exact spots to change.

@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 63.24503% with 111 lines in your changes missing coverage. Please review.

❌ Your patch status has failed because the patch coverage (63.2%) is below the target coverage (80.0%). You can increase the patch coverage or adjust the target coverage.
❌ Your project status has failed because the head coverage (68.2%) is below the target coverage (80.0%). You can increase the head coverage or adjust the target coverage.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@           Coverage Diff           @@
##            main     #49     +/-   ##
=======================================
+ Coverage   67.0%   68.2%   +1.2%     
=======================================
  Files        151     152      +1     
  Lines       9384    9560    +176     
=======================================
+ Hits        6290    6524    +234     
+ Misses      3094    3036     -58     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8302582...75785f4. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@naaa760
Copy link
Contributor Author

naaa760 commented Feb 18, 2026

@dimeloper
I fixed the failing checks and addressed codeRabbit’s comments.
Deployments won’t get stuck in “waiting” anymore (timeout, retries, validation, and fallback approval added).
You can re-run the checks now.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/event_processors/deployment_protection_rule.py (2)

252-258: ⚠️ Potential issue | 🟠 Major

_check_time_based_violations keyword set is narrower than the scheduler's — day-of-week violations bypass scheduling.

_check_time_based_violations uses ["hours", "weekend", "time", "day"]. The scheduler's _re_evaluate_deployment (Lines 283–298 in deployment_scheduler.py) uses a superset that includes "hour" (singular) and all seven weekday names. Any violation whose rule_description contains only "monday", "tuesday", etc., or only the singular "hour", will not be classified as time-based here, so it will never be added to the scheduler — the deployment is immediately rejected with no chance of re-evaluation. Meanwhile those same keywords in the scheduler can never be triggered because such violations never reach it.

🔧 Proposed fix — align keyword sets
-        if any(k in v.get("rule_description", "").lower() for k in ["hours", "weekend", "time", "day"])
+        rule_text = (v.get("rule_description", "") + " " + v.get("message", "")).lower()
+        if any(
+            k in rule_text
+            for k in [
+                "hour", "day", "weekend", "time",
+                "monday", "tuesday", "wednesday", "thursday",
+                "friday", "saturday", "sunday",
+            ]
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 252 - 258,
The time-based keyword set in _check_time_based_violations is too narrow and
omits tokens used by the scheduler; update _check_time_based_violations to use
the same keyword set as _re_evaluate_deployment (include "hour" and all seven
weekday names like "monday","tuesday",... as well as existing tokens) or better
yet extract the keywords into a shared constant used by both functions so
day-of-week and singular "hour" descriptions are classified as time-based and
will be routed to the scheduler.

174-194: ⚠️ Potential issue | 🟠 Major

Fix the reject-then-approve sequence: callback URL is consumed after first use and cannot be reused.

When time_based_violations is non-empty, the code stores callback_url in the scheduler (line 183) and immediately calls _reject_deployment with the same URL (line 192). GitHub's deployment protection rule API allows only one terminal state change per callback URL; once rejected, the URL cannot be reused to approve. When the scheduler later attempts _approve_deployment with that same URL, GitHub returns 422 ("No pending custom deployment requests"), the error is silently caught in the exception handler, and the deployment remains permanently rejected.

Solution: Skip the rejection when only time-based violations are present. Only reject if there are non-time-based violations that require immediate action. Let the scheduler handle approval when the time window opens.

             if time_based_violations:
                 await get_deployment_scheduler().add_pending_deployment(...)
-                logger.info("Time-based violations detected, added to scheduler for re-evaluation")
+                logger.info("Time-based violations detected, held for scheduler re-evaluation (not rejected now)")

-            if can_call_callback:
-                await self._reject_deployment(deployment_callback_url, environment, violations, installation_id)
+            # Only reject when there are non-time-based violations
+            non_time_violations = [v for v in violations if v not in time_based_violations]
+            if non_time_violations and can_call_callback:
+                await self._reject_deployment(deployment_callback_url, environment, non_time_violations, installation_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 174 - 194,
The code currently rejects the deployment immediately even when only time-based
violations exist, consuming deployment_callback_url and preventing later
approval; update the logic in the block that calls
self._check_time_based_violations so that if time_based_violations is non-empty
you add the pending item via get_deployment_scheduler().add_pending_deployment
(preserving deployment_callback_url) but do NOT call self._reject_deployment;
only call self._reject_deployment when there are non-time-based violations that
require immediate rejection (i.e., violations minus time_based_violations is
non-empty), allowing the scheduler to call self._approve_deployment later with
the preserved callback URL.
🧹 Nitpick comments (1)
src/event_processors/deployment_protection_rule.py (1)

353-353: Missing type annotation on violations parameter.

_format_violations_comment(violations) lacks a type annotation. As per coding guidelines (**/*.py): use modern typing only.

💡 Proposed fix
-    def _format_violations_comment(violations):
+    def _format_violations_comment(violations: list[dict[str, Any]]) -> str:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` at line 353, The function
_format_violations_comment currently has an untyped parameter; add a modern
typing annotation (e.g., change def _format_violations_comment(violations) to
def _format_violations_comment(violations: List[Dict[str, Any]]) and ensure you
import List, Dict, Any from typing at the top of the module) so the parameter
and any returned value follow the repository's typing guidelines; update any
local variable annotations inside _format_violations_comment if needed to match
the chosen types.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c42d7c9 and 60b820f.

📒 Files selected for processing (3)
  • src/event_processors/deployment_protection_rule.py
  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
  • src/event_processors/deployment_protection_rule.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
🧠 Learnings (4)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write regression tests for every bug fix; keep CI coverage thresholds green

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Fail closed for risky decisions; provide actionable remediation in error paths

Applied to files:

  • src/event_processors/deployment_protection_rule.py
🧬 Code graph analysis (1)
tests/unit/event_processors/test_deployment_protection_rule.py (4)
src/core/models.py (1)
  • EventType (91-105)
src/event_processors/deployment_protection_rule.py (3)
  • process (38-250)
  • _is_valid_callback_url (31-32)
  • _is_valid_environment (35-36)
src/rules/models.py (2)
  • Rule (50-64)
  • RuleSeverity (12-20)
src/tasks/task_queue.py (2)
  • Task (19-47)
  • installation_id (41-47)
🪛 GitHub Actions: Run pre-commit hooks
src/tasks/scheduler/deployment_scheduler.py

[error] 146-146: UP038 Use X | Y in isinstance call instead of (X, Y)


[error] 405-405: UP038 Use X | Y in isinstance call instead of (X, Y)

🔇 Additional comments (3)
tests/unit/event_processors/test_deployment_protection_rule.py (1)

56-89: Previously failing tests are now correctly configured.

Both test_exception_calls_fallback_approval (Line 58) and test_exception_without_callback_skips_approval (Line 83) now set rule_provider.get_rules.return_value = [_make_deployment_rule()], which ensures agent execution is reached and the fallback path is exercised. This directly addresses the root cause of the prior CI failures.

src/tasks/scheduler/deployment_scheduler.py (1)

324-340: Failure-count tracking on generic exceptions addresses the "stuck indefinitely" risk.

Incrementing deployment["failure_count"] in the except block and returning (False, True) after MAX_CONSECUTIVE_FAILURES ensures that a persistently failing deployment is eventually removed rather than spinning forever. This directly addresses the prior concern about (False, False) keeping deployments in the scheduler indefinitely.

src/event_processors/deployment_protection_rule.py (1)

131-139: Remove this review comment—execute_with_timeout is correctly designed and used.

The function explicitly expects a Coroutine object (line 16 of src/core/utils/timeout.py: coro: Coroutine[Any, Any, Any]), not a callable/factory. It uses asyncio.wait_for(), the standard pattern for timeout enforcement, which properly handles coroutine cancellation when a timeout occurs. Both call sites (lines 131–139 in deployment_protection_rule.py and line 248 in deployment_scheduler.py) correctly pass coroutine objects by invoking self.engine_agent.execute(...) directly. There is no issue with pre-created coroutines or timeout enforcement; the implementation and usage are correct.

Likely an incorrect or invalid review comment.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Line 146: Replace the isinstance(...) calls that use a tuple of types with the
Python 3.10+ union type syntax: change occurrences like isinstance(created_at,
(int, float)) to isinstance(created_at, int | float); apply the same change in
the get_status function (the other isinstance that uses a tuple of numeric
types) so both checks use the | union form to satisfy Ruff UP038.
- Around line 176-180: The scheduler currently removes deployments
unconditionally after calling _approve_deployment because that helper swallows
exceptions and returns None; change _approve_deployment to return a boolean
(True on confirmed/verified approval, False on failure) and only add the index
to deployments_to_remove when should_approve is True AND
_approve_deployment(...) returns True; update _approve_deployment (function
name) to return True on successful GitHub confirmation and False on
non-retriable/final failure (while still logging errors) so the calling loop
(where deployments_to_remove is appended) only removes confirmed approvals.

---

Outside diff comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 252-258: The time-based keyword set in
_check_time_based_violations is too narrow and omits tokens used by the
scheduler; update _check_time_based_violations to use the same keyword set as
_re_evaluate_deployment (include "hour" and all seven weekday names like
"monday","tuesday",... as well as existing tokens) or better yet extract the
keywords into a shared constant used by both functions so day-of-week and
singular "hour" descriptions are classified as time-based and will be routed to
the scheduler.
- Around line 174-194: The code currently rejects the deployment immediately
even when only time-based violations exist, consuming deployment_callback_url
and preventing later approval; update the logic in the block that calls
self._check_time_based_violations so that if time_based_violations is non-empty
you add the pending item via get_deployment_scheduler().add_pending_deployment
(preserving deployment_callback_url) but do NOT call self._reject_deployment;
only call self._reject_deployment when there are non-time-based violations that
require immediate rejection (i.e., violations minus time_based_violations is
non-empty), allowing the scheduler to call self._approve_deployment later with
the preserved callback URL.

---

Duplicate comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 230-232: The fallback_comment currently embeds a truncated raw
exception string (err_msg) which can leak internal details; instead, change the
logic around the _is_valid_callback_url/_is_valid_environment branch so
fallback_comment contains a generic, non-sensitive message (e.g., "Processing
failed due to an internal error; approved as fallback to avoid blocking.") and
send the full exception details to internal logging (not the public GitHub
comment); update references to err_msg/fallback_comment in
deployment_protection_rule.py and ensure exc_callback_url/exc_environment checks
still gate the fallback comment creation.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Line 405: In get_status, replace the isinstance check using a tuple for
numeric types with a union type: change the isinstance(created_at, (int, float))
pattern to use isinstance(created_at, int | float) so the created_at check
follows the UP038 style; update the check in the get_status function (and any
identical occurrences) to use the int | float union form.

In `@tests/unit/event_processors/test_deployment_protection_rule.py`:
- Around line 1-109: Add unit tests to cover the missing error and happy-path
scenarios: (1) a test that mocks execute_with_timeout (or simulates
agent.execute sleeping) to exceed AGENT_TIMEOUT_SECONDS and asserts the
processor falls back to calling github_client.review_deployment_protection_rule
with an approval comment containing the timeout; (2) a test that mocks
retry_async to exhaust retries for _approve_deployment and _reject_deployment
(or stub retry_async to raise after retries) and asserts the processor records
failure and that review_deployment_protection_rule is called or skipped
appropriately; (3) a test that returns an agent result containing a time-keyword
violation (use mock_agent.execute returning data with
evaluation_result.violations containing a time-keyword) and assert
get_deployment_scheduler().add_pending_deployment is called and
_reject_deployment/github review is not invoked; and (4) a happy-path rejection
test where mock_agent.execute returns violations (non-time) and assert
_reject_deployment path runs and github_client.review_deployment_protection_rule
is called with state "rejected". Target functions/classes:
DeploymentProtectionRuleProcessor.process, mock_agent.execute,
execute_with_timeout/AGENT_TIMEOUT_SECONDS, retry_async, _approve_deployment,
_reject_deployment, and get_deployment_scheduler/add_pending_deployment.

---

Nitpick comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Line 353: The function _format_violations_comment currently has an untyped
parameter; add a modern typing annotation (e.g., change def
_format_violations_comment(violations) to def
_format_violations_comment(violations: List[Dict[str, Any]]) and ensure you
import List, Dict, Any from typing at the top of the module) so the parameter
and any returned value follow the repository's typing guidelines; update any
local variable annotations inside _format_violations_comment if needed to match
the chosen types.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tasks/scheduler/deployment_scheduler.py (1)

75-91: ⚠️ Potential issue | 🔴 Critical

Critical bug: falsy-value check incorrectly rejects valid empty lists and zero values.

Line 88 uses not deployment_data.get(field) to detect missing fields, which treats [], {}, 0, and "" as missing. However, violations: [] and time_based_violations: [] are valid states (no violations), and created_at: 0 is a valid epoch timestamp. This causes legitimate deployments to be rejected.

Replace the check to only reject missing keys or explicit None:

-            missing_fields = [field for field in required_fields if not deployment_data.get(field)]
+            missing_fields = [field for field in required_fields if field not in deployment_data or deployment_data[field] is None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 75 - 91, The
falsy-value check in deployment_scheduler.py incorrectly treats valid empty
lists/zero values as missing; update the missing_fields computation to only mark
a field missing if the key is absent or its value is explicitly None (e.g., use
"if field not in deployment_data or deployment_data[field] is None" instead of
"not deployment_data.get(field)"); keep the existing required_fields list and
then log via logger.error the missing_fields and return as before.
🧹 Nitpick comments (4)
src/tasks/scheduler/deployment_scheduler.py (4)

59-74: Deployment data is an unvalidated dict[str, Any] — consider a typed model.

The deployment dict accumulates fields over its lifecycle (failure_count, last_checked, github_token) with no schema enforcement. A typo in a key name or an unexpected type would silently pass. A Pydantic BaseModel (or at minimum a TypedDict) would catch mismatches at the boundary and make the required-fields validation (line 77–91) unnecessary. As per coding guidelines, "All agent outputs and external payloads must use validated BaseModel from Pydantic" and "Use dataclasses for internal immutable state where appropriate".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 59 - 74, The
add_pending_deployment function accepts an unvalidated dict which accumulates
fields over time and risks silent key/type errors; replace this ad-hoc dict with
a validated Pydantic model (e.g., create a DeploymentData BaseModel with
required fields like deployment_id, repo, installation_id, environment,
event_data, rules, violations, time_based_violations, created_at and optional
fields failure_count, last_checked, github_token) and update
add_pending_deployment to accept DeploymentData (or a TypedDict fallback) so
construction/validation happens at the boundary; remove the manual
required-fields checks in add_pending_deployment and adapt any code that mutates
or extends the dict to use model.copy(update=...) or convert to an internal
dataclass for immutable state where needed.

321-322: Unreachable code — this branch can never execute.

Every violation in the for loop (lines 278–301) is appended to either time_based_violations or other_violations. Since violations is non-empty at this point (line 269 guard), at least one of those lists is non-empty, and both are handled with early returns at lines 310 and 319 respectively. Lines 321–322 are dead code.

🧹 Proposed cleanup
-            deployment["failure_count"] = 0
             if time_based_violations:
                 logger.info(
                     "deployment_scheduler_time_violations",
                     repo=deployment.get("repo"),
                     count=len(time_based_violations),
                 )
                 return False, False
-
-            logger.info("deployment_scheduler_all_resolved", repo=deployment.get("repo"))
-            return True, False

If you want to keep a defensive fallback, add an assertion or # pragma: no cover so it's clear it's unreachable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 321 - 322, The
final return block after logger.info("deployment_scheduler_all_resolved",
repo=deployment.get("repo")) is unreachable because the earlier guard ensures
violations is non-empty and the loop always appends into either
time_based_violations or other_violations which are returned earlier; remove the
dead branch (lines containing that logger.info + return True, False) or replace
it with a defensive assertion (e.g., assert False, "unreachable") or mark it
with a coverage pragma (# pragma: no cover) so intent is explicit; update
references in deployment_scheduler.py around the handling of
time_based_violations and other_violations to keep control flow clear.

88-102: Inconsistent logging: f-strings vs. structured keyword args.

Lines 90, 102, 115, 189, and 433 embed data into the log message via f-strings, while the rest of the file correctly uses structured keyword arguments. This undermines log aggregation and filtering. As per coding guidelines, "Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms".

Example fix for lines 90 and 102
-            logger.error(f"Missing required fields for deployment scheduler: {missing_fields}")
+            logger.error("deployment_scheduler_add_missing_fields", missing_fields=missing_fields)
             ...
-            logger.error(f"Error adding deployment to scheduler: {e}")
+            logger.error("deployment_scheduler_add_error", error=str(e))

Apply the same pattern to lines 115, 189, and 433.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 88 - 102, Replace
all f-string log messages in deployment scheduler with structured logging
keyword args: for the missing fields case (variable missing_fields) change
logger.error(f"Missing required fields for deployment scheduler:
{missing_fields}") to logger.error("missing_required_fields",
operation="deployment_scheduler_add", missing_fields=missing_fields) and for the
exception handler at the end change logger.error(f"Error adding deployment to
scheduler: {e}") to logger.error("error_adding_deployment",
operation="deployment_scheduler_add", error=str(e)) ; apply the same pattern to
the other occurrences referenced (lines around where f-strings are used at the
blocks around lines 115, 189, and 433) ensuring you emit the required structured
fields at boundaries: operation, subject_ids (if applicable), decision (if
applicable), and latency_ms (if available) instead of inlining values into the
message text.

278-301: Fragile keyword-based classification of time-based vs. non-time violations.

String matching on keywords like "time", "day", "hour" is prone to false positives (e.g., "runtime error", "day-zero vulnerability", "first-time deploy") and false negatives (e.g., a rule phrased as "not during business hours" matching, but "only deploy between 9–17" not matching). A mismatch here silently changes whether a deployment is removed or kept pending.

Consider having the agent or rule engine tag violations with a structured type field (e.g., "time_based": true) rather than inferring it from free text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 278 - 301, The
current keyword-based classification in the violations loop (where violations
are split into time_based_violations and other_violations) is fragile; update
the logic to first prefer an explicit structured flag (e.g., check
violation.get("time_based") or violation.get("type") == "time") and only if
absent use a safer heuristic: use regex with word boundaries and phrase matching
for time expressions (e.g.,
r"\b(hour|day|weekend|monday|tuesday|wednesday|thursday|friday|saturday|sunday|business
hours|between \d{1,2}(:\d{2})?\b|\b\d{1,2}:\d{2}\b)\b"), explicitly exclude
common false-positive terms (e.g., "runtime", "first-time", "day-zero") and
prefer matches for numeric time ranges; update the loop around violations, the
classification branch that appends to time_based_violations or other_violations,
to implement this fallback strategy so structured metadata is used when present
and regex-based inference is more precise when needed.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 60b820f and 3cabf88.

📒 Files selected for processing (1)
  • src/tasks/scheduler/deployment_scheduler.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/tasks/scheduler/deployment_scheduler.py
🔇 Additional comments (2)
src/tasks/scheduler/deployment_scheduler.py (2)

1-19: LGTM — imports and constants are clean.

The lazy-import pattern via TYPE_CHECKING, the module-level constants, and the structured logger setup all look good.


397-434: LGTM — past review issues (naive datetime, isinstance syntax) have been addressed.

The get_status method now uses UTC-aware timestamps and modern int | float union syntax correctly.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 206-215: The current presence check uses deployment.get(field)
which treats empty lists/dicts as missing (e.g., rules: []), so change the
missing_fields computation to only treat a field as missing when the key is
absent or its value is explicitly None; replace the list comprehension that
builds missing_fields (which currently uses deployment.get(field)) with one that
checks "field not in deployment or deployment[field] is None" so empty
containers are considered present; update the code in deployment_scheduler.py
where required_fields, missing_fields, and deployment.get(field) are referenced
to use this new existence/None check.
- Around line 258-267: The engine agent output (RuleEvaluationResult model
returned by engine_agent.execute) is missing required fields: confidence (0..1),
reasoning, recommendations, and strategy_used; update the RuleEvaluationResult
Pydantic model to add these fields (confidence: float, reasoning: str,
recommendations: List[str], strategy_used: str) and ensure engine_agent.execute
populates them; in deployment_scheduler.py, after obtaining
result/data/evaluation_result, implement a confidence gate that rejects or
routes to human-in-the-loop when evaluation_result.confidence < 0.5 (e.g., set
decision to "reject" or call the human review path) before processing
violations; finally remove the unreachable dead-code that follows the
time_based_violations/other_violations early returns (the code around
time_based_violations and other_violations handling which makes lines after the
returns unreachable).

---

Outside diff comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 75-91: The falsy-value check in deployment_scheduler.py
incorrectly treats valid empty lists/zero values as missing; update the
missing_fields computation to only mark a field missing if the key is absent or
its value is explicitly None (e.g., use "if field not in deployment_data or
deployment_data[field] is None" instead of "not deployment_data.get(field)");
keep the existing required_fields list and then log via logger.error the
missing_fields and return as before.

---

Duplicate comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 176-178: The code unconditionally appends an index to
deployments_to_remove even when _approve_deployment fails; change
_approve_deployment to return a boolean (True on confirmed approval, False on
failure) or raise on unrecoverable error, ensure it logs retries/errors, and
then in the scheduler loop (where should_approve is checked) only call
deployments_to_remove.append(i) when _approve_deployment indicates success
(i.e., if await self._approve_deployment(deployment):
deployments_to_remove.append(i)). Reference _approve_deployment and
deployments_to_remove in your changes and keep behavior consistent with
retry/error logging so failed approvals are not removed from the scheduler.
- Around line 375-395: The current _approve_deployment block swallows all
exceptions (exceptions=(Exception,)) and returns None on failure; change it so
successful approval returns True and failures return False, and narrow retries
to transient errors only (e.g., aiohttp.ClientError, asyncio.TimeoutError,
OSError or other network/5xx related exceptions) by replacing
exceptions=(Exception,) with an explicit transient-exceptions tuple when calling
retry_async(_do_approve,...). Also update the except branch to log the error
(keep logger.error(..., error=str(e))) and then return False; let non-retryable
exceptions propagate by not including them in the retry_async exceptions tuple
so they aren’t retried.

---

Nitpick comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 59-74: The add_pending_deployment function accepts an unvalidated
dict which accumulates fields over time and risks silent key/type errors;
replace this ad-hoc dict with a validated Pydantic model (e.g., create a
DeploymentData BaseModel with required fields like deployment_id, repo,
installation_id, environment, event_data, rules, violations,
time_based_violations, created_at and optional fields failure_count,
last_checked, github_token) and update add_pending_deployment to accept
DeploymentData (or a TypedDict fallback) so construction/validation happens at
the boundary; remove the manual required-fields checks in add_pending_deployment
and adapt any code that mutates or extends the dict to use
model.copy(update=...) or convert to an internal dataclass for immutable state
where needed.
- Around line 321-322: The final return block after
logger.info("deployment_scheduler_all_resolved", repo=deployment.get("repo")) is
unreachable because the earlier guard ensures violations is non-empty and the
loop always appends into either time_based_violations or other_violations which
are returned earlier; remove the dead branch (lines containing that logger.info
+ return True, False) or replace it with a defensive assertion (e.g., assert
False, "unreachable") or mark it with a coverage pragma (# pragma: no cover) so
intent is explicit; update references in deployment_scheduler.py around the
handling of time_based_violations and other_violations to keep control flow
clear.
- Around line 88-102: Replace all f-string log messages in deployment scheduler
with structured logging keyword args: for the missing fields case (variable
missing_fields) change logger.error(f"Missing required fields for deployment
scheduler: {missing_fields}") to logger.error("missing_required_fields",
operation="deployment_scheduler_add", missing_fields=missing_fields) and for the
exception handler at the end change logger.error(f"Error adding deployment to
scheduler: {e}") to logger.error("error_adding_deployment",
operation="deployment_scheduler_add", error=str(e)) ; apply the same pattern to
the other occurrences referenced (lines around where f-strings are used at the
blocks around lines 115, 189, and 433) ensuring you emit the required structured
fields at boundaries: operation, subject_ids (if applicable), decision (if
applicable), and latency_ms (if available) instead of inlining values into the
message text.
- Around line 278-301: The current keyword-based classification in the
violations loop (where violations are split into time_based_violations and
other_violations) is fragile; update the logic to first prefer an explicit
structured flag (e.g., check violation.get("time_based") or
violation.get("type") == "time") and only if absent use a safer heuristic: use
regex with word boundaries and phrase matching for time expressions (e.g.,
r"\b(hour|day|weekend|monday|tuesday|wednesday|thursday|friday|saturday|sunday|business
hours|between \d{1,2}(:\d{2})?\b|\b\d{1,2}:\d{2}\b)\b"), explicitly exclude
common false-positive terms (e.g., "runtime", "first-time", "day-zero") and
prefer matches for numeric time ranges; update the loop around violations, the
classification branch that appends to time_based_violations or other_violations,
to implement this fallback strategy so structured metadata is used when present
and regex-based inference is more precise when needed.

Comment on lines 206 to 215
# Validate required fields
required_fields = ["repo", "environment", "installation_id", "event_data", "rules"]
missing_fields = [field for field in required_fields if not deployment.get(field)]
if missing_fields:
logger.error(f"Missing required fields for deployment re-evaluation: {missing_fields}")
return False
logger.error(
"deployment_scheduler_missing_fields",
repo=deployment.get("repo"),
missing=missing_fields,
)
return False, True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same falsy-value bug as add_pending_deploymentrules: [] will be treated as "missing".

Line 208 uses the same not deployment.get(field) pattern. An empty rules list or empty event_data dict would be treated as a missing field, causing the deployment to be removed from the scheduler with (False, True).

🐛 Proposed fix
-            missing_fields = [field for field in required_fields if not deployment.get(field)]
+            missing_fields = [field for field in required_fields if field not in deployment or deployment[field] is None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 206 - 215, The
current presence check uses deployment.get(field) which treats empty lists/dicts
as missing (e.g., rules: []), so change the missing_fields computation to only
treat a field as missing when the key is absent or its value is explicitly None;
replace the list comprehension that builds missing_fields (which currently uses
deployment.get(field)) with one that checks "field not in deployment or
deployment[field] is None" so empty containers are considered present; update
the code in deployment_scheduler.py where required_fields, missing_fields, and
deployment.get(field) are referenced to use this new existence/None check.

Comment on lines +258 to +267
violations = []
eval_result = result.data.get("evaluation_result") if result.data else None
if eval_result and hasattr(eval_result, "violations"):
for v in eval_result.violations:
violations.append(
{
"rule_description": getattr(v, "rule_description", ""),
"message": getattr(v, "message", ""),
}
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the full context of the _re_evaluate_deployment method
cat -n src/tasks/scheduler/deployment_scheduler.py | sed -n '199,340p'

Repository: warestack/watchflow

Length of output: 7085


🏁 Script executed:

# Find the execute_with_timeout function definition
rg -n "def execute_with_timeout" src/tasks/scheduler/deployment_scheduler.py -A 20

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check if there are any Pydantic BaseModel definitions for agent outputs
rg -n "class.*BaseModel" src/tasks/scheduler/deployment_scheduler.py -A 5

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Check imports at the top of the file
head -50 src/tasks/scheduler/deployment_scheduler.py

Repository: warestack/watchflow

Length of output: 1648


🏁 Script executed:

# Search for execute_with_timeout across the codebase
rg -n "execute_with_timeout" --type=py -A 10 | head -100

Repository: warestack/watchflow

Length of output: 6400


🏁 Script executed:

# Search for engine_agent definition
rg -n "engine_agent" src/tasks/scheduler/deployment_scheduler.py -B 5 -A 2 | head -50

Repository: warestack/watchflow

Length of output: 1119


🏁 Script executed:

# Look for any Pydantic BaseModel definitions in the codebase
rg -n "class.*BaseModel" --type=py | head -30

Repository: warestack/watchflow

Length of output: 1868


🏁 Script executed:

# Check the execute_with_timeout function definition
cat -n src/core/utils/timeout.py

Repository: warestack/watchflow

Length of output: 2497


🏁 Script executed:

# Check the AgentResult BaseModel
cat -n src/agents/base.py | head -50

Repository: warestack/watchflow

Length of output: 1898


🏁 Script executed:

# Check what engine_agent.execute() returns
rg -n "class.*Agent|def execute" src/agents/engine_agent.py -A 3 | head -100

Repository: warestack/watchflow

Length of output: 129


🏁 Script executed:

# Find the engine agent file
fd -t f "engine" src/agents/

Repository: warestack/watchflow

Length of output: 45


🏁 Script executed:

# Look for agent execute methods
rg -n "async def execute" src/agents/ --type=py -A 10 | head -150

Repository: warestack/watchflow

Length of output: 6735


🏁 Script executed:

# Check what get_agent("engine") returns
rg -n "def get_agent" src/agents/ --type=py -A 15

Repository: warestack/watchflow

Length of output: 879


🏁 Script executed:

# Check the full engine_agent execute method
cat -n src/agents/engine_agent/agent.py | sed -n '76,130p'

Repository: warestack/watchflow

Length of output: 2808


🏁 Script executed:

# Check the EngineState and what it contains
rg -n "class EngineState" src/agents/engine_agent/models.py -A 30

Repository: warestack/watchflow

Length of output: 903


🏁 Script executed:

# Check what the engine agent returns in its data
rg -n "return AgentResult" src/agents/engine_agent/agent.py -B 5 -A 2

Repository: warestack/watchflow

Length of output: 1223


🏁 Script executed:

# Search for confidence field in any agent outputs
rg -n "confidence" src/agents/ --type=py

Repository: warestack/watchflow

Length of output: 1597


🏁 Script executed:

# Get the full return statement of engine_agent.execute to see what's in evaluation_result
cat -n src/agents/engine_agent/agent.py | sed -n '129,181p'

Repository: warestack/watchflow

Length of output: 3184


🏁 Script executed:

# Check if violations in EngineState/evaluation_result are already validated
rg -n "violations" src/agents/engine_agent/agent.py -B 2 -A 2 | head -60

Repository: warestack/watchflow

Length of output: 2706


🏁 Script executed:

# Find RuleEvaluationResult and RuleViolation definitions
rg -n "class RuleEvaluationResult|class RuleViolation" src/ --type=py -A 10

Repository: warestack/watchflow

Length of output: 1604


🏁 Script executed:

# Check the full structure of what's returned in AgentResult.data
cat -n src/agents/engine_agent/agent.py | sed -n '154,174p'

Repository: warestack/watchflow

Length of output: 1384


Engine agent output lacks required fields per coding guidelines.

The engine_agent.execute() returns a validated AgentResult(BaseModel), but the output is missing required fields: confidence (0..1), reasoning, recommendations, and strategy_used. Per coding guidelines: "Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used" and "Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5."

Add a confidence field and supporting fields to the RuleEvaluationResult model, and implement confidence gating in the deployment scheduler to reject decisions where confidence < 0.5.

Additionally, lines 321–322 are unreachable dead code (all violations are classified into time_based_violations or other_violations with early returns at lines 310 and 319).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 258 - 267, The
engine agent output (RuleEvaluationResult model returned by
engine_agent.execute) is missing required fields: confidence (0..1), reasoning,
recommendations, and strategy_used; update the RuleEvaluationResult Pydantic
model to add these fields (confidence: float, reasoning: str, recommendations:
List[str], strategy_used: str) and ensure engine_agent.execute populates them;
in deployment_scheduler.py, after obtaining result/data/evaluation_result,
implement a confidence gate that rejects or routes to human-in-the-loop when
evaluation_result.confidence < 0.5 (e.g., set decision to "reject" or call the
human review path) before processing violations; finally remove the unreachable
dead-code that follows the time_based_violations/other_violations early returns
(the code around time_based_violations and other_violations handling which makes
lines after the returns unreachable).

Copy link
Member

@dkargatzis dkargatzis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some critical security and robustness improvements, along with refactoring suggestions to reduce code duplication.

logger.info(f" Repo: {deployment_data['repo']}")
logger.info(f" Installation: {deployment_data['installation_id']}")
logger.info(f" Time-based violations: {len(deployment_data.get('time_based_violations', []))}")
logger.info(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: callback_url is missing from required_fields. If a deployment is added without it, _approve_deployment will silently fail later, causing the deployment to hang indefinitely.

processing_time_ms=processing_time,
error=str(e),
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Warning: While limiting length to 200 chars helps, str(e) can still leak internal paths or traceback snippets. Consider stripping file paths or using a generic error message for public comments.

eval_result = analysis_result.data["evaluation_result"]
if hasattr(eval_result, "violations"):
violations = [Violation.model_validate(v) for v in eval_result.violations]
# Convert RuleViolation objects to dictionaries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor: This manual loop (lines 147-160) to convert violations to dicts is verbose. Since we use Pydantic v2, you can replace this entire block with violations = [v.model_dump(mode='json') for v in eval_result.violations].

) -> None:
try:
result = await self.github_client.review_deployment_protection_rule(
async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor (DRY): _approve_deployment and _reject_deployment share 90% of their logic. Consider unifying them into a single _send_deployment_review(..., state, comment) method to reduce duplication and ensure consistent retry handling.

@dkargatzis
Copy link
Member

In addition to the specific comments above, please verify the open CodeRabbit findings. Several appear valid and unresolved in the current diff, particularly:

  1. The missing callback_url validation in the scheduler (which could cause deployments to hang).
  2. The potential crash if rules contains dictionary objects instead of Pydantic models in _convert_rules_to_new_format.
  3. The mixed use of f-strings vs. structlog keyword arguments (inconsistent with project style).

Please resolve these before merging.

@naaa760
Copy link
Contributor Author

naaa760 commented Feb 23, 2026

@dkargatzis
sir i have applied your feedback: added callback_url to required fields, switched to a generic error message, refactored to _send_deployment_review, used model_dump for violations, and fixed rule conversion and logging. Tests and pre-commit are passing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tasks/scheduler/deployment_scheduler.py (1)

447-474: ⚠️ Potential issue | 🟠 Major

_convert_rules_to_new_format only handles dicts — will crash on Pydantic model rules.

Rules added via add_pending_deployment originate from deployment_protection_rule.py Line 169 as Pydantic Rule objects (from the rule loader). This static method calls .get() (Line 465) on each rule, which will raise AttributeError on Pydantic models.

The processor's version (deployment_protection_rule.py Lines 291–318) correctly handles both dicts and objects. The scheduler should do the same, or share a single implementation.

Proposed fix — handle both dicts and model objects
     `@staticmethod`
     def _convert_rules_to_new_format(rules: list[dict[str, Any]]) -> list[dict[str, Any]]:
         if not rules:
             return []

-        # Check if conversion is needed by inspecting the first rule
-        first_rule = rules[0]
-        if "rule_description" in first_rule and "event_types" not in first_rule:
-            # This looks like the old format
-            logger.info("Converting old rule format to new format")
-            converted_rules = []
-            for rule in rules:
-                converted_rules.append(
-                    {
-                        "description": rule.get("rule_description", ""),
-                        "severity": rule.get("severity", "medium"),
-                        "event_types": rule.get("event_types", ["deployment"]),
-                        "parameters": rule.get("parameters", {}),
-                    }
-                )
-            return converted_rules
-
-        # Already in new format
-        return rules
+        converted_rules = []
+        for rule in rules:
+            if isinstance(rule, dict):
+                converted_rules.append({
+                    "description": rule.get("description", rule.get("rule_description", "")),
+                    "enabled": rule.get("enabled", True),
+                    "severity": rule.get("severity", "medium"),
+                    "event_types": rule.get("event_types", ["deployment"]),
+                    "parameters": rule.get("parameters", {}),
+                })
+            else:
+                converted_rules.append({
+                    "description": getattr(rule, "description", ""),
+                    "enabled": getattr(rule, "enabled", True),
+                    "severity": (
+                        rule.severity.value if hasattr(rule.severity, "value")
+                        else getattr(rule, "severity", "medium")
+                    ),
+                    "event_types": [
+                        et.value if hasattr(et, "value") else et
+                        for et in getattr(rule, "event_types", [])
+                    ],
+                    "parameters": getattr(rule, "parameters", {}) or {},
+                })
+        return converted_rules
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 447 - 474, The
_convert_rules_to_new_format static method currently assumes each rule is a dict
and calls .get(), which raises AttributeError for Pydantic Rule model instances;
update the method to handle both dict and model by normalizing each rule before
accessing fields (either call rule.dict()/model_dump() when hasattr(rule,
"dict"/"model_dump") or use getattr(rule, "field", default) / dict.get(...)
fallback), then build the converted dict using normalized keys
("rule_description" -> "description", "severity", "event_types", "parameters")
as in the processor implementation (see deployment_protection_rule.py handling)
so the scheduler accepts both dicts and Pydantic Rule objects.
♻️ Duplicate comments (5)
src/tasks/scheduler/deployment_scheduler.py (5)

313-323: Lines 322–323 are unreachable dead code.

Every element in violations is classified into either time_based_violations or other_violations (Lines 279–302). If violations is non-empty (Line 270 guard), at least one of these lists must be non-empty, so the if other_violations (Line 304) or if time_based_violations (Line 314) branches will always return first.

Proposed fix — remove unreachable path
             deployment["failure_count"] = 0
             if time_based_violations:
                 logger.info(
                     "deployment_scheduler_time_violations",
                     repo=deployment.get("repo"),
                     count=len(time_based_violations),
                 )
                 return False, False
-
-            logger.info("deployment_scheduler_all_resolved", repo=deployment.get("repo"))
-            return True, False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 313 - 323, The code
contains unreachable dead code: after classifying violations into
time_based_violations and other_violations, both branches (the if
other_violations and if time_based_violations) return, making the final return
path unreachable; remove the redundant return path by deleting the always-dead
branch (the else/return that follows these conditionals) and keep a single clear
return for the resolved case (ensure the logger call
"deployment_scheduler_all_resolved" and the return True, False remain reachable
when both lists are empty), updating any references to
deployment["failure_count"] or logging ("deployment_scheduler_time_violations")
if needed to preserve existing behavior.

343-396: _approve_deployment returns None in all paths — caller cannot distinguish success from failure.

This is the root cause of the issue flagged at Lines 175–181. The method should return a bool so the caller can decide whether to remove the deployment from the scheduler.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 343 - 396, The
_approve_deployment function currently returns None on all paths so callers
can't tell success vs failure; change its signature/behavior to return a bool:
return True after successful approval (after the retry_async and logger.info in
_approve_deployment), return False when skipping due to missing callback_url or
installation_id (where logger.error is called), and return False on exception
(in the except block where logger.error is called). Ensure the return values are
set around the existing symbols: _approve_deployment,
github_client.review_deployment_protection_rule, retry_async, and the
logger.error/logger.info branches so the caller can decide whether to remove the
deployment from the scheduler.

175-181: ⚠️ Potential issue | 🟠 Major

Deployment removed from scheduler even when GitHub approval API call fails.

_approve_deployment (Lines 343–396) still returns None in all paths — including when the API call fails after retries. Line 179 unconditionally appends to deployments_to_remove, so a failed approval silently drops the deployment from the scheduler while GitHub still shows it as "waiting."

Proposed fix — gate removal on confirmed approval
-    async def _approve_deployment(self, deployment: dict[str, Any]) -> None:
+    async def _approve_deployment(self, deployment: dict[str, Any]) -> bool:
         ...
         try:
             await retry_async(...)
             logger.info("deployment_scheduler_approved", ...)
+            return True
         except Exception as e:
             logger.error("deployment_approve_error", ...)
+            return False
             if should_approve:
-                await self._approve_deployment(deployment)
-                deployments_to_remove.append(i)
+                approved = await self._approve_deployment(deployment)
+                if approved:
+                    deployments_to_remove.append(i)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 175 - 181, The
current logic appends an index to deployments_to_remove whenever should_approve
is True without confirming the GitHub approval succeeded, because
_approve_deployment currently returns None on failures; change
_approve_deployment to return a boolean success flag (True on confirmed
approval, False on permanent failure) and update the caller in
deployment_scheduler.py so that after calling await
self._approve_deployment(deployment) you only append i to deployments_to_remove
when the call returns True; keep the existing behavior for should_remove. Ensure
references: _re_evaluate_deployment, _approve_deployment, should_approve,
should_remove, and deployments_to_remove are updated accordingly.

207-216: Same falsy-value bug — not deployment.get(field) rejects empty containers.

rules: [] or event_data: {} would cause this check to treat the field as missing, returning (False, True) and removing the deployment.

Proposed fix
-            missing_fields = [field for field in required_fields if not deployment.get(field)]
+            missing_fields = [field for field in required_fields if field not in deployment or deployment[field] is None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 207 - 216, The
current presence check treats empty containers as missing because it uses "not
deployment.get(field)"; change the logic to only consider a field missing when
it is absent or explicitly None. Update the missing_fields computation in
deployment_scheduler (where required_fields, missing_fields and
deployment.get(field) are used) to detect missing keys with something like
"field not in deployment or deployment[field] is None" so that rules: [] and
event_data: {} are accepted while still flagging truly absent/None fields.

89-92: ⚠️ Potential issue | 🟠 Major

not deployment_data.get(field) treats empty lists/dicts as missing — rules: [] will be rejected.

Line 89 uses truthiness to check for presence, so any field with a falsy but valid value (empty list [], empty dict {}, 0) is treated as missing. This is the same bug previously flagged at Line 209.

Proposed fix
-            missing_fields = [field for field in required_fields if not deployment_data.get(field)]
+            missing_fields = [field for field in required_fields if field not in deployment_data or deployment_data[field] is None]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tasks/scheduler/deployment_scheduler.py` around lines 89 - 92, The
current missing_fields comprehension treats falsy but valid values (e.g., [] or
{}) as missing; update the check to only consider a field missing if it is
absent or explicitly None by replacing the condition with a presence/None check
(e.g., use "deployment_data.get(field) is None" or "field not in deployment_data
or deployment_data[field] is None") when building missing_fields for
required_fields so empty lists/dicts and 0 are accepted; keep the
logger.error("deployment_scheduler_missing_fields", missing=missing_fields) path
unchanged.
🧹 Nitpick comments (4)
src/event_processors/deployment_protection_rule.py (4)

14-14: AGENT_TIMEOUT_SECONDS is duplicated in both files.

This constant is defined identically here and in deployment_scheduler.py (Line 18). Extract it to a shared location (e.g., src/core/constants.py or alongside execute_with_timeout in src/core/utils/timeout.py where it's already the default) to avoid drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` at line 14,
AGENT_TIMEOUT_SECONDS is duplicated; remove the local definition from
deployment_protection_rule.py and deployment_scheduler.py and centralize it by
exporting the single source of truth where execute_with_timeout already defines
the default (create or use the shared AGENT_TIMEOUT_SECONDS constant there),
then import that constant into both modules and update references to use the
imported symbol AGENT_TIMEOUT_SECONDS.

320-331: Missing type annotations on _format_violations_comment and _get_rule_provider.

Lines 321 and 338 omit return type and parameter annotations, which is inconsistent with the rest of the file.

Proposed fix
     `@staticmethod`
-    def _format_violations_comment(violations):
+    def _format_violations_comment(violations: list[dict[str, Any]]) -> str:
-    def _get_rule_provider(self):
+    def _get_rule_provider(self) -> Any:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 320 - 331,
Add explicit type annotations for the two functions missing them: annotate
_format_violations_comment to accept violations: List[Dict[str, Any]] and return
str, and annotate _get_rule_provider to declare its parameter (e.g., rule: str)
and a return type (e.g., Optional[Callable[..., Any]] or Optional[str] to match
existing usages). Import required typing symbols (List, Dict, Any, Optional,
Callable) at the top if not already present and ensure the signatures for
_format_violations_comment and _get_rule_provider match the rest of the file's
typing style.

51-63: No early return when callback is invalid — processing continues but all API calls are skipped.

When can_call_callback is False, the processor runs the entire agent pipeline (rule loading, LLM execution) but can never act on the result. Consider returning early with an error to save resources and make the behavior clearer.

Proposed fix
             can_call_callback = self._is_valid_callback_url(deployment_callback_url) and self._is_valid_environment(
                 environment
             )
             if not can_call_callback:
                 logger.warning(
                     "deployment_status_skipped",
-                    extra={
-                        "operation": "deployment_protection_rule",
-                        "deployment_id": deployment_id,
-                        "environment": environment,
-                        "reason": "invalid or missing callback_url or environment",
-                    },
+                    extra={...},
                 )
+                return ProcessingResult(
+                    success=False,
+                    violations=[],
+                    api_calls_made=0,
+                    processing_time_ms=int((time.time() - start_time) * 1000),
+                    error="Invalid or missing callback_url or environment",
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 51 - 63, The
code currently sets can_call_callback using _is_valid_callback_url and
_is_valid_environment then logs a warning but continues processing; change this
to return early (e.g., return an error/skip result or raise a specific
exception) immediately after the logger.warning when can_call_callback is False
to avoid running the agent pipeline unnecessarily. Locate the block that
computes can_call_callback (using _is_valid_callback_url and
_is_valid_environment) and after the logger.warning for
"deployment_status_skipped" return a clear failure/skip value (or raise) that
includes deployment_id and environment so callers can short-circuit further rule
loading and LLM execution. Ensure the rest of the method (the agent pipeline
calls) only runs when can_call_callback is True.

1-16: Uses stdlib logging instead of structlog — inconsistent with the scheduler and project conventions.

deployment_scheduler.py uses structlog with keyword arguments, while this file uses logging with extra={} dicts and %-style formatting (e.g., Lines 116, 149). The reviewer (dkargatzis) also flagged this: consistently use structlog keyword arguments to align with project style.

Proposed fix — switch to structlog
-import logging
 import time
 from typing import Any

+import structlog
+
 from src.agents import get_agent
 from src.core.utils.retry import retry_async
 ...

-logger = logging.getLogger(__name__)
+logger = structlog.get_logger(__name__)

Then replace extra={...} patterns with structlog keyword args, e.g.:

-            logger.info(
-                "deployment_processing_start",
-                extra={
-                    "operation": "deployment_protection_rule",
-                    "deployment_id": deployment_id,
-                    ...
-                },
-            )
+            logger.info(
+                "deployment_processing_start",
+                operation="deployment_protection_rule",
+                deployment_id=deployment_id,
+                ...
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 1 - 16, This
file uses the stdlib logging API (import logging and logger =
logging.getLogger(...)) and percent-style/extra={} logging calls; replace with
structlog to match project conventions: import structlog and set logger =
structlog.get_logger(__name__), then update all logging calls in this module
(e.g., the logger usage around the deployment protection processing at the
top-level and the calls flagged in the review) to use structlog keyword-argument
style instead of extra={} or % formatting (e.g., logger.info("msg", key=value) /
logger.error("msg", key=value) and use logger.exception(...) or
logger.error(..., exc_info=err) as appropriate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 30-36: The callback URL validator _is_valid_callback_url currently
accepts any URL starting with "http" and must be tightened to require "https://"
specifically; update _is_valid_callback_url to verify url is a non-empty str and
that url.strip().lower().startswith("https://") (or use a URL parser to confirm
scheme == "https") while leaving _is_valid_environment unchanged; ensure the
change still handles None/empty inputs the same way.
- Around line 252-280: The retry is ineffective because
github_client.review_deployment_protection_rule swallows errors and returns
None, so _do_send never raises and retry_async won’t retry; modify _do_send (the
inner async function) to call review_deployment_protection_rule, check the
returned value, and raise a clear exception (e.g., RuntimeError or a custom
exception including environment/state info) when the result is None so
retry_async can perform retries; keep the outer try/except and logging
(logger.error "deployment_review_failed"/"deployment_review_error") but ensure
failures that should be retried are raised from _do_send before the retry_async
call completes.

In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 367-383: The retry is ineffective because
github_client.review_deployment_protection_rule swallows exceptions and returns
None, so _do_approve never raises; update _do_approve to inspect the result of
github_client.review_deployment_protection_rule and raise a transient exception
when it returns None (or an unexpected value) so retry_async can retry,
referencing the _do_approve wrapper, retry_async call, and
github_client.review_deployment_protection_rule; ensure the raised exception
type is included in the retry_async exceptions tuple.

---

Outside diff comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 447-474: The _convert_rules_to_new_format static method currently
assumes each rule is a dict and calls .get(), which raises AttributeError for
Pydantic Rule model instances; update the method to handle both dict and model
by normalizing each rule before accessing fields (either call
rule.dict()/model_dump() when hasattr(rule, "dict"/"model_dump") or use
getattr(rule, "field", default) / dict.get(...) fallback), then build the
converted dict using normalized keys ("rule_description" -> "description",
"severity", "event_types", "parameters") as in the processor implementation (see
deployment_protection_rule.py handling) so the scheduler accepts both dicts and
Pydantic Rule objects.

---

Duplicate comments:
In `@src/tasks/scheduler/deployment_scheduler.py`:
- Around line 313-323: The code contains unreachable dead code: after
classifying violations into time_based_violations and other_violations, both
branches (the if other_violations and if time_based_violations) return, making
the final return path unreachable; remove the redundant return path by deleting
the always-dead branch (the else/return that follows these conditionals) and
keep a single clear return for the resolved case (ensure the logger call
"deployment_scheduler_all_resolved" and the return True, False remain reachable
when both lists are empty), updating any references to
deployment["failure_count"] or logging ("deployment_scheduler_time_violations")
if needed to preserve existing behavior.
- Around line 343-396: The _approve_deployment function currently returns None
on all paths so callers can't tell success vs failure; change its
signature/behavior to return a bool: return True after successful approval
(after the retry_async and logger.info in _approve_deployment), return False
when skipping due to missing callback_url or installation_id (where logger.error
is called), and return False on exception (in the except block where
logger.error is called). Ensure the return values are set around the existing
symbols: _approve_deployment, github_client.review_deployment_protection_rule,
retry_async, and the logger.error/logger.info branches so the caller can decide
whether to remove the deployment from the scheduler.
- Around line 175-181: The current logic appends an index to
deployments_to_remove whenever should_approve is True without confirming the
GitHub approval succeeded, because _approve_deployment currently returns None on
failures; change _approve_deployment to return a boolean success flag (True on
confirmed approval, False on permanent failure) and update the caller in
deployment_scheduler.py so that after calling await
self._approve_deployment(deployment) you only append i to deployments_to_remove
when the call returns True; keep the existing behavior for should_remove. Ensure
references: _re_evaluate_deployment, _approve_deployment, should_approve,
should_remove, and deployments_to_remove are updated accordingly.
- Around line 207-216: The current presence check treats empty containers as
missing because it uses "not deployment.get(field)"; change the logic to only
consider a field missing when it is absent or explicitly None. Update the
missing_fields computation in deployment_scheduler (where required_fields,
missing_fields and deployment.get(field) are used) to detect missing keys with
something like "field not in deployment or deployment[field] is None" so that
rules: [] and event_data: {} are accepted while still flagging truly absent/None
fields.
- Around line 89-92: The current missing_fields comprehension treats falsy but
valid values (e.g., [] or {}) as missing; update the check to only consider a
field missing if it is absent or explicitly None by replacing the condition with
a presence/None check (e.g., use "deployment_data.get(field) is None" or "field
not in deployment_data or deployment_data[field] is None") when building
missing_fields for required_fields so empty lists/dicts and 0 are accepted; keep
the logger.error("deployment_scheduler_missing_fields", missing=missing_fields)
path unchanged.

---

Nitpick comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Line 14: AGENT_TIMEOUT_SECONDS is duplicated; remove the local definition from
deployment_protection_rule.py and deployment_scheduler.py and centralize it by
exporting the single source of truth where execute_with_timeout already defines
the default (create or use the shared AGENT_TIMEOUT_SECONDS constant there),
then import that constant into both modules and update references to use the
imported symbol AGENT_TIMEOUT_SECONDS.
- Around line 320-331: Add explicit type annotations for the two functions
missing them: annotate _format_violations_comment to accept violations:
List[Dict[str, Any]] and return str, and annotate _get_rule_provider to declare
its parameter (e.g., rule: str) and a return type (e.g., Optional[Callable[...,
Any]] or Optional[str] to match existing usages). Import required typing symbols
(List, Dict, Any, Optional, Callable) at the top if not already present and
ensure the signatures for _format_violations_comment and _get_rule_provider
match the rest of the file's typing style.
- Around line 51-63: The code currently sets can_call_callback using
_is_valid_callback_url and _is_valid_environment then logs a warning but
continues processing; change this to return early (e.g., return an error/skip
result or raise a specific exception) immediately after the logger.warning when
can_call_callback is False to avoid running the agent pipeline unnecessarily.
Locate the block that computes can_call_callback (using _is_valid_callback_url
and _is_valid_environment) and after the logger.warning for
"deployment_status_skipped" return a clear failure/skip value (or raise) that
includes deployment_id and environment so callers can short-circuit further rule
loading and LLM execution. Ensure the rest of the method (the agent pipeline
calls) only runs when can_call_callback is True.
- Around line 1-16: This file uses the stdlib logging API (import logging and
logger = logging.getLogger(...)) and percent-style/extra={} logging calls;
replace with structlog to match project conventions: import structlog and set
logger = structlog.get_logger(__name__), then update all logging calls in this
module (e.g., the logger usage around the deployment protection processing at
the top-level and the calls flagged in the review) to use structlog
keyword-argument style instead of extra={} or % formatting (e.g.,
logger.info("msg", key=value) / logger.error("msg", key=value) and use
logger.exception(...) or logger.error(..., exc_info=err) as appropriate).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cabf88 and 514c0ec.

📒 Files selected for processing (3)
  • src/event_processors/deployment_protection_rule.py
  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/event_processors/test_deployment_protection_rule.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/tasks/scheduler/deployment_scheduler.py
  • src/event_processors/deployment_protection_rule.py
🧠 Learnings (5)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Reject unvalidated agent outputs and missing confidence gating

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Agent outputs must include: `decision`, `confidence (0..1)`, short `reasoning`, `recommendations`, `strategy_used`

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Fail closed for risky decisions; provide actionable remediation in error paths

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
  • src/event_processors/deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : All agent outputs and external payloads must use validated `BaseModel` from Pydantic

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
🧬 Code graph analysis (1)
src/event_processors/deployment_protection_rule.py (7)
src/core/utils/retry.py (1)
  • retry_async (81-135)
src/core/utils/timeout.py (1)
  • execute_with_timeout (15-45)
src/event_processors/base.py (3)
  • BaseEventProcessor (26-61)
  • ProcessingResult (16-23)
  • _get_rule_provider (53-55)
src/tasks/task_queue.py (3)
  • Task (19-47)
  • repo_full_name (33-38)
  • installation_id (41-47)
src/core/models.py (1)
  • repo_full_name (137-139)
src/rules/loaders/github_loader.py (1)
  • get_rules (37-77)
src/integrations/github/api.py (1)
  • review_deployment_protection_rule (628-663)
🔇 Additional comments (7)
src/tasks/scheduler/deployment_scheduler.py (3)

22-37: LGTM — lazy-loading pattern for engine agent.

Clean lazy initialization that avoids API key validation at import time.


134-170: LGTM — created_at handling with UTC-aware timestamps and expiry logic.

The isinstance(created_at, int | float) syntax, UTC-aware conversion, and 7-day expiry are correctly implemented. Good improvement over the previous version.


224-257: Token refresh and agent execution with timeout look solid.

Good use of execute_with_timeout and the failure-count tracking for token errors. One note: the TimeoutError raised by execute_with_timeout will be caught by the outer except Exception at Line 325 and correctly tracked via failure_count.

src/event_processors/deployment_protection_rule.py (4)

141-151: Good use of model_dump for violation serialization — addresses past review feedback.

Using v.model_dump(mode="json") if hasattr(v, "model_dump") else v is a clean, defensive approach that correctly handles both Pydantic models and plain dicts.


199-234: Fallback approval on exception is a key improvement — directly addresses the PR objective.

The sanitized fallback_comment (Line 216) no longer leaks str(e), addressing the prior review feedback. The flow correctly validates callback_url and environment before attempting the fallback. Based on learnings, this aligns with the "fail closed for risky decisions; provide actionable remediation in error paths" guideline — though note that approving on failure is "fail open," which is the deliberate trade-off here to prevent indefinite blocking.


282-289: Good DRY refactor — _approve_deployment and _reject_deployment now route through _send_deployment_review.

This directly addresses the prior reviewer feedback about 90% shared logic.


291-318: LGTM — _convert_rules_to_new_format handles both dicts and Pydantic models.

Correctly uses isinstance(rule, dict) to branch between .get() and getattr() approaches, with proper guards for enum .value access and conditions fallback.

Comment on lines +30 to +36
@staticmethod
def _is_valid_callback_url(url: str | None) -> bool:
return bool(url and isinstance(url, str) and url.strip().startswith("http"))

@staticmethod
def _is_valid_environment(env: str | None) -> bool:
return bool(env and isinstance(env, str) and env.strip())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Callback URL validator allows plain http:// — GitHub callbacks should always be HTTPS.

url.strip().startswith("http") matches both http:// and https://. For security, GitHub deployment callback URLs should use HTTPS.

Proposed fix
     `@staticmethod`
     def _is_valid_callback_url(url: str | None) -> bool:
-        return bool(url and isinstance(url, str) and url.strip().startswith("http"))
+        return bool(url and isinstance(url, str) and url.strip().startswith("https://"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 30 - 36, The
callback URL validator _is_valid_callback_url currently accepts any URL starting
with "http" and must be tightened to require "https://" specifically; update
_is_valid_callback_url to verify url is a non-empty str and that
url.strip().lower().startswith("https://") (or use a URL parser to confirm
scheme == "https") while leaving _is_valid_environment unchanged; ensure the
change still handles None/empty inputs the same way.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/event_processors/deployment_protection_rule.py (1)

30-36: ⚠️ Potential issue | 🟡 Minor

Callback URL validator still allows plain http:// — GitHub callbacks should use HTTPS.

The validator url.strip().startswith("http") matches both http:// and https://. This was flagged in a previous review. For security, GitHub deployment callback URLs should require HTTPS to prevent man-in-the-middle attacks.

Proposed fix
     `@staticmethod`
     def _is_valid_callback_url(url: str | None) -> bool:
-        return bool(url and isinstance(url, str) and url.strip().startswith("http"))
+        return bool(url and isinstance(url, str) and url.strip().startswith("https://"))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 30 - 36, The
callback URL validator _is_valid_callback_url currently permits both http and
https; update it to require HTTPS only by checking url is a non-empty string and
stripped lowercased value starts with "https://". Locate the
_is_valid_callback_url static method and replace the startswith("http") check
with a check like url.strip().lower().startswith("https://"), keeping the
existing None/type guards; no other behavior should change.
🧹 Nitpick comments (1)
tests/unit/event_processors/test_deployment_protection_rule.py (1)

159-170: Consider asserting exact retry count.

The test verifies call_count >= 1, but since retry_async is configured with max_retries=3, asserting call_count == 3 would verify the retry behavior more precisely.

Proposed fix
     assert result.success is False
     assert result.error == "agent failed"
-    assert processor.github_client.review_deployment_protection_rule.call_count >= 1
+    assert processor.github_client.review_deployment_protection_rule.call_count == 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/event_processors/test_deployment_protection_rule.py` around lines
159 - 170, The test currently asserts
processor.github_client.review_deployment_protection_rule.call_count >= 1 which
doesn't verify retry behavior; change that assertion in
test_retry_exhaustion_returns_failure so it asserts the exact retry count
matching the retry_async configuration (assert
processor.github_client.review_deployment_protection_rule.call_count == 3) to
ensure the code retried the expected number of times when mock_agent.execute
raises RuntimeError in process and review_deployment_protection_rule returned
None.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 281-288: Both _approve_deployment and _reject_deployment currently
call _send_deployment_review and ignore its return value, so failures after
retries are swallowed; change both methods to return the result of
_send_deployment_review (i.e., return await self._send_deployment_review(...))
instead of awaiting without returning, and ensure callers (and the exception
handler path that may perform fallback approval) check the returned value and
propagate or raise on a falsy/failed result so the deployment state isn't left
waiting silently.

In `@tests/unit/event_processors/test_deployment_protection_rule.py`:
- Around line 1-14: The test file was reformatted by ruff-format; update the
repository by replacing the existing
tests/unit/event_processors/test_deployment_protection_rule.py with the
ruff-formatted version and commit those changes so the pre-commit formatting is
preserved; ensure the reformatted file still imports the same symbols
(AGENT_TIMEOUT_SECONDS, DeploymentProtectionRuleProcessor, EventType, Rule,
RuleSeverity, Task) and run the test/format hooks locally before pushing.
- Around line 125-130: The test currently expects
DeploymentProtectionRuleProcessor._is_valid_callback_url("http://localhost/callback")
to be True, which conflicts with the security change to require HTTPS; update
the test to reflect the new policy by changing the assertion to expect False for
"http://localhost/callback" (or, if you need to allow HTTP for local dev, add a
brief comment and a separate test case that documents the exception and uses an
explicit flag or config to permit non-HTTPS) so the behavior of
_is_valid_callback_url aligns with the HTTPS-only requirement.

---

Duplicate comments:
In `@src/event_processors/deployment_protection_rule.py`:
- Around line 30-36: The callback URL validator _is_valid_callback_url currently
permits both http and https; update it to require HTTPS only by checking url is
a non-empty string and stripped lowercased value starts with "https://". Locate
the _is_valid_callback_url static method and replace the startswith("http")
check with a check like url.strip().lower().startswith("https://"), keeping the
existing None/type guards; no other behavior should change.

---

Nitpick comments:
In `@tests/unit/event_processors/test_deployment_protection_rule.py`:
- Around line 159-170: The test currently asserts
processor.github_client.review_deployment_protection_rule.call_count >= 1 which
doesn't verify retry behavior; change that assertion in
test_retry_exhaustion_returns_failure so it asserts the exact retry count
matching the retry_async configuration (assert
processor.github_client.review_deployment_protection_rule.call_count == 3) to
ensure the code retried the expected number of times when mock_agent.execute
raises RuntimeError in process and review_deployment_protection_rule returned
None.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 514c0ec and 5f50c21.

📒 Files selected for processing (3)
  • src/event_processors/deployment_protection_rule.py
  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

**/*.py: Use modern typing only: dict[str, Any], list[str], str | None (no Dict, List, Optional)
GitHub/HTTP/DB calls must be async def; avoid blocking calls (time.sleep, sync HTTP) in async paths
All agent outputs and external payloads must use validated BaseModel from Pydantic
Use dataclasses for internal immutable state where appropriate
Use structured logging at boundaries with fields: operation, subject_ids, decision, latency_ms
Implement Agent pattern: single-responsibility agents with typed inputs/outputs
Use Decorator pattern for retries, metrics, caching as cross-cutting concerns
Agent outputs must include: decision, confidence (0..1), short reasoning, recommendations, strategy_used
Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5
Use minimal, step-driven prompts; provide Chain-of-Thought only for complexity > 0.7 or ambiguity > 0.6
Strip secrets/PII from agent prompts; scope tools; keep raw reasoning out of logs (store summaries only)
Cache idempotent lookups; lazy-import heavy dependencies; bound fan-out with asyncio.Semaphore
Avoid redundant LLM calls; memoize per event when safe
Use domain errors (e.g., AgentError) with error_type, message, context, timestamp, retry_count
Use exponential backoff for transient failures; circuit-break noisy integrations when needed
Fail closed for risky decisions; provide actionable remediation in error paths
Validate all external inputs; verify webhook signatures
Implement prompt-injection hardening; sanitize repository content passed to LLMs
Performance targets: Static validation ~<100ms typical, hybrid decisions sub-second when cache warm, budget LLM paths thoughtfully
Reject old typing syntax (Dict, List, Optional) in code review
Reject blocking calls in async code; reject bare except: clauses; reject swallowed errors
Reject LLM calls for trivial/deterministic checks
Reject unvalidated agent outputs and missing confidenc...

Files:

  • src/tasks/scheduler/deployment_scheduler.py
  • tests/unit/event_processors/test_deployment_protection_rule.py
  • src/event_processors/deployment_protection_rule.py
tests/**/*.py

📄 CodeRabbit inference engine (.cursor/rules/guidelines.mdc)

tests/**/*.py: Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths
Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence
Use pytest.mark.asyncio for async tests; avoid live network calls; freeze time and seed randomness
Write regression tests for every bug fix; keep CI coverage thresholds green

Files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
🧠 Learnings (8)
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Reject unvalidated agent outputs and missing confidence gating

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Agent outputs must include: `decision`, `confidence (0..1)`, short `reasoning`, `recommendations`, `strategy_used`

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Implement confidence policy: reject or route to human-in-the-loop when confidence < 0.5

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : Fail closed for risky decisions; provide actionable remediation in error paths

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
  • src/event_processors/deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to **/*.py : All agent outputs and external payloads must use validated `BaseModel` from Pydantic

Applied to files:

  • src/tasks/scheduler/deployment_scheduler.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write unit tests for deterministic rule evaluation (pass/warn/block), model validation, and error paths

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write integration tests for webhook parsing, idempotency, multi-agent coordination, and state persistence

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
📚 Learning: 2026-01-31T19:35:22.504Z
Learnt from: CR
Repo: warestack/watchflow PR: 0
File: .cursor/rules/guidelines.mdc:0-0
Timestamp: 2026-01-31T19:35:22.504Z
Learning: Applies to tests/**/*.py : Write regression tests for every bug fix; keep CI coverage thresholds green

Applied to files:

  • tests/unit/event_processors/test_deployment_protection_rule.py
🧬 Code graph analysis (2)
src/tasks/scheduler/deployment_scheduler.py (4)
src/core/utils/retry.py (1)
  • retry_async (81-135)
src/core/utils/timeout.py (1)
  • execute_with_timeout (15-45)
src/event_processors/deployment_protection_rule.py (1)
  • _convert_rules_to_new_format (290-317)
src/integrations/github/api.py (1)
  • get_installation_access_token (98-128)
tests/unit/event_processors/test_deployment_protection_rule.py (1)
src/event_processors/deployment_protection_rule.py (4)
  • DeploymentProtectionRuleProcessor (17-340)
  • process (38-234)
  • _is_valid_callback_url (31-32)
  • _is_valid_environment (35-36)
🪛 GitHub Actions: Run pre-commit hooks
tests/unit/event_processors/test_deployment_protection_rule.py

[error] 1-1: ruff-format reformatted 1 file. 1 file left unchanged. Changes were applied by the pre-commit hook.

🔇 Additional comments (21)
src/event_processors/deployment_protection_rule.py (6)

51-73: LGTM!

The validation pattern correctly computes can_call_callback early and uses it to gate API calls throughout the method. Logging the warning while continuing processing allows rule evaluation to complete even when the callback cannot be invoked.


131-148: LGTM!

The timeout-wrapped agent execution and concise model_dump(mode="json") conversion address the PR objectives and past review feedback effectively.


159-180: LGTM!

The time-based violation handling correctly schedules deployments for re-evaluation while rejecting them in the current cycle. This ensures deployments aren't left waiting indefinitely.


199-234: LGTM!

The exception handling implements the PR objective of fallback approval to prevent indefinite blocking. The generic error message in fallback_comment (Line 216) addresses the previous security concern about leaking internal details.


290-317: LGTM!

The rule conversion handles both dict and Pydantic model representations defensively, addressing backward compatibility concerns.


319-329: LGTM!

The violations comment formatting is clear and handles missing fields gracefully with .get() fallbacks.

tests/unit/event_processors/test_deployment_protection_rule.py (6)

16-56: LGTM!

The test fixtures are well-structured with proper mocking of external dependencies. Using a real Rule instance with EventType.DEPLOYMENT ensures tests exercise actual rule filtering logic.


59-92: LGTM!

These tests effectively verify the fallback approval behavior on exceptions, addressing the core PR objective. The rule setup (Line 61, 86) ensures tests reach the agent execution path.


95-122: LGTM!

This test correctly verifies that an invalid callback URL prevents API calls even when rules exist and violations are found, addressing the past review feedback.


139-156: LGTM!

This test verifies the timeout handling behavior, ensuring fallback approval is triggered when the agent exceeds AGENT_TIMEOUT_SECONDS.


173-201: LGTM!

This test effectively verifies that time-based violations are forwarded to the deployment scheduler with the correct metadata, addressing the test coverage gap identified in past reviews.


204-226: LGTM!

This test verifies the rejection flow when violations are present. The assertion correctly checks the API call state and comment content.

src/tasks/scheduler/deployment_scheduler.py (9)

18-19: LGTM!

The module-level constants AGENT_TIMEOUT_SECONDS and MAX_CONSECUTIVE_FAILURES provide clear configuration for timeout and failure handling, addressing the PR objectives.


87-92: LGTM!

Adding callback_url to required fields addresses the critical review comment. The validation at Line 89 correctly checks for key presence rather than falsy values.


207-223: LGTM!

The validation now correctly checks for key presence (f not in deployment) rather than falsy values, addressing the past review concern about empty lists/dicts being treated as missing.


175-194: LGTM!

The approval flow now correctly handles API failures by only removing deployments from the scheduler when _approve_deployment returns True. This addresses the past review concern about deployments being silently dropped.


266-280: Missing confidence gating per coding guidelines.

Per coding guidelines, agent outputs should include a confidence (0..1) field, and decisions with confidence < 0.5 should be rejected or routed to human-in-the-loop. The current implementation processes violations without validating confidence, which could lead to unreliable automated decisions.

This was flagged in a past review but appears unaddressed. Consider this a recommended improvement that could be deferred if agent output changes require broader refactoring.


328-344: LGTM!

The exception handling now tracks consecutive failures and removes deployments after MAX_CONSECUTIVE_FAILURES, preventing the "stuck in waiting" state that this PR aims to fix.


346-404: LGTM!

The _approve_deployment method correctly raises on None result to enable retries, returns a boolean to indicate success, and includes proper validation and logging. This addresses past review concerns about retry ineffectiveness and silent failure.


406-443: LGTM!

The get_status method now uses UTC-aware timestamps and modern int | float union syntax, addressing past review concerns about naive datetimes and the UP038 linting rule.


455-482: LGTM!

The rule conversion provides backward compatibility for the scheduler's stored deployment data format.

Comment on lines 281 to 288
async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int):
await self._send_deployment_review(callback_url, environment, "approved", comment, installation_id)

async def _reject_deployment(
self, callback_url: str, environment: str, violations: list[dict[str, Any]], installation_id: int
):
comment_text = self._format_violations_comment(violations)
await self._send_deployment_review(callback_url, environment, "rejected", comment_text, installation_id)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

_approve_deployment and _reject_deployment ignore failure result.

Both methods call _send_deployment_review but don't use its return value. If the review API call fails after retries, callers won't know. In the exception handler (Lines 217-219), the fallback approval could silently fail, leaving the deployment in "waiting" state — the exact problem this PR aims to fix.

Proposed fix — propagate failure to callers
-    async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int):
-        await self._send_deployment_review(callback_url, environment, "approved", comment, installation_id)
+    async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int) -> bool:
+        return await self._send_deployment_review(callback_url, environment, "approved", comment, installation_id)

-    async def _reject_deployment(
+    async def _reject_deployment(
         self, callback_url: str, environment: str, violations: list[dict[str, Any]], installation_id: int
-    ):
+    ) -> bool:
         comment_text = self._format_violations_comment(violations)
-        await self._send_deployment_review(callback_url, environment, "rejected", comment_text, installation_id)
+        return await self._send_deployment_review(callback_url, environment, "rejected", comment_text, installation_id)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int):
await self._send_deployment_review(callback_url, environment, "approved", comment, installation_id)
async def _reject_deployment(
self, callback_url: str, environment: str, violations: list[dict[str, Any]], installation_id: int
):
comment_text = self._format_violations_comment(violations)
await self._send_deployment_review(callback_url, environment, "rejected", comment_text, installation_id)
async def _approve_deployment(self, callback_url: str, environment: str, comment: str, installation_id: int) -> bool:
return await self._send_deployment_review(callback_url, environment, "approved", comment, installation_id)
async def _reject_deployment(
self, callback_url: str, environment: str, violations: list[dict[str, Any]], installation_id: int
) -> bool:
comment_text = self._format_violations_comment(violations)
return await self._send_deployment_review(callback_url, environment, "rejected", comment_text, installation_id)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/event_processors/deployment_protection_rule.py` around lines 281 - 288,
Both _approve_deployment and _reject_deployment currently call
_send_deployment_review and ignore its return value, so failures after retries
are swallowed; change both methods to return the result of
_send_deployment_review (i.e., return await self._send_deployment_review(...))
instead of awaiting without returning, and ensure callers (and the exception
handler path that may perform fallback approval) check the returned value and
propagate or raise on a falsy/failed result so the deployment state isn't left
waiting silently.

Comment on lines +1 to +14
"""Unit tests for DeploymentProtectionRuleProcessor."""

from unittest.mock import AsyncMock, MagicMock, patch

import pytest

from src.core.models import EventType
from src.event_processors.deployment_protection_rule import (
AGENT_TIMEOUT_SECONDS,
DeploymentProtectionRuleProcessor,
)
from src.rules.models import Rule, RuleSeverity
from src.tasks.task_queue import Task

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pipeline failure: ruff-format reformatted this file.

The pre-commit hook applied formatting changes. Ensure the reformatted version is committed.

🧰 Tools
🪛 GitHub Actions: Run pre-commit hooks

[error] 1-1: ruff-format reformatted 1 file. 1 file left unchanged. Changes were applied by the pre-commit hook.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/event_processors/test_deployment_protection_rule.py` around lines
1 - 14, The test file was reformatted by ruff-format; update the repository by
replacing the existing
tests/unit/event_processors/test_deployment_protection_rule.py with the
ruff-formatted version and commit those changes so the pre-commit formatting is
preserved; ensure the reformatted file still imports the same symbols
(AGENT_TIMEOUT_SECONDS, DeploymentProtectionRuleProcessor, EventType, Rule,
RuleSeverity, Task) and run the test/format hooks locally before pushing.

Comment on lines +125 to +130
def test_is_valid_callback_url():
assert DeploymentProtectionRuleProcessor._is_valid_callback_url("https://api.github.com/callback") is True
assert DeploymentProtectionRuleProcessor._is_valid_callback_url("http://localhost/callback") is True
assert DeploymentProtectionRuleProcessor._is_valid_callback_url("") is False
assert DeploymentProtectionRuleProcessor._is_valid_callback_url(None) is False
assert DeploymentProtectionRuleProcessor._is_valid_callback_url("ftp://invalid") is False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test asserts http:// URLs are valid — conflicts with security recommendation.

Line 127 asserts that http://localhost/callback returns True. If _is_valid_callback_url is updated to require HTTPS (as recommended), this test will fail. Either update the test to expect False for http:// URLs, or document why http:// is intentionally allowed (e.g., for local development).

Proposed fix if HTTPS is required
 def test_is_valid_callback_url():
     assert DeploymentProtectionRuleProcessor._is_valid_callback_url("https://api.github.com/callback") is True
-    assert DeploymentProtectionRuleProcessor._is_valid_callback_url("http://localhost/callback") is True
+    assert DeploymentProtectionRuleProcessor._is_valid_callback_url("http://localhost/callback") is False
     assert DeploymentProtectionRuleProcessor._is_valid_callback_url("") is False
     assert DeploymentProtectionRuleProcessor._is_valid_callback_url(None) is False
     assert DeploymentProtectionRuleProcessor._is_valid_callback_url("ftp://invalid") is False
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/event_processors/test_deployment_protection_rule.py` around lines
125 - 130, The test currently expects
DeploymentProtectionRuleProcessor._is_valid_callback_url("http://localhost/callback")
to be True, which conflicts with the security change to require HTTPS; update
the test to reflect the new policy by changing the assertion to expect False for
"http://localhost/callback" (or, if you need to allow HTTP for local dev, add a
brief comment and a separate test case that documents the exception and uses an
explicit flag or config to permit non-HTTPS) so the behavior of
_is_valid_callback_url aligns with the HTTPS-only requirement.

@dkargatzis dkargatzis merged commit bf5c971 into warestack:main Feb 26, 2026
3 checks passed
dkargatzis added a commit that referenced this pull request Feb 26, 2026
- Migrate deployment_protection_rule.py from stdlib logging to structlog
- Add AGENT_TIMEOUT_SECONDS constant for timeout configuration
- Add defensive getattr() calls for violation object attributes
- Migrate deployment_scheduler.py from stdlib logging to structlog
- Add callback_url to required_fields validation
- Fix falsy checks to is None checks for proper validation
- Fix timestamp handling to use timezone-aware datetime (UTC)
- Improve structured logging with consistent fields
- Fix UP038 isinstance and B011 assert False lint issues

Closes #49

Signed-off-by: Dimitris Kargatzis <dkargatzis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: resolve deployment status blocking without clear reason

3 participants