Skip to content

Add actionlint, zizmor, gitleaks, and refurb gates to check.sh#30

Merged
alexkroman merged 5 commits into
mainfrom
claude/check-sh-quality-tNO0n
Jun 6, 2026
Merged

Add actionlint, zizmor, gitleaks, and refurb gates to check.sh#30
alexkroman merged 5 commits into
mainfrom
claude/check-sh-quality-tNO0n

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Extend the CI gate with four checks that target classes of issue the existing
tools don't cover:

  • actionlint: lints .github/workflows for bad expressions, undefined
    needs/matrix refs, and shell bugs in run: blocks (the workflow analogue of
    the existing shellcheck gate). Go binary -> self-skips locally, CI installs it.
  • zizmor: audits the workflows for CI security issues (script injection via
    untrusted ${{ github.* }} interpolation, over-broad token scopes, unpinned
    actions). Pip-installable, so it runs in the locked env as a hard gate;
    --offline keeps it deterministic.
  • gitleaks: secret scan defending the project's promise that credentials never
    land in the repo. Fake test/doc fixtures are allowlisted in .gitleaks.toml.
    Go binary -> self-skips locally, CI installs it.
  • refurb: modernization lint complementing ruff's UP rules. FURB123/159/184 are
    disabled in [tool.refurb] because they misfire here (FURB123 flags intentional
    buffer/tuple conversions; FURB159's rstrip rewrite changes behavior; FURB184
    would inline named intermediates kept for readability). Runs via uv run so
    PEP 695 generics parse under the project's 3.12+ interpreter.

Also apply the genuinely-correct refurb suggestions: contextlib.suppress for the
follow-mode KeyboardInterrupt, dict-union over {**a, ...} spreads, and
list.extend over paired appends.

https://claude.ai/code/session_01GJtgtguBfPxjk5JVcT19FG

claude added 5 commits June 6, 2026 21:30
Extend the CI gate with four checks that target classes of issue the existing
tools don't cover:

- actionlint: lints .github/workflows for bad expressions, undefined
  needs/matrix refs, and shell bugs in `run:` blocks (the workflow analogue of
  the existing shellcheck gate). Go binary -> self-skips locally, CI installs it.
- zizmor: audits the workflows for CI security issues (script injection via
  untrusted ${{ github.* }} interpolation, over-broad token scopes, unpinned
  actions). Pip-installable, so it runs in the locked env as a hard gate;
  --offline keeps it deterministic.
- gitleaks: secret scan defending the project's promise that credentials never
  land in the repo. Fake test/doc fixtures are allowlisted in .gitleaks.toml.
  Go binary -> self-skips locally, CI installs it.
- refurb: modernization lint complementing ruff's UP rules. FURB123/159/184 are
  disabled in [tool.refurb] because they misfire here (FURB123 flags intentional
  buffer/tuple conversions; FURB159's rstrip rewrite changes behavior; FURB184
  would inline named intermediates kept for readability). Runs via `uv run` so
  PEP 695 generics parse under the project's 3.12+ interpreter.

Also apply the genuinely-correct refurb suggestions: contextlib.suppress for the
follow-mode KeyboardInterrupt, dict-union over {**a, ...} spreads, and
list.extend over paired appends.

https://claude.ai/code/session_01GJtgtguBfPxjk5JVcT19FG
FileSource.__iter__ delegates to the _ffmpeg_chunks() generator with a plain
`for` loop. Closing the outer generator early (the streaming code stops pulling
chunks) raises GeneratorExit at its `yield`, but a plain loop does not forward
that into the inner generator, so ffmpeg's terminate/wait/close cleanup ran only
at GC instead of synchronously — leaking the child process until collection and
defeating the KeyboardInterrupt-during-wait handling. This regressed in the #29
refactor that split _ffmpeg_chunks() out of __iter__.

Wrap the delegation in try/finally and close the inner generator explicitly so
the subprocess teardown happens deterministically on early stop. The two
test_streaming_sources cleanup tests pass again.
Extend the "no new static-analysis escape hatches" diff gate with two test-suite
shortcuts, same net-new-only policy as the existing type-ignore/noqa checks:

- pytest.skip()/xfail() and @pytest.mark.skip/xfail: how a change makes a red
  test disappear instead of fixing it.
- time.sleep() in tests/: the classic flakiness source (prefer events/polling).

The legitimate existing skips guard the env-gated marker suites (e2e, install,
install_script) and the two real time.sleep() synchronizations all live on
origin/main, so they aren't added diff lines and don't trip the gate; a
genuinely-needed new one must update the gate deliberately. Scoped to tests/ so
production sleeps (e.g. real-time pacing) are unaffected. The matcher ignores the
injectable `sleep=`/`self._sleep(` test idiom.
Coverage proves a changed line executed; it can't prove a test would fail if that
line broke (the earlier ffmpeg-cleanup bug passed at 100% coverage). This adds a
mutation gate that closes that gap, scoped to the diff so it stays fast and only
judges changed code.

scripts/mutation_gate.py:
- Finds added lines in aai_cli/*.py vs the merge-base with origin/main.
- Mutates only those lines with a high-signal operator set (comparison negation,
  and/or swap, arithmetic swap, boolean/numeric constants) — `not`-removal and
  statement deletion are omitted to avoid equivalent mutants and parent rewiring.
- Reruns only the tests that cover each mutant, read from per-test coverage
  contexts in .coverage, so it's fast; a hang counts as killed.
- Fails on survivors. A genuinely-equivalent or intentionally-unasserted line can
  be marked `# pragma: no mutate` (scanned across the statement's whole span so a
  formatter-wrapped comment still counts).

check.sh: the main pytest run now writes per-test contexts (--cov-context=test);
the gate runs after diff-cover and self-skips when origin/main is absent, exactly
like diff-cover. coverage becomes an explicit dev dep (used directly by the gate;
DEP004-ignored since scripts/ is dev-only tooling never shipped in the wheel).

Two changed lines in sources.py are marked `# pragma: no mutate`: `produced +=`
is only tested for == 0 (sign is an equivalent mutant), and the real-time pacing
math is deliberately not asserted (tests inject a no-op sleep).
Aikido's license gate flagged refurb as GPL-3.0-only, which conflicts with this
repo's MIT license. refurb was a dev-only lint tool invoked as a subprocess (no
copyleft obligation on our code in practice), but keeping the dependency tree
GPL-free for a distributed OSS CLI is the cleaner call and avoids a recurring
audit flag. Its value over ruff's UP rules was marginal, and the concrete
rewrites it already suggested (contextlib.suppress, dict-union, list.extend)
remain in the code on their own.

Drops the dev dependency, the [tool.refurb] config, and the check.sh step.
zizmor (MIT) and coverage (Apache-2.0) are unaffected.
@alexkroman alexkroman merged commit df0e264 into main Jun 6, 2026
10 checks passed
@alexkroman alexkroman deleted the claude/check-sh-quality-tNO0n branch June 6, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants