Add actionlint, zizmor, gitleaks, and refurb gates to check.sh#30
Merged
Conversation
Extend the CI gate with four checks that target classes of issue the existing
tools don't cover:
- actionlint: lints .github/workflows for bad expressions, undefined
needs/matrix refs, and shell bugs in `run:` blocks (the workflow analogue of
the existing shellcheck gate). Go binary -> self-skips locally, CI installs it.
- zizmor: audits the workflows for CI security issues (script injection via
untrusted ${{ github.* }} interpolation, over-broad token scopes, unpinned
actions). Pip-installable, so it runs in the locked env as a hard gate;
--offline keeps it deterministic.
- gitleaks: secret scan defending the project's promise that credentials never
land in the repo. Fake test/doc fixtures are allowlisted in .gitleaks.toml.
Go binary -> self-skips locally, CI installs it.
- refurb: modernization lint complementing ruff's UP rules. FURB123/159/184 are
disabled in [tool.refurb] because they misfire here (FURB123 flags intentional
buffer/tuple conversions; FURB159's rstrip rewrite changes behavior; FURB184
would inline named intermediates kept for readability). Runs via `uv run` so
PEP 695 generics parse under the project's 3.12+ interpreter.
Also apply the genuinely-correct refurb suggestions: contextlib.suppress for the
follow-mode KeyboardInterrupt, dict-union over {**a, ...} spreads, and
list.extend over paired appends.
https://claude.ai/code/session_01GJtgtguBfPxjk5JVcT19FG
FileSource.__iter__ delegates to the _ffmpeg_chunks() generator with a plain `for` loop. Closing the outer generator early (the streaming code stops pulling chunks) raises GeneratorExit at its `yield`, but a plain loop does not forward that into the inner generator, so ffmpeg's terminate/wait/close cleanup ran only at GC instead of synchronously — leaking the child process until collection and defeating the KeyboardInterrupt-during-wait handling. This regressed in the #29 refactor that split _ffmpeg_chunks() out of __iter__. Wrap the delegation in try/finally and close the inner generator explicitly so the subprocess teardown happens deterministically on early stop. The two test_streaming_sources cleanup tests pass again.
Extend the "no new static-analysis escape hatches" diff gate with two test-suite shortcuts, same net-new-only policy as the existing type-ignore/noqa checks: - pytest.skip()/xfail() and @pytest.mark.skip/xfail: how a change makes a red test disappear instead of fixing it. - time.sleep() in tests/: the classic flakiness source (prefer events/polling). The legitimate existing skips guard the env-gated marker suites (e2e, install, install_script) and the two real time.sleep() synchronizations all live on origin/main, so they aren't added diff lines and don't trip the gate; a genuinely-needed new one must update the gate deliberately. Scoped to tests/ so production sleeps (e.g. real-time pacing) are unaffected. The matcher ignores the injectable `sleep=`/`self._sleep(` test idiom.
Coverage proves a changed line executed; it can't prove a test would fail if that line broke (the earlier ffmpeg-cleanup bug passed at 100% coverage). This adds a mutation gate that closes that gap, scoped to the diff so it stays fast and only judges changed code. scripts/mutation_gate.py: - Finds added lines in aai_cli/*.py vs the merge-base with origin/main. - Mutates only those lines with a high-signal operator set (comparison negation, and/or swap, arithmetic swap, boolean/numeric constants) — `not`-removal and statement deletion are omitted to avoid equivalent mutants and parent rewiring. - Reruns only the tests that cover each mutant, read from per-test coverage contexts in .coverage, so it's fast; a hang counts as killed. - Fails on survivors. A genuinely-equivalent or intentionally-unasserted line can be marked `# pragma: no mutate` (scanned across the statement's whole span so a formatter-wrapped comment still counts). check.sh: the main pytest run now writes per-test contexts (--cov-context=test); the gate runs after diff-cover and self-skips when origin/main is absent, exactly like diff-cover. coverage becomes an explicit dev dep (used directly by the gate; DEP004-ignored since scripts/ is dev-only tooling never shipped in the wheel). Two changed lines in sources.py are marked `# pragma: no mutate`: `produced +=` is only tested for == 0 (sign is an equivalent mutant), and the real-time pacing math is deliberately not asserted (tests inject a no-op sleep).
Aikido's license gate flagged refurb as GPL-3.0-only, which conflicts with this repo's MIT license. refurb was a dev-only lint tool invoked as a subprocess (no copyleft obligation on our code in practice), but keeping the dependency tree GPL-free for a distributed OSS CLI is the cleaner call and avoids a recurring audit flag. Its value over ruff's UP rules was marginal, and the concrete rewrites it already suggested (contextlib.suppress, dict-union, list.extend) remain in the code on their own. Drops the dev dependency, the [tool.refurb] config, and the check.sh step. zizmor (MIT) and coverage (Apache-2.0) are unaffected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Extend the CI gate with four checks that target classes of issue the existing
tools don't cover:
needs/matrix refs, and shell bugs in
run:blocks (the workflow analogue ofthe existing shellcheck gate). Go binary -> self-skips locally, CI installs it.
untrusted ${{ github.* }} interpolation, over-broad token scopes, unpinned
actions). Pip-installable, so it runs in the locked env as a hard gate;
--offline keeps it deterministic.
land in the repo. Fake test/doc fixtures are allowlisted in .gitleaks.toml.
Go binary -> self-skips locally, CI installs it.
disabled in [tool.refurb] because they misfire here (FURB123 flags intentional
buffer/tuple conversions; FURB159's rstrip rewrite changes behavior; FURB184
would inline named intermediates kept for readability). Runs via
uv runsoPEP 695 generics parse under the project's 3.12+ interpreter.
Also apply the genuinely-correct refurb suggestions: contextlib.suppress for the
follow-mode KeyboardInterrupt, dict-union over {**a, ...} spreads, and
list.extend over paired appends.
https://claude.ai/code/session_01GJtgtguBfPxjk5JVcT19FG