Add SSRF guard for outbound URL fetches#261
Merged
Merged
Conversation
…check) The `read_url` live-agent tool, `speak --url`, and the podcast-feed probe all fetched URLs through a guard that only string-matched the literal host (`risk._LOCAL_HOST`). That missed DNS-based bypasses (a public hostname that resolves to 127.0.0.1/169.254.169.254), alternate IP spellings (decimal/hex IPv4, IPv4-mapped IPv6), and — critically — redirects: the fetch followed 30x hops with no re-check, so a public URL could redirect to the cloud-metadata endpoint and the body came back to the model. Add `core/ssrf.py`: resolve the host via getaddrinfo and refuse any private/loopback/link-local/reserved/multicast IP via `ipaddress`, enforced on the initial URL and on every redirect hop. `core/webpage._fetch` and `app/transcribe/feed._fetch` now follow redirects manually and call the guard each hop. Also cap `webpage` response bodies at 10 MB so a hostile URL can't exhaust memory (the feed fetch already capped). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01AWsVSeWJjTXE6bsG4e1J3V
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a new
ssrfmodule that validates URLs before fetching them, preventing Server-Side Request Forgery attacks by refusing to connect to private, loopback, link-local, or internal addresses. The guard resolves hostnames to IP addresses and inspects them, catching both direct attempts to access internal addresses and public URLs that redirect to internal ones.Key changes
New
aai_cli/core/ssrf.pymodule: Providesassert_public_url()to validate that a URL's hostname resolves only to public IP addresses. Covers IPv4 and IPv6 (including IPv4-mapped IPv6), detects loopback, RFC 1918 private ranges, link-local (including the169.254.169.254cloud-metadata address), unique-local, and multicast addresses. RaisesBlockedURLError(aUsageErrorsubclass) for non-public hosts.Manual redirect handling in
webpage._fetch(): Changed fromfollow_redirects=Trueto manual per-hop redirect following so the SSRF guard runs on every redirect target. A public URL can redirect to an internal one, so each hop must be validated. Implements a redirect hop cap (MAX_REDIRECTS = 5) to prevent infinite loops. Also adds response body size capping (_MAX_BYTES = 10 MB) to prevent memory exhaustion from hostile URLs, and proper charset decoding using the response's declared charset rather than assuming UTF-8.Manual redirect handling in
feed._fetch(): Similarly updated to follow redirects manually with per-hop SSRF validation. ReturnsNoneon SSRF violations so the URL falls through to the API's server-side fetch.Comprehensive test coverage: New
tests/test_ssrf.pytests IP classification (internal vs. public), URL validation, and error cases. Updatedtests/test_webpage.pywith fixtures that stub DNS to a public IP for hermeticity, plus new tests for SSRF blocking, redirect-to-internal detection, body size capping, charset decoding, redirect loops, and DNS failures. Updatedtests/test_transcribe_feed.pywith similar fixtures and tests for redirect following and SSRF violations.Implementation details
_resolve_host()so tests can stub it without network access while still exercising the realipaddress-based IP classification.BlockedURLErrorexception is aUsageErrorsubclass so it renders as a clean exit-2 message and integrates with existing error handling.socket,ipaddress,urllib.parse) plus the existingerrorsmodule, keeping the import footprint minimal.https://claude.ai/code/session_01AWsVSeWJjTXE6bsG4e1J3V