yt-dlp
providers=env,pip,brew,apt
node
providers=env,apt,brew
ffmpeg
providers=env,apt,brew
This gallery collects the workflows that usually take time to wire up by hand: browsers, media tools, parsers, metadata extractors, search indexes, and format-specific outputs. Choose a plugin to see how to run it, what it depends on, and what it produces.
screenshot
pdf
media
metadata
search
html/text
url parsing
Click a plugin card below and this example updates with the selected plugin.
pip install abx-dl
abx-dl install
mkdir archive
cd archive
abx-dl 'https://example.com'
ytdlpDownload video and audio media with metadata, subtitles, thumbnails, and description sidecars.
Download video and audio media with metadata, subtitles, thumbnails, and description sidecars.
yt-dlp
providers=env,pip,brew,apt
node
providers=env,apt,brew
ffmpeg
providers=env,apt,brew
abx-dl --plugins=ytdlp 'https://example.com'
YTDLP_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
YTDLP_ENABLEDEnable video/audio downloading with yt-dlp |
true |
boolean
|
MEDIA_ENABLED, SAVE_MEDIA, USE_MEDIA, USE_YTDLP, FETCH_MEDIA, SAVE_YTDLP |
YTDLP_BINARYPath to yt-dlp binary |
"yt-dlp" |
string
|
YOUTUBEDL_BINARY, YOUTUBE_DL_BINARY |
NODE_BINARYPath to Node.js binary for yt-dlp JS runtime |
"node" |
string
|
— |
YTDLP_TIMEOUTTimeout for yt-dlp downloads in seconds |
3600 |
integer
min 30 |
MEDIA_TIMEOUT fallback: TIMEOUT |
YTDLP_COOKIES_FILEPath to cookies file |
"" |
string
|
fallback: COOKIES_FILE |
YTDLP_MAX_SIZEMaximum file size for yt-dlp downloads |
"750m" |
string
pattern ^\d+[kmgKMG]?$ |
MEDIA_MAX_SIZE |
YTDLP_CHECK_SSL_VALIDITYWhether to verify SSL certificates |
true |
boolean
|
fallback: CHECK_SSL_VALIDITY |
YTDLP_ARGSDefault yt-dlp arguments |
[ "--restrict-filenames" "--trim-filenames=128" "--write-description" "--write-info-json" "--write-thumbnail" "--write-sub" "--write-auto-subs" "--convert-subs=srt" "--yes-playlist" "--continue" "--no-abort-on-error" "--ignore-errors" "--geo-bypass" "--add-metadata" "--no-progress" "--remote-components=ejs:github" "-o" "%(title)s.%(ext)s" |
array
|
YTDLP_DEFAULT_ARGS |
YTDLP_ARGS_EXTRAExtra arguments to append to yt-dlp command |
[] |
array
|
YTDLP_EXTRA_ARGS |
gallerydlDownload image and media galleries along with metadata sidecars from supported sites.
Download image and media galleries along with metadata sidecars from supported sites.
gallery-dl
providers=env,pip,brew,apt
abx-dl --plugins=gallerydl 'https://example.com'
GALLERYDL_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
GALLERYDL_ENABLEDEnable gallery downloading with gallery-dl |
true |
boolean
|
SAVE_GALLERYDL, USE_GALLERYDL |
GALLERYDL_BINARYPath to gallery-dl binary |
"gallery-dl" |
string
|
— |
GALLERYDL_TIMEOUTTimeout for gallery downloads in seconds |
3600 |
integer
min 30 |
fallback: TIMEOUT |
GALLERYDL_COOKIES_FILEPath to cookies file |
"" |
string
|
fallback: COOKIES_FILE |
GALLERYDL_CHECK_SSL_VALIDITYWhether to verify SSL certificates |
true |
boolean
|
fallback: CHECK_SSL_VALIDITY |
GALLERYDL_ARGSDefault gallery-dl arguments |
[ "--write-metadata" "--write-info-json" |
array
|
GALLERYDL_DEFAULT_ARGS |
GALLERYDL_ARGS_EXTRAExtra arguments to append to gallery-dl command |
[] |
array
|
GALLERYDL_EXTRA_ARGS |
forumdlDownload forum threads and exports in JSONL, WARC, and mailbox-style archive formats.
Download forum threads and exports in JSONL, WARC, and mailbox-style archive formats.
forum-dl
providers=env,pip
abx-dl --plugins=forumdl 'https://example.com'
FORUMDL_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
FORUMDL_ENABLEDEnable forum downloading with forum-dl |
true |
boolean
|
SAVE_FORUMDL, USE_FORUMDL |
FORUMDL_BINARYPath to forum-dl binary |
"forum-dl" |
string
|
— |
FORUMDL_TIMEOUTTimeout for forum downloads in seconds |
3600 |
integer
min 30 |
fallback: TIMEOUT |
FORUMDL_OUTPUT_FORMATOutput format for forum downloads |
"jsonl" |
string
jsonl | warc | mbox | maildir | mh | mmdf | babyl |
— |
FORUMDL_ARGSDefault forum-dl arguments |
[] |
array
|
FORUMDL_DEFAULT_ARGS |
FORUMDL_ARGS_EXTRAExtra arguments to append to forum-dl command |
[] |
array
|
FORUMDL_EXTRA_ARGS |
gitClone git repositories from supported repository URLs into the snapshot output directory.
Clone git repositories from supported repository URLs into the snapshot output directory.
git
providers=env,apt,brew
abx-dl --plugins=git 'https://example.com'
GIT_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
GIT_ENABLEDEnable git repository cloning |
true |
boolean
|
SAVE_GIT, USE_GIT |
GIT_BINARYPath to git binary |
"git" |
string
|
— |
GIT_TIMEOUTTimeout for git operations in seconds |
120 |
integer
min 10 |
fallback: TIMEOUT |
GIT_DOMAINSComma-separated list of domains to treat as git repositories |
"github.com,gitlab.com,bitbucket.org,gist.github.com,codeberg.org,gitea.com,git.sr.ht" |
string
|
— |
GIT_ARGSDefault git arguments |
[ "clone" "--depth=1" "--recursive" |
array
|
GIT_DEFAULT_ARGS |
GIT_ARGS_EXTRAExtra arguments to append to git command |
[] |
array
|
GIT_EXTRA_ARGS |
wgetArchive pages and their requisites with wget, optionally writing WARC captures.
Archive pages and their requisites with wget, optionally writing WARC captures.
wget
providers=env,apt,brew
abx-dl --plugins=wget 'https://example.com'
WGET_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
WGET_ENABLEDEnable wget archiving |
true |
boolean
|
SAVE_WGET, USE_WGET |
WGET_WARC_ENABLEDSave WARC archive file |
true |
boolean
|
SAVE_WARC, WGET_SAVE_WARC |
WGET_BINARYPath to wget binary |
"wget" |
string
|
— |
WGET_TIMEOUTTimeout for wget in seconds |
60 |
integer
min 5 |
fallback: TIMEOUT |
WGET_USER_AGENTUser agent string for wget |
"" |
string
|
fallback: USER_AGENT |
WGET_COOKIES_FILEPath to cookies file |
"" |
string
|
fallback: COOKIES_FILE |
WGET_CHECK_SSL_VALIDITYWhether to verify SSL certificates |
true |
boolean
|
fallback: CHECK_SSL_VALIDITY |
WGET_ARGSDefault wget arguments |
[ "--no-verbose" "--adjust-extension" "--convert-links" "--force-directories" "--backup-converted" "--span-hosts" "--no-parent" "--page-requisites" "--restrict-file-names=windows" "--tries=2" "-e" "robots=off" |
array
|
WGET_DEFAULT_ARGS |
WGET_ARGS_EXTRAExtra arguments to append to wget command |
[] |
array
|
WGET_EXTRA_ARGS |
archivedotorgSubmit URLs to the Internet Archive Wayback Machine and save the resulting archive link.
Submit URLs to the Internet Archive Wayback Machine and save the resulting archive link.
abx-dl --plugins=archivedotorg 'https://example.com'
ARCHIVEDOTORG_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
ARCHIVEDOTORG_ENABLEDSubmit URLs to archive.org Wayback Machine |
true |
boolean
|
SAVE_ARCHIVEDOTORG, USE_ARCHIVEDOTORG, SUBMIT_ARCHIVEDOTORG |
ARCHIVEDOTORG_TIMEOUTTimeout for archive.org submission in seconds |
60 |
integer
min 10 |
fallback: TIMEOUT |
ARCHIVEDOTORG_USER_AGENTUser agent string |
"" |
string
|
fallback: USER_AGENT |
faviconFetch and save the site favicon or touch icon.
Fetch and save the site favicon or touch icon.
abx-dl --plugins=favicon 'https://example.com'
FAVICON_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
FAVICON_ENABLEDEnable favicon downloading |
true |
boolean
|
SAVE_FAVICON, USE_FAVICON |
FAVICON_TIMEOUTTimeout for favicon fetch in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
FAVICON_USER_AGENTUser agent string |
"" |
string
|
fallback: USER_AGENT |
FAVICON_PROVIDERFallback favicon provider URL template. Supports either {} or {domain} placeholders. |
"https://www.google.com/s2/favicons?domain={}&format=ico" |
string
|
— |
ublockInstall the uBlock Origin Lite extension to block ads, trackers, and other page clutter during archiving.
Install the uBlock Origin Lite extension to block ads, trackers, and other page clutter during archiving.
chromium
providers=env,puppeteer
ublock
providers=chromewebstore
abx-dl --plugins=ublock 'https://example.com'
UBLOCK_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
UBLOCK_ENABLEDEnable uBlock Origin Lite browser extension for ad blocking |
true |
boolean
|
USE_UBLOCK |
istilldontcareaboutcookiesInstall the I Still Don't Care About Cookies extension to dismiss cookie banners during archiving.
Install the I Still Don't Care About Cookies extension to dismiss cookie banners during archiving.
chromium
providers=env,puppeteer
istilldontcareaboutcookies
providers=chromewebstore
abx-dl --plugins=istilldontcareaboutcookies 'https://example.com'
ISTILLDONTCAREABOUTCOOKIES_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
ISTILLDONTCAREABOUTCOOKIES_ENABLEDEnable I Still Don't Care About Cookies browser extension |
true |
boolean
|
USE_ISTILLDONTCAREABOUTCOOKIES |
twocaptchaInstall and configure the 2Captcha extension to solve CAPTCHAs during browser-based archiving.
Install and configure the 2Captcha extension to solve CAPTCHAs during browser-based archiving.
chromium
providers=env,puppeteer
twocaptcha
providers=chromewebstore
abx-dl --plugins=twocaptcha 'https://example.com'
TWOCAPTCHA_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
TWOCAPTCHA_ENABLEDEnable 2captcha browser extension for automatic CAPTCHA solving |
true |
boolean
|
CAPTCHA2_ENABLED, USE_CAPTCHA2, USE_TWOCAPTCHA |
TWOCAPTCHA_API_KEY2captcha API key for CAPTCHA solving service (get from https://2captcha.com) |
"" |
string
|
API_KEY_2CAPTCHA, CAPTCHA2_API_KEY |
TWOCAPTCHA_RETRY_COUNTNumber of times to retry CAPTCHA solving on error |
3 |
integer
min 0 |
CAPTCHA2_RETRY_COUNT |
TWOCAPTCHA_RETRY_DELAYDelay in seconds between CAPTCHA solving retries |
5 |
integer
min 0 |
CAPTCHA2_RETRY_DELAY |
TWOCAPTCHA_TIMEOUTTimeout for CAPTCHA solving in seconds |
60 |
integer
min 5 |
CAPTCHA2_TIMEOUT fallback: TIMEOUT |
TWOCAPTCHA_AUTO_SUBMITAutomatically submit forms after CAPTCHA is solved |
false |
boolean
|
— |
modalcloserAutomatically dismiss dialogs, cookie banners, and framework modals while the page is being archived.
Automatically dismiss dialogs, cookie banners, and framework modals while the page is being archived.
chromium
providers=env,puppeteer
abx-dl --plugins=modalcloser 'https://example.com'
MODALCLOSER_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
MODALCLOSER_ENABLEDEnable automatic modal and dialog closing |
true |
boolean
|
CLOSE_MODALS, AUTO_CLOSE_MODALS |
MODALCLOSER_TIMEOUTDelay before auto-closing dialogs (ms) |
1250 |
integer
min 100 |
— |
MODALCLOSER_POLL_INTERVALHow often to check for CSS modals (ms) |
500 |
integer
min 100 |
— |
consolelogCapture browser console messages emitted while the page loads.
Capture browser console messages emitted while the page loads.
chromium
providers=env,puppeteer
abx-dl --plugins=consolelog 'https://example.com'
CONSOLELOG_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
CONSOLELOG_ENABLEDEnable console log capture |
true |
boolean
|
SAVE_CONSOLELOG, USE_CONSOLELOG |
CONSOLELOG_TIMEOUTTimeout for console log capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
dnsRecord DNS activity observed while loading the page in Chrome.
Record DNS activity observed while loading the page in Chrome.
chromium
providers=env,puppeteer
abx-dl --plugins=dns 'https://example.com'
DNS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
DNS_ENABLEDEnable DNS traffic recording during page load |
true |
boolean
|
SAVE_DNS, USE_DNS |
DNS_TIMEOUTTimeout for DNS recording in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
sslcertsCapture TLS certificate and connection metadata for the loaded page.
Capture TLS certificate and connection metadata for the loaded page.
chromium
providers=env,puppeteer
abx-dl --plugins=sslcerts 'https://example.com'
SSLCERTS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
SSLCERTS_ENABLEDEnable SSL certificate capture |
true |
boolean
|
SAVE_SSLCERTS, USE_SSLCERTS |
SSLCERTS_TIMEOUTTimeout for SSL capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
responsesCapture HTTP response metadata for requests made during page load.
Capture HTTP response metadata for requests made during page load.
chromium
providers=env,puppeteer
abx-dl --plugins=responses 'https://example.com'
RESPONSES_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
RESPONSES_ENABLEDEnable HTTP response capture |
true |
boolean
|
SAVE_RESPONSES, USE_RESPONSES |
RESPONSES_TIMEOUTTimeout for response capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
redirectsCapture the redirect chain encountered while loading the page.
Capture the redirect chain encountered while loading the page.
chromium
providers=env,puppeteer
abx-dl --plugins=redirects 'https://example.com'
REDIRECTS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
REDIRECTS_ENABLEDEnable redirect chain capture |
true |
boolean
|
SAVE_REDIRECTS, USE_REDIRECTS |
REDIRECTS_TIMEOUTTimeout for redirect capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
staticfileDetect and download static-file responses directly when a URL resolves to a non-HTML asset.
Detect and download static-file responses directly when a URL resolves to a non-HTML asset.
chromium
providers=env,puppeteer
abx-dl --plugins=staticfile 'https://example.com'
STATICFILE_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
STATICFILE_ENABLEDEnable static file detection |
true |
boolean
|
SAVE_STATICFILE, USE_STATICFILE |
STATICFILE_TIMEOUTTimeout for static file detection in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
headersCapture HTTP headers for the main document response.
Capture HTTP headers for the main document response.
chromium
providers=env,puppeteer
abx-dl --plugins=headers 'https://example.com'
HEADERS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
HEADERS_ENABLEDEnable HTTP headers capture |
true |
boolean
|
SAVE_HEADERS, USE_HEADERS |
HEADERS_TIMEOUTTimeout for headers capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
chromeLaunch and manage a shared Chromium-compatible CDP browser session for browser-driven plugins.
Launch and manage a shared Chromium-compatible CDP browser session for browser-driven plugins.
node
providers=env,apt,brew
chromium
providers=env,puppeteer
abx-dl --plugins=chrome 'https://example.com'
CHROME_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_ENABLEDEnable Chrome browser integration for archiving |
true |
boolean
|
USE_CHROME |
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
CHROME_TIMEOUTTimeout for Chrome operations in seconds |
60 |
integer
min 5 |
fallback: TIMEOUT |
CHROME_LAUNCH_ATTEMPTSMaximum Chrome launch attempts before failing on transient startup errors |
3 |
integer
min 1 |
— |
CHROME_HEADLESSRun Chrome in headless mode |
true |
boolean
|
— |
PERSONAS_DIRShared Chrome/browser personas root |
"" |
string
|
— |
ACTIVE_PERSONAActive browser persona name |
"Default" |
string
|
— |
CHROME_SANDBOXEnable Chrome sandbox (disable in Docker with --no-sandbox) |
true |
boolean
|
— |
CHROME_RESOLUTIONBrowser viewport resolution (width,height) |
"1440,2000" |
string
pattern ^\d+,\d+$ |
fallback: RESOLUTION |
CHROME_USER_DATA_DIRPath to Chrome user data directory for persistent sessions (defaults to PERSONAS_DIR/ACTIVE_PERSONA/chrome_profile) |
"" |
string
|
— |
CHROME_USER_AGENTUser agent string for Chrome |
"" |
string
|
fallback: USER_AGENT |
CHROME_CDP_URLConnect to an already-running browser over CDP instead of launching a new local Chrome process |
"" |
string
|
— |
CHROME_IS_LOCALWhether the managed browser process is local and should have a live chrome.pid marker |
true |
boolean
|
— |
CHROME_KEEPALIVEKeep the browser alive after the owning crawl/snapshot hook exits instead of closing it during cleanup |
false |
boolean
|
— |
CHROME_ISOLATIONWhether Chrome runs as one shared browser per crawl or a separate browser per snapshot |
"crawl" |
string
crawl | snapshot |
— |
CHROME_ARGSDefault Chrome command-line arguments (static flags only, dynamic args like --user-data-dir are added at runtime) |
[ "--no-first-run" "--no-default-browser-check" "--disable-default-apps" "--disable-sync" "--disable-infobars" "--disable-blink-features=AutomationControlled" "--disable-component-update" "--disable-domain-reliability" "--disable-breakpad" "--disable-client-side-phishing-detection" "--disable-hang-monitor" "--disable-speech-synthesis-api" "--disable-speech-api" "--disable-print-preview" "--disable-notifications" "--disable-desktop-notifications" "--disable-popup-blocking" "--disable-prompt-on-repost" "--disable-external-intent-requests" "--disable-session-crashed-bubble" "--disable-search-engine-choice-screen" "--disable-datasaver-prompt" "--ash-no-nudges" "--hide-crash-restore-bubble" "--suppress-message-center-popups" "--noerrdialogs" "--no-pings" "--silent-debugger-extension-api" "--deny-permission-prompts" "--safebrowsing-disable-auto-update" "--metrics-recording-only" "--password-store=basic" "--use-mock-keychain" "--disable-cookie-encryption" "--font-render-hinting=none" "--force-color-profile=srgb" "--disable-partial-raster" "--disable-skia-runtime-opts" "--disable-2d-canvas-clip-aa" "--enable-webgl" "--hide-scrollbars" "--export-tagged-pdf" "--generate-pdf-document-outline" "--disable-lazy-loading" "--disable-renderer-backgrounding" "--disable-background-networking" "--disable-background-timer-throttling" "--disable-backgrounding-occluded-windows" "--disable-ipc-flooding-protection" "--disable-extensions-http-throttling" "--disable-field-trial-config" "--disable-back-forward-cache" "--autoplay-policy=no-user-gesture-required" "--disable-gesture-requirement-for-media-playback" "--lang=en-US,en;q=0.9" "--log-level=2" "--enable-logging=stderr" |
array
|
CHROME_DEFAULT_ARGS |
CHROME_ARGS_EXTRAExtra arguments to append to Chrome command (for user customization) |
[] |
array
|
CHROME_EXTRA_ARGS |
CHROME_PAGELOAD_TIMEOUTTimeout for page navigation/load in seconds |
60 |
integer
min 5 |
fallback: CHROME_TIMEOUT |
CHROME_WAIT_FORPage load completion condition (domcontentloaded, load, networkidle0, networkidle2) |
"load" |
string
domcontentloaded | load | networkidle0 | networkidle2 |
— |
CHROME_DELAY_AFTER_LOADExtra delay in seconds after page load completes before archiving (useful for JS-heavy SPAs) |
0 |
number
min 0 |
— |
CHROME_CHECK_SSL_VALIDITYWhether to verify SSL certificates (disable for self-signed certs) |
true |
boolean
|
fallback: CHECK_SSL_VALIDITY |
seoCapture SEO-related metadata such as meta tags and Open Graph fields.
Capture SEO-related metadata such as meta tags and Open Graph fields.
chromium
providers=env,puppeteer
abx-dl --plugins=seo 'https://example.com'
SEO_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
SEO_ENABLEDEnable SEO metadata capture |
true |
boolean
|
SAVE_SEO, USE_SEO |
SEO_TIMEOUTTimeout for SEO capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
accessibilityCapture the browser accessibility tree for the archived page.
Capture the browser accessibility tree for the archived page.
chromium
providers=env,puppeteer
abx-dl --plugins=accessibility 'https://example.com'
ACCESSIBILITY_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
ACCESSIBILITY_ENABLEDEnable accessibility tree capture |
true |
boolean
|
SAVE_ACCESSIBILITY, USE_ACCESSIBILITY |
ACCESSIBILITY_TIMEOUTTimeout for accessibility capture in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
infiniscrollExpand infinite-scroll pages and load additional content before downstream capture plugins run.
Expand infinite-scroll pages and load additional content before downstream capture plugins run.
chromium
providers=env,puppeteer
abx-dl --plugins=infiniscroll 'https://example.com'
INFINISCROLL_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
INFINISCROLL_ENABLEDEnable infinite scroll page expansion |
true |
boolean
|
SAVE_INFINISCROLL, USE_INFINISCROLL |
INFINISCROLL_TIMEOUTMaximum timeout for scrolling in seconds |
120 |
integer
min 10 |
— |
INFINISCROLL_SCROLL_DELAYDelay between scrolls in milliseconds |
2000 |
integer
min 500 |
— |
INFINISCROLL_SCROLL_DISTANCEDistance to scroll per step in pixels |
1600 |
integer
min 100 |
— |
INFINISCROLL_SCROLL_LIMITMaximum number of scroll steps |
10 |
integer
min 1 |
— |
INFINISCROLL_MIN_HEIGHTMinimum page height to scroll to in pixels |
16000 |
integer
min 1000 |
— |
INFINISCROLL_EXPAND_DETAILSExpand <details> elements and click 'load more' buttons for comments |
true |
boolean
|
— |
claudechromeUse Claude computer-use to interact with pages in Chrome via CDP screenshots and the Anthropic API.
Use Claude computer-use to interact with pages in Chrome via CDP screenshots and the Anthropic API.
chromium
providers=env,puppeteer
node
providers=env,apt,brew
claudechrome
providers=chromewebstore
abx-dl --plugins=claudechrome 'https://example.com'
CLAUDECHROME_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
PERSONAS_DIRShared Chrome/browser personas root |
"" |
string
|
— |
ACTIVE_PERSONAActive browser persona name |
"Default" |
string
|
— |
CLAUDECHROME_ENABLEDEnable Claude for Chrome browser extension for AI-driven page interaction |
false |
boolean
|
USE_CLAUDECHROME |
CLAUDECHROME_PROMPTPrompt for Claude to execute on the page. Claude can click buttons, fill forms, download files, and interact with any page element. |
"Look at the current page. If there are any "expand", "show more", "load more", or similar buttons/links, click them all to reveal hidden content. Report what you did." |
string
|
— |
CLAUDECHROME_TIMEOUTTimeout for Claude for Chrome operations in seconds |
120 |
integer
min 10 |
fallback: TIMEOUT |
CLAUDECHROME_MODELClaude model to use (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001). Availability depends on your plan. |
"claude-sonnet-4-6" |
string
|
— |
CLAUDECHROME_MAX_ACTIONSMaximum number of agentic loop iterations (screenshots + actions) per page |
15 |
integer
min 1 |
— |
ANTHROPIC_API_KEYAnthropic API key for Claude for Chrome authentication |
"" |
string
|
— |
singlefileSave a complete page as a single self-contained HTML file using the SingleFile extension or CLI.
Save a complete page as a single self-contained HTML file using the SingleFile extension or CLI.
chromium
providers=env,puppeteer
single-file
providers=env,npm
singlefile
providers=chromewebstore
abx-dl --plugins=singlefile 'https://example.com'
SINGLEFILE_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
PERSONAS_DIRShared Chrome/browser personas root |
"" |
string
|
— |
ACTIVE_PERSONAActive browser persona name |
"Default" |
string
|
— |
SINGLEFILE_ENABLEDEnable SingleFile archiving |
true |
boolean
|
SAVE_SINGLEFILE, USE_SINGLEFILE |
SINGLEFILE_BINARYPath to single-file binary |
"single-file" |
string
|
SINGLE_FILE_BINARY |
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
SINGLEFILE_TIMEOUTTimeout for SingleFile in seconds |
60 |
integer
min 10 |
fallback: TIMEOUT |
SINGLEFILE_USER_AGENTUser agent string |
"" |
string
|
fallback: USER_AGENT |
SINGLEFILE_COOKIES_FILEPath to cookies file |
"" |
string
|
fallback: COOKIES_FILE |
SINGLEFILE_CHECK_SSL_VALIDITYWhether to verify SSL certificates |
true |
boolean
|
fallback: CHECK_SSL_VALIDITY |
SINGLEFILE_CHROME_ARGSChrome command-line arguments for SingleFile |
[] |
array
|
fallback: CHROME_ARGS |
SINGLEFILE_ARGSDefault single-file arguments |
[ "--browser-headless" |
array
|
SINGLEFILE_DEFAULT_ARGS |
SINGLEFILE_ARGS_EXTRAExtra arguments to append to single-file command |
[] |
array
|
SINGLEFILE_EXTRA_ARGS |
screenshotCapture a PNG screenshot of the rendered page.
Capture a PNG screenshot of the rendered page.
chromium
providers=env,puppeteer
abx-dl --plugins=screenshot 'https://example.com'
SCREENSHOT_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
SCREENSHOT_ENABLEDEnable screenshot capture |
true |
boolean
|
SAVE_SCREENSHOT, USE_SCREENSHOT |
SCREENSHOT_TIMEOUTTimeout for screenshot capture in seconds |
60 |
integer
min 5 |
fallback: TIMEOUT |
SCREENSHOT_RESOLUTIONScreenshot resolution (width,height) |
"1440,2000" |
string
pattern ^\d+,\d+$ |
fallback: RESOLUTION |
pdfRender the current page to PDF using the shared Chrome session.
Render the current page to PDF using the shared Chrome session.
chromium
providers=env,puppeteer
abx-dl --plugins=pdf 'https://example.com'
PDF_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
PDF_ENABLEDEnable PDF generation |
true |
boolean
|
SAVE_PDF, USE_PDF |
PDF_TIMEOUTTimeout for PDF generation in seconds |
60 |
integer
min 5 |
fallback: TIMEOUT |
PDF_RESOLUTIONPDF page resolution (width,height) |
"1440,2000" |
string
pattern ^\d+,\d+$ |
fallback: RESOLUTION |
domSave the fully rendered DOM HTML from the live page.
Save the fully rendered DOM HTML from the live page.
chromium
providers=env,puppeteer
abx-dl --plugins=dom 'https://example.com'
DOM_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
DOM_ENABLEDEnable DOM capture |
true |
boolean
|
SAVE_DOM, USE_DOM |
DOM_TIMEOUTTimeout for DOM capture in seconds |
60 |
integer
min 5 |
fallback: TIMEOUT |
titleCapture the final document title from the rendered page.
Capture the final document title from the rendered page.
chromium
providers=env,puppeteer
abx-dl --plugins=title 'https://example.com'
TITLE_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
TITLE_ENABLEDEnable title extraction |
true |
boolean
|
SAVE_TITLE, USE_TITLE |
TITLE_TIMEOUTTimeout for title extraction in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
readabilityExtract article HTML, text, and metadata using Mozilla Readability.
Extract article HTML, text, and metadata using Mozilla Readability.
readability-extractor
providers=env,npm
abx-dl --plugins=readability 'https://example.com'
READABILITY_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
READABILITY_ENABLEDEnable Readability text extraction |
true |
boolean
|
SAVE_READABILITY, USE_READABILITY |
READABILITY_BINARYPath to readability-extractor binary |
"readability-extractor" |
string
|
— |
READABILITY_TIMEOUTTimeout for Readability in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
READABILITY_ARGSDefault Readability arguments |
[] |
array
|
READABILITY_DEFAULT_ARGS |
READABILITY_ARGS_EXTRAExtra arguments to append to Readability command |
[] |
array
|
READABILITY_EXTRA_ARGS |
defuddleExtract cleaned article HTML, text, and metadata from archived HTML using Defuddle.
Extract cleaned article HTML, text, and metadata from archived HTML using Defuddle.
defuddle
providers=env,npm
abx-dl --plugins=defuddle 'https://example.com'
DEFUDDLE_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
DEFUDDLE_ENABLEDEnable Defuddle text extraction |
true |
boolean
|
SAVE_DEFUDDLE, USE_DEFUDDLE |
DEFUDDLE_BINARYPath to defuddle binary |
"defuddle" |
string
|
— |
DEFUDDLE_TIMEOUTTimeout for Defuddle in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
DEFUDDLE_ARGSDefault Defuddle arguments |
[] |
array
|
DEFUDDLE_DEFAULT_ARGS |
DEFUDDLE_ARGS_EXTRAExtra arguments to append to Defuddle command |
[] |
array
|
DEFUDDLE_EXTRA_ARGS |
mercuryExtract article HTML, text, and metadata using the Postlight Mercury parser.
Extract article HTML, text, and metadata using the Postlight Mercury parser.
postlight-parser
providers=npm,env
abx-dl --plugins=mercury 'https://example.com'
MERCURY_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
MERCURY_ENABLEDEnable Mercury text extraction |
true |
boolean
|
SAVE_MERCURY, USE_MERCURY |
MERCURY_BINARYPath to Mercury/Postlight parser binary |
"postlight-parser" |
string
|
— |
MERCURY_TIMEOUTTimeout for Mercury in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
MERCURY_ARGSDefault Mercury parser arguments |
[] |
array
|
MERCURY_DEFAULT_ARGS |
MERCURY_ARGS_EXTRAExtra arguments to append to Mercury parser command |
[] |
array
|
MERCURY_EXTRA_ARGS |
claudecodeextractUse Claude Code to generate clean Markdown from snapshot extractor outputs.
Use Claude Code to generate clean Markdown from snapshot extractor outputs.
claude
providers=env,npm
abx-dl --plugins=claudecodeextract 'https://example.com'
CLAUDECODEEXTRACT_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CLAUDECODE_BINARYPath to Claude Code CLI binary |
"claude" |
string
|
— |
CLAUDECODEEXTRACT_ENABLEDEnable Claude Code AI extraction |
false |
boolean
|
USE_CLAUDECODEEXTRACT |
CLAUDECODEEXTRACT_TIMEOUTTimeout for Claude Code extraction in seconds |
120 |
integer
min 10 |
fallback: CLAUDECODE_TIMEOUT |
CLAUDECODEEXTRACT_PROMPTCustom prompt for Claude Code extraction. Use this to define what Claude should extract or generate from the snapshot. |
"Read all the previously extracted outputs in this snapshot directory (readability/, mercury/, defuddle/, htmltotext/, dom/, singlefile/, etc.). Using the best available source, generate a clean, well-formatted Markdown representation of the page content. Save the output as content.md in your output directory." |
string
|
— |
CLAUDECODEEXTRACT_MODELClaude model to use for extraction (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001) |
"claude-sonnet-4-6" |
string
|
fallback: CLAUDECODE_MODEL |
CLAUDECODEEXTRACT_MAX_TURNSMaximum number of agentic turns for extraction |
50 |
integer
min 1 |
fallback: CLAUDECODE_MAX_TURNS |
htmltotextConvert archived HTML from other extractors into plain text for indexing and analysis.
Convert archived HTML from other extractors into plain text for indexing and analysis.
abx-dl --plugins=htmltotext 'https://example.com'
HTMLTOTEXT_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
HTMLTOTEXT_ENABLEDEnable HTML to text conversion |
true |
boolean
|
SAVE_HTMLTOTEXT, USE_HTMLTOTEXT |
HTMLTOTEXT_TIMEOUTTimeout for HTML to text conversion in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
trafilaturaExtract article content from archived HTML into text, markdown, HTML, CSV, JSON, and XML formats.
Extract article content from archived HTML into text, markdown, HTML, CSV, JSON, and XML formats.
trafilatura
providers=env,pip
abx-dl --plugins=trafilatura 'https://example.com'
TRAFILATURA_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
TRAFILATURA_ENABLEDEnable Trafilatura extraction |
true |
boolean
|
SAVE_TRAFILATURA, USE_TRAFILATURA |
TRAFILATURA_BINARYPath to trafilatura binary |
"trafilatura" |
string
|
— |
TRAFILATURA_TIMEOUTTimeout for Trafilatura in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
TRAFILATURA_OUTPUT_FORMATSComma-separated trafilatura output formats to write (txt, markdown, html, csv, json, xml, xmltei) |
"txt,markdown,html" |
string
|
— |
opendataloaderExtract structured text, tables, and metadata from PDFs using opendataloader-pdf. Supports OCR for scanned PDFs via hybrid backend.
Extract structured text, tables, and metadata from PDFs using opendataloader-pdf. Supports OCR for scanned PDFs via hybrid backend.
opendataloader-pdf
providers=env,pip
java
providers=env,apt,brew
min_version=11.0.0
abx-dl --plugins=opendataloader 'https://example.com'
OPENDATALOADER_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
OPENDATALOADER_ENABLEDEnable PDF text extraction with opendataloader-pdf |
true |
boolean
|
SAVE_OPENDATALOADER, USE_OPENDATALOADER |
OPENDATALOADER_BINARYPath to opendataloader-pdf binary |
"opendataloader-pdf" |
string
|
— |
OPENDATALOADER_JAVA_BINARYPath to the Java runtime used by opendataloader-pdf |
"java" |
string
|
fallback: JAVA_BINARY |
OPENDATALOADER_TIMEOUTTimeout for PDF extraction in seconds |
120 |
integer
min 10 |
fallback: TIMEOUT |
OPENDATALOADER_FORCE_OCRUse hybrid OCR backend (--hybrid docling-fast) for scanned/image-based PDFs. Requires opendataloader-pdf-hybrid server running. |
false |
boolean
|
— |
OPENDATALOADER_HYBRID_URLURL of the opendataloader-pdf-hybrid server (e.g. http://localhost:5002). If empty, uses the default built-in URL. |
"" |
string
|
— |
OPENDATALOADER_ARGSDefault opendataloader-pdf arguments |
[] |
array
|
OPENDATALOADER_DEFAULT_ARGS |
OPENDATALOADER_ARGS_EXTRAExtra arguments to append to opendataloader-pdf command |
[] |
array
|
OPENDATALOADER_EXTRA_ARGS |
liteparseExtract text and metadata from PDFs and documents using LiteParse (by LlamaIndex). Supports OCR via Tesseract.js.
Extract text and metadata from PDFs and documents using LiteParse (by LlamaIndex). Supports OCR via Tesseract.js.
lit
providers=env,npm
abx-dl --plugins=liteparse 'https://example.com'
LITEPARSE_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
LITEPARSE_ENABLEDEnable LiteParse document extraction |
true |
boolean
|
SAVE_LITEPARSE, USE_LITEPARSE |
LITEPARSE_BINARYPath to lit binary |
"lit" |
string
|
— |
LITEPARSE_TIMEOUTTimeout for LiteParse extraction in seconds |
120 |
integer
min 10 |
fallback: TIMEOUT |
LITEPARSE_ARGSDefault LiteParse arguments |
[] |
array
|
LITEPARSE_DEFAULT_ARGS |
LITEPARSE_ARGS_EXTRAExtra arguments to append to LiteParse command |
[] |
array
|
LITEPARSE_EXTRA_ARGS |
papersdlFetch downloadable academic papers from paper URLs and DOI targets.
Fetch downloadable academic papers from paper URLs and DOI targets.
papers-dl
providers=env,pip
abx-dl --plugins=papersdl 'https://example.com'
PAPERSDL_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PAPERSDL_ENABLEDEnable paper downloading with papers-dl |
true |
boolean
|
SAVE_PAPERSDL, USE_PAPERSDL |
PAPERSDL_BINARYPath to papers-dl binary |
"papers-dl" |
string
|
— |
PAPERSDL_TIMEOUTTimeout for paper downloads in seconds |
300 |
integer
min 30 |
fallback: TIMEOUT |
PAPERSDL_ARGSDefault papers-dl arguments |
[ "fetch" |
array
|
PAPERSDL_DEFAULT_ARGS |
PAPERSDL_ARGS_EXTRAExtra arguments to append to papers-dl command |
[] |
array
|
PAPERSDL_EXTRA_ARGS |
parse_html_urlsParse HTML documents and emit discovered links as JSONL snapshot records.
Parse HTML documents and emit discovered links as JSONL snapshot records.
abx-dl --plugins=parse_html_urls 'https://example.com'
PARSE_HTML_URLS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PARSE_HTML_URLS_ENABLEDEnable HTML URL parsing |
true |
boolean
|
USE_PARSE_HTML_URLS |
parse_txt_urlsParse plain text documents and emit discovered URLs as JSONL snapshot records.
Parse plain text documents and emit discovered URLs as JSONL snapshot records.
abx-dl --plugins=parse_txt_urls 'https://example.com'
PARSE_TXT_URLS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PARSE_TXT_URLS_ENABLEDEnable plain text URL parsing |
true |
boolean
|
USE_PARSE_TXT_URLS |
parse_rss_urlsParse RSS and Atom feeds and emit discovered entry URLs as JSONL snapshot records.
Parse RSS and Atom feeds and emit discovered entry URLs as JSONL snapshot records.
abx-dl --plugins=parse_rss_urls 'https://example.com'
PARSE_RSS_URLS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PARSE_RSS_URLS_ENABLEDEnable RSS/Atom feed URL parsing |
true |
boolean
|
USE_PARSE_RSS_URLS |
parse_netscape_urlsParse Netscape bookmark HTML exports and emit discovered URLs as JSONL snapshot records.
Parse Netscape bookmark HTML exports and emit discovered URLs as JSONL snapshot records.
abx-dl --plugins=parse_netscape_urls 'https://example.com'
PARSE_NETSCAPE_URLS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PARSE_NETSCAPE_URLS_ENABLEDEnable Netscape bookmarks HTML URL parsing |
true |
boolean
|
USE_PARSE_NETSCAPE_URLS |
parse_jsonl_urlsParse JSONL bookmark exports and emit discovered URLs as JSONL snapshot records.
Parse JSONL bookmark exports and emit discovered URLs as JSONL snapshot records.
abx-dl --plugins=parse_jsonl_urls 'https://example.com'
PARSE_JSONL_URLS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PARSE_JSONL_URLS_ENABLEDEnable JSON Lines URL parsing |
true |
boolean
|
USE_PARSE_JSONL_URLS |
parse_dom_outlinksExtract crawlable links from the rendered DOM and emit them as JSONL records.
Extract crawlable links from the rendered DOM and emit them as JSONL records.
chromium
providers=env,puppeteer
abx-dl --plugins=parse_dom_outlinks 'https://example.com'
PARSE_DOM_OUTLINKS_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
PARSE_DOM_OUTLINKS_ENABLEDEnable DOM outlinks parsing from archived pages |
true |
boolean
|
SAVE_DOM_OUTLINKS, USE_PARSE_DOM_OUTLINKS |
PARSE_DOM_OUTLINKS_TIMEOUTTimeout for DOM outlinks parsing in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
search_backend_sqliteIndex archived snapshot content into a SQLite FTS database for local search.
Index archived snapshot content into a SQLite FTS database for local search.
abx-dl --plugins=search_backend_sqlite 'https://example.com'
archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
SEARCH_BACKEND_ENGINESelected search backend implementation |
"sqlite" |
string
|
— |
USE_INDEXING_BACKENDEnable search indexing for archived snapshots |
true |
boolean
|
— |
SEARCH_BACKEND_SQLITE_DBSQLite FTS database filename |
"search.sqlite3" |
string
|
SQLITEFTS_DB |
SEARCH_BACKEND_SQLITE_SEPARATE_DATABASEUse separate database file for FTS index |
true |
boolean
|
FTS_SEPARATE_DATABASE, SQLITEFTS_SEPARATE_DATABASE |
SEARCH_BACKEND_SQLITE_TOKENIZERSFTS5 tokenizer configuration |
"porter unicode61 remove_diacritics 2" |
string
|
FTS_TOKENIZERS, SQLITEFTS_TOKENIZERS |
search_backend_sonicIndex archived snapshot content into a Sonic search backend.
Index archived snapshot content into a Sonic search backend.
sonic
providers=env,apt,brew,cargo
abx-dl --plugins=search_backend_sonic 'https://example.com'
archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
SEARCH_BACKEND_ENGINESelected search backend implementation |
"sonic" |
string
|
— |
USE_INDEXING_BACKENDEnable search indexing for archived snapshots |
true |
boolean
|
— |
SONIC_BINARYPath to Sonic server binary |
"sonic" |
string
|
— |
SONIC_DIRDirectory used to store the Sonic config, logs, and index data |
"" |
string
|
— |
SEARCH_BACKEND_SONIC_HOST_NAMESonic server hostname |
"127.0.0.1" |
string
|
SEARCH_BACKEND_HOST_NAME, SONIC_HOST |
SEARCH_BACKEND_SONIC_PORTSonic server port |
1491 |
integer
min 1 |
SEARCH_BACKEND_PORT, SONIC_PORT |
SEARCH_BACKEND_SONIC_PASSWORDSonic server password |
"SecretPassword" |
string
|
SEARCH_BACKEND_PASSWORD, SONIC_PASSWORD |
SEARCH_BACKEND_SONIC_COLLECTIONSonic collection name |
"archivebox" |
string
|
SONIC_COLLECTION |
SEARCH_BACKEND_SONIC_BUCKETSonic bucket name |
"snapshots" |
string
|
SONIC_BUCKET |
claudecodecleanupUse Claude Code to deduplicate and clean up redundant snapshot extractor outputs.
Use Claude Code to deduplicate and clean up redundant snapshot extractor outputs.
abx-dl --plugins=claudecodecleanup 'https://example.com'
CLAUDECODECLEANUP_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CLAUDECODE_BINARYPath to Claude Code CLI binary |
"claude" |
string
|
— |
CLAUDECODECLEANUP_ENABLEDEnable Claude Code AI cleanup of snapshot files |
false |
boolean
|
USE_CLAUDECODECLEANUP |
CLAUDECODECLEANUP_TIMEOUTTimeout for Claude Code cleanup in seconds |
180 |
integer
min 10 |
fallback: CLAUDECODE_TIMEOUT |
CLAUDECODECLEANUP_PROMPTCustom prompt for Claude Code cleanup. Defines what Claude should clean up and how to determine which duplicates to keep. |
"Analyze all the extractor output directories in this snapshot. Look for duplicate or redundant outputs across plugins (e.g. multiple HTML extractions, multiple text extractions, multiple URL extraction outputs, etc.). For each group of similar outputs, inspect the content and determine which version is the best quality. Delete the inferior/redundant versions, keeping only the best one. Also remove any unnecessary temporary files, empty directories, or incomplete outputs. Write a summary of what you cleaned up to cleanup_report.txt in your output directory." |
string
|
— |
CLAUDECODECLEANUP_MODELClaude model to use for cleanup (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001) |
"claude-sonnet-4-6" |
string
|
fallback: CLAUDECODE_MODEL |
CLAUDECODECLEANUP_MAX_TURNSMaximum number of agentic turns for cleanup |
50 |
integer
min 1 |
fallback: CLAUDECODE_MAX_TURNS |
hashesGenerate a hash manifest for files produced in the snapshot directory.
Generate a hash manifest for files produced in the snapshot directory.
abx-dl --plugins=hashes 'https://example.com'
HASHES_ENABLED=true archivebox add 'https://example.com'
Runtime plugins execute while archiving a URL.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
HASHES_ENABLEDEnable merkle tree hash generation |
true |
boolean
|
SAVE_HASHES, USE_HASHES |
HASHES_TIMEOUTTimeout for merkle tree generation in seconds |
30 |
integer
min 5 |
fallback: TIMEOUT |
aptInstall binaries through the Debian and Ubuntu APT package manager.
Install binaries through the Debian and Ubuntu APT package manager.
abx-dl plugins --install apt
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
config.json schema.baseProvide shared utilities, helpers, and test support used by other plugins.
Provide shared utilities, helpers, and test support used by other plugins.
abx-dl plugins base
archivebox add 'https://example.com'
Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
DATA_DIRBase data directory for the current runtime |
"" |
string
|
— |
ABX_RUNTIMECurrent host runtime name, eg. abx-dl or archivebox |
"abx-dl" |
string
|
— |
ABX_INSTALL_CACHERuntime-derived install preflight cache keyed by binary name |
{{}} |
object
|
— |
SNAP_DIRBase snapshot directory for per-snapshot hook output |
"" |
string
|
— |
CRAWL_DIRBase crawl directory for per-crawl hook output |
"" |
string
|
— |
LIB_DIRShared tools and binary installation root |
"" |
string
|
— |
PERSONAS_DIRShared Chrome/browser personas root |
"" |
string
|
— |
ACTIVE_PERSONAActive browser persona name |
"Default" |
string
|
— |
EXTRA_CONTEXTJSON object merged into emitted JSONL event records |
"" |
string
|
— |
TIMEOUTDefault timeout in seconds for hooks that support a TIMEOUT fallback |
60 |
integer
min 0 |
— |
USER_AGENTDefault user agent string for HTTP requests and browser automation |
"Mozilla/5.0 (compatible; ArchiveBox/1.0)" |
string
|
— |
PATHExecutable search path |
"" |
string
|
— |
NODE_MODULES_DIRShared Node.js module resolution root |
"" |
string
|
— |
NODE_MODULE_DIRLegacy alias for NODE_MODULES_DIR |
"" |
string
|
— |
NODE_PATHNode.js module lookup path |
"" |
string
|
— |
NODE_V8_COVERAGEOptional V8 coverage output directory for Node.js hooks |
"" |
string
|
— |
CHROME_BINARYResolved Chromium/Chrome binary path shared across plugins |
"" |
string
|
— |
CHROME_USER_DATA_DIRChrome user data directory for persistent browser state (defaults to PERSONAS_DIR/ACTIVE_PERSONA/chrome_profile) |
"" |
string
|
— |
CHROME_DOWNLOADS_DIRChrome downloads directory shared by browser plugins |
"" |
string
|
— |
CHROME_EXTENSIONS_DIRChrome extensions directory shared by browser plugins |
"" |
string
|
— |
bashInstall binaries using arbitrary bash shell commands.
Install binaries using arbitrary bash shell commands.
abx-dl plugins --install bash
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
config.json schema.brewInstall binaries through the Homebrew package manager.
Install binaries through the Homebrew package manager.
abx-dl plugins --install brew
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
config.json schema.cargoInstall binaries through Rust's Cargo package manager.
Install binaries through Rust's Cargo package manager.
abx-dl plugins --install cargo
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
config.json schema.chromewebstoreResolve Chrome Web Store extensions as installable binary-like artifacts.
Resolve Chrome Web Store extensions as installable binary-like artifacts.
node
providers=env,apt,brew
abx-dl plugins --install chromewebstore
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
PERSONAS_DIRShared Chrome/browser personas root |
"" |
string
|
— |
ACTIVE_PERSONAActive browser persona name |
"Default" |
string
|
— |
CHROME_EXTENSIONS_DIRPath to installed Chrome extensions directory |
"" |
string
|
— |
claudecodeRun Claude Code AI agent on snapshots to extract, analyze, or transform archived content.
Run Claude Code AI agent on snapshots to extract, analyze, or transform archived content.
node
providers=env,apt,brew
claude
providers=env,npm
abx-dl plugins claudecode
archivebox add 'https://example.com'
Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
CLAUDECODE_ENABLEDEnable Claude Code AI agent integration. Controls whether the Claude CLI dependency is resolved for this plugin; child plugins still need the claudecode plugin enabled and a working Claude binary. |
false |
boolean
|
USE_CLAUDECODE |
CLAUDECODE_BINARYPath to Claude Code CLI binary |
"claude" |
string
|
— |
CLAUDECODE_TIMEOUTTimeout for Claude Code operations in seconds |
120 |
integer
min 10 |
fallback: TIMEOUT |
ANTHROPIC_API_KEYAnthropic API key for Claude Code authentication |
"" |
string
|
— |
CLAUDECODE_MODELClaude model to use (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001) |
"claude-sonnet-4-6" |
string
|
— |
CLAUDECODE_MAX_TURNSMaximum number of agentic turns per invocation |
50 |
integer
min 1 |
— |
envDiscover binaries that are already available on the system PATH.
Discover binaries that are already available on the system PATH.
abx-dl plugins --install env
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
config.json schema.mediaProvide a shared namespace for media-related plugin outputs and helpers.
Provide a shared namespace for media-related plugin outputs and helpers.
abx-dl plugins media
archivebox add 'https://example.com'
Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.
config.json schema.npmInstall binaries from npm packages and expose Node module paths.
Install binaries from npm packages and expose Node module paths.
node
providers=env,apt,brew
abx-dl plugins --install npm
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
NODE_BINARYPath to Node.js binary |
"node" |
string
|
— |
NPM_BINARYPath to npm binary |
"npm" |
string
|
— |
pipInstall Python-based binaries into a managed virtual environment.
Install Python-based binaries into a managed virtual environment.
python
providers=env
abx-dl plugins --install pip
archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
PIP_VENV_PYTHONPreferred Python interpreter for creating the shared pip virtualenv |
"" |
string
|
— |
puppeteerInstall and manage Chrome for Testing through the Puppeteer toolchain.
Install and manage Chrome for Testing through the Puppeteer toolchain.
puppeteer
providers=npm
abx-dl plugins --install puppeteer
PUPPETEER_ENABLED=true archivebox init --setup
Setup plugins install dependencies or prepare shared runtime state.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
CHROME_BINARYPath to Chromium, Chrome for Testing, or Chrome Canary binary |
"chromium" |
string
|
CHROMIUM_BINARY |
PUPPETEER_ENABLEDEnable Puppeteer dependency installation during crawl setup |
true |
boolean
|
— |
PUPPETEER_TIMEOUTTimeout in seconds for Puppeteer-managed browser dependency installation |
900 |
integer
min 60 |
— |
search_backend_ripgrepSearch archived snapshot files directly with ripgrep instead of maintaining an index.
Search archived snapshot files directly with ripgrep instead of maintaining an index.
rg
providers=env,apt,brew
abx-dl plugins search_backend_ripgrep
archivebox add 'https://example.com'
Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.
| Key | Default | Type | Aliases / Fallback |
|---|---|---|---|
SEARCH_BACKEND_ENGINESelected search backend implementation |
"ripgrep" |
string
|
— |
RIPGREP_BINARYPath to ripgrep binary |
"rg" |
string
|
— |
RIPGREP_TIMEOUTSearch timeout in seconds |
90 |
integer
min 5 |
SEARCH_BACKEND_TIMEOUT fallback: TIMEOUT |
RIPGREP_ARGSDefault ripgrep arguments |
[ "--files-with-matches" "--no-messages" "--ignore-case" |
array
|
RIPGREP_DEFAULT_ARGS |
RIPGREP_ARGS_EXTRAExtra arguments to append to ripgrep command |
[] |
array
|
RIPGREP_EXTRA_ARGS |
sslUtility plugin namespace reserved for SSL-related integration points and metadata.
Utility plugin namespace reserved for SSL-related integration points and metadata.
abx-dl plugins ssl
archivebox add 'https://example.com'
Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.
config.json schema.