ArchiveBox Plugin Gallery

yt-dlp

ytdlp

Download video and audio media with metadata, subtitles, thumbnails, and description sidecars.

#02 Snapshot Embed Fullscreen

yt-dlp (env,pip,brew,apt) node (env,apt,brew) +1 more

audio video image application/x-subrip +3 more

Source on GitHub

Dependencies & Outputs

Required Binaries

yt-dlp providers=env,pip,brew,apt node providers=env,apt,brew ffmpeg providers=env,apt,brew

Output Mimetypes

audio video image application/x-subrip text/vtt application/json text/plain

Run It

abx-dl --plugins=ytdlp 'https://example.com'

YTDLP_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__02_ytdlp.finite.bg.py

order 02 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`YTDLP_ENABLED`Enable video/audio downloading with yt-dlp	true	`boolean`	MEDIA_ENABLED, SAVE_MEDIA, USE_MEDIA, USE_YTDLP, FETCH_MEDIA, SAVE_YTDLP
`YTDLP_BINARY`Path to yt-dlp binary	"yt-dlp"	`string`	YOUTUBEDL_BINARY, YOUTUBE_DL_BINARY
`NODE_BINARY`Path to Node.js binary for yt-dlp JS runtime	"node"	`string`	—
`YTDLP_TIMEOUT`Timeout for yt-dlp downloads in seconds	3600	`integer` min 30	MEDIA_TIMEOUT fallback: `TIMEOUT`
`YTDLP_COOKIES_FILE`Path to cookies file	""	`string`	fallback: `COOKIES_FILE`
`YTDLP_MAX_SIZE`Maximum file size for yt-dlp downloads	"750m"	`string` pattern `^\d+[kmgKMG]?$`	MEDIA_MAX_SIZE
`YTDLP_CHECK_SSL_VALIDITY`Whether to verify SSL certificates	true	`boolean`	fallback: `CHECK_SSL_VALIDITY`
`YTDLP_ARGS`Default yt-dlp arguments	[ "--restrict-filenames" "--trim-filenames=128" "--write-description" "--write-info-json" "--write-thumbnail" "--write-sub" "--write-auto-subs" "--convert-subs=srt" "--yes-playlist" "--continue" "--no-abort-on-error" "--ignore-errors" "--geo-bypass" "--add-metadata" "--no-progress" "--remote-components=ejs:github" "-o" "%(title)s.%(ext)s" ]	`array`	YTDLP_DEFAULT_ARGS
`YTDLP_ARGS_EXTRA`Extra arguments to append to yt-dlp command	[]	`array`	YTDLP_EXTRA_ARGS

gallery-dl

gallerydl

Download image and media galleries along with metadata sidecars from supported sites.

#03 Snapshot Embed Fullscreen

gallery-dl (env,pip,brew,apt)

image video application/json text/plain +1 more

Source on GitHub

Dependencies & Outputs

Required Binaries

gallery-dl providers=env,pip,brew,apt

Output Mimetypes

image video application/json text/plain application/zip

Run It

abx-dl --plugins=gallerydl 'https://example.com'

GALLERYDL_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__03_gallerydl.finite.bg.py

order 03 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`GALLERYDL_ENABLED`Enable gallery downloading with gallery-dl	true	`boolean`	SAVE_GALLERYDL, USE_GALLERYDL
`GALLERYDL_BINARY`Path to gallery-dl binary	"gallery-dl"	`string`	—
`GALLERYDL_TIMEOUT`Timeout for gallery downloads in seconds	3600	`integer` min 30	fallback: `TIMEOUT`
`GALLERYDL_COOKIES_FILE`Path to cookies file	""	`string`	fallback: `COOKIES_FILE`
`GALLERYDL_CHECK_SSL_VALIDITY`Whether to verify SSL certificates	true	`boolean`	fallback: `CHECK_SSL_VALIDITY`
`GALLERYDL_ARGS`Default gallery-dl arguments	[ "--write-metadata" "--write-info-json" ]	`array`	GALLERYDL_DEFAULT_ARGS
`GALLERYDL_ARGS_EXTRA`Extra arguments to append to gallery-dl command	[]	`array`	GALLERYDL_EXTRA_ARGS

forum-dl

forumdl

Download forum threads and exports in JSONL, WARC, and mailbox-style archive formats.

#04 Snapshot Embed Fullscreen

forum-dl (env,pip)

application/x-ndjson application/warc message/rfc822

Source on GitHub

Dependencies & Outputs

Required Binaries

forum-dl providers=env,pip

Output Mimetypes

application/x-ndjson application/warc message/rfc822

Run It

abx-dl --plugins=forumdl 'https://example.com'

FORUMDL_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__04_forumdl.finite.bg.py

order 04 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`FORUMDL_ENABLED`Enable forum downloading with forum-dl	true	`boolean`	SAVE_FORUMDL, USE_FORUMDL
`FORUMDL_BINARY`Path to forum-dl binary	"forum-dl"	`string`	—
`FORUMDL_TIMEOUT`Timeout for forum downloads in seconds	3600	`integer` min 30	fallback: `TIMEOUT`
`FORUMDL_OUTPUT_FORMAT`Output format for forum downloads	"jsonl"	`string` jsonl \| warc \| mbox \| maildir \| mh \| mmdf \| babyl	—
`FORUMDL_ARGS`Default forum-dl arguments	[]	`array`	FORUMDL_DEFAULT_ARGS
`FORUMDL_ARGS_EXTRA`Extra arguments to append to forum-dl command	[]	`array`	FORUMDL_EXTRA_ARGS

Git

git

Clone git repositories from supported repository URLs into the snapshot output directory.

#05 Snapshot Embed

git (env,apt,brew)

text application image audio +2 more

Source on GitHub

Dependencies & Outputs

Required Binaries

git providers=env,apt,brew

Output Mimetypes

text application image audio video font

Run It

abx-dl --plugins=git 'https://example.com'

GIT_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__05_git.finite.bg.py

order 05 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`GIT_ENABLED`Enable git repository cloning	true	`boolean`	SAVE_GIT, USE_GIT
`GIT_BINARY`Path to git binary	"git"	`string`	—
`GIT_TIMEOUT`Timeout for git operations in seconds	120	`integer` min 10	fallback: `TIMEOUT`
`GIT_DOMAINS`Comma-separated list of domains to treat as git repositories	"github.com,gitlab.com,bitbucket.org,gist.github.com,codeberg.org,gitea.com,git.sr.ht"	`string`	—
`GIT_ARGS`Default git arguments	[ "clone" "--depth=1" "--recursive" ]	`array`	GIT_DEFAULT_ARGS
`GIT_ARGS_EXTRA`Extra arguments to append to git command	[]	`array`	GIT_EXTRA_ARGS

wget

wget

Archive pages and their requisites with wget, optionally writing WARC captures.

#06 Snapshot Embed

wget (env,apt,brew)

text/html application/warc application/gzip image +5 more

Source on GitHub

Dependencies & Outputs

Required Binaries

wget providers=env,apt,brew

Output Mimetypes

text/html application/warc application/gzip image text/css application/javascript font audio video

Run It

abx-dl --plugins=wget 'https://example.com'

WGET_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__06_wget.finite.bg.py

order 06 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`WGET_ENABLED`Enable wget archiving	true	`boolean`	SAVE_WGET, USE_WGET
`WGET_WARC_ENABLED`Save WARC archive file	true	`boolean`	SAVE_WARC, WGET_SAVE_WARC
`WGET_BINARY`Path to wget binary	"wget"	`string`	—
`WGET_TIMEOUT`Timeout for wget in seconds	60	`integer` min 5	fallback: `TIMEOUT`
`WGET_USER_AGENT`User agent string for wget	""	`string`	fallback: `USER_AGENT`
`WGET_COOKIES_FILE`Path to cookies file	""	`string`	fallback: `COOKIES_FILE`
`WGET_CHECK_SSL_VALIDITY`Whether to verify SSL certificates	true	`boolean`	fallback: `CHECK_SSL_VALIDITY`
`WGET_ARGS`Default wget arguments	[ "--no-verbose" "--adjust-extension" "--convert-links" "--force-directories" "--backup-converted" "--span-hosts" "--no-parent" "--page-requisites" "--restrict-file-names=windows" "--tries=2" "-e" "robots=off" ]	`array`	WGET_DEFAULT_ARGS
`WGET_ARGS_EXTRA`Extra arguments to append to wget command	[]	`array`	WGET_EXTRA_ARGS

Archive.org

archivedotorg

Submit URLs to the Internet Archive Wayback Machine and save the resulting archive link.

#08 Snapshot Embed

text/plain

Source on GitHub

Dependencies & Outputs

Output Mimetypes

text/plain

Run It

abx-dl --plugins=archivedotorg 'https://example.com'

ARCHIVEDOTORG_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__08_archivedotorg.finite.bg.py

order 08 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`ARCHIVEDOTORG_ENABLED`Submit URLs to archive.org Wayback Machine	true	`boolean`	SAVE_ARCHIVEDOTORG, USE_ARCHIVEDOTORG, SUBMIT_ARCHIVEDOTORG
`ARCHIVEDOTORG_TIMEOUT`Timeout for archive.org submission in seconds	60	`integer` min 10	fallback: `TIMEOUT`
`ARCHIVEDOTORG_USER_AGENT`User agent string	""	`string`	fallback: `USER_AGENT`

Favicon

favicon

Fetch and save the site favicon or touch icon.

#11 Snapshot Embed

image

Source on GitHub

Dependencies & Outputs

Output Mimetypes

image

Run It

abx-dl --plugins=favicon 'https://example.com'

FAVICON_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__11_favicon.finite.bg.py

order 11 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`FAVICON_ENABLED`Enable favicon downloading	true	`boolean`	SAVE_FAVICON, USE_FAVICON
`FAVICON_TIMEOUT`Timeout for favicon fetch in seconds	30	`integer` min 5	fallback: `TIMEOUT`
`FAVICON_USER_AGENT`User agent string	""	`string`	fallback: `USER_AGENT`
`FAVICON_PROVIDER`Fallback favicon provider URL template. Supports either {} or {domain} placeholders.	"https://www.google.com/s2/favicons?domain={}&format=ico"	`string`	—

uBlock Origin Lite

ublock

Install the uBlock Origin Lite extension to block ads, trackers, and other page clutter during archiving.

#12 Snapshot

chromium (env,puppeteer) ublock (chromewebstore)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer ublock providers=chromewebstore

Run It

abx-dl --plugins=ublock 'https://example.com'

UBLOCK_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__12_ublock.daemon.bg.js

order 12 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`UBLOCK_ENABLED`Enable uBlock Origin Lite browser extension for ad blocking	true	`boolean`	USE_UBLOCK

I Still Don't Care About Cookies

istilldontcareaboutcookies

Install the I Still Don't Care About Cookies extension to dismiss cookie banners during archiving.

#13 Snapshot

chromium (env,puppeteer) istilldontcareaboutcookies (chromewebstore)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer istilldontcareaboutcookies providers=chromewebstore

Run It

abx-dl --plugins=istilldontcareaboutcookies 'https://example.com'

ISTILLDONTCAREABOUTCOOKIES_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__13_istilldontcareaboutcookies.daemon.bg.js

order 13 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`ISTILLDONTCAREABOUTCOOKIES_ENABLED`Enable I Still Don't Care About Cookies browser extension	true	`boolean`	USE_ISTILLDONTCAREABOUTCOOKIES

2Captcha

twocaptcha

Install and configure the 2Captcha extension to solve CAPTCHAs during browser-based archiving.

#14 Snapshot

chromium (env,puppeteer) twocaptcha (chromewebstore)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer twocaptcha providers=chromewebstore

Run It

abx-dl --plugins=twocaptcha 'https://example.com'

TWOCAPTCHA_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_CrawlSetup__95_twocaptcha_config.js

order 95 foreground CrawlSetup

on_Snapshot__14_twocaptcha.daemon.bg.js

order 14 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`TWOCAPTCHA_ENABLED`Enable 2captcha browser extension for automatic CAPTCHA solving	true	`boolean`	CAPTCHA2_ENABLED, USE_CAPTCHA2, USE_TWOCAPTCHA
`TWOCAPTCHA_API_KEY`2captcha API key for CAPTCHA solving service (get from https://2captcha.com)	""	`string`	API_KEY_2CAPTCHA, CAPTCHA2_API_KEY
`TWOCAPTCHA_RETRY_COUNT`Number of times to retry CAPTCHA solving on error	3	`integer` min 0	CAPTCHA2_RETRY_COUNT
`TWOCAPTCHA_RETRY_DELAY`Delay in seconds between CAPTCHA solving retries	5	`integer` min 0	CAPTCHA2_RETRY_DELAY
`TWOCAPTCHA_TIMEOUT`Timeout for CAPTCHA solving in seconds	60	`integer` min 5	CAPTCHA2_TIMEOUT fallback: `TIMEOUT`
`TWOCAPTCHA_AUTO_SUBMIT`Automatically submit forms after CAPTCHA is solved	false	`boolean`	—

Modal Closer

modalcloser

Automatically dismiss dialogs, cookie banners, and framework modals while the page is being archived.

#15 Snapshot

chromium (env,puppeteer)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Run It

abx-dl --plugins=modalcloser 'https://example.com'

MODALCLOSER_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__15_modalcloser.daemon.bg.js

order 15 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`MODALCLOSER_ENABLED`Enable automatic modal and dialog closing	true	`boolean`	CLOSE_MODALS, AUTO_CLOSE_MODALS
`MODALCLOSER_TIMEOUT`Delay before auto-closing dialogs (ms)	1250	`integer` min 100	—
`MODALCLOSER_POLL_INTERVAL`How often to check for CSS modals (ms)	500	`integer` min 100	—

Console Log

consolelog

Capture browser console messages emitted while the page loads.

#21 Snapshot

chromium (env,puppeteer)

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=consolelog 'https://example.com'

CONSOLELOG_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__21_consolelog.daemon.bg.js

order 21 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`CONSOLELOG_ENABLED`Enable console log capture	true	`boolean`	SAVE_CONSOLELOG, USE_CONSOLELOG
`CONSOLELOG_TIMEOUT`Timeout for console log capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

DNS

dns

Record DNS activity observed while loading the page in Chrome.

#22 Snapshot

chromium (env,puppeteer)

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=dns 'https://example.com'

DNS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__22_dns.daemon.bg.js

order 22 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`DNS_ENABLED`Enable DNS traffic recording during page load	true	`boolean`	SAVE_DNS, USE_DNS
`DNS_TIMEOUT`Timeout for DNS recording in seconds	30	`integer` min 5	fallback: `TIMEOUT`

SSL Certificates

sslcerts

Capture TLS certificate and connection metadata for the loaded page.

#23 Snapshot

chromium (env,puppeteer)

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=sslcerts 'https://example.com'

SSLCERTS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__23_sslcerts.daemon.bg.js

order 23 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`SSLCERTS_ENABLED`Enable SSL certificate capture	true	`boolean`	SAVE_SSLCERTS, USE_SSLCERTS
`SSLCERTS_TIMEOUT`Timeout for SSL capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Responses

responses

Capture HTTP response metadata for requests made during page load.

#24 Snapshot

chromium (env,puppeteer)

application/x-ndjson text image audio +3 more

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson text image audio video application font

Run It

abx-dl --plugins=responses 'https://example.com'

RESPONSES_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__24_responses.daemon.bg.js

order 24 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`RESPONSES_ENABLED`Enable HTTP response capture	true	`boolean`	SAVE_RESPONSES, USE_RESPONSES
`RESPONSES_TIMEOUT`Timeout for response capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Redirects

redirects

Capture the redirect chain encountered while loading the page.

#25 Snapshot

chromium (env,puppeteer)

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=redirects 'https://example.com'

REDIRECTS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__25_redirects.daemon.bg.js

order 25 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`REDIRECTS_ENABLED`Enable redirect chain capture	true	`boolean`	SAVE_REDIRECTS, USE_REDIRECTS
`REDIRECTS_TIMEOUT`Timeout for redirect capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Static File

staticfile

Detect and download static-file responses directly when a URL resolves to a non-HTML asset.

#26 Snapshot Embed

chromium (env,puppeteer)

application/pdf application/epub+zip image audio +8 more

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/pdf application/epub+zip image audio video application/json application/xml text/csv text/xml application/zip application/octet-stream application/x-

Run It

abx-dl --plugins=staticfile 'https://example.com'

STATICFILE_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__26_staticfile.daemon.bg.js

order 26 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`STATICFILE_ENABLED`Enable static file detection	true	`boolean`	SAVE_STATICFILE, USE_STATICFILE
`STATICFILE_TIMEOUT`Timeout for static file detection in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Headers

headers

Capture HTTP headers for the main document response.

#27 Snapshot

chromium (env,puppeteer)

application/json

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/json

Run It

abx-dl --plugins=headers 'https://example.com'

HEADERS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__27_headers.daemon.bg.js

order 27 background (daemon) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`HEADERS_ENABLED`Enable HTTP headers capture	true	`boolean`	SAVE_HEADERS, USE_HEADERS
`HEADERS_TIMEOUT`Timeout for headers capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Chrome

chrome

Launch and manage a shared Chromium-compatible CDP browser session for browser-driven plugins.

#30 Snapshot

node (env,apt,brew) chromium (env,puppeteer)

text/plain application/json

Source on GitHub

Dependencies & Outputs

Required Plugins

puppeteer

Required Binaries

node providers=env,apt,brew chromium providers=env,puppeteer

Output Mimetypes

text/plain application/json

Run It

abx-dl --plugins=chrome 'https://example.com'

CHROME_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_CrawlSetup__89_chrome_kill_zombies.js

order 89 foreground CrawlSetup

on_CrawlSetup__90_chrome_launch.daemon.bg.js

order 90 background (daemon) CrawlSetup

on_CrawlSetup__91_chrome_wait.js

order 91 foreground CrawlSetup

on_Snapshot__09_chrome_launch.daemon.bg.js

order 09 background (daemon) Snapshot

on_Snapshot__10_chrome_tab.daemon.bg.js

order 10 background (daemon) Snapshot

on_Snapshot__11_chrome_wait.js

order 11 foreground Snapshot

on_Snapshot__30_chrome_navigate.js

order 30 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_ENABLED`Enable Chrome browser integration for archiving	true	`boolean`	USE_CHROME
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`CHROME_TIMEOUT`Timeout for Chrome operations in seconds	60	`integer` min 5	fallback: `TIMEOUT`
`CHROME_LAUNCH_ATTEMPTS`Maximum Chrome launch attempts before failing on transient startup errors	3	`integer` min 1	—
`CHROME_HEADLESS`Run Chrome in headless mode	true	`boolean`	—
`PERSONAS_DIR`Shared Chrome/browser personas root	""	`string`	—
`ACTIVE_PERSONA`Active browser persona name	"Default"	`string`	—
`CHROME_SANDBOX`Enable Chrome sandbox (disable in Docker with --no-sandbox)	true	`boolean`	—
`CHROME_RESOLUTION`Browser viewport resolution (width,height)	"1440,2000"	`string` pattern `^\d+,\d+$`	fallback: `RESOLUTION`
`CHROME_USER_DATA_DIR`Path to Chrome user data directory for persistent sessions (defaults to PERSONAS_DIR/ACTIVE_PERSONA/chrome_profile)	""	`string`	—
`CHROME_USER_AGENT`User agent string for Chrome	""	`string`	fallback: `USER_AGENT`
`CHROME_CDP_URL`Connect to an already-running browser over CDP instead of launching a new local Chrome process	""	`string`	—
`CHROME_IS_LOCAL`Whether the managed browser process is local and should have a live chrome.pid marker	true	`boolean`	—
`CHROME_KEEPALIVE`Keep the browser alive after the owning crawl/snapshot hook exits instead of closing it during cleanup	false	`boolean`	—
`CHROME_ISOLATION`Whether Chrome runs as one shared browser per crawl or a separate browser per snapshot	"crawl"	`string` crawl \| snapshot	—
`CHROME_ARGS`Default Chrome command-line arguments (static flags only, dynamic args like --user-data-dir are added at runtime)	[ "--no-first-run" "--no-default-browser-check" "--disable-default-apps" "--disable-sync" "--disable-infobars" "--disable-blink-features=AutomationControlled" "--disable-component-update" "--disable-domain-reliability" "--disable-breakpad" "--disable-client-side-phishing-detection" "--disable-hang-monitor" "--disable-speech-synthesis-api" "--disable-speech-api" "--disable-print-preview" "--disable-notifications" "--disable-desktop-notifications" "--disable-popup-blocking" "--disable-prompt-on-repost" "--disable-external-intent-requests" "--disable-session-crashed-bubble" "--disable-search-engine-choice-screen" "--disable-datasaver-prompt" "--ash-no-nudges" "--hide-crash-restore-bubble" "--suppress-message-center-popups" "--noerrdialogs" "--no-pings" "--silent-debugger-extension-api" "--deny-permission-prompts" "--safebrowsing-disable-auto-update" "--metrics-recording-only" "--password-store=basic" "--use-mock-keychain" "--disable-cookie-encryption" "--font-render-hinting=none" "--force-color-profile=srgb" "--disable-partial-raster" "--disable-skia-runtime-opts" "--disable-2d-canvas-clip-aa" "--enable-webgl" "--hide-scrollbars" "--export-tagged-pdf" "--generate-pdf-document-outline" "--disable-lazy-loading" "--disable-renderer-backgrounding" "--disable-background-networking" "--disable-background-timer-throttling" "--disable-backgrounding-occluded-windows" "--disable-ipc-flooding-protection" "--disable-extensions-http-throttling" "--disable-field-trial-config" "--disable-back-forward-cache" "--autoplay-policy=no-user-gesture-required" "--disable-gesture-requirement-for-media-playback" "--lang=en-US,en;q=0.9" "--log-level=2" "--enable-logging=stderr" ]	`array`	CHROME_DEFAULT_ARGS
`CHROME_ARGS_EXTRA`Extra arguments to append to Chrome command (for user customization)	[]	`array`	CHROME_EXTRA_ARGS
`CHROME_PAGELOAD_TIMEOUT`Timeout for page navigation/load in seconds	60	`integer` min 5	fallback: `CHROME_TIMEOUT`
`CHROME_WAIT_FOR`Page load completion condition (domcontentloaded, load, networkidle0, networkidle2)	"load"	`string` domcontentloaded \| load \| networkidle0 \| networkidle2	—
`CHROME_DELAY_AFTER_LOAD`Extra delay in seconds after page load completes before archiving (useful for JS-heavy SPAs)	0	`number` min 0	—
`CHROME_CHECK_SSL_VALIDITY`Whether to verify SSL certificates (disable for self-signed certs)	true	`boolean`	fallback: `CHECK_SSL_VALIDITY`

SEO

seo

Capture SEO-related metadata such as meta tags and Open Graph fields.

#38 Snapshot

chromium (env,puppeteer)

application/json

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/json

Run It

abx-dl --plugins=seo 'https://example.com'

SEO_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__38_seo.js

order 38 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`SEO_ENABLED`Enable SEO metadata capture	true	`boolean`	SAVE_SEO, USE_SEO
`SEO_TIMEOUT`Timeout for SEO capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Accessibility

accessibility

Capture the browser accessibility tree for the archived page.

#39 Snapshot

chromium (env,puppeteer)

application/json

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/json

Run It

abx-dl --plugins=accessibility 'https://example.com'

ACCESSIBILITY_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__39_accessibility.js

order 39 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`ACCESSIBILITY_ENABLED`Enable accessibility tree capture	true	`boolean`	SAVE_ACCESSIBILITY, USE_ACCESSIBILITY
`ACCESSIBILITY_TIMEOUT`Timeout for accessibility capture in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Infinite Scroll

infiniscroll

Expand infinite-scroll pages and load additional content before downstream capture plugins run.

#45 Snapshot

chromium (env,puppeteer)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Run It

abx-dl --plugins=infiniscroll 'https://example.com'

INFINISCROLL_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__45_infiniscroll.js

order 45 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`INFINISCROLL_ENABLED`Enable infinite scroll page expansion	true	`boolean`	SAVE_INFINISCROLL, USE_INFINISCROLL
`INFINISCROLL_TIMEOUT`Maximum timeout for scrolling in seconds	120	`integer` min 10	—
`INFINISCROLL_SCROLL_DELAY`Delay between scrolls in milliseconds	2000	`integer` min 500	—
`INFINISCROLL_SCROLL_DISTANCE`Distance to scroll per step in pixels	1600	`integer` min 100	—
`INFINISCROLL_SCROLL_LIMIT`Maximum number of scroll steps	10	`integer` min 1	—
`INFINISCROLL_MIN_HEIGHT`Minimum page height to scroll to in pixels	16000	`integer` min 1000	—
`INFINISCROLL_EXPAND_DETAILS`Expand <details> elements and click 'load more' buttons for comments	true	`boolean`	—

Claude Chrome

claudechrome

Use Claude computer-use to interact with pages in Chrome via CDP screenshots and the Anthropic API.

#47 Snapshot Embed Fullscreen

chromium (env,puppeteer) node (env,apt,brew) +1 more

application/json image/png

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer node providers=env,apt,brew claudechrome providers=chromewebstore

Output Mimetypes

application/json image/png

Run It

abx-dl --plugins=claudechrome 'https://example.com'

CLAUDECHROME_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_CrawlSetup__96_claudechrome_config.js

order 96 foreground CrawlSetup

on_Snapshot__47_claudechrome.js

order 47 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`PERSONAS_DIR`Shared Chrome/browser personas root	""	`string`	—
`ACTIVE_PERSONA`Active browser persona name	"Default"	`string`	—
`CLAUDECHROME_ENABLED`Enable Claude for Chrome browser extension for AI-driven page interaction	false	`boolean`	USE_CLAUDECHROME
`CLAUDECHROME_PROMPT`Prompt for Claude to execute on the page. Claude can click buttons, fill forms, download files, and interact with any page element.	"Look at the current page. If there are any "expand", "show more", "load more", or similar buttons/links, click them all to reveal hidden content. Report what you did."	`string`	—
`CLAUDECHROME_TIMEOUT`Timeout for Claude for Chrome operations in seconds	120	`integer` min 10	fallback: `TIMEOUT`
`CLAUDECHROME_MODEL`Claude model to use (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001). Availability depends on your plan.	"claude-sonnet-4-6"	`string`	—
`CLAUDECHROME_MAX_ACTIONS`Maximum number of agentic loop iterations (screenshots + actions) per page	15	`integer` min 1	—
`ANTHROPIC_API_KEY`Anthropic API key for Claude for Chrome authentication	""	`string`	—

SingleFile

singlefile

Save a complete page as a single self-contained HTML file using the SingleFile extension or CLI.

#50 Snapshot Embed

chromium (env,puppeteer) single-file (env,npm) +1 more

text/html

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer single-file providers=env,npm singlefile providers=chromewebstore

Output Mimetypes

text/html

Run It

abx-dl --plugins=singlefile 'https://example.com'

SINGLEFILE_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__50_singlefile.py

order 50 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`PERSONAS_DIR`Shared Chrome/browser personas root	""	`string`	—
`ACTIVE_PERSONA`Active browser persona name	"Default"	`string`	—
`SINGLEFILE_ENABLED`Enable SingleFile archiving	true	`boolean`	SAVE_SINGLEFILE, USE_SINGLEFILE
`SINGLEFILE_BINARY`Path to single-file binary	"single-file"	`string`	SINGLE_FILE_BINARY
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`SINGLEFILE_TIMEOUT`Timeout for SingleFile in seconds	60	`integer` min 10	fallback: `TIMEOUT`
`SINGLEFILE_USER_AGENT`User agent string	""	`string`	fallback: `USER_AGENT`
`SINGLEFILE_COOKIES_FILE`Path to cookies file	""	`string`	fallback: `COOKIES_FILE`
`SINGLEFILE_CHECK_SSL_VALIDITY`Whether to verify SSL certificates	true	`boolean`	fallback: `CHECK_SSL_VALIDITY`
`SINGLEFILE_CHROME_ARGS`Chrome command-line arguments for SingleFile	[]	`array`	fallback: `CHROME_ARGS`
`SINGLEFILE_ARGS`Default single-file arguments	[ "--browser-headless" ]	`array`	SINGLEFILE_DEFAULT_ARGS
`SINGLEFILE_ARGS_EXTRA`Extra arguments to append to single-file command	[]	`array`	SINGLEFILE_EXTRA_ARGS

Screenshot

screenshot

Capture a PNG screenshot of the rendered page.

#51 Snapshot Embed Fullscreen

chromium (env,puppeteer)

image/png

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

image/png

Run It

abx-dl --plugins=screenshot 'https://example.com'

SCREENSHOT_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__51_screenshot.js

order 51 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`SCREENSHOT_ENABLED`Enable screenshot capture	true	`boolean`	SAVE_SCREENSHOT, USE_SCREENSHOT
`SCREENSHOT_TIMEOUT`Timeout for screenshot capture in seconds	60	`integer` min 5	fallback: `TIMEOUT`
`SCREENSHOT_RESOLUTION`Screenshot resolution (width,height)	"1440,2000"	`string` pattern `^\d+,\d+$`	fallback: `RESOLUTION`

PDF

pdf

Render the current page to PDF using the shared Chrome session.

#52 Snapshot Embed Fullscreen

chromium (env,puppeteer)

application/pdf

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/pdf

Run It

abx-dl --plugins=pdf 'https://example.com'

PDF_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__52_pdf.js

order 52 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`PDF_ENABLED`Enable PDF generation	true	`boolean`	SAVE_PDF, USE_PDF
`PDF_TIMEOUT`Timeout for PDF generation in seconds	60	`integer` min 5	fallback: `TIMEOUT`
`PDF_RESOLUTION`PDF page resolution (width,height)	"1440,2000"	`string` pattern `^\d+,\d+$`	fallback: `RESOLUTION`

DOM

dom

Save the fully rendered DOM HTML from the live page.

#53 Snapshot Embed

chromium (env,puppeteer)

text/html

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

text/html

Run It

abx-dl --plugins=dom 'https://example.com'

DOM_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__53_dom.js

order 53 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`DOM_ENABLED`Enable DOM capture	true	`boolean`	SAVE_DOM, USE_DOM
`DOM_TIMEOUT`Timeout for DOM capture in seconds	60	`integer` min 5	fallback: `TIMEOUT`

Title

title

Capture the final document title from the rendered page.

#54 Snapshot

chromium (env,puppeteer)

text/plain

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

text/plain

Run It

abx-dl --plugins=title 'https://example.com'

TITLE_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__54_title.js

order 54 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`TITLE_ENABLED`Enable title extraction	true	`boolean`	SAVE_TITLE, USE_TITLE
`TITLE_TIMEOUT`Timeout for title extraction in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Readability

readability

Extract article HTML, text, and metadata using Mozilla Readability.

#56 Snapshot Embed Fullscreen

readability-extractor (env,npm)

text/html text/plain application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

readability-extractor providers=env,npm

Output Mimetypes

text/html text/plain application/json

Run It

abx-dl --plugins=readability 'https://example.com'

READABILITY_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__56_readability.py

order 56 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`READABILITY_ENABLED`Enable Readability text extraction	true	`boolean`	SAVE_READABILITY, USE_READABILITY
`READABILITY_BINARY`Path to readability-extractor binary	"readability-extractor"	`string`	—
`READABILITY_TIMEOUT`Timeout for Readability in seconds	30	`integer` min 5	fallback: `TIMEOUT`
`READABILITY_ARGS`Default Readability arguments	[]	`array`	READABILITY_DEFAULT_ARGS
`READABILITY_ARGS_EXTRA`Extra arguments to append to Readability command	[]	`array`	READABILITY_EXTRA_ARGS

Defuddle

defuddle

Extract cleaned article HTML, text, and metadata from archived HTML using Defuddle.

#57 Snapshot

defuddle (env,npm)

text/html text/plain application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

defuddle providers=env,npm

Output Mimetypes

text/html text/plain application/json

Run It

abx-dl --plugins=defuddle 'https://example.com'

DEFUDDLE_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__57_defuddle.py

order 57 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`DEFUDDLE_ENABLED`Enable Defuddle text extraction	true	`boolean`	SAVE_DEFUDDLE, USE_DEFUDDLE
`DEFUDDLE_BINARY`Path to defuddle binary	"defuddle"	`string`	—
`DEFUDDLE_TIMEOUT`Timeout for Defuddle in seconds	30	`integer` min 5	fallback: `TIMEOUT`
`DEFUDDLE_ARGS`Default Defuddle arguments	[]	`array`	DEFUDDLE_DEFAULT_ARGS
`DEFUDDLE_ARGS_EXTRA`Extra arguments to append to Defuddle command	[]	`array`	DEFUDDLE_EXTRA_ARGS

Mercury

mercury

Extract article HTML, text, and metadata using the Postlight Mercury parser.

#57 Snapshot Embed

postlight-parser (npm,env)

text/html text/plain application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

postlight-parser providers=npm,env

Output Mimetypes

text/html text/plain application/json

Run It

abx-dl --plugins=mercury 'https://example.com'

MERCURY_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__57_mercury.py

order 57 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`MERCURY_ENABLED`Enable Mercury text extraction	true	`boolean`	SAVE_MERCURY, USE_MERCURY
`MERCURY_BINARY`Path to Mercury/Postlight parser binary	"postlight-parser"	`string`	—
`MERCURY_TIMEOUT`Timeout for Mercury in seconds	30	`integer` min 5	fallback: `TIMEOUT`
`MERCURY_ARGS`Default Mercury parser arguments	[]	`array`	MERCURY_DEFAULT_ARGS
`MERCURY_ARGS_EXTRA`Extra arguments to append to Mercury parser command	[]	`array`	MERCURY_EXTRA_ARGS

Claude Code Extract

claudecodeextract

Use Claude Code to generate clean Markdown from snapshot extractor outputs.

#58 Snapshot Embed Fullscreen

claude (env,npm)

text/markdown

Source on GitHub

Dependencies & Outputs

Required Plugins

claudecode

Required Binaries

claude providers=env,npm

Output Mimetypes

text/markdown

Run It

abx-dl --plugins=claudecodeextract 'https://example.com'

CLAUDECODEEXTRACT_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__58_claudecodeextract.py

order 58 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CLAUDECODE_BINARY`Path to Claude Code CLI binary	"claude"	`string`	—
`CLAUDECODEEXTRACT_ENABLED`Enable Claude Code AI extraction	false	`boolean`	USE_CLAUDECODEEXTRACT
`CLAUDECODEEXTRACT_TIMEOUT`Timeout for Claude Code extraction in seconds	120	`integer` min 10	fallback: `CLAUDECODE_TIMEOUT`
`CLAUDECODEEXTRACT_PROMPT`Custom prompt for Claude Code extraction. Use this to define what Claude should extract or generate from the snapshot.	"Read all the previously extracted outputs in this snapshot directory (readability/, mercury/, defuddle/, htmltotext/, dom/, singlefile/, etc.). Using the best available source, generate a clean, well-formatted Markdown representation of the page content. Save the output as content.md in your output directory."	`string`	—
`CLAUDECODEEXTRACT_MODEL`Claude model to use for extraction (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001)	"claude-sonnet-4-6"	`string`	fallback: `CLAUDECODE_MODEL`
`CLAUDECODEEXTRACT_MAX_TURNS`Maximum number of agentic turns for extraction	50	`integer` min 1	fallback: `CLAUDECODE_MAX_TURNS`

HTML to Text

htmltotext

Convert archived HTML from other extractors into plain text for indexing and analysis.

#58 Snapshot

text/plain

Source on GitHub

Dependencies & Outputs

Output Mimetypes

text/plain

Run It

abx-dl --plugins=htmltotext 'https://example.com'

HTMLTOTEXT_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__58_htmltotext.py

order 58 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`HTMLTOTEXT_ENABLED`Enable HTML to text conversion	true	`boolean`	SAVE_HTMLTOTEXT, USE_HTMLTOTEXT
`HTMLTOTEXT_TIMEOUT`Timeout for HTML to text conversion in seconds	30	`integer` min 5	fallback: `TIMEOUT`

Trafilatura

trafilatura

Extract article content from archived HTML into text, markdown, HTML, CSV, JSON, and XML formats.

#59 Snapshot

trafilatura (env,pip)

text/plain text/markdown text/html text/csv +3 more

Source on GitHub

Dependencies & Outputs

Required Binaries

trafilatura providers=env,pip

Output Mimetypes

text/plain text/markdown text/html text/csv application/json application/xml application/tei+xml

Run It

abx-dl --plugins=trafilatura 'https://example.com'

TRAFILATURA_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__59_trafilatura.py

order 59 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`TRAFILATURA_ENABLED`Enable Trafilatura extraction	true	`boolean`	SAVE_TRAFILATURA, USE_TRAFILATURA
`TRAFILATURA_BINARY`Path to trafilatura binary	"trafilatura"	`string`	—
`TRAFILATURA_TIMEOUT`Timeout for Trafilatura in seconds	30	`integer` min 5	fallback: `TIMEOUT`
`TRAFILATURA_OUTPUT_FORMATS`Comma-separated trafilatura output formats to write (txt, markdown, html, csv, json, xml, xmltei)	"txt,markdown,html"	`string`	—

OpenDataLoader

opendataloader

Extract structured text, tables, and metadata from PDFs using opendataloader-pdf. Supports OCR for scanned PDFs via hybrid backend.

#60 Snapshot

opendataloader-pdf (env,pip) java>=11.0.0 (env,apt,brew)

text/plain text/markdown application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

opendataloader-pdf providers=env,pip java providers=env,apt,brew min_version=11.0.0

Output Mimetypes

text/plain text/markdown application/json

Run It

abx-dl --plugins=opendataloader 'https://example.com'

OPENDATALOADER_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__60_opendataloader.py

order 60 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`OPENDATALOADER_ENABLED`Enable PDF text extraction with opendataloader-pdf	true	`boolean`	SAVE_OPENDATALOADER, USE_OPENDATALOADER
`OPENDATALOADER_BINARY`Path to opendataloader-pdf binary	"opendataloader-pdf"	`string`	—
`OPENDATALOADER_JAVA_BINARY`Path to the Java runtime used by opendataloader-pdf	"java"	`string`	fallback: `JAVA_BINARY`
`OPENDATALOADER_TIMEOUT`Timeout for PDF extraction in seconds	120	`integer` min 10	fallback: `TIMEOUT`
`OPENDATALOADER_FORCE_OCR`Use hybrid OCR backend (--hybrid docling-fast) for scanned/image-based PDFs. Requires opendataloader-pdf-hybrid server running.	false	`boolean`	—
`OPENDATALOADER_HYBRID_URL`URL of the opendataloader-pdf-hybrid server (e.g. http://localhost:5002). If empty, uses the default built-in URL.	""	`string`	—
`OPENDATALOADER_ARGS`Default opendataloader-pdf arguments	[]	`array`	OPENDATALOADER_DEFAULT_ARGS
`OPENDATALOADER_ARGS_EXTRA`Extra arguments to append to opendataloader-pdf command	[]	`array`	OPENDATALOADER_EXTRA_ARGS

LiteParse

liteparse

Extract text and metadata from PDFs and documents using LiteParse (by LlamaIndex). Supports OCR via Tesseract.js.

#61 Snapshot

lit (env,npm)

text/plain application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

lit providers=env,npm

Output Mimetypes

text/plain application/json

Run It

abx-dl --plugins=liteparse 'https://example.com'

LITEPARSE_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__61_liteparse.py

order 61 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`LITEPARSE_ENABLED`Enable LiteParse document extraction	true	`boolean`	SAVE_LITEPARSE, USE_LITEPARSE
`LITEPARSE_BINARY`Path to lit binary	"lit"	`string`	—
`LITEPARSE_TIMEOUT`Timeout for LiteParse extraction in seconds	120	`integer` min 10	fallback: `TIMEOUT`
`LITEPARSE_ARGS`Default LiteParse arguments	[]	`array`	LITEPARSE_DEFAULT_ARGS
`LITEPARSE_ARGS_EXTRA`Extra arguments to append to LiteParse command	[]	`array`	LITEPARSE_EXTRA_ARGS

papers-dl

papersdl

Fetch downloadable academic papers from paper URLs and DOI targets.

#66 Snapshot Embed Fullscreen

papers-dl (env,pip)

application/pdf

Source on GitHub

Dependencies & Outputs

Required Binaries

papers-dl providers=env,pip

Output Mimetypes

application/pdf

Run It

abx-dl --plugins=papersdl 'https://example.com'

PAPERSDL_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__66_papersdl.finite.bg.py

order 66 background (finite) Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PAPERSDL_ENABLED`Enable paper downloading with papers-dl	true	`boolean`	SAVE_PAPERSDL, USE_PAPERSDL
`PAPERSDL_BINARY`Path to papers-dl binary	"papers-dl"	`string`	—
`PAPERSDL_TIMEOUT`Timeout for paper downloads in seconds	300	`integer` min 30	fallback: `TIMEOUT`
`PAPERSDL_ARGS`Default papers-dl arguments	[ "fetch" ]	`array`	PAPERSDL_DEFAULT_ARGS
`PAPERSDL_ARGS_EXTRA`Extra arguments to append to papers-dl command	[]	`array`	PAPERSDL_EXTRA_ARGS

Parse HTML URLs

parse_html_urls

Parse HTML documents and emit discovered links as JSONL snapshot records.

#70 Snapshot

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_html_urls 'https://example.com'

PARSE_HTML_URLS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__70_parse_html_urls.py

order 70 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PARSE_HTML_URLS_ENABLED`Enable HTML URL parsing	true	`boolean`	USE_PARSE_HTML_URLS

Parse Text URLs

parse_txt_urls

Parse plain text documents and emit discovered URLs as JSONL snapshot records.

#71 Snapshot

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_txt_urls 'https://example.com'

PARSE_TXT_URLS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__71_parse_txt_urls.py

order 71 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PARSE_TXT_URLS_ENABLED`Enable plain text URL parsing	true	`boolean`	USE_PARSE_TXT_URLS

Parse RSS URLs

parse_rss_urls

Parse RSS and Atom feeds and emit discovered entry URLs as JSONL snapshot records.

#72 Snapshot

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_rss_urls 'https://example.com'

PARSE_RSS_URLS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__72_parse_rss_urls.py

order 72 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PARSE_RSS_URLS_ENABLED`Enable RSS/Atom feed URL parsing	true	`boolean`	USE_PARSE_RSS_URLS

Parse Netscape URLs

parse_netscape_urls

Parse Netscape bookmark HTML exports and emit discovered URLs as JSONL snapshot records.

#73 Snapshot

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_netscape_urls 'https://example.com'

PARSE_NETSCAPE_URLS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__73_parse_netscape_urls.py

order 73 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PARSE_NETSCAPE_URLS_ENABLED`Enable Netscape bookmarks HTML URL parsing	true	`boolean`	USE_PARSE_NETSCAPE_URLS

Parse JSONL URLs

parse_jsonl_urls

Parse JSONL bookmark exports and emit discovered URLs as JSONL snapshot records.

#74 Snapshot

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_jsonl_urls 'https://example.com'

PARSE_JSONL_URLS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__74_parse_jsonl_urls.py

order 74 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PARSE_JSONL_URLS_ENABLED`Enable JSON Lines URL parsing	true	`boolean`	USE_PARSE_JSONL_URLS

Parse DOM Outlinks

parse_dom_outlinks

Extract crawlable links from the rendered DOM and emit them as JSONL records.

#75 Snapshot

chromium (env,puppeteer)

application/x-ndjson

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

chromium providers=env,puppeteer

Output Mimetypes

application/x-ndjson

Run It

abx-dl --plugins=parse_dom_outlinks 'https://example.com'

PARSE_DOM_OUTLINKS_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__75_parse_dom_outlinks.js

order 75 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`PARSE_DOM_OUTLINKS_ENABLED`Enable DOM outlinks parsing from archived pages	true	`boolean`	SAVE_DOM_OUTLINKS, USE_PARSE_DOM_OUTLINKS
`PARSE_DOM_OUTLINKS_TIMEOUT`Timeout for DOM outlinks parsing in seconds	30	`integer` min 5	fallback: `TIMEOUT`

SQLite Search

search_backend_sqlite

Index archived snapshot content into a SQLite FTS database for local search.

#90 Snapshot

application/vnd.sqlite3

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/vnd.sqlite3

Run It

abx-dl --plugins=search_backend_sqlite 'https://example.com'

archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__90_index_sqlite.py

order 90 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`SEARCH_BACKEND_ENGINE`Selected search backend implementation	"sqlite"	`string`	—
`USE_INDEXING_BACKEND`Enable search indexing for archived snapshots	true	`boolean`	—
`SEARCH_BACKEND_SQLITE_DB`SQLite FTS database filename	"search.sqlite3"	`string`	SQLITEFTS_DB
`SEARCH_BACKEND_SQLITE_SEPARATE_DATABASE`Use separate database file for FTS index	true	`boolean`	FTS_SEPARATE_DATABASE, SQLITEFTS_SEPARATE_DATABASE
`SEARCH_BACKEND_SQLITE_TOKENIZERS`FTS5 tokenizer configuration	"porter unicode61 remove_diacritics 2"	`string`	FTS_TOKENIZERS, SQLITEFTS_TOKENIZERS

Sonic Search

search_backend_sonic

Index archived snapshot content into a Sonic search backend.

#91 Snapshot

sonic (env,apt,brew,cargo)

Source on GitHub

Dependencies & Outputs

Required Binaries

sonic providers=env,apt,brew,cargo

Run It

abx-dl --plugins=search_backend_sonic 'https://example.com'

archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_CrawlSetup__55_sonic_start.py

order 55 foreground CrawlSetup

on_Snapshot__91_index_sonic.py

order 91 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`SEARCH_BACKEND_ENGINE`Selected search backend implementation	"sonic"	`string`	—
`USE_INDEXING_BACKEND`Enable search indexing for archived snapshots	true	`boolean`	—
`SONIC_BINARY`Path to Sonic server binary	"sonic"	`string`	—
`SONIC_DIR`Directory used to store the Sonic config, logs, and index data	""	`string`	—
`SEARCH_BACKEND_SONIC_HOST_NAME`Sonic server hostname	"127.0.0.1"	`string`	SEARCH_BACKEND_HOST_NAME, SONIC_HOST
`SEARCH_BACKEND_SONIC_PORT`Sonic server port	1491	`integer` min 1	SEARCH_BACKEND_PORT, SONIC_PORT
`SEARCH_BACKEND_SONIC_PASSWORD`Sonic server password	"SecretPassword"	`string`	SEARCH_BACKEND_PASSWORD, SONIC_PASSWORD
`SEARCH_BACKEND_SONIC_COLLECTION`Sonic collection name	"archivebox"	`string`	SONIC_COLLECTION
`SEARCH_BACKEND_SONIC_BUCKET`Sonic bucket name	"snapshots"	`string`	SONIC_BUCKET

Claude Code Cleanup

claudecodecleanup

Use Claude Code to deduplicate and clean up redundant snapshot extractor outputs.

#92 Snapshot Embed Fullscreen

claude (env,npm)

text/plain

Source on GitHub

Dependencies & Outputs

Required Plugins

claudecode

Required Binaries

claude providers=env,npm

Output Mimetypes

text/plain

Run It

abx-dl --plugins=claudecodecleanup 'https://example.com'

CLAUDECODECLEANUP_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__92_claudecodecleanup.py

order 92 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CLAUDECODE_BINARY`Path to Claude Code CLI binary	"claude"	`string`	—
`CLAUDECODECLEANUP_ENABLED`Enable Claude Code AI cleanup of snapshot files	false	`boolean`	USE_CLAUDECODECLEANUP
`CLAUDECODECLEANUP_TIMEOUT`Timeout for Claude Code cleanup in seconds	180	`integer` min 10	fallback: `CLAUDECODE_TIMEOUT`
`CLAUDECODECLEANUP_PROMPT`Custom prompt for Claude Code cleanup. Defines what Claude should clean up and how to determine which duplicates to keep.	"Analyze all the extractor output directories in this snapshot. Look for duplicate or redundant outputs across plugins (e.g. multiple HTML extractions, multiple text extractions, multiple URL extraction outputs, etc.). For each group of similar outputs, inspect the content and determine which version is the best quality. Delete the inferior/redundant versions, keeping only the best one. Also remove any unnecessary temporary files, empty directories, or incomplete outputs. Write a summary of what you cleaned up to cleanup_report.txt in your output directory."	`string`	—
`CLAUDECODECLEANUP_MODEL`Claude model to use for cleanup (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001)	"claude-sonnet-4-6"	`string`	fallback: `CLAUDECODE_MODEL`
`CLAUDECODECLEANUP_MAX_TURNS`Maximum number of agentic turns for cleanup	50	`integer` min 1	fallback: `CLAUDECODE_MAX_TURNS`

Hashes

hashes

Generate a hash manifest for files produced in the snapshot directory.

#93 Snapshot

application/json

Source on GitHub

Dependencies & Outputs

Output Mimetypes

application/json

Run It

abx-dl --plugins=hashes 'https://example.com'

HASHES_ENABLED=true archivebox add 'https://example.com'

Runtime plugins execute while archiving a URL.

Hook Scripts

on_Snapshot__93_hashes.py

order 93 foreground Snapshot

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`HASHES_ENABLED`Enable merkle tree hash generation	true	`boolean`	SAVE_HASHES, USE_HASHES
`HASHES_TIMEOUT`Timeout for merkle tree generation in seconds	30	`integer` min 5	fallback: `TIMEOUT`

APT

apt

Install binaries through the Debian and Ubuntu APT package manager.

Source on GitHub

Run It

abx-dl plugins --install apt

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__13_apt.py

order 13 foreground BinaryRequest

Env Var Config Options

This plugin does not define a config.json schema.

Base

base

Provide shared utilities, helpers, and test support used by other plugins.

Source on GitHub

Run It

abx-dl plugins base

archivebox add 'https://example.com'

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`DATA_DIR`Base data directory for the current runtime	""	`string`	—
`ABX_RUNTIME`Current host runtime name, eg. abx-dl or archivebox	"abx-dl"	`string`	—
`ABX_INSTALL_CACHE`Runtime-derived install preflight cache keyed by binary name	{{}}	`object`	—
`SNAP_DIR`Base snapshot directory for per-snapshot hook output	""	`string`	—
`CRAWL_DIR`Base crawl directory for per-crawl hook output	""	`string`	—
`LIB_DIR`Shared tools and binary installation root	""	`string`	—
`PERSONAS_DIR`Shared Chrome/browser personas root	""	`string`	—
`ACTIVE_PERSONA`Active browser persona name	"Default"	`string`	—
`EXTRA_CONTEXT`JSON object merged into emitted JSONL event records	""	`string`	—
`TIMEOUT`Default timeout in seconds for hooks that support a TIMEOUT fallback	60	`integer` min 0	—
`USER_AGENT`Default user agent string for HTTP requests and browser automation	"Mozilla/5.0 (compatible; ArchiveBox/1.0)"	`string`	—
`PATH`Executable search path	""	`string`	—
`NODE_MODULES_DIR`Shared Node.js module resolution root	""	`string`	—
`NODE_MODULE_DIR`Legacy alias for NODE_MODULES_DIR	""	`string`	—
`NODE_PATH`Node.js module lookup path	""	`string`	—
`NODE_V8_COVERAGE`Optional V8 coverage output directory for Node.js hooks	""	`string`	—
`CHROME_BINARY`Resolved Chromium/Chrome binary path shared across plugins	""	`string`	—
`CHROME_USER_DATA_DIR`Chrome user data directory for persistent browser state (defaults to PERSONAS_DIR/ACTIVE_PERSONA/chrome_profile)	""	`string`	—
`CHROME_DOWNLOADS_DIR`Chrome downloads directory shared by browser plugins	""	`string`	—
`CHROME_EXTENSIONS_DIR`Chrome extensions directory shared by browser plugins	""	`string`	—

Bash

bash

Install binaries using arbitrary bash shell commands.

Source on GitHub

Run It

abx-dl plugins --install bash

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__14_bash.py

order 14 foreground BinaryRequest

Env Var Config Options

This plugin does not define a config.json schema.

Homebrew

brew

Install binaries through the Homebrew package manager.

Source on GitHub

Run It

abx-dl plugins --install brew

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__12_brew.py

order 12 foreground BinaryRequest

Env Var Config Options

This plugin does not define a config.json schema.

Cargo

cargo

Install binaries through Rust's Cargo package manager.

Source on GitHub

Run It

abx-dl plugins --install cargo

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__12_cargo.py

order 12 foreground BinaryRequest

Env Var Config Options

This plugin does not define a config.json schema.

Chrome Web Store Provider

chromewebstore

Resolve Chrome Web Store extensions as installable binary-like artifacts.

node (env,apt,brew)

Source on GitHub

Dependencies & Outputs

Required Plugins

chrome

Required Binaries

node providers=env,apt,brew

Run It

abx-dl plugins --install chromewebstore

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__90_chromewebstore.py

order 90 foreground BinaryRequest

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`PERSONAS_DIR`Shared Chrome/browser personas root	""	`string`	—
`ACTIVE_PERSONA`Active browser persona name	"Default"	`string`	—
`CHROME_EXTENSIONS_DIR`Path to installed Chrome extensions directory	""	`string`	—

Claude Code

claudecode

Run Claude Code AI agent on snapshots to extract, analyze, or transform archived content.

Embed Fullscreen

node (env,apt,brew) claude (env,npm)

application/json

Source on GitHub

Dependencies & Outputs

Required Binaries

node providers=env,apt,brew claude providers=env,npm

Output Mimetypes

application/json

Run It

abx-dl plugins claudecode

archivebox add 'https://example.com'

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`CLAUDECODE_ENABLED`Enable Claude Code AI agent integration. Controls whether the Claude CLI dependency is resolved for this plugin; child plugins still need the claudecode plugin enabled and a working Claude binary.	false	`boolean`	USE_CLAUDECODE
`CLAUDECODE_BINARY`Path to Claude Code CLI binary	"claude"	`string`	—
`CLAUDECODE_TIMEOUT`Timeout for Claude Code operations in seconds	120	`integer` min 10	fallback: `TIMEOUT`
`ANTHROPIC_API_KEY`Anthropic API key for Claude Code authentication	""	`string`	—
`CLAUDECODE_MODEL`Claude model to use (e.g. claude-sonnet-4-6, claude-opus-4-6, claude-haiku-4-5-20251001)	"claude-sonnet-4-6"	`string`	—
`CLAUDECODE_MAX_TURNS`Maximum number of agentic turns per invocation	50	`integer` min 1	—

Environment

env

Discover binaries that are already available on the system PATH.

Source on GitHub

Run It

abx-dl plugins --install env

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__00_env.py

order 00 foreground BinaryRequest

Env Var Config Options

This plugin does not define a config.json schema.

Media

media

Provide a shared namespace for media-related plugin outputs and helpers.

Source on GitHub

Run It

abx-dl plugins media

archivebox add 'https://example.com'

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Env Var Config Options

This plugin does not define a config.json schema.

npm

npm

Install binaries from npm packages and expose Node module paths.

node (env,apt,brew)

Source on GitHub

Dependencies & Outputs

Required Binaries

node providers=env,apt,brew

Run It

abx-dl plugins --install npm

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__10_npm.py

order 10 foreground BinaryRequest

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`NODE_BINARY`Path to Node.js binary	"node"	`string`	—
`NPM_BINARY`Path to npm binary	"npm"	`string`	—

pip

pip

Install Python-based binaries into a managed virtual environment.

python (env)

Source on GitHub

Dependencies & Outputs

Required Binaries

python providers=env

Run It

abx-dl plugins --install pip

archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__11_pip.py

order 11 foreground BinaryRequest

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`PIP_VENV_PYTHON`Preferred Python interpreter for creating the shared pip virtualenv	""	`string`	—

Puppeteer

puppeteer

Install and manage Chrome for Testing through the Puppeteer toolchain.

puppeteer (npm)

Source on GitHub

Dependencies & Outputs

Required Binaries

puppeteer providers=npm

Run It

abx-dl plugins --install puppeteer

PUPPETEER_ENABLED=true archivebox init --setup

Setup plugins install dependencies or prepare shared runtime state.

Hook Scripts

on_BinaryRequest__12_puppeteer.py

order 12 foreground BinaryRequest

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`CHROME_BINARY`Path to Chromium, Chrome for Testing, or Chrome Canary binary	"chromium"	`string`	CHROMIUM_BINARY
`PUPPETEER_ENABLED`Enable Puppeteer dependency installation during crawl setup	true	`boolean`	—
`PUPPETEER_TIMEOUT`Timeout in seconds for Puppeteer-managed browser dependency installation	900	`integer` min 60	—

ripgrep Search

search_backend_ripgrep

Search archived snapshot files directly with ripgrep instead of maintaining an index.

rg (env,apt,brew)

Source on GitHub

Dependencies & Outputs

Required Binaries

rg providers=env,apt,brew

Run It

abx-dl plugins search_backend_ripgrep

archivebox add 'https://example.com'

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Env Var Config Options

Key	Default	Type	Aliases / Fallback
`SEARCH_BACKEND_ENGINE`Selected search backend implementation	"ripgrep"	`string`	—
`RIPGREP_BINARY`Path to ripgrep binary	"rg"	`string`	—
`RIPGREP_TIMEOUT`Search timeout in seconds	90	`integer` min 5	SEARCH_BACKEND_TIMEOUT fallback: `TIMEOUT`
`RIPGREP_ARGS`Default ripgrep arguments	[ "--files-with-matches" "--no-messages" "--ignore-case" ]	`array`	RIPGREP_DEFAULT_ARGS
`RIPGREP_ARGS_EXTRA`Extra arguments to append to ripgrep command	[]	`array`	RIPGREP_EXTRA_ARGS

SSL

ssl

Utility plugin namespace reserved for SSL-related integration points and metadata.

Source on GitHub

Run It

abx-dl plugins ssl

archivebox add 'https://example.com'

Utility plugins are typically consumed indirectly, so the example shows the closest inspection workflow.

Hook Scripts

No hook scripts are defined in this plugin directory.

Env Var Config Options

This plugin does not define a config.json schema.

Pick the web capture you need. Run it in one command.

Try a plugin in one command

Plugin catalog

yt-dlp

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

gallery-dl

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

forum-dl

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Git

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

wget

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Archive.org

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Favicon

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

uBlock Origin Lite

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

I Still Don't Care About Cookies

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

2Captcha

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Modal Closer

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Console Log

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

DNS

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

SSL Certificates

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Responses

Dependencies & Outputs

Run It

Hook Scripts

Env Var Config Options

Redirects

Dependencies & Outputs