Skip to content

docs: document non-UTF-8 encoding behavior in JSON output#355

Merged
jiegec merged 1 commit intolsof-org:masterfrom
kurok:doc/json-utf8-encoding-caveat
Mar 13, 2026
Merged

docs: document non-UTF-8 encoding behavior in JSON output#355
jiegec merged 1 commit intolsof-org:masterfrom
kurok:doc/json-utf8-encoding-caveat

Conversation

@kurok
Copy link
Contributor

@kurok kurok commented Mar 12, 2026

Summary

  • Add character encoding caveat to man page (Lsof.8) under -J/-j flags
  • Add -J/-j options with UTF-8 note to docs/options.md
  • Replace "JSON is planned" placeholder in docs/tutorial.md with full JSON section including encoding caveat
  • Add source comment on json_puts_escaped() in src/print.c explaining the trade-off

Motivation

As noted by @stephane-chazelas in #353 and tracked in #354: Unix file names are arbitrary byte sequences and may contain bytes that are not valid UTF-8. The JSON output modes (-J/-j) pass such bytes through unchanged, producing output that is not strictly RFC 8259 conformant but preserves original file names. This is the same approach taken by lsfd(1), ip(8) -j, systemctl --output=json, and other Linux tools.

This behavior is worth documenting so users know what to expect and how to handle it if strict UTF-8 is required (e.g. iconv or Python's surrogateescape).

See https://unix.stackexchange.com/questions/757832/ for background.

Closes #354

Unix file names are arbitrary byte sequences and may contain bytes
that are not valid UTF-8.  The JSON output modes (-J/-j) pass such
bytes through unchanged, producing output that is not strictly
RFC 8259 conformant but preserves original file names.  This is
consistent with lsfd(1), ip(8) -j, and other Linux tools.

Document this behavior in the man page (Lsof.8), docs/options.md,
docs/tutorial.md, and add a source comment on json_puts_escaped().

Closes lsof-org#354
@jiegec jiegec merged commit bf0457d into lsof-org:master Mar 13, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC], when strings in the output are encoded as JSON strings but is not proper JSON per-RFC ASCII or UTF-8.

2 participants