Skip to content

feat: add native JSON output (-J/-j flags)#353

Merged
jiegec merged 8 commits intolsof-org:masterfrom
kurok:feature/json-output
Mar 12, 2026
Merged

feat: add native JSON output (-J/-j flags)#353
jiegec merged 8 commits intolsof-org:masterfrom
kurok:feature/json-output

Conversation

@kurok
Copy link
Contributor

@kurok kurok commented Mar 12, 2026

Summary

  • Add -J flag for nested JSON output (single JSON object with processes→files hierarchy)
  • Add -j flag for flat JSON Lines output (one denormalized JSON object per open file, one per line)
  • Both formats reuse the existing -F field selection mechanism for choosing which fields to include

Motivation

The legacy -F field output uses single-character field IDs with newline/NUL terminators. While machine-parseable, it requires custom parsers. Native JSON enables direct ingestion into logging pipelines (Datadog, Splunk, Elastic Stack) and seamless parsing by Python, Go, Nushell, and jq.

Changes

File Changes
src/store.c Add Fjson, Fjsonl, Fjson_first_proc global flags
lib/common.h Extern declarations for new globals
src/main.c Option parsing for -J/-j, mutual exclusivity validation, default field selection
src/print.c JSON helpers (string escaping, field emitters), json_print_file(), json_print_proc_fields(), JSON branch in print_proc(), envelope functions
lib/proto.h Declarations for json_open_envelope/json_close_envelope
lib/dialects/*/Makefile Add version.h dependency for print.o
Lsof.8 Man page documentation for -J and -j
tests/case-30-json-output.bash 6 test cases for -J nested JSON
tests/case-31-jsonl-output.bash 3 test cases for -j JSON Lines
Makefile.am Register new tests

Design decisions

  • No external dependencies — hand-written JSON formatting (lsof data is simple enough)
  • Large numeric values as stringssize, offset, inode emitted as JSON strings to avoid IEEE 754 precision loss
  • Numeric FDs as strings — directly addresses issue [BUG] fd numbers also truncated in -F fd *output for other programs* #311 (fd truncation in field output)
  • TCP/TPI as nested objecttcp_info: {state, recv_queue, send_queue, ...} in both formats
  • Fields with no value omitted — not set to null

Test plan

  • -J produces valid JSON (parseable by python3 -m json.tool)
  • -j produces valid JSON Lines (each line parseable)
  • Field selection with -Fpcfn limits output to selected fields only
  • -J and -j are mutually exclusive (error if both specified)
  • -J/-j and -t are mutually exclusive
  • Empty results produce valid JSON ({"lsof_version":"...","processes":[]} for -J, empty for -j)
  • +J/+j produce error messages
  • Clean build from scratch on Darwin

kurok added 7 commits March 12, 2026 12:51
Add Fjson/Fjsonl global flags, option parsing for -J (nested JSON)
and -j (JSON Lines), mutual exclusivity validation, and default
field selection when -F is not explicitly given.
Emits {"lsof_version":"...","processes":[...]} around process
objects. Handles empty results and suppresses -F marker output
in JSON modes during repeat cycles.
Refactors json_open_envelope/json_close_envelope into print.c
which already includes version.h via the correct build ordering.
Updates all dialect Makefiles to add version.h as a dependency
for print.o.
@kurok
Copy link
Contributor Author

kurok commented Mar 12, 2026

CI Status

All 14 checks pass except the Cirrus CI / FreeBSD 14.3 job, which is a known pre-existing failure unrelated to this PR.

Evidence:

  • The same freebsd_instance:family/freebsd-14-3 check also fails on master and on PR #352.
  • As @jiegec noted on Fix truncated fd numbers in -F field output #352, this is likely due to Cirrus CI placing the repo on /tmp, where accurate file location is not available for lsof's test suite.
  • FreeBSD 15.0 passes on this PR. All Linux distros (Alpine, Arch, CentOS 9, Debian 11/12, Fedora 37/38, openSUSE 15, Ubuntu 22.04/24.04, NixOS) and macOS pass.

This PR touches only the CLI output path (src/print.c, src/main.c, src/store.c) — none of the liblsof internals that the FreeBSD test exercises.

src/main.c Outdated
}
}
if (!has_fields) {
for (i = 0; FieldSel[i].nm; i++) {
Copy link
Member

@jiegec jiegec Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of code duplication with existing -F handling, can we deduplicate them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bb8bab1. Extracted select_default_fields() static helper in main.c — now called from both the case 'F': default path and the -J/-j default field setup. Removed ~40 lines of duplicated #ifdef logic.

src/print.c Outdated
static void json_print_str(int *sep, const char *key, const char *val) {
if (*sep)
putchar(',');
printf("\"%s\":\"", key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key might also require escape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in bb8bab1 — extracted json_print_key() helper that escapes the key via json_puts_escaped(). All json_print_* functions now route through it.

src/print.c Outdated
if (FieldSel[LSOF_FIX_ACCESS].st) {
char a[2] = {access_to_char(Lf->access), '\0'};
if (a[0] != ' ')
json_print_str(sep, "access", a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a json_print_char makes it simpler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in bb8bab1. Added json_print_char() and switched the access and lock fields to use it — avoids the char[2] array construction.

- Extract select_default_fields() to deduplicate field selection logic
  shared between -F and -J/-j default handling (jiegec review)
- Escape JSON keys via json_print_key() helper (jiegec review)
- Add json_print_char() for single-char fields like access and lock,
  avoiding unnecessary char[2] array construction (jiegec review)
@jiegec jiegec merged commit 054198c into lsof-org:master Mar 12, 2026
15 of 16 checks passed
@stephane-chazelas
Copy link

FYI, as currently written, as strings in the output are encoded as JSON strings, and JSON kind of mandates UTF-8, that doesn't work properly for file names that are encoded in a character encoding other than ASCII or UTF-8.

From testing it looks like it replaces all sequences of 1 or more bytes that can't be decoded into UTF-8 with one character.

There's no good way to address that. That's a shortcoming of the JSON format.

lsfd and many other tools choose to dump those bytes as-is. That means it's not proper JSON per-RFC, but means the information can be extracted reliably provided you have JSON processing utilities that can cope with that.

I'm not saying lsof should or should not do the same, but either way, it would be good to document it.

See https://unix.stackexchange.com/questions/757832/how-to-process-json-with-strings-containing-invalid-utf-8 for more background on that.

@stephane-chazelas
Copy link

From testing it looks like it replaces all sequences of 1 or more bytes that can't be decoded into UTF-8 with one character.

There's no good way to address that. That's a shortcoming of the JSON format.

lsfd and many other tools choose to dump those bytes as-is. That means it's not proper JSON per-RFC, but means the information can be extracted reliably provided you have JSON processing utilities that can cope with that.
[...]

My bad, I messed up my testing. Looks like lsof behaves like lsfd and many other tools that dump those bytes as-is, so producing invalid JSON, but one that contains the information in a theoretically parseable way.

Still worth documenting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants