Skip to content

Comments

Add a new NDJSON / JSONL input source#4721

Merged
bradlarsen merged 12 commits intomainfrom
json-enumerator
Feb 17, 2026
Merged

Add a new NDJSON / JSONL input source#4721
bradlarsen merged 12 commits intomainfrom
json-enumerator

Conversation

@bradlarsen
Copy link
Contributor

@bradlarsen bradlarsen commented Jan 29, 2026

This adds a new input source to TruffleHog, accessible via trufflehog json-enumerator.

This input source requires a list of filenames, each of which is an NDJSON-formatted sequence of objects that take one of two forms:

Form 1: {"data": "utf-8 string", "metadata": <non-null JSON value>}
Form 2: {"data_b64": "base64-encoded bytestring", "metadata": <non-null JSON value>}

The data / data_b64 field specifies the content to be scanned. The metadata field is arbitrary, and is simply propagated downstream with scan results from the corresponding content.

Note that although trufflehog json-enumerator requires a list of filenames to be given, the NDJSON data that you wish to scan may not need to be first written to disk. On Linux and macOS, at least, you can use shell process substitution to set up a named pipe from a producer process, like trufflehog json-enumerator <(some-program-that-emits-ndjson).


Note

Medium Risk
Adds a new ingestion path that parses untrusted NDJSON and threads arbitrary per-record metadata through the scanning pipeline; issues here could lead to scan failures or unexpected memory/throughput behavior on large inputs.

Overview
Adds a new CLI subcommand, trufflehog json-enumerator, that scans one or more NDJSON/JSONL files containing per-line data (UTF-8 string) or data_b64 (bytes) plus required metadata.

Implements a new json_enumerator source and engine entrypoint (ScanJSONEnumeratorInput) that stream-decodes records, scans their payloads via existing file handling, and attaches the record’s metadata to findings via new protobuf types (sourcespb.JSONEnumerator and source_metadatapb.JSONEnumerator). Includes generated proto/validation updates and a focused unit test covering string vs bytes payloads and invalid metadata.

Written by Cursor Bugbot for commit 4beec34. This will update automatically on new commits. Configure here.

@bradlarsen bradlarsen requested a review from a team January 29, 2026 19:41
@bradlarsen bradlarsen requested review from a team as code owners January 29, 2026 19:41
Copy link
Contributor

@rosecodym rosecodym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start! In addition to my inline questions, I have this one: Are scans of individual paths cancellation-aware? It looks like the source is only cancellation-aware between paths.

Copy link
Contributor

@trufflesteeeve trufflesteeeve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, however I'm waiting on others to review before I give my own approval.

Copy link
Contributor

@camgunz camgunz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of nits and suggestions, but super clean and straightforward--looks great 👍🏻


metadataJSON, err := entry.Metadata.MarshalJSON()
if err != nil {
ctx.Logger().Error(err, "failed to convert metadata to JSON")
Copy link
Contributor

@camgunz camgunz Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: log the entry in a separate logging call at like... level 4 or 5 maybe

@bradlarsen bradlarsen merged commit c563a06 into main Feb 17, 2026
12 checks passed
@bradlarsen bradlarsen deleted the json-enumerator branch February 17, 2026 15:28
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

if common.IsDone(ctx) {
return nil
}
s.SetProgressComplete(i, len(s.paths), fmt.Sprintf("Path: %s", path), "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress never reaches completion

Low Severity

Chunks reports progress with zero-based i via SetProgressComplete, so the final update is (len(paths)-1)/len(paths). For a single path this stays at 0%, and for multiple paths it never reaches 100%, leaving json-enumerator jobs appearing incomplete.

Fix in Cursor Fix in Web

}
case jsonEnumeratorScan.FullCommand():
cfg := sources.JSONEnumeratorConfig{Paths: *jsonEnumeratorPaths}
if ref, err := eng.ScanJSONEnumeratorInput(ctx, cfg); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Command accepts empty input silently

Medium Severity

json-enumerator runs even when no path arguments are provided. jsonEnumeratorPaths is optional and the command path does not validate len(*jsonEnumeratorPaths) > 0, so the scan completes successfully without processing any input, creating a false-success result.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants