Add a new NDJSON / JSONL input source#4721
Conversation
rosecodym
left a comment
There was a problem hiding this comment.
This looks like a good start! In addition to my inline questions, I have this one: Are scans of individual paths cancellation-aware? It looks like the source is only cancellation-aware between paths.
trufflesteeeve
left a comment
There was a problem hiding this comment.
This looks good to me, however I'm waiting on others to review before I give my own approval.
camgunz
left a comment
There was a problem hiding this comment.
Couple of nits and suggestions, but super clean and straightforward--looks great 👍🏻
|
|
||
| metadataJSON, err := entry.Metadata.MarshalJSON() | ||
| if err != nil { | ||
| ctx.Logger().Error(err, "failed to convert metadata to JSON") |
There was a problem hiding this comment.
suggestion: log the entry in a separate logging call at like... level 4 or 5 maybe
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| if common.IsDone(ctx) { | ||
| return nil | ||
| } | ||
| s.SetProgressComplete(i, len(s.paths), fmt.Sprintf("Path: %s", path), "") |
There was a problem hiding this comment.
| } | ||
| case jsonEnumeratorScan.FullCommand(): | ||
| cfg := sources.JSONEnumeratorConfig{Paths: *jsonEnumeratorPaths} | ||
| if ref, err := eng.ScanJSONEnumeratorInput(ctx, cfg); err != nil { |
There was a problem hiding this comment.
Command accepts empty input silently
Medium Severity
json-enumerator runs even when no path arguments are provided. jsonEnumeratorPaths is optional and the command path does not validate len(*jsonEnumeratorPaths) > 0, so the scan completes successfully without processing any input, creating a false-success result.


This adds a new input source to TruffleHog, accessible via
trufflehog json-enumerator.This input source requires a list of filenames, each of which is an NDJSON-formatted sequence of objects that take one of two forms:
Form 1:
{"data": "utf-8 string", "metadata": <non-null JSON value>}Form 2:
{"data_b64": "base64-encoded bytestring", "metadata": <non-null JSON value>}The
data/data_b64field specifies the content to be scanned. Themetadatafield is arbitrary, and is simply propagated downstream with scan results from the corresponding content.Note that although
trufflehog json-enumeratorrequires a list of filenames to be given, the NDJSON data that you wish to scan may not need to be first written to disk. On Linux and macOS, at least, you can use shell process substitution to set up a named pipe from a producer process, liketrufflehog json-enumerator <(some-program-that-emits-ndjson).Note
Medium Risk
Adds a new ingestion path that parses untrusted NDJSON and threads arbitrary per-record metadata through the scanning pipeline; issues here could lead to scan failures or unexpected memory/throughput behavior on large inputs.
Overview
Adds a new CLI subcommand,
trufflehog json-enumerator, that scans one or more NDJSON/JSONL files containing per-linedata(UTF-8 string) ordata_b64(bytes) plus requiredmetadata.Implements a new
json_enumeratorsource and engine entrypoint (ScanJSONEnumeratorInput) that stream-decodes records, scans their payloads via existing file handling, and attaches the record’s metadata to findings via new protobuf types (sourcespb.JSONEnumeratorandsource_metadatapb.JSONEnumerator). Includes generated proto/validation updates and a focused unit test covering string vs bytes payloads and invalid metadata.Written by Cursor Bugbot for commit 4beec34. This will update automatically on new commits. Configure here.