Add line and column information to errors by jakubadamw · Pull Request #410 · 01mf02/jaq

jakubadamw · 2026-03-12T12:32:42Z

So I’ve been using jaq as a replacement for jq a bit, and one thing I noticed is that for an invalid JSON file it did not provide me with the location of the parsing error.

$ cat invalid.json | jaq
Error: failed to parse: comma or end of sequence expected

This PR adds line & column information to errors such that the output is now:

$ cat invalid.json | jaq
Error: failed to parse: at line 5, column 5: comma or end of sequence expected

Unlike jq, the offset calculation is character-based rather than byte-based, which I believe is an improvement.

jq’s output for reference:

$ cat invalid.json | jq
jq: parse error: Expected separator between values at line 5, column 8

I also added a number of tests.

For iterator-based parsing, this works through a wrapper iterator that tracks the line/column position by counting bytes. However, when the original sequence of bytes remains available on error (an unhappy path, so to say), we calculate the line & column values from the byte offset, by rescanning the sequence.

I introduced a new test macro and made the `golden_test()` function take an extra argument denoting whether the input is expected to parse successfully.

…umn calculation there.

01mf02 · 2026-03-12T14:29:12Z

Thanks for your PR!

The parsing part looks quite good, I'm not sold yet for the reading part --- I'm particularly worried about the usage of Rc<Cell<_>> in a hot path.
Could you benchmark reading performance (cat big_file_with_single_value | jaq empty) before and after your PR?

jakubadamw · 2026-03-12T16:12:45Z

@01mf02, hello, thanks for taking a look and for hinting at the performance implications of this!

I generated a 4GiB JSON file consisting of a single array of consecutive numbers. Here are the numbers that I am seeing:

$ hyperfine --runs 3 'cat numbers.json | ./target/release/jaq empty' --reference 'cat numbers.json | jaq empty'
Benchmark 1: cat numbers.json | jaq empty
  Time (mean ± σ):     20.311 s ±  0.498 s    [User: 17.702 s, System: 4.057 s]
  Range (min … max):   19.772 s … 20.755 s    3 runs
 
Benchmark 2: cat numbers.json | ./target/release/jaq empty
  Time (mean ± σ):     23.575 s ±  0.320 s    [User: 21.485 s, System: 3.771 s]
  Range (min … max):   23.265 s … 23.904 s    3 runs

So you’re right, this can lead to a 15-20% performance regression. I suppose it’s not worth doing it for the stream-based path, then? What do you think?

01mf02 · 2026-03-12T17:39:29Z

I generated a 4GB JSON file consisting of a single array of consecutive numbers.

It would be nice to have instructions to reproduce your experiment.
I think that such an array of consecutive numbers might even be quite nice in your case --- you could also try just generating a lot of whitespace. I think that would be the worst case for the change that you propose, because parsing whitespace is normally the fastest and would thus show the impact most faithfully.

So you’re right, this can lead to a 15-20% performance regression. I suppose it’s not worth doing it for the stream-based path, then? What do you think?

I agree with you. It's not worth it, also because you normally cannot just look up the line number of something that you piped into jaq. For that, you will probably look into a file. And if you are already doing that, then you should be using jaq ... file anyway, for performance. ;)

01mf02 · 2026-03-12T17:40:16Z

jaq-json/src/read.rs

+    // TODO: replace with `!byte.is_utf8_continuation()` once stable.
+    let column = line_start
+        .iter()
+        .filter(|&&byte| byte & 0xC0 != 0x80)


Can you add a source to this magic trick?

01mf02 · 2026-03-12T17:42:30Z

jaq-json/src/read.rs

 #[derive(Debug)]
-pub struct Error(usize, hifijson::Error);
+pub struct Error {
+    line: usize,


I would make a type LineCol = (usize, usize) and store that here, instead of having separate line and column fields. That would simplify the call sites of byte_offset_to_position below.

01mf02 · 2026-03-12T17:48:26Z

jaq/tests/golden.rs

+        paste::paste! {
+            #[test]
+            fn [<$name _stdin>]() -> io::Result<()> {
+                golden_test($args, $input, false, $stderr)


I would actually just add golden_test_file after golden_test in the test! macro, and delete test_error.

jakubadamw · 2026-03-13T20:01:56Z

It would be nice to have instructions to reproduce your experiment. I think that such an array of consecutive numbers might even be quite nice in your case --- you could also try just generating a lot of whitespace. I think that would be the worst case for the change that you propose, because parsing whitespace is normally the fastest and would thus show the impact most faithfully.

Of course! This is the script I was using. I added an option to generate a file filled with a specific ratio of whitespaces as well.

#!/usr/bin/env -S cargo +nightly -Zscript
---
[dependencies]
bytesize = "1"
structopt = "0.3"
---

use std::io::{BufWriter, Write};
use structopt::StructOpt;

#[derive(StructOpt)]
struct Args {
    /// Total output size, e.g. 4GiB.
    size: bytesize::ByteSize,
    /// Fraction of bytes that should be spaces, e.g. 0.4.
    #[structopt(default_value = "0")]
    whitespace_byte_count_ratio: f64,
}

fn write_spaces(out: &mut impl Write, count: usize) -> std::io::Result<()> {
    use std::io::Read;
    std::io::copy(&mut std::io::repeat(b' ').take(count as u64), out)?;
    Ok(())
}

fn main() -> std::io::Result<()> {
    let args = Args::from_args();
    let desired_total_byte_count = args.size.as_u64() as usize;
    let whitespace_byte_count_ratio = args.whitespace_byte_count_ratio;

    assert!(
        (0.0..1.0).contains(&whitespace_byte_count_ratio),
        "whitespace_byte_count ratio must be in [0, 1)"
    );

    let desired_non_whitespace_byte_count_byte_count =
        (desired_total_byte_count as f64 * (1.0 - whitespace_byte_count_ratio)) as usize;

    let stdout = std::io::stdout();
    let mut out = BufWriter::new(stdout.lock());

    out.write_all(b"[")?;

    let mut non_whitespace_byte_count_byte_count = 1usize; // Counts the opening '['.
    let mut whitespace_byte_count = 0usize;

    for i in 0u64.. {
        let number = i.to_string();
        let entry_byte_count = number.len() + if i == 0 { 0 } else { 1 }; // +1 for ','.

        if non_whitespace_byte_count_byte_count + entry_byte_count + 1 > desired_non_whitespace_byte_count_byte_count {
            // +1 for the closing ']'.
            break;
        }

        if i > 0 {
            out.write_all(b",")?;
            if i % 10 == 0 {
                out.write_all(b"\n")?;
                whitespace_byte_count += 1;
            }
        }

        // Add spaces proportionally to maintain the whitespace_byte_count ratio.
        let desired_whitespace_byte_count = (non_whitespace_byte_count_byte_count as f64 * whitespace_byte_count_ratio / (1.0 - whitespace_byte_count_ratio)) as usize;
        let spaces_to_write = desired_whitespace_byte_count.saturating_sub(whitespace_byte_count);
        write_spaces(&mut out, spaces_to_write)?;
        whitespace_byte_count += spaces_to_write;

        out.write_all(number.as_bytes())?;
        non_whitespace_byte_count_byte_count += entry_byte_count;
    }

    // Pad with remaining whitespace to reach desired_total_byte_count.
    let trailing_spaces = desired_total_byte_count.saturating_sub(non_whitespace_byte_count_byte_count + whitespace_byte_count + 1);
    write_spaces(&mut out, trailing_spaces)?;
    whitespace_byte_count += trailing_spaces;

    out.write_all(b"]")?;
    non_whitespace_byte_count_byte_count += 1;

    assert_eq!(
        non_whitespace_byte_count_byte_count + whitespace_byte_count,
        desired_total_byte_count,
        "total bytes written does not match desired total"
    );

    Ok(())
}

So with a test file generated with:

./generate 4GiB 0.9 > numbers.json

(around 90% of whitespaces)

and the benchmark run:

hyperfine --warmup 1 --runs 5 'cat numbers.json | jaq empty' --reference 'cat numbers.json | ./target/release/jaq empty'

I am seeing these:

Benchmark 1: cat numbers.json | ./target/release/jaq empty
  Time (mean ± σ):     11.647 s ±  0.140 s    [User: 10.755 s, System: 2.665 s]
  Range (min … max):   11.404 s … 11.742 s    5 runs
 
Benchmark 2: cat numbers.json | jaq empty
  Time (mean ± σ):      9.787 s ±  0.043 s    [User: 9.079 s, System: 2.461 s]
  Range (min … max):    9.741 s …  9.835 s    5 runs
 
Summary
  cat numbers.json | ./target/release/jaq empty ran
    1.19 ± 0.02 times slower than cat numbers.json | jaq empty

so, again, a 20% slowdown.

So I’ll exclude the line/column information from the stream-based mode and make the other requested changes.

jakubadamw added 6 commits March 12, 2026 12:20

Add a test for the position calculation.

72dc4a5

I introduced a new test macro and made the `golden_test()` function take an extra argument denoting whether the input is expected to parse successfully.

Change the column calculation to skip UTF-8 continuation bytes.

845be91

cargo fmt.

4247f30

Add tests exercising the non-iterator-based parsing to verify the col…

fc1b218

…umn calculation there.

Two Clippy fixes.

885955b

01mf02 reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add line and column information to errors#410

Add line and column information to errors#410
jakubadamw wants to merge 6 commits into01mf02:mainfrom
jakubadamw:line-column-information

jakubadamw commented Mar 12, 2026

Uh oh!

01mf02 commented Mar 12, 2026

Uh oh!

jakubadamw commented Mar 12, 2026 •

edited

Loading

Uh oh!

01mf02 commented Mar 12, 2026

Uh oh!

01mf02 Mar 12, 2026

Uh oh!

01mf02 Mar 12, 2026

Uh oh!

01mf02 Mar 12, 2026

Uh oh!

jakubadamw commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jakubadamw commented Mar 12, 2026

jq’s output for reference:

Uh oh!

01mf02 commented Mar 12, 2026

Uh oh!

jakubadamw commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

01mf02 commented Mar 12, 2026

Uh oh!

01mf02 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

01mf02 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

01mf02 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jakubadamw commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakubadamw commented Mar 12, 2026 •

edited

Loading