Skip to content

When Streaming input, extra line inserted #9

@ethanone1

Description

@ethanone1

Describe the bug
I've got some very large files and I need to chunk the input. When doing so, the first line of each chunk gets an extra entry. Mostly its blank, but occasionally it seems to be including characters from other parts of the file.
Note, my file is 500,000 rows by 9000 columns. I'm just working with the first 100 rows, but all 9000 columns. In this example, I'm just pulling the first 12 columns.

How can I avoid the extra/strange entry at the beginning of each chunk.

To Reproduce
short bit of code...
`const { FixedWidthParser } = require("fixed-width-parser");

const fixedWidthParser = new FixedWidthParser([
{ name: "FIELD A", start: 0, width: 12 },
]);

async function logChunks(readable) {
const output = fs.createWriteStream(outputPath, { encoding: "utf8" });
for await (const chunk of readable) {
let result = fixedWidthParser.parse(chunk);
console.log(result); //echo result to preview (below)
}
}

const input = fs.createReadStream(inputPath, { encoding: "utf8" });
logChunks(input);`

Expected behavior
[ { 'FIELD A': '000000010002' }, <=This chunk is fine { 'FIELD A': '000000010003' }, { 'FIELD A': '000000010004' }, { 'FIELD A': '000000010005' }, { 'FIELD A': '000000010006' }, { 'FIELD A': '000000010007' }, { 'FIELD A': '000000010008' }, { 'FIELD A': '000000010009' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10009 to 10010 { 'FIELD A': '000000010010' }, { 'FIELD A': '000000010011' }, { 'FIELD A': '000000010012' }, { 'FIELD A': '000000010014' }, { 'FIELD A': '000000010016' }, { 'FIELD A': '000000010017' }, { 'FIELD A': '000000010018' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10018 to 10019 { 'FIELD A': '000000010019' }, { 'FIELD A': '000000010020' }, { 'FIELD A': '000000010021' }, { 'FIELD A': '000000010022' }, { 'FIELD A': '000000010023' }, { 'FIELD A': '000000010024' }, { 'FIELD A': '000000010025' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10025 to 10026 { 'FIELD A': '000000010026' }, { 'FIELD A': '000000010027' }, { 'FIELD A': '000000010030' }, { 'FIELD A': '000000010031' }, { 'FIELD A': '000000010033' }, { 'FIELD A': '000000010035' }, { 'FIELD A': '000000010036' }, { 'FIELD A': '000000010037' } ] [ { 'FIELD A': 'CT PT 6, ACR' }, <= This came from Col 1,174. I really don't understand what's happening here. { 'FIELD A': '000000010038' }, { 'FIELD A': '000000010040' }, { 'FIELD A': '000000010041' }, { 'FIELD A': '000000010042' }, { 'FIELD A': '000000010043' }, { 'FIELD A': '000000010044' }, { 'FIELD A': '000000010045' } ]

Desktop (please complete the following information):

  • OS: win10, node 14.16

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions