-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Describe the bug
I've got some very large files and I need to chunk the input. When doing so, the first line of each chunk gets an extra entry. Mostly its blank, but occasionally it seems to be including characters from other parts of the file.
Note, my file is 500,000 rows by 9000 columns. I'm just working with the first 100 rows, but all 9000 columns. In this example, I'm just pulling the first 12 columns.
How can I avoid the extra/strange entry at the beginning of each chunk.
To Reproduce
short bit of code...
`const { FixedWidthParser } = require("fixed-width-parser");
const fixedWidthParser = new FixedWidthParser([
{ name: "FIELD A", start: 0, width: 12 },
]);
async function logChunks(readable) {
const output = fs.createWriteStream(outputPath, { encoding: "utf8" });
for await (const chunk of readable) {
let result = fixedWidthParser.parse(chunk);
console.log(result); //echo result to preview (below)
}
}
const input = fs.createReadStream(inputPath, { encoding: "utf8" });
logChunks(input);`
Expected behavior
[ { 'FIELD A': '000000010002' }, <=This chunk is fine { 'FIELD A': '000000010003' }, { 'FIELD A': '000000010004' }, { 'FIELD A': '000000010005' }, { 'FIELD A': '000000010006' }, { 'FIELD A': '000000010007' }, { 'FIELD A': '000000010008' }, { 'FIELD A': '000000010009' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10009 to 10010 { 'FIELD A': '000000010010' }, { 'FIELD A': '000000010011' }, { 'FIELD A': '000000010012' }, { 'FIELD A': '000000010014' }, { 'FIELD A': '000000010016' }, { 'FIELD A': '000000010017' }, { 'FIELD A': '000000010018' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10018 to 10019 { 'FIELD A': '000000010019' }, { 'FIELD A': '000000010020' }, { 'FIELD A': '000000010021' }, { 'FIELD A': '000000010022' }, { 'FIELD A': '000000010023' }, { 'FIELD A': '000000010024' }, { 'FIELD A': '000000010025' } ] [ { 'FIELD A': '' }, <= This blank line is not in the data. Data goes right from 10025 to 10026 { 'FIELD A': '000000010026' }, { 'FIELD A': '000000010027' }, { 'FIELD A': '000000010030' }, { 'FIELD A': '000000010031' }, { 'FIELD A': '000000010033' }, { 'FIELD A': '000000010035' }, { 'FIELD A': '000000010036' }, { 'FIELD A': '000000010037' } ] [ { 'FIELD A': 'CT PT 6, ACR' }, <= This came from Col 1,174. I really don't understand what's happening here. { 'FIELD A': '000000010038' }, { 'FIELD A': '000000010040' }, { 'FIELD A': '000000010041' }, { 'FIELD A': '000000010042' }, { 'FIELD A': '000000010043' }, { 'FIELD A': '000000010044' }, { 'FIELD A': '000000010045' } ]
Desktop (please complete the following information):
- OS: win10, node 14.16