You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
exports it to a csv string, reads back the string, counts the rows
writes it to disk, read back the file, counts a different number of rows!
The behavior is consistent on a single run, but non-deterministic (a different wrong number) when run in a loop.
So the bugs are:
row counts should be preserved across writing a file and reading it back in
there should be no state carried from one invocation of loadCSV to the next
run with node 22 on linux and node 20 on Windows
I also have a web based version where you can play with the row counts and column counts to slightly different bad behavior.
If it works with 10K rows and 20 columns, try 100K rows and 200 columns.
// csv_fail.mjs
// $ node csv_fail.mjs
import * as aq from "arquero";
import { writeFileSync, unlinkSync } from 'fs';
// make the test CSV
const numRows = 10000;
const numColumns = 20;
const one = Object.fromEntries(Array(numColumns).fill(null).map((_, i) => [`a${i + 1}`, '"']));
const many = Array(numRows).fill(one);
const built = aq.from(many);
const csvFromBuilt = built.toCSV();
// read it back from a string (good)
const fromCsvFromBuilt = await aq.fromCSV(csvFromBuilt);
// prints 10000 (good)
console.log(fromCsvFromBuilt.numRows());
// write it to disk and read it back (bad)
writeFileSync('test.csv', csvFromBuilt);
const loadCsvFromDisk = await aq.loadCSV('test.csv');
// prints 4587 (bad)
console.log(loadCsvFromDisk.numRows());
unlinkSync('test.csv');
The text was updated successfully, but these errors were encountered:
Here's a minimal script that:
The behavior is consistent on a single run, but non-deterministic (a different wrong number) when run in a loop.
So the bugs are:
run with node 22 on linux and node 20 on Windows
I also have a web based version where you can play with the row counts and column counts to slightly different bad behavior.
If it works with 10K rows and 20 columns, try 100K rows and 200 columns.
The text was updated successfully, but these errors were encountered: