-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support arbitrary separators in read_csv
#61
Comments
I have a use case for this. Currently read_csv() only accepts a single file path and has hardcoded try_parse_dates(true) and missing_is_null(true). It would be great if polars-cli's read_csv() was at feature parity with the python api's read_csv(). I was under the impression someone was already working on this a while ago, but I can take it. |
For this: Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.36s
Running `target/debug/polars`
Polars CLI version 0.9.0
Type .help for help.
〉select * FROM read_csv('foods.csv');
┌────────────┬──────────┬────────┬──────────┐
│ category ┆ calories ┆ fats_g ┆ sugars_g │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ i64 │
╞════════════╪══════════╪════════╪══════════╡
│ vegetables ┆ 45 ┆ 0.5 ┆ 2 │
│ seafood ┆ 150 ┆ 5.0 ┆ 0 │
│ meat ┆ 100 ┆ 5.0 ┆ 0 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ seafood ┆ 140 ┆ 5.0 ┆ 1 │
│ … ┆ … ┆ … ┆ … │
│ seafood ┆ 100 ┆ 5.0 ┆ 0 │
│ seafood ┆ 200 ┆ 10.0 ┆ 0 │
│ seafood ┆ 200 ┆ 7.0 ┆ 2 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ meat ┆ 110 ┆ 7.0 ┆ 0 │
└────────────┴──────────┴────────┴──────────┘
〉select * FROM read_csv('foods_semicolon.csv');
┌─────────────────────────────────┐
│ category;calories;fats_g;sugar… │
│ --- │
│ str │
╞═════════════════════════════════╡
│ vegetables;45;0.5;2 │
│ seafood;150;5;0 │
│ meat;100;5;0 │
│ fruit;60;0;11 │
│ seafood;140;5;1 │
│ … │
│ seafood;100;5;0 │
│ seafood;200;10;0 │
│ seafood;200;7;2 │
│ fruit;60;0;11 │
│ meat;110;7;0 │
└─────────────────────────────────┘
〉select * FROM read_csv('foods_semicolon.csv', separator=';');
┌────────────┬──────────┬────────┬──────────┐
│ category ┆ calories ┆ fats_g ┆ sugars_g │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ i64 │
╞════════════╪══════════╪════════╪══════════╡
│ vegetables ┆ 45 ┆ 0.5 ┆ 2 │
│ seafood ┆ 150 ┆ 5.0 ┆ 0 │
│ meat ┆ 100 ┆ 5.0 ┆ 0 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ seafood ┆ 140 ┆ 5.0 ┆ 1 │
│ … ┆ … ┆ … ┆ … │
│ seafood ┆ 100 ┆ 5.0 ┆ 0 │
│ seafood ┆ 200 ┆ 10.0 ┆ 0 │
│ seafood ┆ 200 ┆ 7.0 ┆ 2 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ meat ┆ 110 ┆ 7.0 ┆ 0 │
└────────────┴──────────┴────────┴──────────┘
〉select * FROM read_csv('foods_hash.csv', separator=';');
┌─────────────────────────────────┐
│ category#calories#fats_g#sugar… │
│ --- │
│ str │
╞═════════════════════════════════╡
│ vegetables#45#0.5#2 │
│ seafood#150#5#0 │
│ meat#100#5#0 │
│ fruit#60#0#11 │
│ seafood#140#5#1 │
│ … │
│ seafood#100#5#0 │
│ seafood#200#10#0 │
│ seafood#200#7#2 │
│ fruit#60#0#11 │
│ meat#110#7#0 │
└─────────────────────────────────┘
〉select * FROM read_csv('foods_hash.csv', separator='#');
┌────────────┬──────────┬────────┬──────────┐
│ category ┆ calories ┆ fats_g ┆ sugars_g │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ i64 │
╞════════════╪══════════╪════════╪══════════╡
│ vegetables ┆ 45 ┆ 0.5 ┆ 2 │
│ seafood ┆ 150 ┆ 5.0 ┆ 0 │
│ meat ┆ 100 ┆ 5.0 ┆ 0 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ seafood ┆ 140 ┆ 5.0 ┆ 1 │
│ … ┆ … ┆ … ┆ … │
│ seafood ┆ 100 ┆ 5.0 ┆ 0 │
│ seafood ┆ 200 ┆ 10.0 ┆ 0 │
│ seafood ┆ 200 ┆ 7.0 ┆ 2 │
│ fruit ┆ 60 ┆ 0.0 ┆ 11 │
│ meat ┆ 110 ┆ 7.0 ┆ 0 │
└────────────┴──────────┴────────┴──────────┘
〉SELECT *
〉FROM read_csv('foods_tab.csv', separator='\t')
〉WHERE fats_g >= 10;
┌──────────┬──────────┬────────┬──────────┐
│ category ┆ calories ┆ fats_g ┆ sugars_g │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 ┆ i64 │
╞══════════╪══════════╪════════╪══════════╡
│ meat ┆ 120 ┆ 10.0 ┆ 1 │
│ seafood ┆ 200 ┆ 10.0 ┆ 0 │
└──────────┴──────────┴────────┴──────────┘
〉 stdin: $ cat foods_query.sql
SELECT
category,
calories
FROM read_csv('foods_semicolon.csv', separator = ';')
WHERE calories > 100;
$ cat foods_query.sql | polars
┌──────────┬──────────┐
│ category ┆ calories │
│ --- ┆ --- │
│ str ┆ i64 │
╞══════════╪══════════╡
│ seafood ┆ 150 │
│ seafood ┆ 140 │
│ meat ┆ 120 │
│ seafood ┆ 130 │
│ meat ┆ 110 │
│ … ┆ … │
│ seafood ┆ 200 │
│ meat ┆ 110 │
│ seafood ┆ 200 │
│ seafood ┆ 130 │
│ fruit ┆ 130 │
└──────────┴──────────┘
$ Done:
|
Before continuing, @ritchie46 might there be any specific reason why we shouldn't extend read_csv()? If not, should I first create an issue on the main repo? There's an explicit test on the number of arguments passed, so not sure if that's for the sake of testing completeness or to block extending. |
Description
TSV files are pretty common, but I couldn't make them work with
polars-cli
. Apparently, theread_csv
function ofpolars-cli
doesn't take the same arguments as the one in Python (maybe it doesn't support arguments at all?)From what I've seen in
table_functions.rs
,read_csv
uses aLazyCsvReader
, which apparently supports theseparator
argument. I can't say I'm very familiar with Polar's code, though. I might be missing something obvious.The text was updated successfully, but these errors were encountered: