Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parquet reader can't handle float16 with BYTE_STREAM_SPLIT #21803

Open
2 tasks done
mwlon opened this issue Mar 17, 2025 · 0 comments
Open
2 tasks done

Parquet reader can't handle float16 with BYTE_STREAM_SPLIT #21803

mwlon opened this issue Mar 17, 2025 · 0 comments
Labels
A-io-parquet Area: reading/writing Parquet files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@mwlon
Copy link

mwlon commented Mar 17, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
import pyarrow as pa
from pyarrow import parquet as pq

t = pa.Table.from_pydict({'a': pa.array([1.0]).cast(pa.float16())})
pq.write_table(t, 'foo.parquet', use_dictionary=False, column_encoding='BYTE_STREAM_SPLIT')
pl.read_parquet('foo.parquet')

Log output

parquet scan with parallel = None
Traceback (most recent call last):
  File "/usr/local/home/mloncaric/temp.py", line 7, in <module>
    pl.read_parquet('foo.parquet')
  File "/dev/shm/uid-22467-gid-33123/dc9eeca1-seed-nspid4026531836-ns-4026531841-js-pip/polars/_utils/deprecation.py", line 114, in wrapper
    return function(*args, **kwargs)
  File "/dev/shm/uid-22467-gid-33123/dc9eeca1-seed-nspid4026531836-ns-4026531841-js-pip/polars/_utils/deprecation.py", line 114, in wrapper
    return function(*args, **kwargs)
  File "/dev/shm/uid-22467-gid-33123/dc9eeca1-seed-nspid4026531836-ns-4026531841-js-pip/polars/io/parquet/functions.py", line 252, in read_parquet
    return lf.collect()
  File "/dev/shm/uid-22467-gid-33123/dc9eeca1-seed-nspid4026531836-ns-4026531841-js-pip/polars/_utils/deprecation.py", line 88, in wrapper
    return function(*args, **kwargs)
  File "/dev/shm/uid-22467-gid-33123/dc9eeca1-seed-nspid4026531836-ns-4026531841-js-pip/polars/lazyframe/frame.py", line 2188, in collect
    return wrap_df(ldf.collect(engine, callback))
polars.exceptions.ComputeError: parquet: Not yet supported: Decoding FixedLenByteArray(2) "ByteStreamSplit"-encoded optional parquet pages not yet supported

Issue description

Polars native reader doesn't support float16+parquet yet. I can get around this with use_pyarrow=True, but it's discouraging to people I'm trying to win over from Pandas.

Expected behavior

No error

Installed versions

>>> pl.show_versions()
--------Version info---------
Polars:              1.25.0
Index type:          UInt32
Platform:            Linux-6.1.128-1.el8.jane1.x86_64-x86_64-with-glibc2.28
Python:              3.10.13 (main, Jun  2 2024, 23:07:15) [GCC 13.1.1 20230614 (Red Hat 13.1.1-4)]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               5.5.0
azure.identity       <not installed>
boto3                1.35.30
cloudpickle          3.0.0
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          2.35.0
great_tables         0.10.0
matplotlib           3.9.2
numpy                1.26.3
openpyxl             3.1.5
pandas               1.5.3
polars_cloud         <not installed>
pyarrow              16.1.0.1+jane.el8
pydantic             2.9.2
pyiceberg            <not installed>
sqlalchemy           1.4.54
torch                2.5.0.3+jane
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@mwlon mwlon added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 17, 2025
@alexander-beedie alexander-beedie added the A-io-parquet Area: reading/writing Parquet files label Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants