-
Notifications
You must be signed in to change notification settings - Fork 462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schema merge failed since switch to Datafusion if a field is a list of structs #3339
Comments
^ To clarify, if the schema does not contain a list field, the merge works as expected. EDIT: If I switch the |
@liamphmurphy yeah this is not something we can fix in delta-rs, it's an unsupported cast in datafusion. Can you make an issue upstream please in https://github.com/apache/datafusion cc @alamb Error message:
|
Bug issue report upstream: apache/datafusion#15338 |
The issue started with the release Here is the minimal code to reproduce. The new column can be added at the 1st position or in the middle - it will fail the same. from deltalake import write_deltalake
import pyarrow as pa
def write_table_v1(table_path):
schema_v1 = pa.schema([
pa.field(
"c1",
pa.struct([
pa.field("c2", pa.string()),
pa.field("c3", pa.string()),
])
)
])
data = [{"c1": {"c2": "v2", "c3": "v3"}}]
table = pa.Table.from_pylist(data, schema_v1)
write_deltalake(
table_or_uri=table_path,
data=table,
schema=schema_v1,
mode="append",
schema_mode="merge",
engine="rust",
)
def write_table_v2(table_path, new_field_type):
schema_v2 = pa.schema([
pa.field(
"c1",
pa.struct([
pa.field("c2", pa.string()),
pa.field("new_field", new_field_type),
pa.field("c3", pa.string()),
])
)
])
data = [
{
"c1": {
"c2": "v2",
"new_field": None,
"c3": "v3",
}
}
]
table = pa.Table.from_pylist(data, schema_v2)
write_deltalake(
table_or_uri=table_path,
data=table,
schema=schema_v2,
mode="append",
schema_mode="merge",
engine="rust",
)
write_table_v1("table1")
for field_type in (pa.bool_(), pa.int64(), pa.list_(pa.int64())):
try:
write_table_v2("table1", field_type)
except Exception as e:
print(e) Output
|
Environment
Delta-rs version: v0.25.4 (see below for specifics)
Binding: Python, rust engine
Environment:
Local, S3
Bug
What happened:
Since the adoption of datafusion, it appears to struggling with schema merges if the originating table schema contains a list of structs (Pyarrow list for exact verbiage).
What you expected to happen:
Adding a non-list field to a schema with a list of structs field would merge, which worked previously.
How to reproduce it:
On v0.25.4, run the following Python code:
More details:
The above code works as expected on the last version I was using, v0.19.2.
The text was updated successfully, but these errors were encountered: