You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in #844, predicate pushdown may degrade the quality of statistics provided by a connector.
If the connector is not able to consider Filter in the returned response, and engine no longer has a Filter, the statistics will not reflect the effect of filtering.
This can become a bigger issue, as we progress with #18 and #4249, #6613#6620 in particular.
Here, a solution is proposed where Engine keeps track of pre-pushdown statistics on TableHandle / TableScanNode and uses them when connector can no longer produce statistics.
By "pre-pushdown" i mean statistics derived from the TableScan + the operation being pushed down (e.g. Filter, Aggregation, Join, TopN, etc.), calculated on the engine side.
This does not exactly solve #844 as connector might support predicate pushdown but might not support predicate pushdown for stats computation.
For #844 I think engine should first try to compute stats with predicate applied and if it returns empty, then engine should try to compute stats without any predicate and apply predicate with stats calculator.
Anyways, I don't think io.trino.spi.connector.ConnectorMetadata#getTableStatistics is used (and implemented) correctly (e.g see io.trino.plugin.hive.HiveMetadata#getTableStatistics)
As described in #844, predicate pushdown may degrade the quality of statistics provided by a connector.
If the connector is not able to consider Filter in the returned response, and engine no longer has a Filter, the statistics will not reflect the effect of filtering.
This can become a bigger issue, as we progress with #18 and #4249, #6613 #6620 in particular.
Here, a solution is proposed where Engine keeps track of pre-pushdown statistics on TableHandle / TableScanNode and uses them when connector can no longer produce statistics.
By "pre-pushdown" i mean statistics derived from the TableScan + the operation being pushed down (e.g. Filter, Aggregation, Join, TopN, etc.), calculated on the engine side.
cc @rzeyde-varada @losipiuk @sopel39 @martint @kokosing
The text was updated successfully, but these errors were encountered: