Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remember best known statistics for TableHandle on engine side #6998

Closed
findepi opened this issue Feb 23, 2021 · 2 comments · Fixed by #7913
Closed

Remember best known statistics for TableHandle on engine side #6998

findepi opened this issue Feb 23, 2021 · 2 comments · Fixed by #7913
Labels
enhancement New feature or request

Comments

@findepi
Copy link
Member

findepi commented Feb 23, 2021

As described in #844, predicate pushdown may degrade the quality of statistics provided by a connector.
If the connector is not able to consider Filter in the returned response, and engine no longer has a Filter, the statistics will not reflect the effect of filtering.

This can become a bigger issue, as we progress with #18 and #4249, #6613 #6620 in particular.

Here, a solution is proposed where Engine keeps track of pre-pushdown statistics on TableHandle / TableScanNode and uses them when connector can no longer produce statistics.
By "pre-pushdown" i mean statistics derived from the TableScan + the operation being pushed down (e.g. Filter, Aggregation, Join, TopN, etc.), calculated on the engine side.

cc @rzeyde-varada @losipiuk @sopel39 @martint @kokosing

@findepi findepi added the enhancement New feature or request label Feb 23, 2021
@findepi
Copy link
Member Author

findepi commented Feb 23, 2021

The computation would need to happen (eagerly, or lazily - to be defined), after successful pushdown, based on the plan nodes before the pushdown.

@sopel39
Copy link
Member

sopel39 commented Feb 25, 2021

This does not exactly solve #844 as connector might support predicate pushdown but might not support predicate pushdown for stats computation.
For #844 I think engine should first try to compute stats with predicate applied and if it returns empty, then engine should try to compute stats without any predicate and apply predicate with stats calculator.

Anyways, I don't think io.trino.spi.connector.ConnectorMetadata#getTableStatistics is used (and implemented) correctly (e.g see io.trino.plugin.hive.HiveMetadata#getTableStatistics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

Successfully merging a pull request may close this issue.

2 participants