-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cautionary note about performance in get_row_count method #998
Add cautionary note about performance in get_row_count method #998
Conversation
@matthewkrausse Do you think it might be worth changing this method to use the |
That's a good point. I'm sure how much of a concern the computational resources of running a big COUNT() would be. Maybe make COUNT() a default, and add an extra boolean flag for something like Maybe that's not something that would actually be helpful to the way people are using this method in production code? Maybe we could keep it simple and just always use COUNT. |
Yeah, I think it's useful when someone is just looking to see if a table has 0 records, 10 records or 10 million records.But, I don't feel like anyone is doing that programatically. They are likely just using the BQ Studio. |
So I'm happy to merge this as is, but another approach would be to create another method like |
Let's go ahead and merge this as is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request adds a cautionary note about the performance of the get_row_count method in the BigQuery materialization. The note warns that using SELECT COUNT(*) can be expensive for large tables, especially those with many columns, as BigQuery scans all table data to perform the count. The note aims to inform developers about the potential performance impact and encourage them to consider alternative approaches when dealing with large tables.