-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve revised integration tests for local use by devs #17838
base: master
Are you sure you want to change the base?
Conversation
The goal of this work is to make running the revised ITs locally easier for devs while not impacting the robustness of ITs running in Github actions CI pipeline
integration-tests-ex/cases/cluster/MultiStageQueryWithMM/docker-compose.py
Fixed
Show fixed
Hide fixed
@@ -74,6 +74,10 @@ public class ITSecurityBasicQuery | |||
public static final String USER_1_PASSWORD = "password1"; | |||
private static final String EXPORT_TASK = "/indexer/export_task.json"; | |||
|
|||
// Time in ms to sleep after updating role permissions in each test. This intends to give the | |||
// underlying test cluster enough time to sync permissions and be ready when test execution starts. | |||
private static final int SYNC_SLEEP = 10000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will total in 50s
- but I guess if someone is bothered that these tests are slow someone will take a look at that...
I personally like slow tests better than flaky ones :)
|
||
def define_master_service(self, name, base) -> dict: | ||
''' | ||
Defines a "master" service: one which depends on the metadata service. | ||
''' | ||
service = self.define_druid_service(name, base) | ||
self.add_depends(service, [ZOO_KEEPER, METADATA]) | ||
dep_dict = { ZOO_KEEPER: {'condition': 'service_started'}, METADATA: {'condition': 'service_healthy'}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the original call sites were communicating the intention better - it also feels like the service_healthy
and service_started
are abstraction leakage.
it would be nice to get the full condition differently - I feel like that has more to do with what the service want to depend on - so the the condition
parts like service_started
could be in a map or something ; and the add_depends
could look that up - that will also leave all the callsites as is.
Having dependency-service startup conditional setup use a centralized lookup with a sane default simplifies individual IT cases and will drive consistency across all existing and future IT cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! thank you for the updates!
Marking as draft so this doesn't get merged over the weekend. I think I found an issue that could impact CI stability. In my local I've been hammering this docker-compose with ups and downs to test and noticed intermittent bad behavior where the https://github.com/apache/druid/blob/master/integration-tests-ex/cases/cluster.sh#L132 we are removing the From official mysql docker image docks:
Long story short, we can either have a proper seed file available even though it isn't doing anything. Or we can just forget about it and nuke the extra mysql volume all together |
As for why this now causes an issue even though it has been around for a long time - we were never waiting on container health so docker-compose never cared that the container restarted. now it seems to sometimes notice and blow up. It is inconsistent on my local, maybe 10% of the time with the current code, but enough to cause CI nightmares if it also happens in github actions. The changes I have tested seem to appear to completely eradicate the problem. I'll push one up shortly after I decide which I like more. |
…ql docker entrypoint script The mysql docker container allows you to prive startup scripts for the database in this entrypoint directory. We were nmounting a directory named like a file, and it confuses the docker startup for mysql foring a restart on startup. This restart is now problematic since we are telling docker-compose to wait for mysql health before starting services that depend on it. Since we aren't using the startup script mechanism at all, simply removing it seems like cleanest strategy
Description
The main goal of this work is to make running the revised ITs locally easier for devs while not impacting the robustness of ITs already successfully running in Github actions CI pipeline
Improved flakiness of cluster startup caused by race condition in container startup (druid vs MySQL)
MySQL has to be up accepting connections before coordinator/overlord containers start, otherwise their is a risk of them trying to create tables and indexes before MySQL is healthy. When this happened on my local it led to undesired behavior (services starting up, failing to create metadata tables and indexes, and then complaining about them not existing for remainder of cluster uptime).
To accomplish this, I made two main changes:
Improved flakiness of Security IT suite
Locally I was still seeing flakiness that seemed to be caused by authz policies not syncing in time for each test. I saw that one test had it's "policy sync sleep" increased to 4 seconds with the others left at 1. I centralized sleep time with a static variable so we don't run into drift like this again. I also upped the sleep to 10 seconds, but am not wed to that number if we want to try to squeeze it down some. After this change, I saw much more consistent success running the Security suite locally.
Improved UX for folks using machines with Apple Silicon
In the past the docker-compose template had a commented out
platform
key that told Apple Silicon users to uncomment it. This will inevitably lead to folks using Apple chips to become fatigued with remembering this bit of information. Instead I a modified the template.py script that generates the docker-compose file to key off of the host platform and set the proper override if we are running onarm
.Migrated Query and MultiStageQueryWithMM
Query and MultiStageQueryWithMM were using their own static docker-compose files. This goes against the pattern of generating the compose file dynamically in the other IT categories. I am suspicious that it is a case of the tests not fully migrating from the "old" IT pattern to the "revised" pattern. After migrating them to the new generated compose file pattern, I had no issues locally. Interested to see them run in CI.
These categories rely on Kafka, which pushed me to add Kafka as a core dependency generated in the compose file by
template.py
. I updated the categories that don't need kafka to exclude it from their compose files going forward.Release note
N/A as this is test facing only.
For Druid Devs: Improve the user experience running integration-tests-ex on your local machine.
Key changed/added classes in this PR
integration-tests-ex/cases/cluster/template.py
integration-tests-ex/cases/src/test/java/org/apache/druid/testsEx/auth/ITSecurityBasicQuery.java
integration-tests-ex/cases/cluster/Common/dependencies.yaml
This PR has:
test Druid clusterlocal dev env