-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
push auth fails when 5 to 10 Minutes after pull auth (with Workload Identity in GCP) #5852
Comments
Note that there are two timeouts here. When the registry asks for an authentication token, buildx will contact the auth service with your credentials to pull one. This token has an expiration time as a field, and buildkit/buildx will refresh it and get a new one if the previous token gets close to expiration. This is in https://github.com/moby/buildkit/blob/master/util/resolver/authorizer.go#L330 . The other case is that for some credential helpers, the credentials themselves are not static and will expire (there are multiple levels of credentials). Because there is significant overhead in the credential helpers, buildx will cache them. When the token expires, build will try to generate a new one but may receive an error because cached credentials don't work anymore. This is the 10min cache expiration time you are pointing to. You can try with a custom build that changes the timeout to 5min. If it works, we can consider making it configurable or lowering the default. If the error returned from the token endpoint is typed we could also consider a fix where we try again with uncached credentials. Note that this is used in client side so if you are using |
Thanks a lot @tonistiigi for the quick reply. In the meantime, my colleague @nobbs has manipulated buildx in a way that ensures that ultimately at the relevant place we have a 5-minute timeout (a few seconds less would be even better, as I indicated above): nobbs/buildx@444aa01 Should I create a PR with the changed proposed in the issue description? |
Yes, lowering the default to ~5min seems ok. But make the change in buildkit instead of overriding in buildx so all clients have a better default. FYI @cpuguy83 |
fixes moby#5852 Signed-off-by: Michael Korn <[email protected]>
fixes moby#5852 Signed-off-by: Michael Korn <[email protected]>
fixes moby#5852 Signed-off-by: Michael Korn <[email protected]>
fixes moby#5852 Signed-off-by: Michael Korn <[email protected]>
Contributing guidelines and issue reporting guide
Well-formed report checklist
Description of bug
Bug description
Using Google Artifact Registry and Workload Identity for authentication:
Image pushes fail due to auth fail if the push is exactly 5 Minutes to 10 Minutes after the cache pull. With following Error:
Error seems from authprovider.go#L140 and the issue could result from authprovider.go#L62.
I tried to change the code to:
But these changes (also tried to change the log) are not reflected after I build the buildkit image and use it in buildx.
Reproduction
[auth]
--cache-from
:[auth] .../test-nginx-image:pull token for europe-west3-docker.pkg.dev
--cache-to
or--push
:[auth] .../test-nginx-image:pull,push token for europe-west3-docker.pkg.dev
sleep 270
works,sleep 300
fails andsleep 600
(and much more) works fine again.docker buildx build
call.--cache-from
can stay in the second call, as everything is cached there is no [auth] for the remote cache needed (or in the log) during the second run.Version information
The text was updated successfully, but these errors were encountered: