-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #38269 - fix container-image-name #11336
base: master
Are you sure you want to change the base?
Conversation
In this case, I think we can resort to the "by ID" method that we have for ambiguous organization labels. We had to implement it since containers can't have uppercase letters. "Default_Org" and "default_org" would clash, for example. This method is documented at the bottom of the section here https://docs.theforeman.org/nightly/Managing_Content/index-katello.html#Using_Container_Registries_content-management Edit: my comment was related to container push only. However, for normal repositories, we could also rely on the clashing name error to pop up. In that case, users would need to edit their Library lifecycle environment container naming scheme for the org. |
for organization-labels starting and/or einding in '_'
2d71c39
to
8a85a19
Compare
Thanks for the hint @ianballou I added some code to use |
Ah, looks like some container VCRs need rerecording. Are those errors reproducible locally and make sense? |
@ianballou I also get these vcr-failures locally. I am a bit puzzled by them, because I would assume that the Organization labels in the tests should be fine and not fail the Container-Name-Validation 🤔 It might be a problem to have these values in the VCR-cassettes, because URIs will evaluate to something like |
The VCRs can be decoded to show the expected params, which could be used to diff the params being sent in the test. That's usually my last-resort debugging tool.
Ah yeah DB IDs sent into VCRs won't work. I think the best we can do for the 'by-ID push method' is to test up until the point that something is sent to Pulp. |
I see what you mean now though, we need to look into why the ID type of container path is getting used in the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've learned some things:
@@ -8,7 +8,7 @@ def validate_each(record, attribute, value) | |||
end | |||
|
|||
def self.validate_name(name) | |||
if name.empty? || name.length > 255 || !/\A([a-z0-9]+[a-z0-9\-_.]*)+(\/[a-z0-9]+[a-z0-9\-_.]*)*\z/.match?(name) | |||
if name.empty? || name.length > 255 || !/\A([a-z0-9]+([\-_.]*[a-z0-9]+)*)+(\/[a-z0-9]+([\-_.]*[a-z0-9]+)*)*\z/.match?(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the official regex from the OCI spec:
[a-z0-9]+([._-][a-z0-9]+)*(/[a-z0-9]+([._-][a-z0-9]+)*)*
https://specs.opencontainers.org/distribution-spec/?v=v1.0.0#DISTRIBUTION-SPEC-27
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a small difference to how pulp_container defines it, they do a split('/')
first and check each element separately:
[a-z0-9]+(?:(?:[._]|__|[-]*)[a-z0-9])*
The official regex would suggest that .
, -
, and _
must be followed by alphanumerical character so a__b
or c---d
would not be allowed. While pulp seems to explicitly allow them 😵
If we choose the official Regexp we should make sure that pulp uses that one as well.
else | ||
items = [repository.organization.label, repository.content_view.label, repository.content_view_version.version, repository.product.label, repository.label] | ||
items += [repository.content_view.label, repository.content_view_version.version, repository.product.label, repository.label] | ||
end | ||
Repository.clean_container_name(items.compact.join("/")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why the VCR tests are failing now. The label needs to have clean_container_name
applied to it in the new ID logic.
In fact, this raises the question if we even need to use this ID method at all. We already seem to have a precedent for cleaning up container names, which could cause them to clash. So, the question is then, what is missing from this clean_container_name
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what is missing is a check for an underscore before a forward slash, like:
weird_organization_/product/repository...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but the question is: what should it do with it?
If it drops the _
before the /
then this might create a new collision. I admit the case is far fetched that someone has Weird Organization
and Weird Organization!
with similar Container Image structure, but it would trigger undefined behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the collision is okay since it's the current norm. If an organization label contained any of the characters that are cleaned up today, it could already collide, which we have handling for. Theoretically, the error will then bubble up to the user that they need to update their name. In this case, it could mean removing the organization name from the lifecycle environment container naming pattern.
It's likely that the error won't immediately tell users what the issue is, but it would also (hopefully) be super rare that a user would have two organizations where the only difference is a special character at the end. And in that case, we have a workaround that they can apply (change the name pattern) once they dig into it.
The error would be Registry name pattern results in duplicate container image names for these repositories: %s.
.
The message could potentially be improved to guide the users to the fix.
The reason I'm thinking hard about the above is that this new logic would probably have users wondering why
their container names are deviating from the normal container naming pattern. It's possible that users have container images today that are being 'cleaned' -- these images would then change to the new ID-pattern (worth a test). Actually -- I think most users' container repos would end up in the ID category since the default organization label has capital letters!
[2] pry(main)> ::Organization.second.label
=> "Default_Organization"
[3] pry(main)> ::Katello::Repository.clean_container_name(::Organization.second.label)
=> "default_organization"
On top of that, since it's a very rare case, it makes me wonder if the regression risk is work the gain.
All that said, it's better for users if they have more flexibility. I think, if we go the ID route, we'll want to have it work as well for content views, repositories, lifecycle environments, and products. All of these records could suffer from the same collision issue.
I will add that this ID pattern could also be implemented by the user in the container naming pattern on the LCE. Perhaps our documentation could be updated to guide them there?
Edit: the naming pattern can't include IDs, but the pattern could be modified to not have the organization in it at least (or provide a custom label)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thought, even if the orgs are the same, the other parts that make up the name would also have to collide for it to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not fix the underlying issue, yet!
The issue being that Organization Labels ending with non-alphanumeric characters will break the container-image-name used for publishing in pulp.
I am hesitant to just strip them, because it may happen that there are two Organizations
ACME Inc.
andACME Inc
. Stripping the_
from the Labels will end in both Orgs having the same container-image base-path.What are the changes introduced in this pull request?
Fixes the Container-Image-Name Validator to detect the same as pulp-container (see https://github.com/pulp/pulp_container/blob/bd2a27e9f581d675bf33232fede2a6466d56aa3d/pulp_container/app/serializers.py#L39).
With this fix it is no longer possible to create docker-type repos in Organizations whose Label ends in
_
. Creation worked before but synchronizing failed, because pulp rejects the container-image-name.I am open for suggestions on how to fix this thoroughly 🤔