-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL has an invalid label. #3683
Comments
So this is arguably an increase in strictness thanks to our new idna library, but underscores are not allowed in hostnames. See kjd/idna#17 for more. While this previously worked, I suspect that really it shouldn't have. Underscores are perfectly allowed in DNS, which is why this never encountered a problem prior to this, but for hostnames they're entirely forbidden. I'm inclined to say that this is within the bounds of things that are allowed to stop working in a minor release, as it's simply outside the realm of what RFCs allow. |
To be clear, your suitable workarounds are either to downgrade Requests or to change your hostnames to use hyphens instead of underscores. I'm proposing not to consider this a Requests bug, so you'd either be stuck on 2.11.1 for the rest of eternity or you'll need to change your hostnames to use hyphens. |
Ok, thanks! I will change my urls. |
Thanks for the report! ✨ |
We encountered the same issue today and comes to the same conclusion that they are not officially supported. In our situation, we tried to use the devpi CLI against a docker image with the hostname We downgraded requests to bypass the problem. While underscores are not officially support it seems that both Firefox (https://bugzilla.mozilla.org/show_bug.cgi?id=479520#c45) and curl seems to works well with underscores:
I didn't saw any option in the idna library to accept these characters, so apart from avoiding requests>2.12.0 when having underscore hostnames, we have no choices, right? |
I'm afraid that's correct. Other workarounds include adding hosts or DNS aliases for hostnames with underscores in them. |
I have updated Unfortunately, it's not easy to remove Maybe you could add an option to disable this validation? |
@kottenator we don't add options for things like this. |
Just a quick question @sigmavirus24 / @Lukasa Services like Amazon's SES, and Sengrid, use CNAMES with underscores in them for DKIM authentication. http://docs.aws.amazon.com/ses/latest/DeveloperGuide/easy-dkim-dns-records.html As you've mentioned before, these are fine for DNS, but not for Hostnames. When you say Hostnames, I'm assuming you're talking about DNS records that are used for browsers, or similar tools, to access data, not for records that are used for stuff like the aforementioned DKIM records? |
@TetraEtc Exactly so. Underscores in DNS names are reasonably common (e.g. SRV records have mandatory underscores in them), but for hostnames (that is, anything in the host part of a URL) they are not allowed. |
Well, this is a pity, as e.g. gitlab-ci replaces / with __ to build the hostname from the docker image name, so my integration tests broke out-of-the-blue. |
@mfriedenhagen the GitLab team is very receptive to bug reports and I consider this to be a problem on their end. Regardless, I believe the solution being worked on in #3695 will help you |
Unfortunately, I don't think #3695 will address this problem. #3695 is solely for bypassing hostnames that were IDNA encoded before the request was made. I haven't seen any comment by @Lukasa about changing how we are now handling underscores in hostnames. @sigmavirus24's advice to raise a bug report with GitLab is the way to go. |
Yup, there's no plan to tolerate underscores: they're not valid in hostnames, and people should avoid using them. |
Hm, so a lot of other network utilities are just tolerant ( |
Understand the underscore, not valid. import requests
url = "http://[2003:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:ed3d]:12900/system"
response = requests.get(url, auth=('admin', 'foobar'))
print response.text Got the same exception with requests==2.12.1 (and >= 2.12) Traceback (most recent call last):
File "test_requests.py", line 6, in <module>
response = requests.get(url, auth=authentication)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 474, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 407, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 302, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 372, in prepare_url
raise InvalidURL('URL has an invalid label.')
requests.exceptions.InvalidURL: URL has an invalid label.
|
@mfriedenhagen None of those tools tolerate IDNs. In this case it's a matter of picking what you want: do users want us to correctly tolerate internationalized domain names, or do they want to do their own IDN handling? I'm prepared to believe that most users do not want to do their own IDN handling. @rsaikali So that seems to be a bigger issue. And at this point we're in a bit of a bind because by default there aren't good tools to spot this specific problem (namely, we have an IP address here, not a hostname, so skip IDNA), which means we'd have to vendor the ipaddress module to get that to work. Alternatively, we could use some heuristics to detect possible IP addresses in hostnames and skip IDNA encoding for those as well. I think that's definitely a problem though. An alternative solution to this is to say that if the |
Do you want me to open a new ticket for that ? "InvalidURL: URL has an invalid label. (with IPv6)" |
Nope. =) |
Just my 1c: I am all for standards! But I am also pragmatic -- in my case with this upgrade of Actually! that restriction (no
which refers to "domain names" (once again which might be hostnames or some other DNS records). So as far as I see it, it is actually CC: @chaselgrove |
@yarikoptic It might have helped to read the discussion more carefully: the core devs are already in agreement with you, and we're taking steps to resolve the problem (see #3695). You could have saved yourself a bit of time by checking that first. However, addressing the substantive portion of your complaint:
A fuller read of that section will reveal to you that the only ambiguity is between However, we aren't dealing with general URIs. We only support URIs that contain either IP address literals (in which case we do no name resolution) or hostnames (in which case we do an A lookup). These are not expected to be generalised names. However, we need to follow the furrow that browsers have ploughed for us, and they have no interest in enforcing restrictions on URL formatting. This is why we're working on a fix. To all other readers of this issue: please do not add your 2¢ or "me too"s. We already acknowledge this to be an issue. We are working on a fix. Please wait patiently for that fix to come through. |
@Lukasa Thank you very much for the rapid reply!!! very much appreciated. And hopefully now ppl would see your statement in bold (feel free to remove my reply to not shadow it), and not e.g. |
Re reg-name argument, just for my education sake, could you please point to the portion of some relevant RFC or any other document stating that URL, as a subset of URIs must use hostname and not a domain name? (as far as I see it, |
|
still confused (e.g. on either you are saying that |
Hello, I tried understanding rfc 3986 (URIs) and 3987 (IRIs) As outlined in http://stackoverflow.com/questions/1856785/characters-allowed-in-a-url as well, the two rfcs seem to allow the usage of underscores in hostnames if I am not mistaken. |
Ok, so to be clear: I am talking specifically about hostnames. Not URLs: those are a more general construct. Many valid URLs are not valid HTTP urls, and many of those are not accepted by Requests. However, and I cannot stress this enough, we are already fixing the problem. There is no argument to be won here. So rather than fill up my inbox with mail and force me to reply to you all, if you're interested in seeing this issue fixed I highly recommend you wait patiently. To reduce the noise in this issue I'm locking it to contributors only until such time as we land the fix, at which point I will allow third party comments again. |
@mfriedenhagen to attempt to help with your confusion, you should be looking at the IDNA standard documents (which you can discover from https://tools.ietf.org/html/rfc5890) |
Should be resolved in v2.12.2. Thanks for your time all. |
requests v2.12.0
The text was updated successfully, but these errors were encountered: