Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL has an invalid label. #3683

Closed
MikeVL opened this issue Nov 15, 2016 · 29 comments
Closed

URL has an invalid label. #3683

MikeVL opened this issue Nov 15, 2016 · 29 comments
Labels

Comments

@MikeVL
Copy link

MikeVL commented Nov 15, 2016

requests v2.12.0

>>> import requests
>>> requests.get('http://vluki_develop_624_solr:8983/solr/vluki')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 474, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 407, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 302, in prepare
    self.prepare_url(url, params)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 372, in prepare_url
    raise InvalidURL('URL has an invalid label.')
requests.exceptions.InvalidURL: URL has an invalid label.
@MikeVL MikeVL changed the title requests.exceptions.InvalidURL: URL has an invalid label. URL has an invalid label. Nov 15, 2016
@Lukasa
Copy link
Member

Lukasa commented Nov 15, 2016

So this is arguably an increase in strictness thanks to our new idna library, but underscores are not allowed in hostnames. See kjd/idna#17 for more. While this previously worked, I suspect that really it shouldn't have. Underscores are perfectly allowed in DNS, which is why this never encountered a problem prior to this, but for hostnames they're entirely forbidden.

I'm inclined to say that this is within the bounds of things that are allowed to stop working in a minor release, as it's simply outside the realm of what RFCs allow.

@Lukasa
Copy link
Member

Lukasa commented Nov 15, 2016

To be clear, your suitable workarounds are either to downgrade Requests or to change your hostnames to use hyphens instead of underscores. I'm proposing not to consider this a Requests bug, so you'd either be stuck on 2.11.1 for the rest of eternity or you'll need to change your hostnames to use hyphens.

@MikeVL
Copy link
Author

MikeVL commented Nov 15, 2016

Ok, thanks! I will change my urls.

@Lukasa
Copy link
Member

Lukasa commented Nov 15, 2016

Thanks for the report! ✨

@Lukasa Lukasa closed this as completed Nov 15, 2016
@Lothiraldan
Copy link

Lothiraldan commented Nov 15, 2016

We encountered the same issue today and comes to the same conclusion that they are not officially supported. In our situation, we tried to use the devpi CLI against a docker image with the hostname infrastructure_pythondevpi_1. The hostname was auto-generated by docker-compose with its own algorithm which is more or less: PWD-NAME_SERVICE-NAME_NUMBER. The issue is well-known but not yet fixed: docker/compose#229.

We downgraded requests to bypass the problem.

While underscores are not officially support it seems that both Firefox (https://bugzilla.mozilla.org/show_bug.cgi?id=479520#c45) and curl seems to works well with underscores:

curl -I http://infrastructure_pythondevpi_1:3141/root/staging/+api
HTTP/1.1 200 OK
Connection: close
Content-Type: application/json
Date: Tue, 15 Nov 2016 16:36:40 GMT

I didn't saw any option in the idna library to accept these characters, so apart from avoiding requests>2.12.0 when having underscore hostnames, we have no choices, right?

@Lukasa
Copy link
Member

Lukasa commented Nov 15, 2016

I'm afraid that's correct. Other workarounds include adding hosts or DNS aliases for hostnames with underscores in them.

@kottenator
Copy link

I have updated requests to 2.12.0 today and I have the same problem.

Unfortunately, it's not easy to remove _ from my hostnames - I have hundreds of them (legacy stuff). And I don't want to downgrade requests and to use an older version forever.

Maybe you could add an option to disable this validation?

@sigmavirus24
Copy link
Contributor

@kottenator we don't add options for things like this.

@StewPoll
Copy link
Contributor

Just a quick question @sigmavirus24 / @Lukasa

Services like Amazon's SES, and Sengrid, use CNAMES with underscores in them for DKIM authentication.

http://docs.aws.amazon.com/ses/latest/DeveloperGuide/easy-dkim-dns-records.html

As you've mentioned before, these are fine for DNS, but not for Hostnames.

When you say Hostnames, I'm assuming you're talking about DNS records that are used for browsers, or similar tools, to access data, not for records that are used for stuff like the aforementioned DKIM records?

@Lukasa
Copy link
Member

Lukasa commented Nov 16, 2016

@TetraEtc Exactly so. Underscores in DNS names are reasonably common (e.g. SRV records have mandatory underscores in them), but for hostnames (that is, anything in the host part of a URL) they are not allowed.

@mfriedenhagen
Copy link

Well, this is a pity, as e.g. gitlab-ci replaces / with __ to build the hostname from the docker image name, so my integration tests broke out-of-the-blue.

@sigmavirus24
Copy link
Contributor

@mfriedenhagen the GitLab team is very receptive to bug reports and I consider this to be a problem on their end. Regardless, I believe the solution being worked on in #3695 will help you

@nateprewitt
Copy link
Member

Unfortunately, I don't think #3695 will address this problem. #3695 is solely for bypassing hostnames that were IDNA encoded before the request was made. I haven't seen any comment by @Lukasa about changing how we are now handling underscores in hostnames. @sigmavirus24's advice to raise a bug report with GitLab is the way to go.

@Lukasa
Copy link
Member

Lukasa commented Nov 17, 2016

Yup, there's no plan to tolerate underscores: they're not valid in hostnames, and people should avoid using them.

@mfriedenhagen
Copy link

Hm, so a lot of other network utilities are just tolerant (ping, curl, wget, python.urllib, python.urllib2, java.net.URL) but requests will not be among them, that is sad.

@rsaikali
Copy link

rsaikali commented Nov 17, 2016

Understand the underscore, not valid.
But what about IPv6 ?

import requests
url = "http://[2003:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:ed3d]:12900/system"
response = requests.get(url, auth=('admin', 'foobar'))
print response.text

Got the same exception with requests==2.12.1 (and >= 2.12)

Traceback (most recent call last):
  File "test_requests.py", line 6, in <module>
    response = requests.get(url, auth=authentication)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 474, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/site-packages/requests/sessions.py", line 407, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 302, in prepare
    self.prepare_url(url, params)
  File "/usr/local/lib/python2.7/site-packages/requests/models.py", line 372, in prepare_url
    raise InvalidURL('URL has an invalid label.')
requests.exceptions.InvalidURL: URL has an invalid label.

@Lukasa
Copy link
Member

Lukasa commented Nov 17, 2016

@mfriedenhagen None of those tools tolerate IDNs. In this case it's a matter of picking what you want: do users want us to correctly tolerate internationalized domain names, or do they want to do their own IDN handling? I'm prepared to believe that most users do not want to do their own IDN handling.

@rsaikali So that seems to be a bigger issue. And at this point we're in a bit of a bind because by default there aren't good tools to spot this specific problem (namely, we have an IP address here, not a hostname, so skip IDNA), which means we'd have to vendor the ipaddress module to get that to work. Alternatively, we could use some heuristics to detect possible IP addresses in hostnames and skip IDNA encoding for those as well.

I think that's definitely a problem though.

An alternative solution to this is to say that if the idna module fails to encode we'll just go ahead and try to encode as ASCII. If that works then we shrug our shoulders and say everything is probably ok, and if it fails we catch that and throw InvalidURL. @sigmavirus24, how does that sound?

@rsaikali
Copy link

Do you want me to open a new ticket for that ? "InvalidURL: URL has an invalid label. (with IPv6)"
This one is currently closed.

@Lukasa Lukasa reopened this Nov 17, 2016
@Lukasa
Copy link
Member

Lukasa commented Nov 17, 2016

Nope. =)

@yarikoptic
Copy link

Just my 1c: I am all for standards! But I am also pragmatic -- in my case with this upgrade of requests I started to experience this behavior, whenever any other tool (browsers, etc) is working just fine. Hostname in my case is not to be immediately renamed and underscore version immediately abandoned since this is a project which was out there for years, referenced in many scientific papers, etc. So what should we do now? we apparently can't simply adhere to standard 100%, we can't simply rename and forward (once again -- url is already spread out), we can't demand downgrades of requests...

Actually! that restriction (no _) applies to hostnames not DNS records (see e.g. http://stackoverflow.com/a/2183140/1265472). Here we are actually talking about URLs and I do not think that those must contain only hostnames! (needless to say that not every URL points to a different host ;) ) See https://tools.ietf.org/html/rfc3986#section-3.2.2 which defines reg-name (not a hostname) and here is relevant description:

 Such a name consists of a sequence of domain labels separated by ".",
   each domain label starting and ending with an alphanumeric character
   and possibly also containing "-" characters.  The rightmost domain
   label of a fully qualified domain name in DNS may be followed by a
   single "." and should be if it is necessary to distinguish between
   the complete domain name and some local domain.

      reg-name    = *( unreserved / pct-encoded / sub-delims )

which refers to "domain names" (once again which might be hostnames or some other DNS records). So as far as I see it, it is actually requests which by using idna to verify that reg-name is a standard compliant hostname, unnecessarily imposes restriction that URLs must contain only hostnames in reg-name.
What do you think? ;)

CC: @chaselgrove

@Lukasa
Copy link
Member

Lukasa commented Nov 21, 2016

@yarikoptic It might have helped to read the discussion more carefully: the core devs are already in agreement with you, and we're taking steps to resolve the problem (see #3695). You could have saved yourself a bit of time by checking that first.

However, addressing the substantive portion of your complaint:

So as far as I see it, it is actually requests which by using idna to verify that reg-name is a standard compliant hostname, unnecessarily imposes restriction that URLs must contain only hostnames in reg-name.

A fuller read of that section will reveal to you that the only ambiguity is between reg-name and IP-literal/IPv4address. Again, as noted above, the Requests team is already in agreement that Requests has inadvertently blocked IP addresses. However, the reg-name argument is a bad one. reg-name is defined with a very general ABNF because, as the RFC states, it is not limited to DNS or any specific form of lookups. The host portion of a general URI may cover quite a lot of name formats.

However, we aren't dealing with general URIs. We only support URIs that contain either IP address literals (in which case we do no name resolution) or hostnames (in which case we do an A lookup). These are not expected to be generalised names.

However, we need to follow the furrow that browsers have ploughed for us, and they have no interest in enforcing restrictions on URL formatting. This is why we're working on a fix.

To all other readers of this issue: please do not add your 2¢ or "me too"s. We already acknowledge this to be an issue. We are working on a fix. Please wait patiently for that fix to come through.

@yarikoptic
Copy link

@Lukasa Thank you very much for the rapid reply!!! very much appreciated. And hopefully now ppl would see your statement in bold (feel free to remove my reply to not shadow it), and not e.g. Yup, there's no plan to tolerate underscores: they're not valid in hostnames, and people should avoid using them. just from few days back. Also if label Propose Close was replaced with Fix pending or something like that -- could help as well. another way is to adjust initial bug report description with pointer to a tentative fix.
Cheers!

@yarikoptic
Copy link

Re reg-name argument, just for my education sake, could you please point to the portion of some relevant RFC or any other document stating that URL, as a subset of URIs must use hostname and not a domain name? (as far as I see it, A DNS record not necessarily refers to a hostname, as far as I see from e.g. http://www.freesoft.org/CIE/RFC/1035/34.htm )

@Lukasa
Copy link
Member

Lukasa commented Nov 21, 2016

A records aren't limited to hostnames: Requests is. Requests supports IP addresses (with no name resolution) or hostnames (with A/AAAA record lookup). We do not support any other method of identifying the authority.

@yarikoptic
Copy link

still confused (e.g. on either you are saying that requests intensionally not compliant with existing standards or limiting its applicability, or you are describing current broken state, or just me misunderstanding something) but that is ok -- I trust you guys to fix it properly and life becoming great again ;) Have a lovely week!

@mfriedenhagen
Copy link

Hello, I tried understanding rfc 3986 (URIs) and 3987 (IRIs) As outlined in http://stackoverflow.com/questions/1856785/characters-allowed-in-a-url as well, the two rfcs seem to allow the usage of underscores in hostnames if I am not mistaken.

@Lukasa
Copy link
Member

Lukasa commented Nov 21, 2016

Ok, so to be clear: I am talking specifically about hostnames. Not URLs: those are a more general construct. Many valid URLs are not valid HTTP urls, and many of those are not accepted by Requests.

However, and I cannot stress this enough, we are already fixing the problem. There is no argument to be won here. So rather than fill up my inbox with mail and force me to reply to you all, if you're interested in seeing this issue fixed I highly recommend you wait patiently.

To reduce the noise in this issue I'm locking it to contributors only until such time as we land the fix, at which point I will allow third party comments again.

@psf psf locked and limited conversation to collaborators Nov 21, 2016
@Lukasa Lukasa added Bug and removed Propose Close labels Nov 21, 2016
@sigmavirus24
Copy link
Contributor

@mfriedenhagen to attempt to help with your confusion, you should be looking at the IDNA standard documents (which you can discover from https://tools.ietf.org/html/rfc5890)

@Lukasa
Copy link
Member

Lukasa commented Nov 30, 2016

Should be resolved in v2.12.2. Thanks for your time all.

@Lukasa Lukasa closed this as completed Nov 30, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

10 participants