Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Unicode support for URLs #1

Closed
doga opened this issue Oct 19, 2024 · 0 comments
Closed

Better Unicode support for URLs #1

doga opened this issue Oct 19, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@doga
Copy link
Owner

doga commented Oct 19, 2024

Problem

IRIs are Unicode-based but URLs aren't, so creating a URL object in JavaScript encodes the URL in ASCII:

let url = new URL('https://çağlayan.info/');
url.host // ⇒ "xn--alayan-vua36b.info"

Also, IRI currently uses an internal class that is derived from URL, but this derived class only enables Unicode for its toString() method.

Solution

IRIs really should be manipulating URLs in their original Unicode form, and this can be achieved through an IRL class✤.

---
title: Class diagram
---

classDiagram
  punycode: toUnicode()
  punycode: toASCII()
  note for punycode "Converts between\nInternationalized Domain Names (IDNs)\nand punycode\n(an ASCII representation of IDNs)."

  URL: string host
  URL: string pathname
  URL: toString()
  URL: ...

  IRL: URL url
  IRL: string host
  IRL: string pathname
  IRL: toString()
  IRL: ...
  IRL ..> punycode : depends on
  IRL --* URL : has property

  IriParser: parse()
  IriParser ..> IRL : depends on
  IriParser ..> URN : depends on
Loading
---
title: Package dependencies
---

flowchart

subgraph PunycodeLibrary [External punycode library]
  punycode
end

subgraph IriLibrary [This IRI library]
  IriParser -->URN
  IriParser -->IRL  
end

IriLibrary --imports--> PunycodeLibrary
Loading
import { punycode } from 'https://esm.sh/gh/doga/[email protected]/mod.mjs';

class IRL{
  constructor(stringable){
    const
    url = new URL(`${stringable}`),
    credentials = url.username ? (url.password ? `${url.username}:${url.password}@` : `${url.username}@`) : '';

    this.protocol = url.protocol;
    this.username = url.username;
    this.password = url.password;
    this.hostname = punycode.toUnicode(url.hostname);
    this.port = url.port;
    this.host = this.port ? `${this.hostname}:${this.port}` : this.hostname;
    this.origin = `${this.protocol}//${this.host}`;
    this.pathname = decodeURIComponent(url.pathname);
    this.search = decodeURIComponent(url.search);
    this.hash = decodeURIComponent(url.hash);
    this.href = `${this.protocol}//${credentials}${this.host}${this.pathname}${this.search}${this.hash}`;

    this.url = url;
  }
  toString(){
    return this.href;
  }
  toJSON(){
    return this.toString();
  }
}

let irl = new IRL('https://çağlayan.info/user/çağlayan/?çağlayan#çağlayan');
irl.host // ⇒ "çağlayan.info"
irl.url.host // ⇒ "xn--alayan-vua36b.info"
irl.pathname // ⇒ "/user/çağlayan/"
irl.url.pathname // ⇒ "/user/%C3%A7a%C4%9Flayan/"
`${irl}` // ⇒ "https://çağlayan.info/user/çağlayan/?çağlayan#çağlayan"
`${irl.url}` // ⇒ "https://xn--alayan-vua36b.info/user/%C3%A7a%C4%9Flayan/?%C3%A7a%C4%9Flayan#%C3%A7a%C4%9Flayan"


Internationalized Resource Locator; I define them as URLs that keep characters in their Unicode form. Note that there's already a specification called IRL which stands for Internet Resource Locator, but that's a different concept that's relatively unknown.

@doga doga pinned this issue Oct 19, 2024
@doga doga self-assigned this Oct 20, 2024
@doga doga added the enhancement New feature or request label Oct 20, 2024
@doga doga closed this as completed Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant