Internationalized URL #matchurl.com

Internationalized URL #matchurl.com

06.Nov.2021

Adding Internationalized URLs to your website is easy. This article will show you how.

All modern browsers support IRIs, so all the parts of the URL (except for the domain name) should work. There may be some issues with older browsers and noncompliant servers, but these will disappear over time as IRI adoption spreads. The parts that need special treatment are the domain name and path; see below for details on each one.

Domain Name

The character set used in the internationalized version of your domain name has no bearing on whether or not it can be registered, according to recent news from ICANN . What matters is whether or not it complies with existing rules for IDNs . You must still register your internationalized domain name using Punycode, since this is what browsers will convert these to before sending requests. This does not affect the length of your domain name; it is still limited to 63 characters per label (for a total of 191 characters) under current rules.

Path

To internationalize your website's path, you need to use %HH encoding, where HH represents the hexadecimal value of one or more UTF-8 encoded code points that appear in the URL after the first / . Your site's root is considered part of the path up until and including /../ , so if there are any periods in your original path, you'll need to encode them as well. So for example:

www.example.com/blue%2Fph%C3%A9bus%2F

would turn into:

www.example.com/blue%252Fph%25C3%25A9bus%252F

Note that % is also an important character, so you'll need to replace it with %25 in your path when encoding it. This only has to be done once in each URL; after the encoded path reaches 3 characters or longer (the longest one I've seen is 13), the rest of the original path is ignored. After encoding, URLs will still work if they're missing trailing slashes and case-sensitive--and internationalized--characters aren't allowed in directories and filenames. The HTML5 spec suggests that although non-ASCII characters in filenames are allowed, their specification probably won't be finalized until 2022.

LDH Rule

Whether or not your website's path has to follow the LDH rule depends on whether or not it uses internationalized domain names. If it does, then you can use any legal characters (including upper and lowercase letters) except for the following: " / , ? : @ & = + $ ; # [ ] | ` ~ % . , * ' () - _ { } | !"#$%&'()*+, -./:;<=>[email protected][]^_`{|}~] . Would-be attackers also cannot use these Unicode code points: U+002D (-), U+002E (.). U+3002 ( 〃 ), U+FF0E ( ︰ ) , U+FF61 ( 。 )

 

Domain Name

The character set used in your internationalized domain name has no bearing on whether or not it can be registered. What matters is whether or not it complies with existing rules for IDNs . You must still register your internationalized domain name using Punycode, since this is what browsers will convert these to before sending requests. This does not affect the length of your domain name; it is still limited to 63 characters per label (for a total of 191 characters) under current rules.

Path

To internationalize your website's path, you need to use %HH encoding , where HH represents the hexadecimal value of one or more UTF-8 characters that appear in the URL after the first / . Your site's root is considered part of the path up until and including /../ , so if there are any periods in your original path, you'll need to encode them as well.

 

This only has to be done once in each URL; after the encoded path reaches 3 characters or longer (the longest one I've seen is 13), the rest of the original path is ignored. After encoding, URLs will still work if they're missing trailing slashes and case-sensitive - and internationalized - characters aren't allowed in directories and filenames. The HTML5 spec suggests that although non-ASCII characters in filenames are allowed, their specification probably won't be finalized until 2022 .

LDH Rule

The character set used in your internationalized domain name has no bearing on whether or not it can be registered. What matters is whether or not it complies with existing rules for IDNs. You must still register your internationalized domain name using Punycode , since this is what browsers will convert these to before sending requests. This does not affect the length of your domain name; it is still limited to 63 characters per label (for a total of 191 characters) under current rules .

We are social