Why URLs Can't Contain Any Character
URLs were defined in RFC 1738 (1994) to use a subset of ASCII. The restricted character set exists because URLs need to be safely transmitted through systems designed for ASCII text — email headers, HTML attributes, log files, terminal output.
Some characters have reserved meanings in a URL structure: / separates path segments, ? starts the query string, # marks a fragment, & separates parameters, = separates keys from values. If your data contains these characters, they'd be misinterpreted as URL structure.
Percent Encoding
Any character that can't appear literally in a URL is represented as %XX where XX is the character's hexadecimal byte value in UTF-8.
- Space →
%20(hex for 32, the ASCII space character) +→%2B/→%2F?→%3F=→%3D&→%26#→%23@→%40"→%22
Non-ASCII characters (accented letters, emoji, Chinese characters) are first converted to UTF-8 bytes, then each byte is percent-encoded. The emoji 😀 encodes to %F0%9F%98%80.
The Space Problem: %20 vs +
There are two conventions for encoding spaces in URLs:
%20 is the standard RFC 3986 encoding for a literal space character in any part of a URL.
+ is a form encoding convention (application/x-www-form-urlencoded) where + means space in query strings only. This convention comes from HTML forms.
This creates bugs: if you encode a literal + as +, the other end may decode it as a space. Always use %20 for spaces in application code and use a proper URL encoding library rather than replacing spaces manually.
Unreserved Characters (Never Encode These)
A-Z, a-z, 0-9, -, _, ., ~ — these can always appear literally in a URL without encoding.
Encoding Context Matters
**Path segments:** /search/hello%20world — / must not be encoded (it's the segment separator). Spaces must be encoded.
**Query string values:** ?q=hello%20world — = and & must not be encoded in the structure. Parameter values must be encoded.
**Full URL encoding:** When embedding a URL inside another URL (as a redirect parameter), the entire inner URL must be encoded: ?redirect=https%3A%2F%2Fexample.com%2Fpath
Common Bugs
**Double encoding:** Encoding an already-encoded URL. %20 becomes %2520 (because % gets encoded to %25). The result fails to decode correctly.
**Forgetting to encode form values:** A form input containing & or = will break query string parsing if not encoded.
**Incorrectly encoding the full URL:** You should encode individual parameter values, not the entire URL. Encoding the whole URL will encode /, ?, and &, breaking the URL structure.
**UTF-8 assumptions:** Some old code encodes non-ASCII as Latin-1 bytes rather than UTF-8 bytes. Modern systems expect UTF-8 in URLs (RFC 3986).
Practical Tools
Browser DevTools → Network tab → right-click a request → "Copy as cURL" gives you a properly encoded URL. Using a URL class (new URL() in JavaScript, Python's urllib.parse) is always safer than building URLs by string concatenation.
NoxaKit's UTM Parameter Builder assembles properly encoded tracking URLs. The URL Phishing Analyzer decodes obfuscated URLs that use percent encoding to hide their true destination.