loading . . . Specification vs. implementation There are a lot of times when the specification says one thing but common implementations do another. Here are some especially common examples to watch out for.
### Attribute quoting
According to the HTML specification, single-quoted attributes are perfectly valid; for example, these HTML fragments should be absolutely equivalent:
<a href="https://example.com/">Check out my website</a>
<a href='https://example.com/'>Check out my website</a>
<a href=https://example.com/>Check out my website</a>
However, there are _many, many_ client implementations which expect any quoted attributes to be double-quoted, and even some which do not support unquoted attributes at all. So, for example, Iāve seen many implementations assume that a single-quoted attribute is equivalent to an unquoted attribute, so it treats _these_ as equivalent:
<a href='https://example.com/'>Check out my website</a>
<a href="'https://example.com/'">Check out my website</a>
which is to say, if `<a href='https://example.com/'>Check out my website</a>` appears on the website `https://foo.example/~bob/homepage.html`, the URL then is interpreted as being `https://foo.example/~bob/%39https://example.com%39`.
Unquoted attributes also often are subject to all sorts of weird things, especially with how the entities within them get decoded.
Email systems are historically _particularly_ bad about this; the impetus to this article was discovering that Fastmail (which is otherwise amazing; use that link for 10% off your first year) does not support single-quoted attributes, and goes so far as to converting the quotes to `'` entities, causing even more problems downstream.
So, for maximum compatibility, itās best to always use double-quoted attributes, regardless of what the HTML specification says.
### Protocol-relative URLs
Back in the day, it was pretty common for websites to serve things up in a mixture of HTTP (plaintext) and HTTPS (encrypted), and there were reasons to want static pages to link to external resources with the same scheme (for example, an HTTP page referring to an external image with an HTTP URL, but the HTTPS version of the page using an HTTPS URL for the image).
I cannot find any official specification for HTML, but the commonly-accepted standard for these, per the generic URI syntax, considers the initial `//hostname` to be the starting portion of the path component of the URL, which is to say, a protocol-relative URL of `//example.com/foo` should be treated by adding the current pageās scheme to the URL; for example, from `https://example.com/~bob/homepage.html`, a link to `//website.example/meow.gif` should be interpreted as `https://website.example/meow.gif`, while from `http://example.com/~alice/` the same link would become `http://website.example/meow.gif`.
Unfortunately, a _lot_ of software out there just sees that the link starts with a `/` and assumes itās a site-relative URL instead, so from `https://example.com/~bob/` it is interpreted as `https://example.com/website.example/meow.gif`.
You can see how your browser implements such a link.
In any case, itās better to be explicit about your URL scheme, and in general if a site supports `https` itās best to just link to that version anyway.
### Path coalescing
In UNIX and other operating systems, there is a convention that `..` refers to the parent directory, so for example the path `/foo/bar/../baz` is equivalent to `/foo/baz`. Additionally, `.` refers to the current directory, and `/` is seen as a path separator. So a path of `/foo/bar/./baz` is equivalent to `/foo/bar/baz`, for example.
Most web-based things will automatically apply these rules, even if itās technically incorrect; for example, both Apache and nginx will internally manipulate the URL to treat them as equivalent before it even touches the backing application, and even if they donāt, it seems that most application stacks will also pre-coalesce the URL.
But on the client side, browsers will also automatically do this path coalescing before it even forms the URL to be requested; for example, `../blog/` and even `https://junk.sockpuppet.band/foo/bar/../../songlets/` never even show up on the wire with any of the `..` components from most browsers (although I have seen some clients preserve them). Strictly-speaking those URLs shouldnāt even be equivalent, because `foo/bar` is a nonexistent path on both of those sites, so based purely on filesystem rules those _should_ result in a 404 Not Found error. But things are being short-circuited for the sake of friendliness. And if you enter a URL manually, by copy-pasting e.g. `https://beesbuzz.biz/foo/../code/` into your location bar, every browser Iāve tried will just automagically coalesce the path component.
(Note that how it coalesces `//` is inconsistent, in my experience; some browsers treat it as a subdirectory with an empty name, while others treat it as if itās the same as a single `/`, the same as UNIX.)
But itās not necessarily the case that the path _will_ be coalesced. For example, hereās a trivial WSGI application that just passes through a couple of things from the request:
def app(environ, start_response):
start_response("200 OK", [("Content-Type", "text/plain")])
for key in ('HTTP_HOST', 'RAW_URI'):
yield f'{key}: {environ[key]}\n'.encode('utf-8')
And hereās some outputs when run through gunicorn; for starters, by default, curl coalesces `/./` and `/../` (but not `//`) client-side:
bean:~ $ curl -i http://localhost:8000/foo//moo/./bar/../baz/
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 08 Apr 2026 20:36:44 GMT
Connection: close
Transfer-Encoding: chunked
Content-Type: text/plain
HTTP_HOST: localhost:8000
RAW_URI: /foo//moo/baz/
bean:~ $ curl -i http://localhost:8000/foo//moo/../../bar/
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 08 Apr 2026 20:42:56 GMT
Connection: close
Transfer-Encoding: chunked
Content-Type: text/plain
HTTP_HOST: localhost:8000
RAW_URI: /foo/bar/
But a request that uses the path as-is will still at least pass through directly, at least through gunicorn itself:
bean:~ $ curl -i http://localhost:8000/foo//moo/./bar/../baz/ --path-as-is
HTTP/1.1 200 OK
Server: gunicorn
Date: Wed, 08 Apr 2026 20:40:09 GMT
Connection: close
Transfer-Encoding: chunked
Content-Type: text/plain
HTTP_HOST: localhost:8000
RAW_URI: /foo//moo/./bar/../baz/
But in other testing I have found that, at least with a stack of nginx+gunicorn+Flask, the path coalescing takes place _somewhere_ before it hits the actual application. (I do not have the patience to try to figure out where, exactly, not that it even matters.)
All this is to say, you _cannot_ expect runs of multiple `/` or paths containing `/./` or `/../` to remain intact, even when the request is being made at the wire level, but you also cannot assume that the path _will_ be pre-coalesced.
### Case-sensitivity/case-folding
Case-sensitivity and lack thereof in the hostname is also something you cannot rely on:
$ curl -ivvv http://beesbuzz.biz/ | head
* Host beesbuzz.biz:80 was resolved.
[...]
> GET / HTTP/1.1
> Host: beesbuzz.biz
[...]
< link: <https://webmention.io/beesbuzz.biz/webmention>; rel="webmention"
< Link: <https://beesbuzz.biz/_tokens>; rel="token_endpoint"
[...]
$ curl -sivvv http://BeesBuzz.Biz/ | head
* Host BeesBuzz.Biz:80 was resolved.
[...]
> GET / HTTP/1.1
> Host: BeesBuzz.Biz
[...]
< link: <https://webmention.io/beesbuzz.biz/webmention>; rel="webmention"
< Link: <https://beesbuzz.biz/_tokens>; rel="token_endpoint"
[...]
In this case, note that `curl` preserved the case of the domain name in the `Host:` parameter, but something within the stack converted the hostname to all-lowercase (as can be seen in the `link` headers in the response). Whether this is happening in nginx or Flask is uncertain (and I, again, do not feel a particular need to figure out where this takes place, although Iād assume itās at the vhost ā and therefore nginx ā level), but gunicorn does preserve the case of the hostname (using the same minimal WSGI app as above):
bean:~ $ curl http://LocalHost:8000/
HTTP_HOST: LocalHost:8000
RAW_URI: /
So, as with path coalescing, you cannot assume that elements will be case-folded for you, but you also cannot assume that they _wonāt_ be.
And of course, path resolution for resources is up to the underlying implementation; a webserver running on macOS or Windows will (usually) treat `/foo.jpg` and `/foo.JPG` as the same resource, while on Linux, those are different resources. Of course the browser will _hopefully_ treat them as separate for the purpose of caching, but: **_you cannot guarantee this_**.
As one of my college professors once said, āIf it makes a difference whether something is case-sensitive or not, you have made a mistake.ā
### http vs. https in general
Nothing in the HTTP specification says that the same path on two different schemes will reflect the same resource; for example, `http://example.com/` and `https://example.com/` can very well be completely different websites. But I have seen plenty of browsers, web crawlers, and other software assume that they are the one and the same!
At its most trivial, this very site will have slightly different content for the two versions; there are a handful of places where out of necessity, some links do not appear on the `http` version, or where they are rendered as absolute links and will match the original requestās scheme rather than directing to `https`.
But things can be a lot more complicated. For example, once upon a time I ran a site where the `http` version was an informational page and the `https` version was the webmail for the domain. It was silly to do it that way, and I stopped doing it when browsers started being āhelpfulā about automatically converting http URLs to https (not to mention when I stopped hosting my own email and switched to other hosting providers), but you absolutely cannot just assume that two pages will be the same despite different URL schemes.
(Also, remember that URL schemes other than `http` and `https` exist! FTP, Gopher, and others might have fallen out of fashion, but they still exist. Not to mention nascent protocols like Gemini.)
From a server implementation standpoint, you should assume that clients can and will treat differing schemes as identical, so if a website is available from both protocols, the content should match between them, and if something is only available via `https`, then an `http` request to the same resource should redirect to the `https` one.
But from a client standpoint, you really should consider the scheme to be a part of the URL.
### `www.` prefixes (and other subdomain issues)
Back in the early days of the Internet, it was common for a domain to host a whole bunch of different services, for example `ftp.example.com`, `irc.example.com`, `mail.example.com`, and so on, and many of these would even be hosted by separate physical servers with their own IP addresses. So when the web started up as an experimental thing it was super common to just spin up another server named `www`, and that was the one and only way that people would reference the website; there often wouldnāt even _be_ a root domain `A` record.
In those days, the hostname used to resolve the site had no impact on the resource returned; in fact the `Host:` request header didnāt even exist, and it wasnāt until quite some time later that browsers started sending that, to support name-based virtual hosting. Every website needed its own IP address. (Note that many non-HTTP protocols still have this limitation.)
As the web became the primary use of the Internet, the `www` prefix convention remained, and you had a big hot mess of differing implementations:
* Dedicated-IP hosts that have the root record _and_ `www` resolve to the same server, which would then serve up the same content on either hostname
* Sites that would map both `example.com` and `www.example.com` to the same virtual host configuration
* Sites that would redirect `www.example.com` to `example.com`
* Sites that would redirect `example.com` to `www.example.com`
* Sites that serve up entirely different content for `example.com` vs. `www.example.com`
* Hosts that only resolve from one or the other
Pretty much all of these remain to this very day, and to make things even more fun, many clients try to do āhelpfulā things where, for example, if `example.com` doesnāt resolve itāll automatically redirect to `www.example.com` (or put up a prompt to that effect), or if a web crawler sees both hostnames itāll just assume that both are the same, or follow the preference of whomever implemented it.
I donāt even know what the best practice should be in this case. I guess it should be something like:
* Clients should assume that `www.example.com` and `example.com` are different websites and use canonical URLs to sort out which is the ārealā one if they both exist, even if this means potentially crawling the same site twice
* Servers should redirect to the one that is correct
Then again, the same issue comes up with sites that are available from multiple separate domains, and Iāve also seen situations where badly-behaved crawlers will assume that _all_ subdomains are equivalent (e.g. `alice.example.com` and `bob.example.com`), sometimes even getting confused by ccTLDs that are multi-level (like `.uk`) and thinking that, for example, `example.co.uk` and `google.co.uk` are the same site because theyāre both subdomains of `co.uk`! (This was especially bad back when so-called ādynamic DNSā providers were super common.)
### Redirections
There are so many different kinds of HTTP redirection, each with different implications on caching, HTTP method, and equivalence.
Clients should probably just note the type and target of a redirection rather than try to treat the URLs as equivalent; for example, `/code` and `/code/` are distinct URLs and should be treated as such.
Like, in theory, `/code` could _not_ redirect and instead have entirely different content from `/code/`, but in practice, this will almost certainly cause Problems, and Iām sure thereās even crawlers out there which strip off trailing slashes and then expect the actual request to be redirected.
### In conclusion
When implementing client software, you should do whatever you can to follow the specification, but when implementing server software, you should also be aware of common client implementation issues.
If you do run into an implementation issue, it is of course a kindness to inform the implementor of the mistake, but some of these issues are common enough that itās best to accommodate the common misunderstandings and just sigh quietly about it.
Note: I may earn a commission on affiliated product links in this article. https://beesbuzz.biz/code/6604-Specification-vs.-implementation