URL Query Parameters and HTML Entities: The Case of the Missing Semicolon - Redfin Real Estate News

URL Query Parameters and HTML Entities: The Case of the Missing Semicolon

by
Updated on October 5th, 2020

What’s the difference between this HTML snippet:

    <a href="http://www.google.com/search?q=html&foo=0">foo=0</a>

and this?

    <a href="http://www.google.com/search?q=html&copy=0">copy=0</a>

Both of them look like simple Google searches (though they could have been anything; Google is just an example). One of them appends an extra “&foo=0” to the end of the URL; the other appends “&copy=0” instead.

Only the second snippet is valid in HTML 4.01 Strict, but that snippet doesn’t work the way you might expect. Neither snippet is valid in XHTML.

Give up? Click on these:

The first URL searches for “html,” but the other URL searches for “html©=0.”

Two weird things are happening here.

  • Note that “&copy;” is an HTML entity for the copyright symbol “©.” It would have been more obvious if the URL had used a semicolon, like this:
        <a href="http://www.google.com/search?q=html&copy;=0">copy;=0</a>

    or if we’d used a more traditional HTML entity like this:

        <a href="http://www.google.com/search?q=html&quot;=0">quot;=0</a>
  • The second weird thing is a quirk in the HTML specification on character references:

    Note. In SGML, it is possible to eliminate the final “;” after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the “;” in all cases to avoid problems with user agents that require this character to be present.

    As a result, all modern browsers (FF3, IE7, Opera 9, Safari 3.1) will helpfully notice possible entities like “&copy” and “&lt” and replace them with “©” and “<” … they assume you forgot the semicolon. This applies to all of the HTML entities, even the obscure ones like &empty “∅”, &not “¬”, &reg “®”, &sub “⊂”, and &lang “⟨”. (Bizarrely, &Copy is left alone as “&Copy” but &COPY is replaced with “&COPY;”.)

We think there are two valuable lessons to learn from this story. The first lesson you may already know:

  1. The correct way to write an URL with a query parameter is to HTML escape the URL, replacing all &s with &amp; like this:
        <a href="http://www.google.com/search?q=html&amp;copy=0">copy=0</a>

    That’s also the only way to make the snippet XHTML compliant.

  2. Don’t use URL query parameters whose names are HTML entities. Never create a web service that accepts a query parameter like “&lang=en”. After all, there’s no way to know when your users might want to copy & paste your URLs into a blog, forum, or HTML email. Even if developers are clever enough to HTML escape href links, not everyone will be, and you can save everybody some trouble by avoiding the dangerous entities altogether.

Leave a Comment

Your email address will not be published. Required fields are marked *

Be the first to see the latest real estate news:

  • This field is for validation purposes and should be left unchanged.

By submitting your email you agree to Redfin’s Terms of Use and Privacy Policy

Scroll to Top