You may not use XHTML (anymore), but when you write HTML, you may be more influenced by XHTML than you think. You are very likely writing HTML, the XHTML way.
What is the XHTML way of writing HTML, and what is the HTML way of writing HTML? Let’s have a look.
HTML, XHTML, HTML
In the 1990s, there was HTML. In the 2000s, there was XHTML. Then, in the 2010s, we switched back to HTML. That’s the simple story.
You can tell by the rough dates of the specifications, too: HTML “1” 1992, HTML 2.0 1995, HTML 3.2 1997, HTML 4.01 1999; XHTML 1.0 2000, XHTML 1.1 2001; “HTML5” 2007.
XHTML became popular when everyone believed XML and XML derivatives were the future. “XML all the things.” For HTML, this had a profound effect: The effect that we learned to write it the XHTML way.
The XHTML way of writing HTML
The XHTML way is well-documented, because XHTML 1.0 describes in great detail in its section on “Differences with HTML 4”:
- Documents must be well-formed.
- Element and attribute names must be in lower case.
- For non-empty elements, end tags are required.
- Attribute values must always be quoted.
- Attribute minimization is not supported.
- Empty elements need to be closed.
- White space handling in attribute values is done according to XML.
- Script and style elements need CDATA sections.
- SGML exclusions are not possible.
- The elements with
id
andname
attributes, likea
,applet
,form
,frame
,iframe
,img
, andmap
, should only useid
. - Attributes with pre-defined value sets are case-sensitive.
- Entity references as hex values must be in lowercase.
Does this look familiar? With the exception of marking CDATA content, as well as dealing with SGML exclusions, you probably follow all of these rules. All of them.
Although XHTML is dead, many of these rules have never been questioned again. Some have even been elevated to “best practices” for HTML.
That is the XHTML way of writing HTML, and its lasting impact on the field.
The HTML way of writing HTML
One way of walking us back is to negate the rules imposed by XHTML. Let’s actually do this (without the SGML part, because HTML isn’t based on SGML anymore):
- Documents may not be well-formed.
- Element and attribute names may not be in lower case.
- For non-empty elements, end tags are not always required.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
- Empty elements don’t need to be closed.
- White space handling in attribute values isn’t done according to XML.
- Script and style elements don’t need CDATA sections.
- The elements with
id
andname
attributes may not only useid
. - Attributes with pre-defined value sets are not case-sensitive.
- Entity references as hex values may not only be in lowercase.
Let’s remove the esoteric things; the things that don’t seem relevant. This includes XML whitespace handling, CDATA sections, doubling of name
attribute values, the case of pre-defined value sets, and hexadecimal entity references:
- Documents may not be well-formed.
- Element and attribute names may not be in lowercase.
- For non-empty elements, end tags are not always required.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
- Empty elements don’t need to be closed.
Peeling away from these rules, this looks a lot less like we’re working with XML, and more like working with HTML. But we’re not done yet.
“Documents may not be well-formed” suggests that it was fine if HTML code was invalid. It was fine for XHTML to point to wellformedness because of XML’s strict error handling. But while HTML documents work even when they contain severe syntax and wellformedness issues, it’s neither useful for the professional — nor our field — to use and abuse this resilience. (I’ve argued this case before in my article, “In Critical Defense of Frontend Development.”)
The HTML way would therefore not suggest “documents may not be well-formed.” It would also be clear that not only end, but also start tags aren’t always required. Rephrasing and reordering, this is the essence:
- Start and end tags are not always required.
- Empty elements don’t need to be closed.
- Element and attribute names may be lower or upper case.
- Attribute values may not always be quoted.
- Attribute minimization is supported.
Examples
How does this look like in practice? For start and end tags, be aware that many tags are optional. A paragraph and a list, for example, are written like this in XHTML:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<ul>
<li>Praesent augue nisl</li>
<li>Lobortis nec bibendum ut</li>
<li>Dictum ac quam</li>
</ul>
In HTML, however, you can write them using only this code (which is valid):
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<ul>
<li>Praesent augue nisl
<li>Lobortis nec bibendum ut
<li>Dictum ac quam
</ul>
Developers also learned to write void elements, like so:
<br />
This is something XHTML brought to HTML, but as the slash has no effect on void elements, you only need this:
<br>
In HTML, you can also just write everything in all caps:
<A HREF="https://css-tricks.com/">CSS-Tricks</A>
It looks like you’re yelling and you may not like it, but it’s okay to write it like this.
When you want to condense that link, HTML offers you the option to leave out certain quotes:
<A HREF=https://css-tricks.com/>CSS-Tricks</A>
As a rule of thumb, when the attribute value doesn’t contain a space or an equal sign, it’s usually fine to drop the quotes.
Finally, HTML–HTML — not XHTML–HTML — also allows to minimize attributes. That is, instead of marking an input
element as required and read-only, like this:
<input type="text" required="required" readonly="readonly">
You can minimize the attributes:
<input type="text" required readonly>
If you’re not only taking advantage of the fact that the quotes aren’t needed, but that text
is the default for the type
attribute here (there are more such unneeded attribute–value combinations), you get an example that shows HTML in all its minimal beauty:
<input required readonly>
Write HTML, the HTML way
The above isn’t a representation of where HTML was in the 90s. HTML, back then, was loaded with <table>
elements for layout, packed with presentational code, largely invalid (as it’s still today), with wildly varying user agent support. Yet it’s the essence of what we would have wanted to keep if XML and XHTML hadn’t come around.
If you’re open to a suggestion of what a more comprehensive, contemporary way of writing HTML could look like, I have one. (HTML is my main focus area, so I’m augmenting this by links to some of my articles.)
- Respect syntax and semantics.
- Validate your HTML, and ship only valid HTML.
- Use the options HTML gives you, as long as you do so consistently.
- Remember that element and attribute names may be lowercase or uppercase.
- Keep use of HTML to the absolute minimum
- Remember that presentational and behavioral markup is to be handled by CSS and JavaScript instead.
- Remember that start and end tags are not always required.
- Remember that empty elements don’t need to be closed.
- Remember that some attributes have defaults that allow these attribute–value pairs to be omitted.
- Remember that attribute values may not always be quoted.
- Remember that attribute minimization is supported.
It’s not a coincidence that this resembles the three ground rules for HTML, that it works with the premise of a smaller payload also leading to faster sites, and that this follows the school of minimal web development. None of this is new — our field could merely decide to rediscover it. Tooling is available, too: html-minifier is probably the most established and able to handle all HTML optimizations.
You’ve learned HTML the XHTML way. HTML isn’t XHTML. Rediscover HTML, and help shape a new, modern way of writing HTML — which acknowledges, but isn’t necessarily based on XML.
Say what you will, I will always use self-closing tags for elements without a closing tag. My OCD disallows me from being satisfied by an opening tag and no indication of the tag closing.
I don’t think I will ever be comfortable not using closing tags in my lists!
Great read, but I’d just give it to
Pug
to format everything for meMy basic takeaway after reading this anarchic manifesto is: Just because you can, doesn’t mean you should. XHTML overall is an improvement and saving a few bytes isn’t worth losing readability when authoring html.
I think XHTML was too strict and put a high cost for a low gain, but it had point. Readability counts (a lot) in any language syntax. Enforce will not be the answer but I think encourage things like always close tags and do not suppress default arguments helps (again, a lot) on the readability of the code.
Again, I don’t think enforce codestyle is the answers but encourage a more readable code is always a good thing
Most of them I follow yes. Except for a few
Closing empty tags. Used to do that a lot back in 2012 because of compatibility issues.
And some things I do that’s the XHTML way because it is easier
All hex entity references must be lowercase, be it color codes, or IDs it is much more manageable to keep it all lowercas
Sorry, I cannot agree to most of this. Using XHTML syntax, I can visually check source code and see an appropriate structure. I don’t always have access to tools that validate the HTML.
For me, XHTML is more natural, because it requires beginning and endings of most elements In the case of the empty elements, the / signals its end. It also brings some sanity to those who come from backend languages or even Javascript, as the syntax in those languages require beginning and ending tokens.
I will still use name attributes on occasion, as that spec is an attribute and not concerned with beginning and ending an element. However, CSS selectors and parts of the Javascript API use ID attributes, so that is likely why the name attribute feel out of favor, with the exception for forms.
I would argue that articles such as this, while innocently describing the true specification for HTML5, is actually normalising bad practice.
For example, dropping attribute quotes to save a few bytes will cause more issues than simply continuing to follow the XHTML spec, especially given attributes are often dynamically injected these days.
Similarly, dropping closing tags causes untold woes when the HTML is more than a few lines or worked on by a few developers, and you’re not sure if it’s supposed to be nested or simply an error.
There were reasons everyone preferred XHTML back in the early 00’s. We shouldn’t forget those.
Unfortunately, many non-browser parsers do expect optional and closing tags. Bing, for instance, expects optional head and body tags or it could fail to read some metadata. Some link previewers also fail.
Optional closing slashes, quotes, etc. are completely unnecessary and should be picked up by typical HTML minifiers.
A lot of the XHTML points actually made sense. You don’t HAVE TO close an element in HTML but it helps to know where it ends (but people don’t really understand the concept of “elements” in HTML).
One thing influenced by XHTML that has never gone away and is so deeply, deeply entrenched in the web dev world is the typing of HTML element selectors in lowercase within the CSS. This stems from when XHTML said that within the HTML that HTML element names should probably be in lowercase but would have to be if you wanted to validate your document as XML, and then if you did that you would have to type your references to them in your stylesheet in lowercase as well or otherwise the matching of elements in the HTML to their corresponding elements in the CSS would not work. Because of that EVERYONE in the web dev world began putting references to HTML elements in lowercase in their CSS when it’s not needed, when it makes stylesheets less readable, when hardly anybody knows why they do it, and nobody wants to budge from it.
When you see:
#header UL LI A:link
It is much easier to quickly tell that UL LI A refers to HTML elements than the following does:
#header ul li a:link
Especially when sifting through tons of CSS code. And then when that confusion occurs it’s harder for developers to grasp the difference between IDs, elements, classes, etc. And then that confusion makes it easier for devs to be too okay with DIV soup.
I would love to see the VS Code plugin that converts from XHTML way of writing to HTML of writing.
I want to write HTML the HTML way but my IDE keeps shouting at me. ;) The default code style settings always seem to be XHTML-HTML.
I may be old fashioned (I first learned HTML in the ’90s and I made my first commission using it in the early 2000s) but I find that learning and respecting the XHTML mantra helps you be a better front end developer, because it’s less sloppy and more predictable. Just because HTML is more permissive doesn’t mean we should lower our standards to it. That’s how I feel about it, anyway.
I feel like minified attributes already became standard. But optional closing tags and for me are like semicolon in JavaScript. Yes, some people write without it and always keep an eye of the circumstances, but for me that harms the readability of the code.
I remember that I found a weird edge case where a library was not generating
</li>
tags and adding<!doctype html>
caused the website layout I was working on to break. I just fixed the library to generate</li>
tags and it solved the problem.I like avoiding tags such as
<head>
because I know browsers are smart to add those and I’m not making websites for bots to spam my comments forms, but in general I close my tags. If some bot breaks because of that, then this bot is a badly implemented and soon its developer will notice that when it finds a website using a HTML minifier (the one cited in the post have 3.9 million downloads per week). I still use<html>
because thelang
attribute.Weirdly I have saw more people closing self-closing tags (like
</link>
and</br>
), because Firefox highlights those as errors in `view-source:, than the opposite. Well, many people also forget to close non-self-closing tags too: Firefox also highlight those errors too and I have seen those a lot.I’ve been through this whole path and even remember being shocked seing
<option>
could be closed!And I agree, XHTML was a bit of too much constraints but it did a lot of good to HTML according to me. It brought a cleaner code, less space for interpretation and more consistency.
Browsers are extremely tolerant but it doesn’t mean we have to push these boundaries, they bring pretty much nothing except a lighter code (which is still important but considering you ship JS libraries beside…).
Consistency is definitely the key, do what you want but stay consistent at least!
Good article. I hadn’t really given this much thought. Which is your point. :-)
One note of English syntax, “Documents may not be well-formed,” is an ambiguous construction. It sounds like it’s illegal for documents to be well-formed. Maybe, “Documents need not be well-formed,” or “Documents may be not-well-formed.”
Great read! The part about not closing tags reminds me a lot about those kind of people who write their JavaScript without semicolons. Yes it works, but I think it’s just barbaric ;-)
Minimalism isn’t always the best idea. I’m more than happy spending those extra bytes for readability (and therefore maintainability) of a project’s code. This applies to HTML as well als CSS and JS.
It’s possible to set up a minifier during build, so that you can keep your XHTML-HTML in the repo, and ship only clean HTML-HTML – that’s what I did and am happy ;)
There is one important consideration, though, that you might want to be aware of.
Is not, semantically, the same as
In the second case, image is, technically, a part of paragraph. It might also affect your presentation, if you are not careful.
Great! Personally, the less you write… the better!
For me, XHTML makes so much more sense; not because it’s XML compatible (though that’s a huge bonus); but because it means there’s a handful of rules which you can consistently obey to have valid code. When you start saying “actually HR doesn’t need a (self-)closing tag” then you start having to remember a list of “the rules do apply to these things, but not to these things, unless you’re in this context in which case you may also need to do this…”; i.e. needless confusion and complexity rather than simple consistency.
There are some things with the XHTML approach that are bad; e.g.
'readonly="readonly"'
is a tautology; why not'readonly="true"'
… Just adding'readonly'
is sort of OK (it’s less wordy, and if the default’s always false it’s clear), but again that’s inconsistent, so I’d rather take the hit of having it equal something than just having the attribute name there.I also wonder how many hours are spent globally in HTML design discussions over each element/attribute, which could be spared by saying “let’s just follow the generic rule” allowing time to be invested in more productive discussions about creating new functionality / adding value.