HTML entities: what to escape and why
Published: 2026-06-05
How HTML character references work, which characters must be escaped in text and attributes, and why entity encoding is not the same as URL encoding or full XSS protection.
HTML entities (formally character references) let you write characters that would otherwise be interpreted as markup. A browser sees < as the literal character <, not the start of a tag. Escaping is how you safely embed user text, code snippets, and punctuation inside HTML documents, templates, and attributes—without the parser reshaping your string into live DOM nodes.
Character references in one minute
HTML supports two common forms:
| Form | Example | Meaning |
|---|---|---|
| Named | &, <, © |
A predefined name maps to one Unicode code point |
| Decimal numeric | ' |
Code point U+0027 (apostrophe) |
| Hex numeric | ' |
Same code point, hexadecimal |
All three end with ;. In practice browsers are lenient about a missing semicolon in some cases, but always include the semicolon in generated output so parsers agree.
Decoding is the inverse: <div> becomes <div> when interpreted as HTML text, not when shown inside a <pre> block as raw source.
What must be escaped (and where)
The dangerous characters are the ones HTML uses for syntax, not “every special symbol on the keyboard.”
Text content (between tags)
At minimum, escape these so user input cannot become markup:
| Character | Entity (common) | Why |
|---|---|---|
& |
& |
Starts every entity; must be first when escaping |
< |
< |
Starts tags |
> |
> |
Ends tags (less critical in plain text, still good practice) |
Example: the user name Tom & Jerry in a paragraph should be stored or emitted as Tom & Jerry, otherwise some pipelines treat & Jerry as the start of a malformed entity.
Attribute values (quoted)
Attributes add quote characters to the set:
| Character | Entity (common) | Why |
|---|---|---|
" |
" |
Ends a double-quoted attribute |
' |
' or ' |
Ends a single-quoted attribute |
If you emit:
<input value="O'Brien">
the apostrophe can break a single-quoted attribute. Safer:
<input value="O'Brien">
Order matters when escaping: replace & first, then <, >, and quotes. If you escape < before &, you will not double-encode; if you escape & last, you may turn legitimate & sequences into &amp;.
What you usually do not need to escape
In text nodes, characters like /, (, ), -, emojis, and most punctuation can appear literally once <, >, and & are handled. You do not need to entity-encode every non-ASCII letter—UTF-8 in the document is fine.
Named entities such as — or are for authoring convenience and typography, not a security checklist item.
Named vs numeric: when each appears
Named entities (©, , →) are easy to read and stable in HTML docs. The HTML specification defines a large set; older XML may only guarantee &, <, >, ", and '.
Numeric entities work everywhere HTML accepts entities and are handy when:
- No widely supported name exists for the character.
- You are generating XML with a minimal entity set and prefer
&#…;over adding DTD declarations. - You need a explicit code point (e.g.
​for zero-width space).
For apostrophe in HTML5, ' is widely used; ' exists but is more associated with XML.
HTML entities are not URL encoding
Percent-encoding (%20, %3C) solves URL structure—path segments, query values, fragments. Entity encoding solves HTML parsing. The same string might need both treatments in different layers:
User input: Tom & Jerry <3
In HTML text: Tom & Jerry <3
In a URL query value: Tom%20%26%20Jerry%20%3C3
Mixing them is a common bug: putting %3C in HTML body text does not create a safe literal < for humans reading the page; putting < in a URL path does not percent-encode for HTTP. See URL encoding vs form encoding for the URL side.
Security: escaping helps, but context still rules
Entity-escaping user data before inserting it into HTML text or quoted attributes stops the classic “they typed <script> and it ran” mistake when your template layer actually emits entities and does not later decode them.
It is not a universal XSS vaccine:
- JavaScript strings and inline event handlers need JS escaping, not HTML entities.
javascript:URLs,onerror=attributes, and CSSurl()values have their own injection rules.innerHTMLanddocument.writeinterpret HTML; escaping once at input but assigning toinnerHTMLwithout a sanitizer can still be unsafe.- Double encoding (
&lt;) looks safe in source but may decode twice in some pipelines.
Treat HTML escaping as one layer at the output boundary for the correct sink (text node vs attribute vs JS vs CSS), not a substitute for a content security policy, framework auto-escaping, or a vetted sanitizer when you need rich HTML.
Common pitfalls
- Double escaping: Running escape on already escaped text turns
&into&amp;. Decode or detect before re-escaping. - Wrong context: HTML entities in JSON or plain-text email bodies are literal ampersand sequences unless the consumer parses HTML.
- Unescaped
&in attributes:href="?a=1&b=2"is fine in HTML when quoted; in XML or strict serializers,&in attribute values may need&. - Assuming unescape is validation: Decoding
<script>yields<script>—that is correct decoding, not proof the input is safe to inject as HTML. - Newlines and whitespace: Entity encoding does not replace
<pre>semantics; preserve formatting with CSS or explicit markup when needed.
Snippets, CMS fields, and tests
Typical workflows:
- Paste raw markup or prose → escape → copy into a CMS HTML field, README, or email template that expects entities.
- Paste escaped source from a log or API → unescape → read or edit the original characters locally.
- Round-trip check: escape, then unescape, should recover the original for the five core characters when no other HTML normalization occurs.
Because escaping is deterministic string replacement, it is safe to run on secrets or PII in the browser when you use a local tool—nothing needs to leave your device for a quick transform.
Try it locally in your browser
Use the HTML Entity Escape & Unescape tool to:
- Escape
<,>,&,", and'with standard entities ("and'for quotes). - Unescape named and numeric references using the browser’s built-in decoder (
©,—, and similar). - Swap output back into the input to chain steps—processing stays in your browser; text is never uploaded.
For percent-encoding URLs and query strings, use the URL Encoder, Decoder & URL Parser instead.
Related reading
- URL encoding vs form encoding —
%HHencoding for URLs and forms; complements entity encoding for HTML. - Validating and formatting XML in the browser — XML’s smaller default entity set and well-formedness rules.