Hex encoding: UTF-8 bytes explained
Published: 2026-06-26
How hexadecimal represents raw bytes, why UTF-8 matters when you encode text as hex, and how hex dumps differ from URL percent-encoding and HTML numeric entities.
Hex encoding (more precisely, writing bytes in hexadecimal) shows each byte as two digits from 0–9 and A–F, covering values 0–255 (0x00–0xFF). Developers use hex dumps to inspect UTF-8 text, compare fingerprints, debug APIs, and read logs where binary data is printed as readable characters. Hex is not encryption—it is a human-friendly view of bytes—but confusing characters, Unicode code points, and UTF-8 bytes causes many “why is é four hex digits?” mistakes.
Bytes and hex in one minute
Computers store text as bytes (8-bit numbers). Hex is just base 16 notation for those numbers:
| Decimal | Hex | Typical meaning in ASCII |
|---|---|---|
| 72 | 48 |
letter H |
| 101 | 65 |
letter e |
| 108 | 6c |
letter l |
| 233 | e9 |
byte in UTF-8 for é (Latin-1) |
The string Hello in UTF-8 is five bytes:
48 65 6c 6c 6f
Each pair of hex digits is one byte. Uppercase (6C) and lowercase (6c) mean the same value; tools often let you pick a display case.
Hex output can be continuous (48656c6c6f) or grouped with spaces (48 65 6c 6c 6f) for readability. Decoders usually ignore separators, 0x prefixes, commas, and line breaks—as long as the remaining digits form an even count.
UTF-8: characters vs bytes
Unicode assigns code points to characters (for example U+0048 for H, U+00E9 for é). UTF-8 is the encoding that turns code points into a variable-length byte sequence on the wire or on disk.
Examples:
| Text | Code points (conceptual) | UTF-8 bytes (hex) |
|---|---|---|
A |
U+0041 | 41 (1 byte) |
é |
U+00E9 | c3 a9 (2 bytes) |
😀 |
U+1F600 | f0 9f 98 80 (4 bytes) |
When you “encode text to hex,” you are almost always seeing UTF-8 bytes, not the Unicode code point number. é is c3a9, not e9 alone—unless your pipeline intentionally uses Latin-1 (ISO-8859-1), which is a different charset.
Decoding hex back to text requires the reverse: interpret the byte sequence as UTF-8. If the bytes are not valid UTF-8 (wrong leading bytes, truncated sequences), a strict decoder reports an error even though the hex itself was syntactically fine.
Where hex shows up in real systems
Logs, dumps, and debugging
Packet captures, firmware logs, and xxd-style dumps print bytes as hex so you can spot magic numbers (89 50 4E 47 for PNG), length fields, and corrupted data without a binary viewer.
Cryptography and hashing
Hash functions output fixed-length byte arrays. Tools display them as hex strings (e3b0c442…) because hex is compact and copy-paste friendly. The underlying value is still bytes; hex is only presentation.
URL percent-encoding
Percent-encoding writes the same byte values as %HH:
| UTF-8 byte | Hex alone | In a URL |
|---|---|---|
| space (32) | 20 |
%20 |
é first byte |
c3 |
%C3 |
é second byte |
a9 |
%A9 |
So %C3%A9 and the hex dump c3 a9 describe the same two bytes. Percent-encoding adds % and is constrained by URL grammar; a plain hex dump is the raw byte view.
HTML numeric entities (hex form)
HTML character references can use &#xHH; for a Unicode code point, not a UTF-8 byte sequence:
' → apostrophe (U+0027)
😀 → 😀 (U+1F600)
That is markup escaping, not a hex dump of stored bytes. é means the character é (one code point), while UTF-8 hex for é is c3 a9 (two bytes). Same letter, different layers—another common source of confusion.
Encoding text → hex (typical workflow)
- Take the string as Unicode text.
- Encode with UTF-8 (browser
TextEncoder, or your language’s UTF-8 encoder). - Format each byte as two hex digits, optionally with spaces or chosen letter case.
Example with Hi!:
Text: H i !
Bytes: 48 69 21
Continuous: 486921
Spaced: 48 69 21
The byte count is the UTF-8 length (3 here), not the number of visible characters—emoji and many non-Latin letters use more bytes than one.
Decoding hex → text
Decoding is not “read hex as ASCII” unless you know the data is 7-bit ASCII:
- Strip non-hex characters (
0x, spaces, newlines, commas). - Require an even number of hex digits—each byte needs exactly two nibbles.
- Parse pairs into bytes (
48→ decimal 72). - Decode bytes as UTF-8 to recover text.
Pitfalls:
- Odd digit count:
48656c6c6is invalid; the last nibble has no partner. - Valid hex, invalid UTF-8:
ff femight parse as bytes but fail UTF-8 decoding. - Wrong charset assumption: Latin-1 hex for
éise9(one byte); UTF-8 isc3 a9(two). Pick the encoding your source actually used.
Hex vs other byte representations
| Representation | What it shows | Typical use |
|---|---|---|
| Hex | Two digits per byte, easy to read | Dumps, hashes, debug |
| Decimal bytes | 72 101 108… |
Teaching, some protocols |
| Base64 | Four ASCII chars per three bytes | JSON, email, data URLs |
| Binary | 8 bits per byte | Low-level bit masks |
Hex and Base64 both represent the same underlying bytes; they differ in density and what characters are safe in a given context. For URL query values you still need percent-encoding; for HTML text you need entity escaping—neither is replaced by a hex dump.
Common pitfalls
- Treating hex as “the character code”:
41is the UTF-8 byte forA, which matches ASCII and U+0041—but multi-byte characters need the full UTF-8 sequence, not one hex pair per letter you see on screen. - Mixing layers: Do not paste a UTF-8 hex dump into HTML as
&#x…;entities without converting code points first. - Silent stripping: Decoders that ignore garbage characters can hide typos (
48g65→4865). Always verify byte count and round-trip when debugging. - Assuming hex hides secrets: Hex is reversible and readable. Encoding is not encryption.
- Endianness in multi-byte numbers: Integer fields in binary protocols use byte order (big-endian vs little-endian). A hex dump of
00 01is not the same semantic as the two-character string"0001"in UTF-8—context matters.
When a local hex tool helps
Typical tasks:
- Verify UTF-8 for a string before it goes into an API, database, or
%-encoded URL. - Paste a log dump (with spaces or
0xprefixes) and recover the original text. - Compare byte length to character count when emoji or accented text behaves oddly in a legacy system.
- Inspect decimal byte values alongside hex when documentation lists offsets in decimal.
Because hex conversion is pure string and byte math, it is safe to run on tokens, payloads, or PII in the browser when nothing is uploaded to a server.
Try it locally in your browser
Use the Hex Encoder & Decoder to:
- Encode plain text to UTF-8 hex with optional spaces and uppercase or lowercase digits.
- Decode flexible hex dumps—non-hex characters are stripped so spaced groups, commas,
0xprefixes, and line breaks all work. - Inspect UTF-8 byte count and decimal byte values in the byte view panel.
- Swap output back into the input to chain steps; processing stays in your browser.
For %HH rules in URLs and forms, see the URL Encoder, Decoder & URL Parser. For < and &#x…; in markup, use the HTML Entity Escape & Unescape.
Related reading
- URL encoding vs form encoding — percent-encoding uses the same byte values as hex, with
%delimiters and URL-specific rules. - HTML entities: what to escape and why —
&#xnumeric references refer to Unicode code points, not UTF-8 byte dumps.