Hex encoding: UTF-8 bytes explained

Published: 2026-06-26

How hexadecimal represents raw bytes, why UTF-8 matters when you encode text as hex, and how hex dumps differ from URL percent-encoding and HTML numeric entities.

Hex encoding (more precisely, writing bytes in hexadecimal) shows each byte as two digits from 0–9 and A–F, covering values 0–255 (0x00–0xFF). Developers use hex dumps to inspect UTF-8 text, compare fingerprints, debug APIs, and read logs where binary data is printed as readable characters. Hex is not encryption—it is a human-friendly view of bytes—but confusing characters, Unicode code points, and UTF-8 bytes causes many “why is é four hex digits?” mistakes.

Bytes and hex in one minute

Computers store text as bytes (8-bit numbers). Hex is just base 16 notation for those numbers:

Decimal	Hex	Typical meaning in ASCII
72	`48`	letter `H`
101	`65`	letter `e`
108	`6c`	letter `l`
233	`e9`	byte in UTF-8 for `é` (Latin-1)

The string Hello in UTF-8 is five bytes:

48 65 6c 6c 6f

Each pair of hex digits is one byte. Uppercase (6C) and lowercase (6c) mean the same value; tools often let you pick a display case.

Hex output can be continuous (48656c6c6f) or grouped with spaces (48 65 6c 6c 6f) for readability. Decoders usually ignore separators, 0x prefixes, commas, and line breaks—as long as the remaining digits form an even count.

UTF-8: characters vs bytes

Unicode assigns code points to characters (for example U+0048 for H, U+00E9 for é). UTF-8 is the encoding that turns code points into a variable-length byte sequence on the wire or on disk.

Examples:

Text	Code points (conceptual)	UTF-8 bytes (hex)
`A`	U+0041	`41` (1 byte)
`é`	U+00E9	`c3 a9` (2 bytes)
`😀`	U+1F600	`f0 9f 98 80` (4 bytes)

When you “encode text to hex,” you are almost always seeing UTF-8 bytes, not the Unicode code point number. é is c3a9, not e9 alone—unless your pipeline intentionally uses Latin-1 (ISO-8859-1), which is a different charset.

Decoding hex back to text requires the reverse: interpret the byte sequence as UTF-8. If the bytes are not valid UTF-8 (wrong leading bytes, truncated sequences), a strict decoder reports an error even though the hex itself was syntactically fine.

Where hex shows up in real systems

Logs, dumps, and debugging

Packet captures, firmware logs, and xxd-style dumps print bytes as hex so you can spot magic numbers (89 50 4E 47 for PNG), length fields, and corrupted data without a binary viewer.

Cryptography and hashing

Hash functions output fixed-length byte arrays. Tools display them as hex strings (e3b0c442…) because hex is compact and copy-paste friendly. The underlying value is still bytes; hex is only presentation.

URL percent-encoding

Percent-encoding writes the same byte values as %HH:

UTF-8 byte	Hex alone	In a URL
space (32)	`20`	`%20`
`é` first byte	`c3`	`%C3`
`é` second byte	`a9`	`%A9`

So %C3%A9 and the hex dump c3 a9 describe the same two bytes. Percent-encoding adds % and is constrained by URL grammar; a plain hex dump is the raw byte view.

HTML numeric entities (hex form)

HTML character references can use &#xHH; for a Unicode code point, not a UTF-8 byte sequence:

&#x27;   → apostrophe (U+0027)
&#x1F600; → 😀 (U+1F600)

That is markup escaping, not a hex dump of stored bytes. é means the character é (one code point), while UTF-8 hex for é is c3 a9 (two bytes). Same letter, different layers—another common source of confusion.

Encoding text → hex (typical workflow)

Take the string as Unicode text.
Encode with UTF-8 (browser TextEncoder, or your language’s UTF-8 encoder).
Format each byte as two hex digits, optionally with spaces or chosen letter case.

Example with Hi!:

Text:     H   i   !
Bytes:    48  69  21
Continuous: 486921
Spaced:     48 69 21

The byte count is the UTF-8 length (3 here), not the number of visible characters—emoji and many non-Latin letters use more bytes than one.

Decoding hex → text

Decoding is not “read hex as ASCII” unless you know the data is 7-bit ASCII:

Strip non-hex characters (0x, spaces, newlines, commas).
Require an even number of hex digits—each byte needs exactly two nibbles.
Parse pairs into bytes (48 → decimal 72).
Decode bytes as UTF-8 to recover text.

Pitfalls:

Odd digit count: 48656c6c6 is invalid; the last nibble has no partner.
Valid hex, invalid UTF-8: ff fe might parse as bytes but fail UTF-8 decoding.
Wrong charset assumption: Latin-1 hex for é is e9 (one byte); UTF-8 is c3 a9 (two). Pick the encoding your source actually used.

Hex vs other byte representations

Representation	What it shows	Typical use
Hex	Two digits per byte, easy to read	Dumps, hashes, debug
Decimal bytes	`72 101 108…`	Teaching, some protocols
Base64	Four ASCII chars per three bytes	JSON, email, data URLs
Binary	8 bits per byte	Low-level bit masks

Hex and Base64 both represent the same underlying bytes; they differ in density and what characters are safe in a given context. For URL query values you still need percent-encoding; for HTML text you need entity escaping—neither is replaced by a hex dump.

Common pitfalls

Treating hex as “the character code”: 41 is the UTF-8 byte for A, which matches ASCII and U+0041—but multi-byte characters need the full UTF-8 sequence, not one hex pair per letter you see on screen.
Mixing layers: Do not paste a UTF-8 hex dump into HTML as &#x…; entities without converting code points first.
Silent stripping: Decoders that ignore garbage characters can hide typos (48g65 → 4865). Always verify byte count and round-trip when debugging.
Assuming hex hides secrets: Hex is reversible and readable. Encoding is not encryption.
Endianness in multi-byte numbers: Integer fields in binary protocols use byte order (big-endian vs little-endian). A hex dump of 00 01 is not the same semantic as the two-character string "0001" in UTF-8—context matters.

When a local hex tool helps

Typical tasks:

Verify UTF-8 for a string before it goes into an API, database, or %-encoded URL.
Paste a log dump (with spaces or 0x prefixes) and recover the original text.
Compare byte length to character count when emoji or accented text behaves oddly in a legacy system.
Inspect decimal byte values alongside hex when documentation lists offsets in decimal.

Because hex conversion is pure string and byte math, it is safe to run on tokens, payloads, or PII in the browser when nothing is uploaded to a server.

Try it locally in your browser

Use the Hex Encoder & Decoder to:

Encode plain text to UTF-8 hex with optional spaces and uppercase or lowercase digits.
Decode flexible hex dumps—non-hex characters are stripped so spaced groups, commas, 0x prefixes, and line breaks all work.
Inspect UTF-8 byte count and decimal byte values in the byte view panel.
Swap output back into the input to chain steps; processing stays in your browser.

For %HH rules in URLs and forms, see the URL Encoder, Decoder & URL Parser. For < and &#x…; in markup, use the HTML Entity Escape & Unescape.