From Encodings to XOR: Building the Foundations of Cryptography
Solving challenge 1 & 2 of Set - 1 Cryptopals
Please refer to the earlier posts in this series on encoding & decoding as a prerequisite
Introduction
Challenges 1 & 2 don't jump straight into encryption & decryption, but with a conversion exercise: take a hex string and produce its base64 equivalent.
Challenge 1 sets the foundational rule the entire cryptopals is built on:
Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.
Challenges 1 and 2, taken together, are designed to force that lesson before any real cryptography begins. If you understand what is actually happening in these two challenges at the byte level, the rest of Set 1 follows naturally.
Challenge 1 - Hex to Base64
Convert this hex string:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
into its base64 representation:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
What it is really teaching
The word "convert" is slightly misleading. There is no direct translation from hex to base64. Both are representations of the same underlying binary data, but they use different grouping schemes. Hex represents one nibble per character and Base64 represents one sextet per character
\(\text{hex string} \xrightarrow{\text{decode}} \text{raw bytes} \xrightarrow{\text{encode}} \text{base64 string}\)
If you decode the hex given in the question, you recover the ASCII string:
I'm killing your brain like a poisonous mushroom
Challenge 2 - Fixed XOR
Take two equal-length hex strings:
1c0111001f010100061a024b53535009181c
686974207468652062756c6c277320657965
XOR them together byte-by-byte and produce the result as a hex string:
746865206b696420646f6e277420706c6179
Why XOR matters in cryptography
XOR is the foundational operation in symmetric cryptography. It is fast, reversible, and commutative.
\(A \oplus B = C \implies C \oplus B = A\)
So to solve this challenge:
\(\text{hex}_A \xrightarrow{\text{decode}} \text{bytes}_A \quad\quad \text{hex}_B \xrightarrow{\text{decode}} \text{bytes}_B\)
\(\text{bytes}_A \oplus \text{bytes}_B \xrightarrow{\text{encode}} \text{hex result}\)
Hex decoding happens first. XOR happens on uint8_t values. Hex encoding happens last, only for output. The XOR itself is never performed on characters or strings. It is performed on the underlying byte values, which is the only interpretation that is mathematically meaningful.
XOR on unequal-length buffers either silently truncates or reads out of bounds.
Summary
As discussed in the previous posts, the encoding formats (hex, base64) are entry and exit points. Everything in between is bytes.

