Base64 Encoding and Decoding
A byte-level walkthrough of 6-bit regrouping, padding, and why base64 is the standard envelope for binary-in-text protocols
Bytes are the computational truth, every other representation is an interface format
Binary data cannot be safely passed through every channel. Email bodies, JSON fields, HTTP headers, PEM certificates, and HTML data-URIs all assume text - specifically, printable ASCII. Raw bytes violate those assumptions the moment a value like 0x00, 0x0A, or anything above 0x7E appears. Systems that handle these bytes as text may truncate, escape, reinterpret, or silently corrupt them.
Base64 solves this by re-encoding binary into a 64-character alphabet drawn entirely from printable ASCII: \(A-Z \) , \(a-z\), $0-9$, \(+\) and \(/\)
Every possible input byte combination is representable, no control characters appear in the output, and the encoded form survives every text-layer transport faithfully. The trade-off is size: base64 output is roughly 33% larger than its input.
Base64 Encoding
Base64 encoding works by regrouping bits. Instead of the 8-bit byte grouping that binary data arrives in, base64 reshuffles the stream into 6-bit units each of which maps to exactly one character in the 64-character sequence
[ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnoqrstuvwxyz0123456789+/]
When the input length is not divisible by 3, a remainder of 1 or 2 bytes is handled separately. The last group is zero-padded to fill the 6-bit slots, and = padding characters are appended to bring the output to a multiple of 4 characters. This is what makes base64 output length-stable and decodable without any metadata information.
Base64 Decoding to bytes
Decoding inverts the bit-regrouping: four 6-bit base64 characters collapse back into three 8-bit bytes.
Each input character is passed through the 256-entry lookup table. The table maps every ASCII Decimal to base64 index (0-63).
static const char asciiToBase64LookUpTable[256] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 0 - 15
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 16 - 31
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63, // 32 - 47
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, // 48 - 63
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, // 64 - 79
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, // 80 - 95
-1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, // 96 - 111
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1, -1, // 112 - 127
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 128 - 143
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 144 - 159
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 160 - 175
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 176 - 191
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 192 - 207
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 208 - 223
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 224 - 239
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, // 240 - 255
};
Why decode to bytes?
Cryptopals challenge inputs encode arbitrary binary, not printable text. So the general order of operating on inputs in this challenge is
Input arrives as base64 or hex → decode to raw bytes
All transforms (XOR, Hamming, key scoring) operate on byte arrays
Output is re-encoded to hex or base64 only for
std::coutdisplay

