Skip to main content

Base64, Base32, Base16

都是一種編解碼的方式,BaseX 後面的數字 X 代表編碼後使用的字元數量。

encode

  1. 若資料來源是字串,轉換成 bytes
  2. 固定數量的位元分為一組。以 Base16 為例,對照表: '0123456789ABCDEF',因為 16 需要 4 bit, 所以 4 bit 為一組
  3. 每一組視為一個數作為 index, 去查詢對照表轉換為字串。以 Base16 為例, uint4 範圍為: 0~15, 對應 0~F。此時資料量變為四分之八倍,也就是兩倍

Base16 Encode

  1. 以字串zZA_A為例
  2. to ascii: [122, 90, 65, 95, 65]
  3. 用 bit 表示: 0111101001011010010000010101111101000001
  4. 4 個分為一組: ['0111', '0110', '0001', '1101']
  5. 轉為十進位: [7, 15, 14, 13, 10, 4, 9, 2, 5, 11]
  6. 根據對照表轉: 7A5A415F41
In [1]: [ord(x) for x in 'zZA_A']
Out[1]: [122, 90, 65, 95, 65]

In [2]: [bin(ord(x))[2:].zfill(8) for x in 'zZA_A']
Out[2]: ['01111010', '01011010', '01000001', '01011111', '01000001']

In [3]: bits = ''.join([bin(ord(x))[2:].zfill(8) for x in 'zZA_A'])

In [4]: bits
Out[4]: '0111101001011010010000010101111101000001'

In [5]: l = len(bits)

In [6]: [bits[i:i+4] for i in range(l//4)]
Out[6]:
['0111',
'1111',
'1110',
'1101',
'1010',
'0100',
'1001',
'0010',
'0101',
'1011']

In [7]: [int(x,2) for x in [bits[i:i+4] for i in range(l//4)]]
Out[7]: [7, 15, 14, 13, 10, 4, 9, 2, 5, 11]

Base32 Encode

流程同上,不一樣的地方是因為 32 需使用 5 個 bit,以及資料量變成五分之八倍

In [1]: bits = ''.join([bin(ord(x))[2:].zfill(8) for x in 'zZA_A'])

In [2]: l = len(bits)

In [3]: [bits[i:i+5] for i in range(l//5)]
Out[3]: ['01111', '11110', '11101', '11010', '10100', '01001', '10010', '00101']

In [4]: [int(x,2) for x in [bits[i:i+5] for i in range(l//5)]]
Out[4]: [15, 30, 29, 26, 20, 9, 18, 5]

Base64 Encode

64 需使用 6 個 bit,以及資料量變成六分之八倍

In [5]: [bits[i:i+6] for i in range(l//6)]
Out[5]: ['011110', '111101', '111010', '110100', '101001', '010010']

In [6]: [int(x,2) for x in [bits[i:i+6] for i in range(l//6)]]
Out[6]: [30, 61, 58, 52, 41, 18]

decode

class BaseX:

@staticmethod
def truncate(x, length):
return x & ((2**length) - 1)

def __init__(self):
amount_alphabet = len(self.alphabet)
if amount_alphabet != 2**self.base_size:
raise ValueError
self.decoder = {}
self.encoder = {}
for i in range(amount_alphabet):
self.decoder[self.alphabet[i]] = i
self.encoder[i] = self.alphabet[i]

def _decode(self, code):
buffer = 0
buffer_len = 0
for c in code:
if c not in self.decoder:
continue
buffer <<= self.base_size
buffer |= self. decoder[c]
buffer_len += self.base_size
if buffer_len >= 8:
yield chr(buffer >> (buffer_len - 8))
buffer_len -= 8
buffer = self.truncate(buffer, buffer_len)
## 沒滿 1 byte 則省略

def decode(self, code):
result = self._decode(code)
result = ''.join(result)
return result


class Base16(BaseX):
alphabet = '0123456789ABCDEF'
base_size = 4


class Base32(BaseX):
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567'
base_size = 5


class Base64(BaseX):
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
base_size = 6


def get_random_string():
from random import choice, randint
string = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
rand_str = ''.join(choice(string) for _ in range(randint(5, 40)))
return rand_str


if __name__ == '__main__':
import base64

func = (
('Base16', base64.b16encode, Base16().decode),
('Base32', base64.b32encode, Base32().decode),
('Base64', base64.b64encode, Base64().decode),
)

for name, builtin, implemented in func:
rand_str = get_random_string()
code = builtin(rand_str.encode()).decode()
print(name, ':', rand_str, '->', code)
assert rand_str == implemented(code)