This page looks best with JavaScript enabled

Base64, Base32, Base16

Base64, Base32, Base16

都是一種編解碼的方式,BaseX 後面的數字 X 代表編碼後使用的字元數量。

encode

  1. 若資料來源是字串,轉換成 bytes
  2. 固定數量的位元分為一組。以 Base16 為例,對照表: ‘0123456789ABCDEF’,因為 16 需要 4 bit, 所以 4 bit 為一組
  3. 每一組視為一個數作為 index, 去查詢對照表轉換為字串。以 Base16 為例, uint4 範圍為: 0~15, 對應 0~F。此時資料量變為四分之八倍,也就是兩倍

Base16 Encode

  1. 以字串zZA_A為例
  2. to ascii: [122, 90, 65, 95, 65]
  3. 用 bit 表示: 0111101001011010010000010101111101000001
  4. 4 個分為一組: ['0111', '0110', '0001', '1101']
  5. 轉為十進位: [7, 15, 14, 13, 10, 4, 9, 2, 5, 11]
  6. 根據對照表轉: 7A5A415F41
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
In [1]: [ord(x) for x in 'zZA_A']
Out[1]: [122, 90, 65, 95, 65]

In [2]: [bin(ord(x))[2:].zfill(8) for x in 'zZA_A']
Out[2]: ['01111010', '01011010', '01000001', '01011111', '01000001']

In [3]: bits = ''.join([bin(ord(x))[2:].zfill(8) for x in 'zZA_A'])

In [4]: bits
Out[4]: '0111101001011010010000010101111101000001'

In [5]: l = len(bits)

In [6]: [bits[i:i+4] for i in range(l//4)]
Out[6]:
['0111',
 '1111',
 '1110',
 '1101',
 '1010',
 '0100',
 '1001',
 '0010',
 '0101',
 '1011']

In [7]: [int(x,2) for x in [bits[i:i+4] for i in range(l//4)]]
Out[7]: [7, 15, 14, 13, 10, 4, 9, 2, 5, 11]

Base32 Encode

流程同上,不一樣的地方是因為 32 需使用 5 個 bit,以及資料量變成五分之八倍

1
2
3
4
5
6
7
8
9
In [1]: bits = ''.join([bin(ord(x))[2:].zfill(8) for x in 'zZA_A'])

In [2]: l = len(bits)

In [3]: [bits[i:i+5] for i in range(l//5)]
Out[3]: ['01111', '11110', '11101', '11010', '10100', '01001', '10010', '00101']

In [4]: [int(x,2) for x in [bits[i:i+5] for i in range(l//5)]]
Out[4]: [15, 30, 29, 26, 20, 9, 18, 5]

Base64 Encode

64 需使用 6 個 bit,以及資料量變成六分之八倍

1
2
3
4
5
In [5]: [bits[i:i+6] for i in range(l//6)]
Out[5]: ['011110', '111101', '111010', '110100', '101001', '010010']

In [6]: [int(x,2) for x in [bits[i:i+6] for i in range(l//6)]]
Out[6]: [30, 61, 58, 52, 41, 18]

decode

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
class BaseX:

    @staticmethod
    def truncate(x, length):
        return x & ((2**length) - 1)

    def __init__(self):
        amount_alphabet = len(self.alphabet)
        if amount_alphabet != 2**self.base_size:
            raise ValueError
        self.decoder = {}
        self.encoder = {}
        for i in range(amount_alphabet):
            self.decoder[self.alphabet[i]] = i
            self.encoder[i] = self.alphabet[i]

    def _decode(self, code):
        buffer = 0
        buffer_len = 0
        for c in code:
            if c not in self.decoder:
                continue
            buffer <<= self.base_size
            buffer |= self. decoder[c]
            buffer_len += self.base_size
            if buffer_len >= 8:
                yield chr(buffer >> (buffer_len - 8))
                buffer_len -= 8
                buffer = self.truncate(buffer, buffer_len)
                # 沒滿 1 byte 則省略

    def decode(self, code):
        result = self._decode(code)
        result = ''.join(result)
        return result


class Base16(BaseX):
    alphabet = '0123456789ABCDEF'
    base_size = 4


class Base32(BaseX):
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567'
    base_size = 5


class Base64(BaseX):
    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    base_size = 6


def get_random_string():
    from random import choice, randint
    string = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
    rand_str = ''.join(choice(string) for _ in range(randint(5, 40)))
    return rand_str


if __name__ == '__main__':
    import base64

    func = (
        ('Base16', base64.b16encode, Base16().decode),
        ('Base32', base64.b32encode, Base32().decode),
        ('Base64', base64.b64encode, Base64().decode),
    )

    for name, builtin, implemented in func:
        rand_str = get_random_string()
        code = builtin(rand_str.encode()).decode()
        print(name, ':', rand_str, '->', code)
        assert rand_str == implemented(code)