Supported Encodings¶
chardet recognises 99 Python text encodings via
86 detection targets and their aliases, across six
encoding eras.
The default ALL era enables detection of all
encodings. Use MODERN_WEB to restrict
candidates to the encodings most commonly found on the web today.
Modern Web¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
ascii |
us-ascii |
No |
big5hkscs |
big5, big5-tw, csbig5, cp950 |
Yes |
cp874 |
windows-874 |
No |
cp932 |
ms932, mskanji, ms-kanji |
Yes |
cp949 |
ms949, uhc |
Yes |
euc-jis-2004 |
euc-jp, eucjp, ujis, u-jis, euc-jisx0213 |
Yes |
euc-kr |
euckr |
Yes |
gb18030 |
gb-18030, gb2312, gbk |
Yes |
hz-gb-2312 |
hz |
Yes |
iso-2022-kr |
csiso2022kr |
Yes |
iso2022-jp-2 |
iso-2022-jp, csiso2022jp, iso2022-jp-1 |
Yes |
iso2022-jp-2004 |
iso2022-jp-3 |
Yes |
iso2022-jp-ext |
— |
Yes |
koi8-r |
koi8r |
No |
koi8-u |
koi8u |
No |
shift_jis_2004 |
shift_jis, sjis, shiftjis, s_jis, shift-jisx0213 |
Yes |
tis-620 |
tis620, iso-8859-11 |
No |
utf-16 |
utf16 |
No |
utf-16-be |
utf-16be |
No |
utf-16-le |
utf-16le |
No |
utf-32 |
utf32 |
No |
utf-32-be |
utf-32be |
No |
utf-32-le |
utf-32le |
No |
utf-7 |
utf7 |
No |
utf-8 |
utf8 |
No |
utf-8-sig |
utf-8-bom |
No |
windows-1250 |
cp1250 |
No |
windows-1251 |
cp1251 |
No |
windows-1252 |
cp1252 |
No |
windows-1253 |
cp1253 |
No |
windows-1254 |
cp1254 |
No |
windows-1255 |
cp1255 |
No |
windows-1256 |
cp1256 |
No |
windows-1257 |
cp1257 |
No |
windows-1258 |
cp1258 |
No |
Legacy ISO¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
iso-8859-1 |
latin-1, latin1, iso8859-1 |
No |
iso-8859-10 |
latin-6, latin6, iso8859-10 |
No |
iso-8859-13 |
latin-7, latin7, iso8859-13 |
No |
iso-8859-14 |
latin-8, latin8, iso8859-14 |
No |
iso-8859-15 |
latin-9, latin9, iso8859-15 |
No |
iso-8859-16 |
latin-10, latin10, iso8859-16 |
No |
iso-8859-2 |
latin-2, latin2, iso8859-2 |
No |
iso-8859-3 |
latin-3, latin3, iso8859-3 |
No |
iso-8859-4 |
latin-4, latin4, iso8859-4 |
No |
iso-8859-5 |
iso8859-5, cyrillic |
No |
iso-8859-6 |
iso8859-6, arabic |
No |
iso-8859-7 |
iso8859-7, greek |
No |
iso-8859-8 |
iso8859-8, hebrew |
No |
iso-8859-9 |
latin-5, latin5, iso8859-9 |
No |
johab |
— |
Yes |
Legacy Mac¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
mac-cyrillic |
maccyrillic |
No |
mac-greek |
macgreek |
No |
mac-iceland |
maciceland |
No |
mac-latin2 |
maclatin2, maccentraleurope |
No |
mac-roman |
macroman, macintosh |
No |
mac-turkish |
macturkish |
No |
Legacy Regional¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
cp1006 |
— |
No |
cp1125 |
— |
No |
cp720 |
— |
No |
hp-roman8 |
roman8, r8, csHPRoman8 |
No |
koi8-t |
— |
No |
kz-1048 |
kz1048, strk1048-2002, rk1048 |
No |
ptcp154 |
pt154, cp154 |
No |
DOS¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
cp437 |
— |
No |
cp737 |
— |
No |
cp775 |
— |
No |
cp850 |
— |
No |
cp852 |
— |
No |
cp855 |
— |
No |
cp856 |
— |
No |
cp857 |
— |
No |
cp858 |
— |
No |
cp860 |
— |
No |
cp861 |
— |
No |
cp862 |
— |
No |
cp863 |
— |
No |
cp864 |
— |
No |
cp865 |
— |
No |
cp866 |
— |
No |
cp869 |
— |
No |
Mainframe (EBCDIC)¶
Encoding |
Aliases |
Multi-byte |
|---|---|---|
cp1026 |
— |
No |
cp1140 |
cp037 |
No |
cp273 |
— |
No |
cp424 |
— |
No |
cp500 |
— |
No |
cp875 |
— |
No |