Supported Encodings

chardet recognises 99 Python text encodings via 86 detection targets and their aliases, across six encoding eras. The default ALL era enables detection of all encodings. Use MODERN_WEB to restrict candidates to the encodings most commonly found on the web today.

Modern Web

Encoding

Aliases

Multi-byte

ascii

us-ascii

No

big5hkscs

big5, big5-tw, csbig5, cp950

Yes

cp874

windows-874

No

cp932

ms932, mskanji, ms-kanji

Yes

cp949

ms949, uhc

Yes

euc-jis-2004

euc-jp, eucjp, ujis, u-jis, euc-jisx0213

Yes

euc-kr

euckr

Yes

gb18030

gb-18030, gb2312, gbk

Yes

hz-gb-2312

hz

Yes

iso-2022-kr

csiso2022kr

Yes

iso2022-jp-2

iso-2022-jp, csiso2022jp, iso2022-jp-1

Yes

iso2022-jp-2004

iso2022-jp-3

Yes

iso2022-jp-ext

Yes

koi8-r

koi8r

No

koi8-u

koi8u

No

shift_jis_2004

shift_jis, sjis, shiftjis, s_jis, shift-jisx0213

Yes

tis-620

tis620, iso-8859-11

No

utf-16

utf16

No

utf-16-be

utf-16be

No

utf-16-le

utf-16le

No

utf-32

utf32

No

utf-32-be

utf-32be

No

utf-32-le

utf-32le

No

utf-7

utf7

No

utf-8

utf8

No

utf-8-sig

utf-8-bom

No

windows-1250

cp1250

No

windows-1251

cp1251

No

windows-1252

cp1252

No

windows-1253

cp1253

No

windows-1254

cp1254

No

windows-1255

cp1255

No

windows-1256

cp1256

No

windows-1257

cp1257

No

windows-1258

cp1258

No

Legacy ISO

Encoding

Aliases

Multi-byte

iso-8859-1

latin-1, latin1, iso8859-1

No

iso-8859-10

latin-6, latin6, iso8859-10

No

iso-8859-13

latin-7, latin7, iso8859-13

No

iso-8859-14

latin-8, latin8, iso8859-14

No

iso-8859-15

latin-9, latin9, iso8859-15

No

iso-8859-16

latin-10, latin10, iso8859-16

No

iso-8859-2

latin-2, latin2, iso8859-2

No

iso-8859-3

latin-3, latin3, iso8859-3

No

iso-8859-4

latin-4, latin4, iso8859-4

No

iso-8859-5

iso8859-5, cyrillic

No

iso-8859-6

iso8859-6, arabic

No

iso-8859-7

iso8859-7, greek

No

iso-8859-8

iso8859-8, hebrew

No

iso-8859-9

latin-5, latin5, iso8859-9

No

johab

Yes

Legacy Mac

Encoding

Aliases

Multi-byte

mac-cyrillic

maccyrillic

No

mac-greek

macgreek

No

mac-iceland

maciceland

No

mac-latin2

maclatin2, maccentraleurope

No

mac-roman

macroman, macintosh

No

mac-turkish

macturkish

No

Legacy Regional

Encoding

Aliases

Multi-byte

cp1006

No

cp1125

No

cp720

No

hp-roman8

roman8, r8, csHPRoman8

No

koi8-t

No

kz-1048

kz1048, strk1048-2002, rk1048

No

ptcp154

pt154, cp154

No

DOS

Encoding

Aliases

Multi-byte

cp437

No

cp737

No

cp775

No

cp850

No

cp852

No

cp855

No

cp856

No

cp857

No

cp858

No

cp860

No

cp861

No

cp862

No

cp863

No

cp864

No

cp865

No

cp866

No

cp869

No

Mainframe (EBCDIC)

Encoding

Aliases

Multi-byte

cp1026

No

cp1140

cp037

No

cp273

No

cp424

No

cp500

No

cp875

No