Supported encodings

chardet supports over 70 character encodings, organized into tiers by the EncodingEra enum. By default, detect() only considers MODERN_WEB encodings. Pass encoding_era=EncodingEra.ALL to consider all tiers, or combine specific tiers with |.

MODERN_WEB

Encodings widely used on the modern web. This is the default tier for detect() and detect_all().

Unicode

  • UTF-8

  • UTF-8-SIG

  • UTF-16 (BE and LE variants)

  • UTF-32 (BE and LE variants)

Multi-byte (CJK)

  • Big5 (Traditional Chinese)

  • CP932 (Japanese)

  • CP949 (Korean)

  • EUC-JP (Japanese)

  • EUC-KR (Korean)

  • GB18030 (Simplified Chinese)

  • HZ-GB-2312 (Simplified Chinese)

  • ISO-2022-JP (Japanese)

  • ISO-2022-KR (Korean)

  • Shift-JIS (Japanese)

Single-byte (Windows)

  • ASCII

  • CP874 (Thai)

  • KOI8-R (Russian, Ukrainian)

  • KOI8-U (Ukrainian)

  • TIS-620 (Thai)

  • Windows-1250 (Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovene)

  • Windows-1251 (Belarusian, Bulgarian, Macedonian, Russian, Serbian, Ukrainian)

  • Windows-1252 (Danish, Dutch, English, Finnish, French, German, Indonesian, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish)

  • Windows-1253 (Greek)

  • Windows-1254 (Turkish)

  • Windows-1255 (Hebrew)

  • Windows-1256 (Arabic, Farsi)

  • Windows-1257 (Estonian, Latvian, Lithuanian)

  • Windows-1258 (Vietnamese)

LEGACY_ISO

ISO 8859 family and other well-known legacy standards. Include this tier when working with older web content or Unix-era text.

  • ISO-8859-1 (Danish, Dutch, English, Finnish, French, German, Icelandic, Indonesian, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish)

  • ISO-8859-2 (Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovene)

  • ISO-8859-3 (Esperanto, Maltese, Turkish)

  • ISO-8859-4 (Estonian, Latvian, Lithuanian)

  • ISO-8859-5 (Belarusian, Bulgarian, Macedonian, Russian, Serbian, Ukrainian)

  • ISO-8859-6 (Arabic, Farsi)

  • ISO-8859-7 (Greek)

  • ISO-8859-8 (Hebrew — visual and logical)

  • ISO-8859-9 (Turkish)

  • ISO-8859-10 (Icelandic)

  • ISO-8859-11 (Thai)

  • ISO-8859-13 (Estonian, Latvian, Lithuanian)

  • ISO-8859-14 (Breton, Irish, Scottish Gaelic, Welsh)

  • ISO-8859-15 (Danish, Dutch, Finnish, French, Italian, Norwegian, Portuguese, Spanish, Swedish)

  • ISO-8859-16 (Croatian, Hungarian, Polish, Romanian, Slovak, Slovene)

  • Johab (Korean)

LEGACY_MAC

Apple Macintosh-specific encodings.

  • MacCyrillic (Belarusian, Bulgarian, Macedonian, Russian, Serbian, Ukrainian)

  • MacGreek (Greek)

  • MacIceland (Icelandic)

  • MacLatin2 (Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovene)

  • MacRoman (Danish, Dutch, English, Finnish, French, German, Icelandic, Indonesian, Italian, Malay, Norwegian, Portuguese, Spanish, Swedish)

  • MacTurkish (Turkish)

LEGACY_REGIONAL

Uncommon regional and national encodings.

  • CP720 (Arabic)

  • CP1006 (Urdu)

  • CP1125 (Ukrainian)

  • KOI8-T (Tajik)

  • KZ1048 (Kazakh)

  • PTCP154 (Kazakh)

DOS

DOS/OEM code pages.

  • CP437 (English)

  • CP737 (Greek)

  • CP775 (Estonian, Latvian, Lithuanian)

  • CP850 (Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish)

  • CP852 (Croatian, Czech, Hungarian, Polish, Romanian, Slovak, Slovene)

  • CP855 (Bulgarian, Macedonian, Russian, Serbian, Ukrainian)

  • CP856 (Hebrew)

  • CP857 (Turkish)

  • CP858 (Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, Swedish)

  • CP860 (Portuguese)

  • CP861 (Icelandic)

  • CP862 (Hebrew)

  • CP863 (French)

  • CP864 (Arabic)

  • CP865 (Danish, Norwegian)

  • CP866 (Belarusian, Russian, Ukrainian)

  • CP869 (Greek)

MAINFRAME

IBM EBCDIC mainframe encodings.

  • CP037 (Breton, Danish, Dutch, English, Finnish, French, German, Icelandic, Indonesian, Irish, Italian, Malay, Norwegian, Portuguese, Scottish Gaelic, Spanish, Swedish, Welsh)

  • CP424 (Hebrew)

  • CP500 (Breton, Danish, Dutch, English, Finnish, French, German, Icelandic, Indonesian, Irish, Italian, Malay, Norwegian, Portuguese, Scottish Gaelic, Spanish, Swedish, Welsh)

  • CP875 (Greek)

  • CP1026 (Turkish)