API Reference¶

Top-level Functions¶

chardet.detect(byte_str, should_rename_legacy=False, encoding_era=<EncodingEra.ALL: 63>, chunk_size=65536, max_bytes=200000, *, prefer_superset=False, compat_names=True, include_encodings=None, exclude_encodings=None, no_match_encoding='cp1252', empty_input_encoding='utf-8')¶

Detect the encoding of the given byte string.

Parameters:

byte_str (bytes | bytearray) – The byte sequence to detect encoding for.
should_rename_legacy (bool) – Deprecated alias for prefer_superset.
encoding_era (EncodingEra) – Restrict candidate encodings to the given era.
chunk_size (int) – Deprecated – accepted for backward compatibility but has no effect.
max_bytes (int) – Maximum number of bytes to examine from byte_str.
prefer_superset (bool) – If True, remap ISO subset encodings to their Windows/CP superset equivalents (e.g., ISO-8859-1 -> Windows-1252).
compat_names (bool) – If True (default), return encoding names compatible with chardet 5.x/6.x. If False, return raw Python codec names.
include_encodings (Iterable[str] | None) – If given, restrict detection to only these encodings (names or aliases).
exclude_encodings (Iterable[str] | None) – If given, remove these encodings from the candidate set.
no_match_encoding (str) – Encoding to return when no candidate survives the pipeline. Defaults to "cp1252".
empty_input_encoding (str) – Encoding to return for empty input. Defaults to "utf-8".

Returns:

A dictionary with keys "encoding", "confidence", and "language".

Return type:

DetectionDict

chardet.detect_all(byte_str, ignore_threshold=False, should_rename_legacy=False, encoding_era=<EncodingEra.ALL: 63>, chunk_size=65536, max_bytes=200000, *, prefer_superset=False, compat_names=True, include_encodings=None, exclude_encodings=None, no_match_encoding='cp1252', empty_input_encoding='utf-8')¶

Detect all possible encodings of the given byte string.

When ignore_threshold is False (the default), results with confidence <= MINIMUM_THRESHOLD (0.20) are filtered out. If all results are below the threshold, the full unfiltered list is returned as a fallback so the caller always receives at least one result.

Parameters:

byte_str (bytes | bytearray) – The byte sequence to detect encoding for.
ignore_threshold (bool) – If True, return all candidate encodings regardless of confidence score.
should_rename_legacy (bool) – Deprecated alias for prefer_superset.
encoding_era (EncodingEra) – Restrict candidate encodings to the given era.
chunk_size (int) – Deprecated – accepted for backward compatibility but has no effect.
max_bytes (int) – Maximum number of bytes to examine from byte_str.
prefer_superset (bool) – If True, remap ISO subset encodings to their Windows/CP superset equivalents.
compat_names (bool) – If True (default), return encoding names compatible with chardet 5.x/6.x. If False, return raw Python codec names.
include_encodings (Iterable[str] | None) – If given, restrict detection to only these encodings (names or aliases).
exclude_encodings (Iterable[str] | None) – If given, remove these encodings from the candidate set.
no_match_encoding (str) – Encoding to return when no candidate survives the pipeline. Defaults to "cp1252".
empty_input_encoding (str) – Encoding to return for empty input. Defaults to "utf-8".

Returns:

A list of dictionaries, sorted by descending confidence.

Return type:

list[DetectionDict]

UniversalDetector¶

class chardet.UniversalDetector(lang_filter=<LanguageFilter.ALL: 31>, should_rename_legacy=False, encoding_era=<EncodingEra.ALL: 63>, max_bytes=200000, *, prefer_superset=False, compat_names=True, include_encodings=None, exclude_encodings=None, no_match_encoding='cp1252', empty_input_encoding='utf-8')¶

Streaming character encoding detector.

Implements a feed/close pattern for incremental detection of character encoding from byte streams. Compatible with the chardet 6.x API.

All detection is performed by the same pipeline used by chardet.detect() and chardet.detect_all(), ensuring consistent results regardless of which API is used.

Note

This class is not thread-safe. Each thread should create its own UniversalDetector instance.

Parameters:

lang_filter (LanguageFilter)
should_rename_legacy (bool)
encoding_era (EncodingEra)
max_bytes (int)
prefer_superset (bool)
compat_names (bool)
include_encodings (Iterable[str] | None)
exclude_encodings (Iterable[str] | None)
no_match_encoding (str)
empty_input_encoding (str)

MINIMUM_THRESHOLD = 0.2¶

LEGACY_MAP: ClassVar[MappingProxyType] = mappingproxy({'ascii': 'cp1252', 'euc_kr': 'cp949', 'iso8859-1': 'cp1252', 'iso8859-2': 'cp1250', 'iso8859-5': 'cp1251', 'iso8859-6': 'cp1256', 'iso8859-7': 'cp1253', 'iso8859-8': 'cp1255', 'iso8859-9': 'cp1254', 'iso8859-11': 'cp874', 'iso8859-13': 'cp1257', 'tis-620': 'cp874'})¶

feed(byte_str)¶

Feed a chunk of bytes to the detector.

Data is accumulated in an internal buffer. Once max_bytes have been buffered, done is set to True and further data is ignored until reset() is called.

Parameters:: byte_str (bytes | bytearray) – The next chunk of bytes to examine.
Raises:: ValueError – If called after close() without a reset().
Return type:: None

close()¶

Finalize detection and return the best result.

Runs the full detection pipeline on the buffered data.

Returns:: A dictionary with keys "encoding", "confidence", and "language".
Return type:: DetectionDict

reset()¶

Reset the detector to its initial state for reuse.

Return type:: None

property done: bool¶: Whether detection is complete and no more data is needed.

property result: DetectionDict¶: The current best detection result.

Enumerations¶

class chardet.EncodingEra(*values)¶

Bit flags representing encoding eras for filtering detection candidates.

MODERN_WEB = 1¶

LEGACY_ISO = 2¶

LEGACY_MAC = 4¶

LEGACY_REGIONAL = 8¶

DOS = 16¶

MAINFRAME = 32¶

ALL = 63¶

class chardet.LanguageFilter(*values)¶

Language filter flags for UniversalDetector (chardet 6.x API compat).

Accepted but not used — our pipeline does not filter by language group.

Deprecated since version Retained: only for backward compatibility with chardet 6.x callers. Will be removed in a future major version.

CHINESE_SIMPLIFIED = 1¶

CHINESE_TRADITIONAL = 2¶

JAPANESE = 4¶

KOREAN = 8¶

NON_CJK = 16¶

ALL = 31¶

CHINESE = 3¶

CJK = 15¶

Result Types¶

class chardet.DetectionResult(encoding, confidence, language, mime_type=None)¶

A single encoding detection result.

Frozen dataclass holding the encoding name, confidence score, and optional language identifier returned by the detection pipeline.

Parameters:

encoding (str | None)
confidence (float)
language (str | None)
mime_type (str | None)

encoding: str | None¶

confidence: float¶

language: str | None¶

mime_type: str | None¶

to_dict()¶

Convert this result to a plain dict.

Returns:: A dict with 'encoding', 'confidence', 'language', and 'mime_type' keys.
Return type:: DetectionDict

class chardet.DetectionDict¶

Dictionary representation of a detection result.

Returned by chardet.detect(), chardet.detect_all(), and chardet.UniversalDetector.result.

encoding: str | None¶

confidence: float¶

language: str | None¶

mime_type: str | None¶

Constants¶

chardet.DEFAULT_MAX_BYTES: int = 200000¶: Default maximum number of bytes to examine during detection.

chardet.MINIMUM_THRESHOLD: float = 0.20¶: Default minimum confidence threshold for filtering results in chardet.detect_all().