API Reference¶

Top-level Functions¶

chardet.detect(byte_str, should_rename_legacy=True, encoding_era=<EncodingEra.ALL: 63>, chunk_size=65536, max_bytes=200000)¶

Detect the encoding of the given byte string.

Parameters match chardet 6.x for backward compatibility. chunk_size is accepted but has no effect.

Parameters:

byte_str (bytes | bytearray) – The byte sequence to detect encoding for.
should_rename_legacy (bool) – If True (the default), remap legacy encoding names to their modern equivalents.
encoding_era (EncodingEra) – Restrict candidate encodings to the given era.
chunk_size (int) – Deprecated – accepted for backward compatibility but has no effect.
max_bytes (int) – Maximum number of bytes to examine from byte_str.

Returns:

A dictionary with keys "encoding", "confidence", and "language".

Return type:

DetectionDict

chardet.detect_all(byte_str, ignore_threshold=False, should_rename_legacy=True, encoding_era=<EncodingEra.ALL: 63>, chunk_size=65536, max_bytes=200000)¶

Detect all possible encodings of the given byte string.

Parameters match chardet 6.x for backward compatibility. chunk_size is accepted but has no effect.

When ignore_threshold is False (the default), results with confidence <= MINIMUM_THRESHOLD (0.20) are filtered out. If all results are below the threshold, the full unfiltered list is returned as a fallback so the caller always receives at least one result.

Parameters:

byte_str (bytes | bytearray) – The byte sequence to detect encoding for.
ignore_threshold (bool) – If True, return all candidate encodings regardless of confidence score.
should_rename_legacy (bool) – If True (the default), remap legacy encoding names to their modern equivalents.
encoding_era (EncodingEra) – Restrict candidate encodings to the given era.
chunk_size (int) – Deprecated – accepted for backward compatibility but has no effect.
max_bytes (int) – Maximum number of bytes to examine from byte_str.

Returns:

A list of dictionaries, each with keys "encoding", "confidence", and "language", sorted by descending confidence.

Return type:

list[DetectionDict]

UniversalDetector¶

class chardet.UniversalDetector(lang_filter=<LanguageFilter.ALL: 31>, should_rename_legacy=True, encoding_era=<EncodingEra.ALL: 63>, max_bytes=200000)¶

Streaming character encoding detector.

Implements a feed/close pattern for incremental detection of character encoding from byte streams. Compatible with the chardet 6.x API.

All detection is performed by the same pipeline used by chardet.detect() and chardet.detect_all(), ensuring consistent results regardless of which API is used.

Note

This class is not thread-safe. Each thread should create its own UniversalDetector instance.

Parameters:

lang_filter (LanguageFilter)
should_rename_legacy (bool)
encoding_era (EncodingEra)
max_bytes (int)

MINIMUM_THRESHOLD = 0.2¶

LEGACY_MAP: ClassVar[MappingProxyType] = mappingproxy({'ascii': 'Windows-1252', 'euc-kr': 'CP949', 'iso-8859-1': 'Windows-1252', 'iso-8859-2': 'Windows-1250', 'iso-8859-5': 'Windows-1251', 'iso-8859-6': 'Windows-1256', 'iso-8859-7': 'Windows-1253', 'iso-8859-8': 'Windows-1255', 'iso-8859-9': 'Windows-1254', 'iso-8859-11': 'CP874', 'iso-8859-13': 'Windows-1257', 'tis-620': 'CP874'})¶

feed(byte_str)¶

Feed a chunk of bytes to the detector.

Data is accumulated in an internal buffer. Once max_bytes have been buffered, done is set to True and further data is ignored until reset() is called.

Parameters:: byte_str (bytes | bytearray) – The next chunk of bytes to examine.
Raises:: ValueError – If called after close() without a reset().
Return type:: None

close()¶

Finalize detection and return the best result.

Runs the full detection pipeline on the buffered data.

Returns:: A dictionary with keys "encoding", "confidence", and "language".
Return type:: DetectionDict

reset()¶

Reset the detector to its initial state for reuse.

Return type:: None

property done: bool¶: Whether detection is complete and no more data is needed.

property result: DetectionDict¶: The current best detection result.

Enumerations¶

class chardet.EncodingEra(*values)¶

Bit flags representing encoding eras for filtering detection candidates.

MODERN_WEB = 1¶

LEGACY_ISO = 2¶

LEGACY_MAC = 4¶

LEGACY_REGIONAL = 8¶

DOS = 16¶

MAINFRAME = 32¶

ALL = 63¶

class chardet.LanguageFilter(*values)¶

Language filter flags for UniversalDetector (chardet 6.x API compat).

Accepted but not used — our pipeline does not filter by language group.

Deprecated since version Retained: only for backward compatibility with chardet 6.x callers. Will be removed in a future major version.

CHINESE_SIMPLIFIED = 1¶

CHINESE_TRADITIONAL = 2¶

JAPANESE = 4¶

KOREAN = 8¶

NON_CJK = 16¶

ALL = 31¶

CHINESE = 3¶

CJK = 15¶

Result Types¶

class chardet.DetectionResult(encoding, confidence, language)¶

A single encoding detection result.

Frozen dataclass holding the encoding name, confidence score, and optional language identifier returned by the detection pipeline.

Parameters:

encoding (str | None)
confidence (float)
language (str | None)

encoding: str | None¶

confidence: float¶

language: str | None¶

to_dict()¶

Convert this result to a plain dict.

Returns:: A dict with 'encoding', 'confidence', and 'language' keys.
Return type:: DetectionDict

class chardet.DetectionDict¶

Dictionary representation of a detection result.

Returned by chardet.detect(), chardet.detect_all(), and chardet.UniversalDetector.result.

encoding: str | None¶

confidence: float¶

language: str | None¶

Constants¶

chardet.DEFAULT_MAX_BYTES: int = 200000¶: Default maximum number of bytes to examine during detection.

chardet.MINIMUM_THRESHOLD: float = 0.20¶: Default minimum confidence threshold for filtering results in chardet.detect_all().