Frequently Asked Questions

Why does detect() return None for encoding?

chardet returns None when the data appears to be binary rather than text. This happens when the data contains null bytes or a high proportion of control characters that don’t match any known text encoding.

result = chardet.detect(b"\x00\x01\x02\x03")
# {'encoding': None, 'confidence': 0.95, 'language': None, 'mime_type': 'application/octet-stream'}

How do I increase accuracy?

  • Provide more data. The default limit of 200,000 bytes is generous and most detections converge well within that. If you are passing very short strings (under a few hundred bytes), providing more data may help.

  • Restrict the encoding era. By default, chardet considers all supported encodings. If you know your data only uses modern web encodings, pass encoding_era=EncodingEra.MODERN_WEB to narrow the candidate set and reduce false positives.

  • Use detect_all(). If the top result is wrong, the correct encoding may be the second candidate. chardet.detect_all() returns all candidates ranked by confidence.

  • Use encoding filters. If you know exactly which encodings are possible, pass include_encodings to restrict the candidate set. Alternatively, use exclude_encodings to remove known false positives.

How is chardet different from charset-normalizer?

charset-normalizer is an alternative encoding detector. Key differences:

  • Accuracy: chardet achieves 99.3% vs charset-normalizer’s 85.4% on the same test suite.

  • Speed: chardet is 1.5x faster with mypyc (551 vs 376 files/s).

  • Memory: chardet uses 1.5x less peak memory (52.9 vs 78.8 MiB).

  • Language detection: chardet detects language with 95.7% accuracy vs charset-normalizer’s 59.2%.

How is chardet different from cchardet?

cchardet wraps Mozilla’s uchardet C/C++ library. Key differences:

  • Accuracy: chardet achieves 99.3% vs cchardet’s 55.9%.

  • Speed: cchardet is faster (1.3s vs 4.6s) due to C implementation.

  • Encoding breadth: chardet supports 49 more encodings than cchardet, including EBCDIC, Mac, Baltic, and BOM-less UTF-16/32.

  • Dependencies: chardet is pure Python with zero dependencies. cchardet requires a C compiler to build from source.

Is chardet thread-safe?

chardet.detect() and chardet.detect_all() are fully thread-safe and can be called concurrently from any number of threads.

UniversalDetector instances are not thread-safe. Create one instance per thread when using the streaming API.

UniversalDetector uses the same detection pipeline as detect() and detect_all(), so results are identical regardless of which API you use.

Does chardet work on PyPy?

Yes. chardet is pure Python and works on PyPy without modification. The optional mypyc compilation is CPython-only; PyPy uses the pure-Python code path automatically.