chardet documentation

chardet is a universal character encoding detector for Python. It analyzes byte strings and returns the detected encoding, confidence score, and language.

import chardet

result = chardet.detect("It\u2019s a lovely day \u2014 let\u2019s grab coffee.".encode("utf-8"))
print(result)
# {'encoding': 'utf-8', 'confidence': 0.99, 'language': 'es'}

chardet 7.0 is a ground-up, MIT-licensed rewrite — same package name, same public API, drop-in replacement for chardet 5.x/6.x. Python 3.10+, zero runtime dependencies, works on PyPy.

  • 96.8% accuracy on 2,179 test files

  • 41x faster than chardet 6.0.0 with mypyc, 28x faster pure Python

  • 7.5x faster than charset-normalizer with mypyc, 5.1x faster pure Python

  • Language detection for every result (90.5% accuracy)

  • 99 encodings across six encoding eras

  • Thread-safe detect() and detect_all()