Performance

Benchmarked against 2,517 test files from the chardet test suite. All detectors evaluated with the same equivalence rules. Numbers below are CPython 3.14 unless noted.

Detecting a superset of the expected encoding is counted as correct, since the superset decodes the data without loss (e.g., detecting Windows-1252 when the expected answer is ISO-8859-1, or GB18030 when the expected answer is GB2312). Byte-order variants of the same encoding (e.g., UTF-16-LE vs UTF-16) are also treated as equivalent. These rules are applied equally to all detectors.

chardet’s statistical models are trained on CulturaX, MADLAD-400, and Wikipedia data. Test files are excluded from training via content fingerprinting to prevent train/test overlap (verified by scripts/verify_no_overlap.py).

Accuracy

Detector

Correct

Accuracy

Speed

chardet 7.4.0 (mypyc)

2499/2517

99.3%

551 files/s

chardet 6.0.0

2219/2517

88.2%

12 files/s

charset-normalizer 3.4.6 (mypyc)

2149/2517

85.4%

376 files/s

cchardet 2.1.19

1407/2517

55.9%

2,005 files/s

chardet leads all detectors on accuracy: +11.1pp vs chardet 6.0.0, +13.9pp vs charset-normalizer 3.4.6, and +43.4pp vs cchardet 2.1.19.

Speed

Detector

Files/s

Mean

Median

p90

p95

cchardet 2.1.19

2,005

0.50ms

0.04ms

0.64ms

0.99ms

chardet 7.4.0 (mypyc)

551

1.81ms

0.54ms

4.61ms

5.84ms

charset-normalizer 3.4.6 (mypyc)

376

2.65ms

1.46ms

6.86ms

10.45ms

chardet 6.0.0

12

85.16ms

1.70ms

190.84ms

394.63ms

With mypyc compilation, chardet 7.4.0 is 47x faster than chardet 6.0.0 and 1.5x faster than charset-normalizer 3.4.6 (mypyc). Median time per file is 0.54ms.

Memory

Detector

Import Time

Import Memory

Peak Memory

RSS

chardet 7.4.0

0.013s

0 B *

52.9 MiB

137.0 MiB

chardet 6.0.0

0.053s

13.0 MiB

29.5 MiB

122.3 MiB

charset-normalizer 3.4.6

0.013s

3.4 MiB

78.8 MiB

238.9 MiB

cchardet 2.1.19

0.001s

28.1 KiB

155.0 KiB

87.7 MiB

* chardet 7.x uses lazy loading — models and the detection pipeline are not allocated until the first detect() call, so import chardet alone allocates effectively nothing. The full cost appears in Peak Memory.

chardet uses 1.5x less peak memory than charset-normalizer 3.4.6 and 1.7x less RSS.

Language Detection

Detector

Correct

Accuracy

chardet 7.4.0

2400/2509

95.7%

charset-normalizer 3.4.6

1486/2509

59.2%

chardet 6.0.0

1003/2509

40.0%

cchardet 2.1.19

0/2509

0.0%

chardet detects language with 95.7% accuracy — +36.5pp vs charset-normalizer 3.4.6 and +55.7pp vs chardet 6.0.0. cchardet 2.1.19 does not report language.

Accuracy on charset-normalizer’s Test Set

charset-normalizer maintains its own test dataset at char-dataset. 469 of those files also exist in the chardet test suite (matched by content hash), so we can compare both detectors on charset-normalizer’s own ground truth. We filed an issue about the 5 files we excluded (4 ambiguous Cyrillic files and 1 corrupted Vietnamese file) and 2 we relabeled (UTF-8-SIG, not UTF-8).

Detector

Correct

Encoding Accuracy

Language Accuracy

chardet 7.4.0 (mypyc)

463/469

98.7%

92.8%

charset-normalizer 3.4.6 (mypyc)

453/469

96.6%

85.9%

chardet is +2.1pp more accurate than charset-normalizer 3.4.6 on charset-normalizer’s own test data, and +6.9pp on language detection.

You can reproduce these numbers with python scripts/compare_detectors.py --cn-dataset --cn --mypyc.

Thread Safety

chardet.detect() and chardet.detect_all() are fully thread-safe. Each call carries its own state with no shared mutable data between threads. Thread safety adds no measurable overhead (< 0.1%).

On free-threaded Python (GIL disabled), detection scales with threads. Standard GIL Python shows no scaling — the GIL serializes threads. Benchmarked with 2,517 files, encoding_era=ALL:

Python

1 thread

2 threads

4 threads

8 threads

3.13 (pure)

10,140ms

10,040ms

10,060ms

10,130ms

3.13t (pure)

9,890ms

7,930ms (1.2x)

4,890ms (2.0x)

4,720ms (2.1x)

3.13 (mypyc)

4,980ms

4,930ms

4,920ms

4,930ms

3.13t (mypyc)

4,570ms

2,450ms (1.9x)

1,330ms (3.4x)

1,040ms (4.4x)

3.14 (pure)

7,670ms

7,720ms

7,800ms

7,880ms

3.14t (pure)

8,330ms

6,160ms (1.4x)

2,640ms (3.2x)

2,070ms (4.0x)

3.14 (mypyc)

4,620ms

4,590ms

4,600ms

4,590ms

3.14t (mypyc)

5,020ms

2,650ms (1.9x)

1,420ms (3.5x)

1,180ms (4.3x)

Individual UniversalDetector instances are not thread-safe. Create one instance per thread when using the streaming API.

Optional mypyc Compilation

Prebuilt mypyc-compiled wheels are published to PyPI for CPython on Linux, macOS, and Windows. A regular pip install chardet will pick them up automatically — no extra flags needed.

Build

Files/s

Speedup

Pure Python

330

baseline

mypyc compiled

551

1.67x

Pure-Python wheels are always available for PyPy and platforms without prebuilt binaries.

Historical Performance

Accuracy and speed of every Python 3-compatible chardet release and its temporary Python-3-compatible fork charade, measured on the same 2,517-file test suite with the same equivalence rules. Pure Python on CPython 3.14 for versions before 7.0; mypyc-compiled for 7.0+, matching what pip install chardet delivers. Language column shows “—” for versions that did not support language detection.

Version

Date

Correct

Accuracy

Files/s

Language

charade 1.0.0

2012-12

716/2517

28.4%

43

charade 1.0.1

2012-12

714/2517

28.4%

43

charade 1.0.3

2013-01

1018/2517

40.4%

48

chardet 2.2.1

2013-12

1019/2517

40.5%

47

chardet 2.3.0

2014-10

1165/2517

46.3%

48

chardet 3.0.4

2017-06

1253/2517

49.8%

56

16.2%

chardet 4.0.0

2020-12

1253/2517

49.8%

59

16.9%

chardet 5.0.0

2022-06

1618/2517

64.3%

57

16.9%

chardet 5.2.0

2023-08

1645/2517

65.4%

55

16.7%

chardet 6.0.0

2026-02

2219/2517

88.2%

11

40.0%

chardet 7.0.1 (mypyc)

2026-03

2469/2517

98.1%

551

95.2%

chardet 7.2.0 (mypyc)

2026-03

2470/2517

98.1%

540

95.3%

chardet 7.3.0 (mypyc)

2026-03

2470/2517

98.1%

623

95.3%

chardet 7.4.0 (mypyc)

2026-03

2499/2517

99.3%

551

95.7%

chardet 3.0.1–3.0.4 had identical accuracy and speed; only 3.0.4 is shown. chardet 5.1.0–5.2.0 were likewise identical. chardet 7.1.0 and 7.2.0 had identical accuracy; only 7.2.0 is shown. charade 1.0.2 could not be installed on Python 3.14. chardet 3.0.0 crashed on Python 3.14 and is omitted.

Performance Across Python Versions

Benchmarked chardet 7.4.0 across all supported Python versions (macOS aarch64, 2,517 files, encoding_era=ALL). CPython versions install mypyc-compiled wheels automatically; PyPy receives the pure-Python wheel.

Python

Wheel

Total

Files/s

Mean

Median

p90

p95

CPython 3.10

mypyc

4,015ms

627

1.60ms

0.55ms

3.84ms

4.79ms

CPython 3.10

pure

9,180ms

274

3.65ms

1.36ms

8.46ms

10.89ms

CPython 3.11

mypyc

3,939ms

639

1.56ms

0.53ms

3.83ms

4.77ms

CPython 3.11

pure

7,145ms

352

2.84ms

1.05ms

6.61ms

8.42ms

CPython 3.12

mypyc

4,429ms

568

1.76ms

0.51ms

4.46ms

5.58ms

CPython 3.12

pure

7,655ms

329

3.04ms

1.06ms

7.17ms

9.24ms

CPython 3.13

mypyc

4,914ms

512

1.95ms

0.58ms

4.89ms

6.03ms

CPython 3.13

pure

9,911ms

254

3.94ms

1.42ms

9.20ms

11.72ms

CPython 3.14

mypyc

4,564ms

551

1.81ms

0.54ms

4.61ms

5.84ms

CPython 3.14

pure

7,632ms

330

3.03ms

1.04ms

7.18ms

9.24ms

PyPy 3.10

pure

5,782ms

435

2.30ms

0.21ms

4.73ms

7.03ms

PyPy 3.11

pure

5,750ms

438

2.28ms

0.22ms

4.69ms

6.94ms

CPython 3.11 + mypyc is the fastest combination at 639 files/s. mypyc provides a 1.7–2.3x speedup across CPython versions. PyPy’s JIT is competitive with mypyc: pure Python on PyPy (435–438 files/s) beats every pure CPython version and reaches 68–86% of mypyc-compiled CPython throughput.