Performance¶
Benchmarked against 2,517 test files from the chardet test suite. All detectors evaluated with the same equivalence rules. Numbers below are CPython 3.14 unless noted.
Detecting a superset of the expected encoding is counted as correct, since the superset decodes the data without loss (e.g., detecting Windows-1252 when the expected answer is ISO-8859-1, or GB18030 when the expected answer is GB2312). Byte-order variants of the same encoding (e.g., UTF-16-LE vs UTF-16) are also treated as equivalent. These rules are applied equally to all detectors.
chardet’s statistical models are trained on CulturaX, MADLAD-400, and
Wikipedia data. Test files are excluded from training via content
fingerprinting to prevent train/test overlap (verified by
scripts/verify_no_overlap.py).
Accuracy¶
Detector |
Correct |
Accuracy |
Speed |
|---|---|---|---|
chardet 7.4.0 (mypyc) |
2499/2517 |
99.3% |
551 files/s |
chardet 6.0.0 |
2219/2517 |
88.2% |
12 files/s |
charset-normalizer 3.4.6 (mypyc) |
2149/2517 |
85.4% |
376 files/s |
cchardet 2.1.19 |
1407/2517 |
55.9% |
2,005 files/s |
chardet leads all detectors on accuracy: +11.1pp vs chardet 6.0.0, +13.9pp vs charset-normalizer 3.4.6, and +43.4pp vs cchardet 2.1.19.
Speed¶
Detector |
Files/s |
Mean |
Median |
p90 |
p95 |
|---|---|---|---|---|---|
cchardet 2.1.19 |
2,005 |
0.50ms |
0.04ms |
0.64ms |
0.99ms |
chardet 7.4.0 (mypyc) |
551 |
1.81ms |
0.54ms |
4.61ms |
5.84ms |
charset-normalizer 3.4.6 (mypyc) |
376 |
2.65ms |
1.46ms |
6.86ms |
10.45ms |
chardet 6.0.0 |
12 |
85.16ms |
1.70ms |
190.84ms |
394.63ms |
With mypyc compilation, chardet 7.4.0 is 47x faster than chardet 6.0.0 and 1.5x faster than charset-normalizer 3.4.6 (mypyc). Median time per file is 0.54ms.
Memory¶
Detector |
Import Time |
Import Memory |
Peak Memory |
RSS |
|---|---|---|---|---|
chardet 7.4.0 |
0.013s |
0 B * |
52.9 MiB |
137.0 MiB |
chardet 6.0.0 |
0.053s |
13.0 MiB |
29.5 MiB |
122.3 MiB |
charset-normalizer 3.4.6 |
0.013s |
3.4 MiB |
78.8 MiB |
238.9 MiB |
cchardet 2.1.19 |
0.001s |
28.1 KiB |
155.0 KiB |
87.7 MiB |
* chardet 7.x uses lazy loading — models and the detection
pipeline are not allocated until the first detect() call, so
import chardet alone allocates effectively nothing. The full cost
appears in Peak Memory.
chardet uses 1.5x less peak memory than charset-normalizer 3.4.6 and 1.7x less RSS.
Language Detection¶
Detector |
Correct |
Accuracy |
|---|---|---|
chardet 7.4.0 |
2400/2509 |
95.7% |
charset-normalizer 3.4.6 |
1486/2509 |
59.2% |
chardet 6.0.0 |
1003/2509 |
40.0% |
cchardet 2.1.19 |
0/2509 |
0.0% |
chardet detects language with 95.7% accuracy — +36.5pp vs charset-normalizer 3.4.6 and +55.7pp vs chardet 6.0.0. cchardet 2.1.19 does not report language.
Accuracy on charset-normalizer’s Test Set¶
charset-normalizer maintains its own test dataset at char-dataset. 469 of those files also exist in the chardet test suite (matched by content hash), so we can compare both detectors on charset-normalizer’s own ground truth. We filed an issue about the 5 files we excluded (4 ambiguous Cyrillic files and 1 corrupted Vietnamese file) and 2 we relabeled (UTF-8-SIG, not UTF-8).
Detector |
Correct |
Encoding Accuracy |
Language Accuracy |
|---|---|---|---|
chardet 7.4.0 (mypyc) |
463/469 |
98.7% |
92.8% |
charset-normalizer 3.4.6 (mypyc) |
453/469 |
96.6% |
85.9% |
chardet is +2.1pp more accurate than charset-normalizer 3.4.6 on charset-normalizer’s own test data, and +6.9pp on language detection.
You can reproduce these numbers with
python scripts/compare_detectors.py --cn-dataset --cn --mypyc.
Thread Safety¶
chardet.detect() and chardet.detect_all() are fully thread-safe.
Each call carries its own state with no shared mutable data between threads.
Thread safety adds no measurable overhead (< 0.1%).
On free-threaded Python (GIL disabled), detection scales with threads.
Standard GIL Python shows no scaling — the GIL serializes threads.
Benchmarked with 2,517 files, encoding_era=ALL:
Python |
1 thread |
2 threads |
4 threads |
8 threads |
|---|---|---|---|---|
3.13 (pure) |
10,140ms |
10,040ms |
10,060ms |
10,130ms |
3.13t (pure) |
9,890ms |
7,930ms (1.2x) |
4,890ms (2.0x) |
4,720ms (2.1x) |
3.13 (mypyc) |
4,980ms |
4,930ms |
4,920ms |
4,930ms |
3.13t (mypyc) |
4,570ms |
2,450ms (1.9x) |
1,330ms (3.4x) |
1,040ms (4.4x) |
3.14 (pure) |
7,670ms |
7,720ms |
7,800ms |
7,880ms |
3.14t (pure) |
8,330ms |
6,160ms (1.4x) |
2,640ms (3.2x) |
2,070ms (4.0x) |
3.14 (mypyc) |
4,620ms |
4,590ms |
4,600ms |
4,590ms |
3.14t (mypyc) |
5,020ms |
2,650ms (1.9x) |
1,420ms (3.5x) |
1,180ms (4.3x) |
Individual UniversalDetector instances are not thread-safe.
Create one instance per thread when using the streaming API.
Optional mypyc Compilation¶
Prebuilt mypyc-compiled wheels are
published to PyPI for CPython on Linux, macOS, and Windows. A regular
pip install chardet will pick them up automatically — no extra flags
needed.
Build |
Files/s |
Speedup |
|---|---|---|
Pure Python |
330 |
baseline |
mypyc compiled |
551 |
1.67x |
Pure-Python wheels are always available for PyPy and platforms without prebuilt binaries.
Historical Performance¶
Accuracy and speed of every Python 3-compatible chardet release and its
temporary Python-3-compatible fork charade, measured on
the same 2,517-file test suite with the same equivalence rules. Pure
Python on CPython 3.14 for versions before 7.0; mypyc-compiled for
7.0+, matching what pip install chardet delivers. Language column
shows “—” for versions that did not support language detection.
Version |
Date |
Correct |
Accuracy |
Files/s |
Language |
|---|---|---|---|---|---|
charade 1.0.0 |
2012-12 |
716/2517 |
28.4% |
43 |
— |
charade 1.0.1 |
2012-12 |
714/2517 |
28.4% |
43 |
— |
charade 1.0.3 |
2013-01 |
1018/2517 |
40.4% |
48 |
— |
chardet 2.2.1 |
2013-12 |
1019/2517 |
40.5% |
47 |
— |
chardet 2.3.0 |
2014-10 |
1165/2517 |
46.3% |
48 |
— |
chardet 3.0.4 |
2017-06 |
1253/2517 |
49.8% |
56 |
16.2% |
chardet 4.0.0 |
2020-12 |
1253/2517 |
49.8% |
59 |
16.9% |
chardet 5.0.0 |
2022-06 |
1618/2517 |
64.3% |
57 |
16.9% |
chardet 5.2.0 |
2023-08 |
1645/2517 |
65.4% |
55 |
16.7% |
chardet 6.0.0 |
2026-02 |
2219/2517 |
88.2% |
11 |
40.0% |
chardet 7.0.1 (mypyc) |
2026-03 |
2469/2517 |
98.1% |
551 |
95.2% |
chardet 7.2.0 (mypyc) |
2026-03 |
2470/2517 |
98.1% |
540 |
95.3% |
chardet 7.3.0 (mypyc) |
2026-03 |
2470/2517 |
98.1% |
623 |
95.3% |
chardet 7.4.0 (mypyc) |
2026-03 |
2499/2517 |
99.3% |
551 |
95.7% |
chardet 3.0.1–3.0.4 had identical accuracy and speed; only 3.0.4 is shown. chardet 5.1.0–5.2.0 were likewise identical. chardet 7.1.0 and 7.2.0 had identical accuracy; only 7.2.0 is shown. charade 1.0.2 could not be installed on Python 3.14. chardet 3.0.0 crashed on Python 3.14 and is omitted.
Performance Across Python Versions¶
Benchmarked chardet 7.4.0 across all supported Python versions
(macOS aarch64, 2,517 files, encoding_era=ALL). CPython versions
install mypyc-compiled wheels automatically; PyPy receives the
pure-Python wheel.
Python |
Wheel |
Total |
Files/s |
Mean |
Median |
p90 |
p95 |
|---|---|---|---|---|---|---|---|
CPython 3.10 |
mypyc |
4,015ms |
627 |
1.60ms |
0.55ms |
3.84ms |
4.79ms |
CPython 3.10 |
pure |
9,180ms |
274 |
3.65ms |
1.36ms |
8.46ms |
10.89ms |
CPython 3.11 |
mypyc |
3,939ms |
639 |
1.56ms |
0.53ms |
3.83ms |
4.77ms |
CPython 3.11 |
pure |
7,145ms |
352 |
2.84ms |
1.05ms |
6.61ms |
8.42ms |
CPython 3.12 |
mypyc |
4,429ms |
568 |
1.76ms |
0.51ms |
4.46ms |
5.58ms |
CPython 3.12 |
pure |
7,655ms |
329 |
3.04ms |
1.06ms |
7.17ms |
9.24ms |
CPython 3.13 |
mypyc |
4,914ms |
512 |
1.95ms |
0.58ms |
4.89ms |
6.03ms |
CPython 3.13 |
pure |
9,911ms |
254 |
3.94ms |
1.42ms |
9.20ms |
11.72ms |
CPython 3.14 |
mypyc |
4,564ms |
551 |
1.81ms |
0.54ms |
4.61ms |
5.84ms |
CPython 3.14 |
pure |
7,632ms |
330 |
3.03ms |
1.04ms |
7.18ms |
9.24ms |
PyPy 3.10 |
pure |
5,782ms |
435 |
2.30ms |
0.21ms |
4.73ms |
7.03ms |
PyPy 3.11 |
pure |
5,750ms |
438 |
2.28ms |
0.22ms |
4.69ms |
6.94ms |
CPython 3.11 + mypyc is the fastest combination at 639 files/s. mypyc provides a 1.7–2.3x speedup across CPython versions. PyPy’s JIT is competitive with mypyc: pure Python on PyPy (435–438 files/s) beats every pure CPython version and reaches 68–86% of mypyc-compiled CPython throughput.