Performance¶
Benchmarked against 2,521 test files from the chardet test suite. All detectors evaluated with the same equivalence rules. Numbers below are CPython 3.14 unless noted.
Detecting a superset of the expected encoding is counted as correct, since the superset decodes the data without loss (e.g., detecting Windows-1252 when the expected answer is ISO-8859-1, or GB18030 when the expected answer is GB2312). Byte-order variants of the same encoding (e.g., UTF-16-LE vs UTF-16) are also treated as equivalent. These rules are applied equally to all detectors.
Accuracy¶
Detector |
Correct |
Accuracy |
Speed |
|---|---|---|---|
chardet 7.3.0 (mypyc) |
2473/2521 |
98.1% |
582 files/s |
chardet 6.0.0 |
2223/2521 |
88.2% |
12 files/s |
charset-normalizer 3.4.6 (mypyc) |
2152/2521 |
85.4% |
373 files/s |
cchardet 2.1.19 |
1410/2521 |
55.9% |
1,992 files/s |
chardet leads all detectors on accuracy: +9.9pp vs chardet 6.0.0, +12.7pp vs charset-normalizer, and +42.2pp vs cchardet.
Speed¶
Detector |
Files/s |
Mean |
Median |
p90 |
p95 |
|---|---|---|---|---|---|
cchardet 2.1.19 |
1,992 |
0.50ms |
0.04ms |
0.65ms |
1.00ms |
chardet 7.3.0 (mypyc) |
582 |
1.72ms |
0.57ms |
4.29ms |
5.38ms |
charset-normalizer 3.4.6 (mypyc) |
373 |
2.68ms |
1.47ms |
6.90ms |
10.67ms |
chardet 6.0.0 |
12 |
85.68ms |
1.71ms |
190.06ms |
395.24ms |
With mypyc compilation, chardet 7.3.0 is 50x faster than chardet 6.0.0 and 1.6x faster than charset-normalizer 3.4.6 (mypyc). Median time per file is 0.57ms.
Memory¶
Detector |
Import Time |
Import Memory |
Peak Memory |
RSS |
|---|---|---|---|---|
chardet 7.3.0 |
0.015s |
1.9 MiB |
25.9 MiB |
115.6 MiB |
chardet 6.0.0 |
0.053s |
13.0 MiB |
29.5 MiB |
122.3 MiB |
charset-normalizer |
0.010s |
1.4 MiB |
101.3 MiB |
273.3 MiB |
cchardet |
0.001s |
23.2 KiB |
26.8 KiB |
81.9 MiB |
chardet uses 3.9x less peak memory than charset-normalizer and 2.4x less RSS.
Language Detection¶
Detector |
Correct |
Accuracy |
|---|---|---|
chardet 7.3.0 |
2393/2513 |
95.2% |
charset-normalizer 3.4.6 |
1489/2513 |
59.3% |
chardet 6.0.0 |
1004/2513 |
40.0% |
cchardet 2.1.19 |
0/2513 |
0.0% |
chardet detects language with 95.2% accuracy — +35.9pp vs charset-normalizer and +55.2pp vs chardet 6.0.0. cchardet does not report language.
Accuracy on charset-normalizer’s Test Set¶
charset-normalizer maintains its own test dataset at char-dataset. 469 of those files also exist in the chardet test suite (matched by content hash), so we can compare both detectors on charset-normalizer’s own ground truth. We filed an issue about the 5 files we excluded (4 ambiguous Cyrillic files and 1 corrupted Vietnamese file) and 2 we relabeled (UTF-8-SIG, not UTF-8).
Detector |
Correct |
Encoding Accuracy |
Language Accuracy |
|---|---|---|---|
chardet 7.3.0 (mypyc) |
461/469 |
98.3% |
92.8% |
charset-normalizer 3.4.6 (mypyc) |
453/469 |
96.6% |
85.9% |
chardet is +1.7pp more accurate than charset-normalizer on charset-normalizer’s own test data, and +6.9pp on language detection.
You can reproduce these numbers with
python scripts/compare_detectors.py --cn-dataset --cn --mypyc.
Thread Safety¶
chardet.detect() and chardet.detect_all() are fully thread-safe.
Each call carries its own state with no shared mutable data between threads.
Thread safety adds no measurable overhead (< 0.1%).
On free-threaded Python (GIL disabled), detection scales with threads.
Standard GIL Python shows no scaling — the GIL serializes threads.
Benchmarked with 2,521 files, encoding_era=ALL:
Python |
1 thread |
2 threads |
4 threads |
8 threads |
|---|---|---|---|---|
3.13 (pure) |
8,100ms |
8,290ms |
8,280ms |
8,300ms |
3.13t (pure) |
9,690ms |
5,510ms (1.8x) |
3,820ms (2.5x) |
4,710ms (2.1x) |
3.13 (mypyc) |
4,380ms |
4,170ms |
4,170ms |
4,170ms |
3.13t (mypyc) |
4,400ms |
2,230ms (2.0x) |
1,180ms (3.7x) |
940ms (4.7x) |
3.14 (pure) |
6,260ms |
6,210ms |
6,230ms |
6,270ms |
3.14t (pure) |
6,760ms |
5,240ms (1.3x) |
2,840ms (2.4x) |
1,690ms (4.0x) |
3.14 (mypyc) |
4,370ms |
4,200ms |
4,190ms |
4,260ms |
3.14t (mypyc) |
5,080ms |
2,570ms (2.0x) |
1,350ms (3.8x) |
980ms (5.2x) |
Individual UniversalDetector instances are not thread-safe.
Create one instance per thread when using the streaming API.
Optional mypyc Compilation¶
Prebuilt mypyc-compiled wheels are
published to PyPI for CPython on Linux, macOS, and Windows. A regular
pip install chardet will pick them up automatically — no extra flags
needed.
Build |
Files/s |
Speedup |
|---|---|---|
Pure Python |
415 |
baseline |
mypyc compiled |
588 |
1.42x |
Pure-Python wheels are always available for PyPy and platforms without prebuilt binaries.
Performance Across Python Versions¶
Benchmarked chardet 7.3.0 across all supported Python versions
(macOS aarch64, 2,521 files, encoding_era=ALL). CPython versions
install mypyc-compiled wheels automatically; PyPy receives the
pure-Python wheel.
Python |
Wheel |
Total |
Files/s |
Mean |
Median |
p90 |
p95 |
|---|---|---|---|---|---|---|---|
CPython 3.10 |
mypyc |
4,057ms |
621 |
1.61ms |
0.53ms |
4.02ms |
5.16ms |
CPython 3.10 |
pure |
7,662ms |
329 |
3.04ms |
1.17ms |
7.07ms |
8.94ms |
CPython 3.11 |
mypyc |
3,454ms |
730 |
1.37ms |
0.45ms |
3.37ms |
4.30ms |
CPython 3.11 |
pure |
5,821ms |
433 |
2.31ms |
0.88ms |
5.41ms |
7.04ms |
CPython 3.12 |
mypyc |
3,994ms |
631 |
1.58ms |
0.53ms |
3.95ms |
5.07ms |
CPython 3.12 |
pure |
6,017ms |
419 |
2.39ms |
0.93ms |
5.53ms |
7.20ms |
CPython 3.13 |
mypyc |
4,260ms |
592 |
1.69ms |
0.56ms |
4.24ms |
5.24ms |
CPython 3.13 |
pure |
7,984ms |
316 |
3.17ms |
1.24ms |
7.26ms |
9.55ms |
CPython 3.14 |
mypyc |
4,283ms |
588 |
1.70ms |
0.57ms |
4.26ms |
5.39ms |
CPython 3.14 |
pure |
6,080ms |
415 |
2.41ms |
0.93ms |
5.58ms |
7.38ms |
PyPy 3.10 |
pure |
6,106ms |
413 |
2.42ms |
0.26ms |
5.03ms |
7.23ms |
PyPy 3.11 |
pure |
6,047ms |
417 |
2.40ms |
0.27ms |
5.03ms |
7.38ms |
CPython 3.11 + mypyc is the fastest combination at 730 files/s. mypyc provides a 1.4–1.9x speedup across CPython versions. PyPy’s JIT is competitive with mypyc: pure Python on PyPy (417 files/s) beats every pure CPython version and reaches 57–71% of mypyc-compiled CPython throughput.