Benchmarks — Arnio

Important: benchmark numbers vary by CPU, compiler, operating system, Python, pandas, NumPy, and build flags. Treat local results as comparable only when run on the same environment and commit.

Current benchmark coverage

End-to-end CSV pipeline

benchmark_vs_pandas.py compares deterministic tall and wide datasets across pandas and Arnio.

Memory-focused paths

Dedicated scripts cover from_pandas, auto-clean memory, sparse nulls, and to-pandas overhead.

Native operations

Benchmarks cover numeric parsing, strip-whitespace, combine-columns, clip-numeric, duplicate profiling, and GIL/threading behavior.

Current-main CI checks

Post-v1.18.0 main adds lightweight benchmark regression checks and dry-run smoke coverage so benchmark scripts keep running in automation.

Run locally

The full suite can generate large deterministic data files. Use dry-run mode first when checking a branch or CI environment.

Smoke check

python -m pip install -e ".[dev]"
python benchmarks/benchmark_vs_pandas.py --dry-run
pytest tests/test_benchmarks_smoke.py

Full reference run

python benchmarks/generate_data.py
python benchmarks/benchmark_vs_pandas.py
python benchmarks/benchmark_auto_clean_memory.py --rows 100000

How to read results

Signal	Interpretation
Wall-clock time	Useful for same-machine comparisons. Small differences are normal between runs.
Peak memory	Use for broad regressions and compare the same Python, pandas, NumPy, and compiler stack.
Dry-run success	Confirms benchmark scripts import, generate tiny data, and complete without exercising full-scale performance.
Regression checks	Current-main CI guardrails catch accidental script breakage and large benchmark drift when baselines are configured.

Benchmark scripts

benchmarks/benchmark_vs_pandas.py — reference pandas vs Arnio workflow.
benchmarks/benchmark_csv.py — parser-focused CSV work.
benchmarks/benchmark_strip_whitespace.py — native whitespace cleaning.
benchmarks/benchmark_sparse_nulls.py — sparse null workloads.
benchmarks/benchmark_from_pandas_memory.py — conversion memory behavior.
benchmarks/benchmark_auto_clean_memory.py — automatic cleaning memory behavior.

Reporting a benchmark

Include OS, CPU, Python version, pandas and NumPy versions, compiler/build mode, Arnio commit, command, and full output. Without that context, raw seconds are not actionable.