Roadmap — Arnio

Current main Merged

The current codebase goes beyond CSV cleanup and adds practical production data checks.

profile() for nulls, duplicate rows, uniqueness, whitespace, semantic hints, memory usage, and cleaning suggestions
suggest_cleaning() and auto_clean() for safe or strict cleanup workflows
Schema, Field, and validation helpers for row-level data contract failures
CSV hardening for quoted multiline records, duplicate headers, non-UTF-8 encodings, and clearer Python exceptions
Python 3.13 classifier and CI coverage alongside existing supported versions

v1.0.x Released

The foundation release establishing Arnio as a production-ready library.

Cross-platform pre-compiled wheels via cibuildwheel — Windows, Linux (manylinux), macOS (Intel & Apple Silicon)
Google Colab compatibility out of the box
Production-grade packaging — resolved ModuleNotFoundError issues
Fully automated PyPI publishing pipeline via Trusted Publishing
CI/CD for Python 3.9–3.13 across all platforms
Stable public API marked "Production/Stable"
Zero-copy to_pandas() via NumPy buffer interfaces
Custom exception hierarchy: ArnioError, UnknownStepError, CsvReadError, TypeCastError
Pure-Python step registration via register_step()

Next Active Development

The primary engineering goal: match or exceed pandas execution speed on the standard benchmark.

Hash-based drop_duplicates — replace O(n²) naive comparison with O(n) hash deduplication
In-place strip_whitespace — eliminate unnecessary string copies
Optimize columnar iteration patterns in C++
Benchmark-driven development with CI-integrated regression detection

This is where contributors make the biggest impact

If you're comfortable with C++, optimizing these two functions is the single highest-value contribution you can make. See open issues.

Scaling Planned

Scaling Arnio to handle files that don't fit in memory, and expanding beyond CSV.