Roadmap
Where Arnio is headed ā a clear engineering plan with honest timelines.
Current main
Merged
Production Readiness & Data Quality Engine
The current codebase goes beyond CSV cleanup and adds practical production data checks.
profile()for nulls, duplicate rows, uniqueness, whitespace, semantic hints, memory usage, and cleaning suggestionssuggest_cleaning()andauto_clean()for safe or strict cleanup workflowsSchema,Field, and validation helpers for row-level data contract failures- CSV hardening for quoted multiline records, duplicate headers, non-UTF-8 encodings, and clearer Python exceptions
- Python 3.13 classifier and CI coverage alongside existing supported versions
v1.0.x
Released
Stable Release & Cross-Platform Packaging
The foundation release establishing Arnio as a production-ready library.
- Cross-platform pre-compiled wheels via
cibuildwheelā Windows, Linux (manylinux), macOS (Intel & Apple Silicon) - Google Colab compatibility out of the box
- Production-grade packaging ā resolved
ModuleNotFoundErrorissues - Fully automated PyPI publishing pipeline via Trusted Publishing
- CI/CD for Python 3.9ā3.13 across all platforms
- Stable public API marked "Production/Stable"
- Zero-copy
to_pandas()via NumPy buffer interfaces - Custom exception hierarchy:
ArnioError,UnknownStepError,CsvReadError,TypeCastError - Pure-Python step registration via
register_step()
Next
Active Development
C++ Pipeline Optimization ā Speed Parity
The primary engineering goal: match or exceed pandas execution speed on the standard benchmark.
- Hash-based
drop_duplicatesā replace O(n²) naive comparison with O(n) hash deduplication - In-place
strip_whitespaceā eliminate unnecessary string copies - Optimize columnar iteration patterns in C++
- Benchmark-driven development with CI-integrated regression detection
This is where contributors make the biggest impact
If you're comfortable with C++, optimizing these two functions is the single highest-value contribution you can make. See open issues.
Scaling
Planned
Chunked Processing & Format Expansion
Scaling Arnio to handle files that don't fit in memory, and expanding beyond CSV.
- Chunked CSV reading ā process files larger than available RAM
- Parquet support ā read and write Apache Parquet files
- JSON support ā ingest newline-delimited JSON (NDJSON)
- Streaming pipeline execution for memory-constrained environments
Want to influence the roadmap?
Open an Issue