Quick Start

macOS / Linux

Terminal
git clone https://github.com/im-anishraj/arnio.git
cd arnio
make install
make test
make lint

Windows

Install Visual Studio Build Tools with the "Desktop development with C++" workload, then:

Terminal
git clone https://github.com/im-anishraj/arnio.git
cd arnio
pip install -e ".[dev]"
pre-commit install

Tip

Windows users can install make via Chocolatey: choco install make. Or use WSL for a faster setup experience.

Adding a Python Pipeline Step

Many new features don't require touching C++. You can write a pure Python step and register it with Arnio, then add focused tests and documentation.

Step 1: Write the function

Python
import arnio as ar

def remove_special_chars(df, columns=None):
    cols = columns or df.select_dtypes("object").columns
    for col in cols:
        df[col] = df[col].str.replace(r"[^a-zA-Z0-9\s]", "", regex=True)
    return df

ar.register_step("remove_special_chars", remove_special_chars)

Step 2: Write tests

tests/test_cleaning.py
def test_remove_special_chars(sample_csv):
    ar.register_step("remove_special_chars", remove_special_chars)
    frame = ar.read_csv(sample_csv)

    result = ar.pipeline(frame, [
        ("remove_special_chars",),
    ])
    df = ar.to_pandas(result)

    assert "name" in df.columns
    # Add your specific assertions here

Step 3: Open a PR

That's it. No build system changes, no C++ compiler, no pybind11.

Data Quality Contributions

The data quality layer is also friendly to Python contributors. High-impact areas include semantic detectors, validation rules, examples, and tests around messy real-world CSVs.

C++ Contributions

For developers comfortable with C++, the highest-impact work right now is performance optimization:

These two functions are the primary performance bottleneck. See Benchmarks for details.

Pull Request Process

  1. Fork the repo and create your branch from main.
  2. Conventional Commits: Ensure your PR title and commits follow the Conventional Commits specification (e.g., feat: add support for..., fix: resolve issue with...). This is required for our automated release system.
  3. If you've added code that should be tested, add tests.
  4. If you've changed APIs, update the documentation.
  5. Ensure the test suite passes: make test or pytest tests/ -v
  6. Ensure your code passes linting and formatting: make lint and pre-commit run --all-files
  7. Issue the pull request.

Code Style

Arnio uses:

pre-commit runs these automatically before each commit if installed.

Browse Open Issues