API Reference

Complete function signatures for the arnio public API after the production-readiness and data-quality updates.

I/O — arnio.io

arnio.read_csv(path, *, delimiter=",", has_header=True, usecols=None, nrows=None, encoding="utf-8")

Read a CSV file into an ArFrame via the C++ backend.

ParameterTypeDescription
pathstr | os.PathLikePath to CSV, TSV, or TXT file.
delimiterstrColumn delimiter. Default: ","
has_headerboolWhether the first row is a header. Default: True
usecolslist[str] | NoneSubset of columns to load.
nrowsint | NoneMaximum number of rows to read.
encodingstrFile encoding. Default: "utf-8"

Returns: ArFrame

arnio.scan_csv(path, *, delimiter=",", encoding="utf-8")

Return the schema (column names and inferred types) without loading data into memory.

ParameterTypeDescription
pathstr | os.PathLikePath to CSV, TSV, or TXT file.
delimiterstrColumn delimiter. Default: ","
encodingstrFile encoding. Non-UTF-8 input is transcoded before native scanning.

Returns: dict[str, str] — e.g. {"id": "int64", "name": "string"}

Cleaning — arnio.cleaning

arnio.drop_nulls(frame, *, subset=None)

Remove rows containing null or empty values.

ParameterTypeDescription
frameArFrameInput frame.
subsetlist[str] | NoneColumns to check. If None, checks all.

Returns: ArFrame

arnio.fill_nulls(frame, value, *, subset=None)

Replace null/empty values with a given fill value.

ParameterTypeDescription
frameArFrameInput frame.
valueAnyFill value (scalar).
subsetlist[str] | NoneColumns to fill.

Returns: ArFrame

arnio.drop_duplicates(frame, *, subset=None, keep="first")

Remove duplicate rows.

ParameterTypeDescription
frameArFrameInput frame.
subsetlist[str] | NoneColumns to check for duplicates.
keepstr | boolWhich duplicate to keep: "first", "last", "none", or False.

Returns: ArFrame

arnio.strip_whitespace(frame, *, subset=None)

Trim leading and trailing whitespace from string columns.

Returns: ArFrame

arnio.normalize_case(frame, *, subset=None, case_type="lower")

Normalize string columns to "lower", "upper", or "title" case.

Returns: ArFrame

arnio.rename_columns(frame, mapping)

Rename columns using a {old_name: new_name} dictionary.

Returns: ArFrame

arnio.cast_types(frame, mapping)

Cast columns to specified types via a {column: type_str} dictionary.

Returns: ArFrame

arnio.clean(frame, *, strip_whitespace=True, drop_nulls=False, drop_duplicates=False)

Convenience function to apply common cleaning operations in order: strip whitespace → drop nulls → drop duplicates.

Returns: ArFrame

Pipeline — arnio.pipeline

arnio.pipeline(frame, steps)

Apply a list of cleaning steps sequentially. Each step is a tuple: (step_name,) or (step_name, kwargs_dict).

ParameterTypeDescription
frameArFrameInput frame.
stepslist[tuple]Ordered list of (name,) or (name, kwargs) tuples.

Returns: ArFrame

Raises: UnknownStepError if a step name is not found.

arnio.register_step(name, fn)

Register a custom Python pipeline step. The function fn should accept and return a pandas.DataFrame.

ParameterTypeDescription
namestrStep name for use in pipeline().
fnCallableFunction: DataFrame → DataFrame.

Conversion — arnio.convert

arnio.to_pandas(frame)

Convert an ArFrame to a pandas.DataFrame. Uses zero-copy NumPy buffer interfaces for numeric columns.

Returns: pandas.DataFrame

arnio.from_pandas(df)

Convert a pandas.DataFrame to an ArFrame. Handles pd.NA and np.nan conversion to null.

Returns: ArFrame

Raises: TypeError if columns contain nested/complex types.

Data Quality — arnio.quality

arnio.profile(frame, *, sample_size=5)

Profile an ArFrame for quality signals before analysis.

ParameterTypeDescription
frameArFrameInput frame to inspect.
sample_sizeintNumber of non-null sample values to keep per column.

Returns: DataQualityReport

arnio.suggest_cleaning(frame_or_report)

Return pipeline-compatible cleaning steps based on detected quality signals.

Returns: list[tuple[str, dict[str, Any]]] — for example [("strip_whitespace", {"subset": ["name"]})]

arnio.auto_clean(frame, *, mode="safe", return_report=False)

Apply built-in automatic cleaning based on the quality report.

ParameterTypeDescription
mode"safe" | "strict""safe" trims whitespace only. "strict" also applies deterministic casts and exact duplicate removal.
return_reportboolReturn the pre-cleaning DataQualityReport with the cleaned frame.

Returns: ArFrame or tuple[ArFrame, DataQualityReport]

class arnio.DataQualityReport

Whole-frame quality report returned by profile().

Attribute/MethodDescription
.row_count, .column_countFrame dimensions.
.memory_usageFrame memory usage in bytes.
.duplicate_rows, .duplicate_ratioExact duplicate-row diagnostics.
.columnsMapping of column name to ColumnProfile.
.suggestionsPipeline-compatible suggested cleaning steps.
.summary(), .to_dict(), .to_pandas()Compact, JSON-friendly, or tabular output.

Schema Validation — arnio.schema

class arnio.Schema(fields, strict=False)

Named validation contract for an ArFrame. Use strict=True to reject unexpected columns.

Method: schema.validate(frame) returns ValidationResult.

Field builders
BuilderRules
Int64(nullable=True, min=None, max=None, unique=False)Integer dtype, nullability, range, uniqueness.
Float64(nullable=True, min=None, max=None, unique=False)Float dtype, nullability, range, uniqueness.
String(nullable=True, pattern=None, allowed=None, unique=False, min_length=None, max_length=None)String dtype, regex, allowed values, uniqueness, length bounds.
Bool(nullable=True)Boolean dtype and nullability.
Email(nullable=True, unique=False)Email semantic validation.
URL(nullable=True, unique=False)URL semantic validation.
arnio.validate(frame, schema)

Validate an ArFrame against a Schema or dict[str, Field].

Returns: ValidationResult

class arnio.ValidationResult
Attribute/MethodDescription
.passedTrue when there are zero issues.
.issue_countTotal number of validation issues.
.issuesList of ValidationIssue objects with column, rule, message, row index, and value.
.bad_rowsSorted row indexes with validation failures.
.summary(), .to_dict(), .to_pandas()Compact, JSON-friendly, or tabular output.

Core — arnio.frame

class arnio.ArFrame(cpp_frame)

Lightweight columnar data container backed by C++.

Property/MethodReturnsDescription
.shapetuple[int, int]Row and column count.
.columnslist[str]Column names.
.dtypesdict[str, str]Column name → inferred type.
.memory_usage()intTotal bytes consumed.
len(frame)intNumber of rows.

Exceptions — arnio.exceptions

Exception Classes
ExceptionDescription
ArnioErrorBase exception for all Arnio errors.
UnknownStepErrorRaised when a pipeline step name is not registered. Lists available steps.
CsvReadErrorRaised when a CSV file cannot be read.
TypeCastErrorRaised when cast_types encounters an incompatible type.