Logo

A simple, minimal website for RobustiPy!

View the Project on GitHub here!

View the Python Documentation here!

View the RobustiPy working paper here!

Getting Started with RobustiPy

A practical, end‑to‑end guide for fitting robust specifications, managing compute, and interpreting outputs.


At a Glance

  1. Prepare a clean pandas.DataFrame with your outcome(s), main predictor(s), and candidate controls.
  2. Fit a model across many specifications with draws and/or kfold.
  3. Review results.summary() for key robustness statistics.
  4. Generate the multi‑panel figure with results.plot().
  5. Interpret the panels using the guide linked at the end.

Installation

pip install robustipy

For the latest development version:

git clone https://github.com/RobustiPy/robustipy.git
cd robustipy
pip install .

Data Preparation

RobustiPy expects a tidy pandas.DataFrame where each column corresponds to a variable.

Recommendations:


Model Types

Both accept single outcomes or lists of outcomes (run separately) and produce a unified results object.


Quick Start (OLS)

import pandas as pd
from robustipy.models import OLSRobust

# data = pd.read_csv("your_data.csv")

y = "your_outcome"
x = "your_key_predictor"
controls = ["control_1", "control_2", "control_3", "control_4"]

model = OLSRobust(y=y, x=x, data=data)
model.fit(
    controls=controls,
    draws=1000,
    kfold=10,
    oos_metric="pseudo-r2",
    seed=192735,
    n_cpu=4,
)

results = model.get_results()
results.summary(digits=3)
results.plot(ic="aic", project_name="union_example", figpath="./figures")

Quick Start (Logistic Regression)

from robustipy.models import LRobust

model = LRobust(y="outcome_binary", x="treatment", data=data)
model.fit(
    controls=controls,
    draws=500,
    kfold=5,
    oos_metric="pseudo-r2",
    seed=123,
)

results = model.get_results()
results.summary()
results.plot(ic="bic", oddsratio=True)

Specification Space Basics

RobustiPy builds many “reasonable” specifications by varying which controls are included. The control list you pass to fit() defines the candidate set. The full specification space can be large, so RobustiPy lets you sample or downscale.


Core Fit Parameters (Reference)

Parameter Purpose Notes
controls Candidate control variables Required list of column names
draws Bootstrap resamples per spec If None, bootstrapping is skipped
kfold Folds for cross‑validation Requires oos_metric
oos_metric OOS metric 'pseudo-r2' or 'rmse'
n_cpu Parallel processes Defaults to all available CPUs minus one if not provided
seed Reproducibility Propagated to all random ops
group Fixed effects grouping De‑means outcomes by group
z_specs_sample_size Sample specs Randomly samples control‑set combinations
composite_sample Composite bootstrap Reduces compute by sampling before specs
rescale_y/x/z Standardization Rescales variables to mean 0, sd 1
threshold Workload warning Warns if specs × draws × folds is too large

Compute‑Friendly Run Example

model.fit(
    controls=controls,
    draws=200,
    kfold=5,
    z_specs_sample_size=300,
    threshold=200000,
    n_cpu=4,
)

Results Object: What You Get

After get_results(), the results object includes:


Plotting Options You’ll Use Most


Highlighting Specific Specifications

specs = [
    ["age", "tenure", "hours"],
    ["age", "tenure", "hours", "collgrad"],
]

results.plot(specs=specs, ic="hqic")

Output Files

results.plot(...) saves a combined multi‑panel figure and a set of individual panels. Use figpath and project_name to control the output directory and filename prefix.


Common Pitfalls


Next Steps