From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching

From Invariant Representations to Invariant Data: Provable Robustness to Spurious Correlations via Noisy Counterfactual Matching

Abstract

Spurious correlations can cause model performance to degrade in new environments. Prior causality-inspired work aim to learn invariant representations (e.g., IRM) but typically underperform empirical risk minimization (ERM). Recent alternatives improve robustness by leveraging test-time data, but such data may be unavailable in practice. To address these issues, we take a data-centric approach by leveraging invariant data pairs—training samples with equal true predictive distributions, such as counterfactuals intervening only on non-ancestors of the target. We introduce noisy counterfactual matching (NCM) that adds a linear constraint to ERM based on counterfactuals that achieves provable robustness to spurious correlations—even if the counterfactuals are noisy. For linear causal models, we prove that the test domain error can be upper bounded by the in domain error and a term that depends on the counterfactuals’ diversity and quality. Empirically, we validate on a synthetic dataset that only a few counterfactual pairs are needed and demonstrate on real‑world benchmarks (ColoredMNIST, Waterbirds, and PACS) that linear probing on a pretrained ViT‑B/32 CLIP backbone improves robustness.

Publication
Preprint