Robustness

Building AI systems that behave reliably and safely even under adversarial conditions.

The Problem

Most AI models are extremely brittle. A small, intentional change to an input—called an "adversarial perturbation"—can cause a state-of-the-art model to fail spectacularly. For safety-critical systems like autonomous vehicles or medical diagnostics, this brittleness is unacceptable.

Key issues:

  • Adversarial Attacks — Small, invisible changes to images or text that trick models into misclassification.
  • Distribution Shift — Performance dropping sharply when the model encounters data that is even slightly different from its training set.
  • Out-of-Distribution Detection — The difficulty models have in "knowing what they don't know."

What We're Working On

  • Formal Verification — Mathematically proving that a model's output will remain within a "safe" range for any input within a certain bound.
  • Adversarial Training — Training models against an "adversary" that actively finds their weaknesses, forcing the model to learn more robust features.
  • Uncertainty Estimation — Developing methods for models to signal high uncertainty when they encounter novel data, allowing for safe human intervention.

Related Publications

No publications in this area yet. Check back soon or view all research.