Research Areas

Exploring critical sub-sections of AI alignment and safety research.

Ensuring AI systems understand and reliably act according to human values and intentions.

Training AI systems to follow a set of explicit, human-defined principles and safety rules.

Research on generative AI models, large language models, and creative AI systems.

Understanding the internal mechanisms of neural networks to predict and verify model behavior.

Building AI systems that behave reliably and safely even under adversarial conditions.

Creating techniques for humans to effectively supervise AI systems on complex, high-stakes tasks.