Alexander Sax

Alexander (Sasha) Sax

Comp Vision Research Scientist

FAIR, AI @ Meta

About me:

I am currently a Research Scientist in the Fundamental AI Research (FAIR) division of AI at Meta. My research focuses on building foundation models to enable Embodied AI that can perceive, act in, and communicate about the physical world.

A lot of my work is around large-scale pretraining for robust Embodied AI -- developing 3D datasets, simulators, and techniques for efficient training and finetuning. Some relevant keywords are pretraining, transfer learning, RL, sim2real.

I received my PhD at UC Berkeley, where I had the great pleasure of working with and being advised by Jitendra Malik and Amir Zamir (at EPFL). Before that I received a bachelors in Math and masters in CS from Stanford, where I was was fortunate to be advised by Silvio Savarese. My Erdős number is 3!

Selected Honors and Awards

Education

  • PhD in CS, 2023

    UC Berkeley

  • MSc in CS, 2018

    Stanford University

  • BSc in Math, 2018

    Stanford University

-->

Selected Publications

Updated version on Google scholar

OpenEQA: Embodied Question Answering in the Era of Foundation Models. In CVPR, 2024.  

Project Site Code PDF
A benchmark of 1800 human annotated questions, and associated RGBD pose trajectories, with registered 3D scans. For categories like spatial understanding, no VLMs (Mar 2024) do better than a blind LLM that can't see the scene.


Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans. In ICCV, 2021.  

Project Site Code PDF
A large diverse dataset of multiview posed RGBD images, 3D scans, and more -- of 1.9k scanned + artist-generated buildings + outdoor scenes.


Robustness via Cross-Domain Ensembles. In ICCV, 2021.  [Oral]

PDF Code Project Site
Enforce consistency constraints between different tasks, to improve robustness to various domain shift.


Robust Policies via Mid-Level Visual Representations. In CoRL, 2020.  

Project Site Code PDF
Using large pretrained visual representations improves performance and learning speed for various manipulation tasks with robotic arms, and with sim2real generalization for navigation.


Robust Learning Through Cross-Task Consistency. In CVPR, 2020.  [Best Paper Nominee] [Oral]

Project Site Code PDF
A strong way to enforce consistency constraints across multiple tasks in multi-task learning -- better generalization and ability to detect prediction failure.


Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks. In CVPR, 2020.  [Oral]

Project Site Code PDF
Simple method to adapt and control output of a larger network with a smaller "side" network.


Mid-Level Visual Priors Improve Generalization and Sample Efficiency for Learning Visuomotor Policies. In CoRL, in BayLearn, 2019.  [Oral]

Project Site Code PDF
Using large pretrained visual representations improves performance and learning speed for various navigation tasks.


Education

Internships and Research Positions