Han Cai
hcai.hm [at] gmail (dot) com

I am a final-year Ph.D. student at MIT, advised by Prof. Song Han. Before coming to MIT, I received my Master's and Bachelor's degree at Shanghai Jiao Tong University (SJTU), advised by Prof. Yong Yu. At SJTU, I also worked closely with Prof. Weinan Zhang and Prof. Jun Wang.

My research interests lie in machine learning, particularly efficient foudation models (diffusion models, LLMs, etc), EdgeAI and AutoML.

Email  /  Google Scholar  /  GitHub  /  Twitter  /  Linkedin


Competition Awards


Selected Projects

Condition-Aware Neural Network for Controlled Image Generation
CAN is a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. CAN combined with EfficientViT (CaT) achieves 2.78 FID on ImageNet 512x512, surpassing DiT-XL/2 while requiring 52x fewer MACs per sampling step.

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
EfficientViT-SAM is a new family of accelerated segment anything models. We replace SAM's heavy image encoder with EfficientViT. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance.

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction
EfficientViT is a new family of vision models for high-resolution dense prediction. It achieves global receptive field and multi-scale learning with only hardware-efficient operations. EfficientViT delivers remarkable performance gains over previous models with speedup on diverse hardware platforms. [Media: MIT home page, MIT News, Imaging and Machine Vision Europe] [Industry Integration: NVIDIA, HuggingFace] [Code: GitHub (0.9k stars)]

Once for All: Train One Network and Specialize it for Efficient Deployment
OFA is an efficient AutoML technique that decouples model training from architecture search. Train only once, specialize for many hardware platforms, from CPU/GPU to hardware accelerators. OFA consistently outperforms SOTA NAS methods while reducing orders of magnitude GPU hours and CO2 emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top1 accuracy under the mobile setting (<600M FLOPs). OFA is the winning solution for CVPR 2020 Workshop of Low-Power Computer Vision Challenge (FPGA track), 2019 IEEE Low-Power Image Recognition Challenge (classification and detection track), Low-Power Computer Vision Workshop at ICCV 2019 (DSP track). [Media: MIT News, Qualcomm, VentureBeat, SingularityHub] [Industry Integration: Meta, Sony, AMD] [Code: GitHub (1.8k stars), Colab]

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware
ProxylessNAS is an efficient hardware-aware neural architecture search method, which can directly search on large-scale datasets. It can design specialized neural network architecture for different hardware platforms. With >74.5% top-1 accuracy, the latency of ProxylessNAS is 1.8x faster than MobileNetV2. [Media: MIT News, IEEE Spectrum] [Industry Integration: Meta, Amazon, Microsoft] [Code: GitHub (1.4k stars)]

Academic Services

  • Serve as a reviewer for ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, TPAMI, IJCV, ACL, EMNLP