Publications

(2024). What If We Recaption Billions of Web Images with LLaMA-3?. ArXiv Preprint. * denotes equal contribution.

PDF Code Dataset Project

(2024). Revisiting Adversarial Training at Scale. CVPR2024. * denotes equal contribution.

PDF Code

(2023). Rejuvenating image-GPT as Strong Visual Representation Learners. ICML2024. * denotes equal contribution.

PDF Code

(2023). Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training. CVPR2024.

PDF Code

(2023). FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning. TMLR2024. * denotes equal contribution.

PDF Code

(2023). DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation. ICCV2023. * denotes equal contribution.

PDF Code

(2023). An Inverse Scaling Law for CLIP Training. NeurIPS2023. * denotes equal contribution.

PDF Code

(2023). On the Adversarial Robustness of Camera-based 3D Object Detection. TMLR2024..

PDF Code

(2022). Masked Autoencoders Enable Efficient Knowledge Distillers. CVPR2023.

PDF Code

(2022). Can CNNs Be More Robust Than Transformers?. ICLR2023.

PDF Code

(2019). Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. AAAI2020. * denotes equal contribution.

PDF Code