Search

Zeyu Wang

Zeyu Wang*

VQ-VA World: Towards High-Quality Visual Question-Visual Answering
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
LightFusion: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation
Emerging properties in unified multimodal pretraining
Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness
What If We Recaption Billions of Web Images with LLaMA-3?
Revisiting Adversarial Training at Scale
Rejuvenating image-GPT as Strong Visual Representation Learners
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning
DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation
An Inverse Scaling Law for CLIP Training
On the Adversarial Robustness of Camera-based 3D Object Detection
Masked Autoencoders Enable Efficient Knowledge Distillers
Can CNNs Be More Robust Than Transformers?
Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines

Powered by the Academic theme for Hugo.