Pu Cao

Pu Cao

Ph.D. student of Artificial Intelligence

Beijing University of Posts and Telecommunications

Biography

Pu Cao is a forth-year Ph.D. student studying at Beijing University of Posts and Telecommunications (BUPT) under the supervision of Prof. Qing Song and Dr. Lu Yang. He is now interested in Computer Vision and am currently working on Visual Synthesis and Multimodal Large-language Model.

Interests
  • Visual Generation
  • Multimodal Large Language Models
  • Embodied AI
Education
  • PhD in Artificial Intelligence, 2022

    Beijing University of Posts and Telecommunications

  • BSc in Information and Computational Science, 2018

    University of Science and Technology Beijing

News

  • 2025.11 One paper is accepted by AAAI 2026 (oral).
  • 2025.01 One paper is accepted by CVPR 2025.

Publications

Browse the highlights below or view the complete publications archive.

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
CVPR 2025 .
Pu Cao*, Feng Zhou*, Lu Yang, Tianrui Huang, Qing Song
Empower diffusion model for in-domain generation.
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
AAAI 2025 (Oral) .
Feng Zhou*, Pu Cao*, Yiyang Ma, Lu Yang, Jianqin Yin
Training-free positional encoding fix for high-resolution diffusion generation.
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
IEEE TCSVT 2025 .
Tianrui Huang*, Pu Cao*, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song
Improving editability in text-guided Image editing.
E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance
Controllable Generation with Text-to-Image Diffusion Models: A Survey
arXiv 2024 .
Pu Cao, Feng Zhou, Qing Song, Lu Yang
A survey on controllable generation with text-to-image diffusion models.
Controllable Generation with Text-to-Image Diffusion Models: A Survey
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion
WACV 2024 .
Pu Cao, Lu Yang, Dongxv Liu, Xiaoya Yang, Tianrui Huang, Qing Song
Editing capability decreases ineivitably in previous refinement methods, (e.g., PTI, HFGI, and SAM). In this work, we explore the idea of “divide and conquer” to address this problem. We combine two mainstream refinement mechanisms (i.e., weight ande feature modulation) and achieve extroadinary inversion and editing results.
What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

Projects

UniDiffusion
A Diffusion training toolbox based on diffusers and existing SOTA methods, including Dreambooth, Texual Inversion, LoRA, Custom Diffusion, XTI, ….
UniDiffusion
Awesome Controllable T2I Diffusion Models
A collection of resources on controllable generation with text-to-image diffusion models.
Awesome Controllable T2I Diffusion Models
GAN Inverter
A GAN inversion toolbox based on PyTorch library. We design a unified pipeline for inversion methods and conduct a comprehensive benchmark.
GAN Inverter

Service

Conference & Journal Service

Conferences

  • ICLR 2026
  • CVPR 2025/2026
  • ICCV 2025
  • ECCV 2024 (Outstanding Reviewer)
  • WACV 2024/2025

Journals

  • TPAMI
  • TIP
  • TCSVT
  • TMM
  • TNNLS

Contact