Pu Cao

Ph.D. student of Artificial Intelligence

Beijing University of Posts and Telecommunications

Biography

Pu Cao is a second-year Ph.D. student studying at Beijing University of Posts and Telecommunications (BUPT) under the supervision of Prof. Qing Song and Dr. Lu Yang. He is now interested in Computer Vision and am currently working on Image Synthesis.

Interests

Image Synthesis
Multimodal Large Language Models
Visual Representation
Image Detection/Segmentation
Computer Vision

Education

PhD in Artificial Intelligence, 2022
Beijing University of Posts and Telecommunications
BSc in Information and Computational Science, 2018
University of Science and Technology Beijing

Publications

Controllable Generation with Text-to-Image Diffusion Models: A Survey

arXiv 2024.

A survey on controllable generation with text-to-image diffusion models.

Controllable Generation with Text-to-Image Diffusion Models: A Survey

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

arXiv 2024.

Improving editability in text-guided Image editing.

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Concept-centric Personalization with Large-scale Diffusion Priors

ArXiv 2023.

Customize diffusion model for concept-centric generation with high controllability, fidelity, and diversity.

Concept-centric Personalization with Large-scale Diffusion Priors

What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

WACV 2024.

Editing capability decreases ineivitably in previous refinement methods, (e.g., PTI, HFGI, and SAM). In this work, we explore the idea of “divide and conquer” to address this problem. We combine two mainstream refinement mechanisms (i.e., weight ande feature modulation) and achieve extroadinary inversion and editing results.

What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion

LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space

arXiv 2022.

We analyse the resources of “Fidelity, Perception, and Editability” in inversion task and point out that the keypoint is disalignment between inverse latent codes and synthetic distribution. We then propose a simple but efficient and uniform solution in both optimization-based and encoder-based methods.