<RETURN_TO_BASE

CompleteMe: Revolutionary AI for Human Image Restoration and Editing

'CompleteMe, a new AI system developed by UC Merced and Adobe, advances human image completion by using reference images and novel attention mechanisms to restore occluded parts with outstanding detail and semantic accuracy.'

Advanced Human Image Completion with CompleteMe

A groundbreaking collaboration between the University of California Merced and Adobe has led to the development of CompleteMe, a new reference-based human image completion system. This technology specializes in restoring and editing obscured or occluded parts of human images, enabling applications such as virtual try-on, animation, and photo-editing.

How CompleteMe Works

CompleteMe employs a dual U-Net architecture combined with a Region-Focused Attention (RFA) block, which concentrates the model’s resources on relevant image areas. It integrates multiple reference images that provide detailed spatial features of various body regions. These features are selectively masked and merged with global semantic information extracted using CLIP, allowing the system to recreate missing content with high visual fidelity and semantic coherence.

Reference U-Net and Cohesive U-Net

The Reference U-Net, initialized from Stable Diffusion 1.5 (without the diffusion noise step), encodes multiple reference images covering different body parts into latent feature representations. Simultaneously, the Cohesive U-Net processes the masked input image and incorporates the reference features through the RFA block and a decoupled cross-attention mechanism inspired by the IP-Adapter framework. This design enables the model to preserve identity and fine details while reconstructing occluded human sections.

Benchmarking and Evaluation

To measure performance, the researchers created a novel benchmark dataset curated from the WPose dataset and Adobe Research’s UniHuman project. The dataset includes masked source images paired with multiple references and associated textual labels generated by the LLaVA large language model.

CompleteMe was trained on 40,000 image pairs using Stable Diffusion V1.5 models and evaluated against several state-of-the-art reference-based and non-reference-based methods. It achieved top scores on most perceptual and semantic metrics, including CLIP-I, DINO, DreamSim, and LPIPS, demonstrating superior visual quality and identity preservation, especially in complex poses and intricate clothing scenarios.

Qualitative and User Study Results

Qualitative comparisons reveal that CompleteMe outperforms competitors by accurately reproducing unique details such as tattoos and clothing patterns, which other methods struggle to replicate without reference images. A user study involving 15 annotators and nearly 2,900 sample pairs confirmed CompleteMe’s superiority in both visual quality and identity preservation over competitors like Paint-by-Example, AnyDoor, LeftRefill, and MimicBrush.

Limitations and Availability

Despite its impressive capabilities, CompleteMe's code is not publicly available, and the project appears to be proprietary. The system relies on the legacy Stable Diffusion 1.5 architecture, favored by researchers for its less restrictive nature and ease of training.

Summary

CompleteMe represents a significant advance in human image restoration and editing AI, offering highly detailed, semantically coherent completions by leveraging multiple reference images and novel architectural components to tackle challenging occlusion scenarios in images of people.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский