In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing

🐢University of Maryland, College Park, Adobe Research
CVPR 2024

Abstract

3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts. GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code. However, a model pre-trained on a particular dataset (e.g., FFHQ) often has difficulty reconstructing images with out-of-distribution (OOD) objects such as faces with heavy make-up or occluding objects. We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs. Our core idea is to represent the image using two individual neural radiance fields: one for the in-distribution content and the other for the out-of-distribution object. The final reconstruction is achieved by optimizing the composition of these two radiance fields with carefully designed regularization. We demonstrate that our explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate reconstruction accuracy and editability of our method on challenging real face images and videos and showcase favorable results against other baselines.

Method Overview

method overview

Single image

Reconstruction on Internet Images

Our method can be applied to diverse images from the Internet. We show reconstruction results from single images below.



Input

Input Image

W

Method Image

Ours

Ours Image


Semantic editing on Internet Images

We can edit it with available semantic editing methods, e.g., InterfaceGAN, StyleCLIP.

Slide to switch between different images

0/5

Click to switch between different methods or editing directions.

Method: W W+ IDE-3D PTI GOAE HFGI3D
Editing: eyeglasses Elsa younger smile surprised

eyeglasses



View synthesis on Internet Images

Unlike HFGI3D, which warps and computes visibility map, resulting a time-consuming optimization, our method only relies on a depth map from MiDaS, and can synthesize faithful views.



Slide to switch between different images

0/3
Method: W W+ IDE-3D PTI GOAE HFGI3D


Videos

Reconstruction on Internet Videos

Our method can be applied to diverse videos from the Internet. We show reconstruction results below.

Slide to switch between different videos

0/2
Method: W W+ IDE-3D PTI GOAE HFGI3D VIVE3D


Semantic editing on Internet Videos

We can edit it with available semantic editing methods, e.g., InterfaceGAN, StyleCLIP.

Slide to switch between different videos.

0/3
Method: W W+ IDE-3D PTI GOAE HFGI3D VIVE3D
Editing: eyeglasses Elsa younger smile surprised

eyeglasses



View synthesis on Internet videos

After reconstruction, we can acquire novel views.

Slide to switch between different videos

0/3
Method: W W+ IDE-3D PTI GOAE HFGI3D VIVE3D




OOD object removal on Internet videos

By setting the weight of OOD pixels to 0, we can remove the OOD object.





Ablation study

w/o blending weight regularization

w/ blending weight regularization (Eqn.6)

w/o blending weight regularization

w/ blending weight regularization (Eqn.6)



Failure cases

OOD dominates: When editing on the OOD region, e.g., adding eyeglasses to the heavy makeup region, because the blending weights are closer to 1, the eyeglasses in the in-distribution radiance field are hard to be added.

Double glasses: Since our OOD radiance field has no knowledge about the GAN and faces prior, when the OOD object itself is glasses, adding eyeglasses introduces duplicate objects.



BibTeX

@article{xu2023video3deditgan,
  author    = {Xu, Yiran and Shu, Zhixin and Smith, Cameron and Oh, Seoung Wug and Huang, Jia-Bin},
  title     = {In-N-Out: Face Video Inversion and Editing with Volumetric Decomposition},
  journal   = {arXiv preprint arXiv: 2302.04871},
  year      = {2023},
}