CHAM, Tat Jen
BA PhD Cambridge

College of Computing and Data Science
Nanyang Technological University Singapore

Research Highlights

AgileGAN

(SIGGRAPH 2021)

Portrait stylization involves taking a real portrait photo as input, and converting it into a particular style, e.g. cartoonization. However, it's very time consuming to train different style generators, often requiring a lot of style domain training data. The proposed AgileGAN framework uses a hierarchical variational autoencoder that learns an inverse mapping into the StyleGAN2 latent space, but which is specifically designed to preserves the original code distribution in the mapping. This regularization allows rapid training on a small style domain dataset to fine-tune the generator to get desired stylization results. This makes the framework "agile", and allows many generators to be quickly trained for different styles, e.g. cartoon, oil painting, charcoal, sculpture, etc.

Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chunpong Lai, Chuanxia Zheng, Tat-Jen Cham, AgileGAN: stylizing portraits by inversion-consistent transfer learning, SIGGRAPH, Los Angeles, CA, USA, 2021

[Project site], [Live demo], [Paper: author's page, ACM free access]

Referred Neural Portrait Relighting

(Computer Graphics Forum, 2021)

Ever had casual holiday photos turn out poor due to bad lighting? Our deep learning framework is able to relight portrait photos to the target lighting seen in another reference photo. At the heart of the method is an overcomplete latent representation that redundantly codes for the same lighting under three 90° rotations. This redundancy helps the latent space self-organize better — not only is the relighting better, but it also allows for the light to be interactively manipulated. This image-based framework uses latent representations and neural rendering, without the need for 3D estimation or inverse rendering.

Guoxian Song, Tat-Jen Cham, Jianfei Cai, Jianmin Zheng, Half-body portrait relighting with overcomplete lighting representation, Computer Graphics Forum (CGF), 2021.

[Paper: Wiley, Local]

Spatially-Correlative Loss | F/LSeSim

(CVPR 2021)

Unpaired image-to-image translation is conceptually problematic. Why? Because how do we decide what is a "good" translation for the network to learn? Unlike previous work that use hand-crafted loss functions to get good results in practice, this work proposes to use a key assumption based on a fundamental insight — that the scene structure in two domains are the same when the patterns of self-similarity within each scene are the same. A "good" translation preserves this scene structure, even when the domains are very different. A learned spatially-correlative loss is proposed that capitalizes on this principled insight.

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, The spatially-correlative loss for various image translation tasks, Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021.

[Project site], [Paper: arxiv, CVF]

Pluralistic Image Completion / PICNet

(CVPR 2019)

A deep learning framework for image completion / inpainting that is capable of synthesizing a widely diverse range of plausible results for erased portions of an image. This is unlike previous image completion work that generate only one solution, or with very limited variation, per masked input. The paper provides detailed mathematical analysis as to why directly using conventional approaches (e.g. CVAE, or simple hybrids of) do not work well.

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Pluralistic image completion, Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019.

[Project site], [Paper: arXiv, CVF], [YouTube: short demo video, more results], [Interactive DEMO]

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

(ECCV 2018)

A deep learning framework for estimating 3D depth maps from single images, that utilizes synthetic models in training, without the need to depend on real 3D depth maps that are hard to acquire, and typically very incomplete and noisy. Part of the process involves learning to intensify the realism of OpenGL-rendered images.

Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, T2Net: synthetic-to-realistic translation for solving single-image depth estimation tasks, European Conference on Computer Vision (ECCV), Munich, Germany, 2018.

[Project site], [Paper: arXiv, CVF]

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

(ACM Multimedia 2018)

Wearing VR headsets is not directly suitable for bidirectional telepresence, as the headset occludes the user's eyes and prevents eye contact between multiple users. The system presented here virtually "removes" the headset; by facial tracking and modeling a user's partially visible face, together with input from in-headset eye cameras, a de-occluded user's face is automatically generated.

Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs, Real-time 3D face-eye performance capture of a person wearing VR headset, ACM Multimedia Conference, Seoul, Korea, 2018.

[Project site], [Paper: arXiv, ACM]