Research Experience

Major Research

“The most exciting phrase to hear in science,
the one that heralds the most discoveries,
is not 'Eureka!', but 'That's funny…”
--Isaac Asimov

Research Theme 1: Perceptual Signal Modeling & Processing

“To Make Machines Perceive Signals as Humans Do”

Signal quality evaluation plays a central role in shaping almost all signal processing algorithms and systems, as well as their implementation, optimization and testing. Since the human (with visual, hearing, touching, smelling and tasting senses) is the ultimate receiver of the majority of signals so far--being natural captured or computer generated--after acquisition, processing and transmission, incorporating proper human perception characteristics not only makes the built systems user-oriented but also enables resource savings (i.e., turning imperfectness of the human perception into advantages in design).

The resultant metrics are to replace the existing, mathematical measure (e.g., MSE, SNR, PSNR, or one of their relatives) to define and gauge the distortion of processed signals, since MSE, SNR or PSNR does not reflect the human perception well. Perceptual metrics are expected to fill a gap in most existing signal processing related products and services; namely, a non-perception-based criterion used in engineering design versus devices/services for the human to consume.

This is an exciting, inter-disciplinary research area since it enables user-oriented designs and further system performance improvement. We need to incorporate the latest relevant findings in neuroscience, brain theory, psychophysics, aesthetics, statistics, and user and cultural studies into computational models, and to verify such models with subjective viewing/hearing/sensing results; i.e., to make perception science truly quantitative.

Some examples of research work under this theme:

A Survey on Perceptual Visual Quality Metrics ( download )
New Just Noticeable Difference (JND model with Separating Edge and Textured Regions (download)
Visual Attention Modeling (download)
Free Energy Principle for Psychovisual Quality Evaluation ( download)
Blind Blur Assessment for Images (download)
New Methodology: learning-based Visual Quality Evaluation (download 1) (download 2) (download 3) (download 3)
Speech Quality Assessment (download)
Visual Quality Assessment Based on Gradient Similarity (download)
Metric Able to Detect Visual Quality Enhancement (download)
Image Retargeting Subjective Quality Database
Image Retargeting Quality Assessment (download) (Source Codes)
a database for Perceptual Quality Assessment of Screen Content Images (SIQAD)
Unified Blind Image Quality Metric for simultaneously coping with natural scene & screen content images (download)
Modeling Users' 'likes' via Deep Learning from Images Being Selected (download)
Multi-metric fusion (MMF) (download), acknowledged as the foundation of VMAF (Video Multimethod Assessment Fusion)*
*the de-facto standard in video quality assessment for premium video content in video streaming industry and video encoding optimization, with open-source code released by Netflix; also included by the Alliance for Open Media (AOM) as one of the quality metrics that track coding efficiency of new video coding tools；VMAF has won the Technology and Engineering Emmy Award , 2020
A benchmark for multi-modality LLMs (MLLMs) on low-level vision and visual quality assessment.

As can be seen in the Publications part, the comprehensive theoretical formulation of JND (Just-Noticeable Distortion) has been introduced in spatial [2003], transform [2005], spatiotemporal [2006], boundary-texture separation [2010], and pattern masking (with brain theory) [2013] aspects, with extension to screen contents [2016] and top-down mechanisms [2022]; we have researched for JND-based protection of privacy/copyright, fighting adversary/deepfake [2021-2023], and exploration beyond the visual into audio, haptics, olfaction and taste [2022]. For visual attention modeling, first quantitative solutions have been proposed for modulatory effect [2005], compressed bitstream directly [2011] (a leap since all visual signals are compressed), spatiotemporal uncertainty [2013], stereoscopic views [2014] and use of deep learning [2019,2020,2022,2023]. Besides several technological breakthroughs in signal-driven IQA (Image Quality Assessment) for natural and partially-artificial images/videos [2003-2022], our effort has led to emergence of machine learning as a new IQA category [2009-2013], by big foundation models [2022,2023] and for generative AI [2023], the first attempt for fine-grained IQA [2019,2022,2023] (an overlooked but important field), and demonstration that foundation models possess IOA capability [2024]. For perceptual video coding, the built perceptual models enable effective resource-allocation and signal optimization/reconstruction [2003,2005,2013-2017], and exploration to 3D visual signals [2022,2023].

Research Theme 2: Visual Signal Coding & Beyond

“To Manage Megabits in Visual Signals”

Great achievement has been made in the area of video coding during the past 3+ decades with the joint effort of academia and industries (resulting in the current ITU/MPEG/JPEG standards and all related products and services launched). Due to the huge data volume in visual signal, images and video still consume the dominant portion of storage/bandwidth for a typical multimedia system today, even after being compressed with state-of-the-art technology, in comparison with other media types (speech, audio, text and others). Therefore, there is a need of further, substantial enhancement of video coding technology, especially with the coming of the big data era. However, after so many years of intensive effort in the field, the room for further improvement in coding efficiency is diminishing in the current technical framework, and it is unlikely to arrive at new coding schemes always just by other rounds of reassembly and optimization of the existing techniques (like in the cases of H.264/HEVC/...). A more revolutionary approach is badly in need.

In our past research, we went back to the basics (e.g., how to define a data frame for coding, how find or even make a reference frame, and which transform should be used and in which situation), to perform a systematic investigation on video coders, and to devise new video coding methods to address the major difficulties in further improving the compression ratio, and to meet the requirements for ubiquity and affordability for multimedia signal representation and communication. Pre-processing, post-processing, adaptation and reconstruction are also important to enhance the received coding quality. It is also noted that videos increasingly contain animations and scenes generated by computers have different characteristics against naturally-captured ones.

Another interesting dimension of work is to build a perceptual model (in conjunction with Theme 1) that accounts for major relevant psychophysical findings, since the human is still the ultimate appreciator and consumer of coded signals in many situations. The phenomena to be modeled may include spatiotemporal contrast sensitivity function, luminance adaptation, visual attention, object motion, eye movement, color/contrast/activity masking, emotional/aesthetic features, task related clues, interaction with other media (like audio, etc), and friendliness with a coder. The resultant model is to be used as the integrated control for the new video coding scheme. Compression of 3D point-clouds is an emerging area of interest and importance.

We have also been exploring the paradigm shift from whole signal coding to feature coding, to enable collaborative intelligence in the AI era. The proposed new paradigm is designed to distribute computing load among edge, cloud and clients with flexibility, transmit data for various analysis tasks, extract more accurate features, and preserve privacy. Research will be extended into true multimedia signals as well, i.e., to include hearing, touching, smelling and tasting ones.

As machines increasingly become the ultimate users of visual signals, research effort has been also extended to machine-task oriented visual signal processing and evaluation.

Some work that we are proud of under this theme:

A Survey on Perceptual Visual Signal Coding & Transmission (download)
Totally non-traditional Frame (optimal compression plane) Formation as Pre-processing for Video Coding: by ignoring the physical meaning of visual data (download)
"Man-made" I-frames -- Most Common Frame in a Scene (McFIS) -- for Video Coding: since a frame in video cannot be ideal for motion estimation and intra-coding, we make one to meet the requirement better (download)
Direct Mode Selection for Video Coding using Phase Correlation (download)
2D Singular Value Decomposition (2D-SVD) for Video Coding: removing the bottleneck when 1D-SVD is used for coding (download)
HD (high definition) Video Coding (download)
Adaptive Downsampling for Low Rate Coding (download)
Robust Image Coding Based upon Compressive Sensing (download)
Using the correlation among coded pixels to reduce coding loss (download)
Dynamic Point Cloud Coding (download)
Compression of intermediate Deep-learnt Features (rather than a whole image/video) (download), accepted by AVS Video Coding (IEEE1857.4) Standard
A survey on multimedia signal manipulations (download)
Just Noticeable Difference for Deep Machine Vision (download)

Please refer to the Publications part for more and updated development.