Tao Wang

Tao Wang is an associate research professor at Sichuan University, Chengdu, China. Prior to that, he received his Ph.D. degree at the National University of Singapore, generously supported by the Institute of Data Science scholarship, advised by Dr. Feng Jiashi, and Dr. Wang Xinchao. He was externally supervised by Prof. Yan Shuicheng during 2020-2021. Prior to Ph.D., He received his bachelor's degree from Yingcai Honors College (an elite school, where candidates are selected from the top 5% of all undergraduates), University of Electronic Science and Technology of China.

Email address: twangnh[dot]ai[at]gmail[dot]com

News

(2025/05) Congrats to my master student Zhengqin Zang, for obtaining Ph.D. scholarship to Zhejiang University(ZJU)!
(2025/03) Obtaining Emei Talent Program support(天府峨眉计划/原四川省千人计划)!
(2024/11) Congrats to my first master student Chenyu Lin, for obtaining Ph.D. scholarship to Hong Kong Baptist University!
(2024/10) 3rd tire award of Jittor Competition, with 10000RMB, congrats to chenyu and other team members!
(2024/05) One paper is accepted by IJCV!
(2024/02) One paper is accepted by TNNLS!
(2024/12) Obtaining Overseas Talent Program support(教育部海外引才专项计划)!
(2023/12) One paper is accepted by AAAI'24! which is our top entry method submission at VisDrone2023 Zero-shot Aerila Object Detection challenge.
(2023/10) 3rd place at Visual Continual Learning Object Detection Challenge at ICCV 2023, congratulations to Chenyu.
(2023/08) One paper is accepted by TNNLS!
(2023/03) PnP-DETR is integrated into detrex! which is a new open-source codebase for detection transformers!
(2023/02) One paper is accepted by CVPR'23!
(2023/02) Our Object Detector Distillation technique is implemented in Yolov5!
(2022/11) CondHead is available on arxiv!
(2022/09) MvP is integrated into XRMoCap, a new open-source PyTorch-based codebase for the use of multi-view motion capture, from OpenXRLab
(2022/03) One paper is accepted by CVPR'22 Oral
(2022/03) T2T-ViT is included in Most Influential ICCV Papers by Paper Digest (rank 3rd in ICCV 2021)
(2022/02) Offered Research Scientist Internship at Facebook AI Research (FAIR).
(2022/01) One paper is accepted by TIP'22.
(2021/09) One paper is accepted by NeurIPS'21.
(2021/07) Two papers accepted by ICCV'21.
(2020/03) Internship at Sea AI Lab
(2020/10) We are best grand challenge winner at ACM MM 2020!
(2020/07) One paper accepted by ECCV'20.
(2020/06) We win 1st place at the ACM MM grand challenge Human in Events Track4.
(2020/06) We win 2st place at the ACM MM grand challenge Human in Events Track2.
(2020/02) Three papers accepted by CVPR'20, two as Oral
(2020/01) Internship at Yitu Tech
(2019/11) Invited talk at ICCV 2019 to present our winner solution on LVIS, glad to meet Ross Girshick!
(2019/10) We win 1st place in the LVIS challenge!
(2019/08) Distilling object detection technology is integrated into product developement at Huawei SG.
(2019/04) Two papers accepted by CVPR'19

Students

Chenyu Lin 2022.09-now Master

Zhengqin Zang 2022.09-now Master

Yifan Wang 2023.09-now Master

Yusheng He 2024.09-now Ph.D.

Xingyu Wang 2024.09-now Master

Jieyu Liu 2024.09-now Master

Featured Works (Refer to GOOGLE scholar for full list)

Open-world Data: Long-tailed Distribution

Learning Box Regression and Mask Segmentation under Long-tailed Distribution with

Gradient Transfusing (CRAT)

Tao Wang, Li Yuan, Jiashi Feng and Xinchao Wang,

IJCV 2024

We study how box regression and mask segmentation are affected by long-tailed distribution and propose CRAT, which is guided by Fisher to augment tail class training during back-propagation.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax

Yu Li, Tao Wang, Bingyi Kang, Sheng Tang, Chunfeng Wang, Jintao Li, Jiashi Feng

CVPR 2020 Oral, [Paper][Code][Video]

Widely adopted by LVIS challenge 2020 and 2021 top-entries
We propose a specifically re-designed softmax classification module which further improves over SimCal on long-tail object detection and instance segmentation.

The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation (SimCal)

Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, and Jiashi Feng

ECCV 2020, [Paper][Code][Video]

Based on our LVIS winner solution, we further extend it and improve the performance by discovering a better initialization strategy.

Joint COCO and Mapillary Workshop at ICCV 2019: LVIS Challenge Track: Classification Calibration for Long-tail Instance Segmentation

Tao Wang, Yu Li, Bingyi Kang, Junnan Li, Junhao Liew, Sheng Tang, Steven Hoi, Jiashi Feng

Winner solution for the 1st LVIS challenge at ICCV 2019 [Tech Report]

Efficient Learning Methods

Zero-Shot Aerial Object Detection with Visual Description Regularization (DescReg)
Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang†(Corresponding Author), Jiancheng Lv
AAAI 2024, [Paper][Code]
We identify the weak semantic-visual correlation challenge in zero-shot aerial object detection domain and propose a visual description regularization method to improve zero-shot detection.

Learning to Detect and Segment for Open Vocabulary Object Detection (CondHead)

Tao Wang, Nan Li

CVPR 2023, [Paper][Code]

CondHead conditions the bounding box regression and mask segmentation on the text embeddings, to facilitate open vocabulary object detection.

PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

Kehong Gong*, Bingbing Li*, Jianfeng Zhang*, Tao Wang*, Jing Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang (* equal contribution)

CVPR 2022 Oral, [Paper][Code][Video]

We construct an effective self-supervised framework for 3D human pose estimation, it is self-improving by generating physically plausible 2D-3D training pose data.

Revisiting Knowledge Distillation via Label Smoothing Regularization

Li Yuan, Francis EH Tay, Guilin Li, Tao Wang, Jiashi Feng

CVPR 2020 Oral, [Paper][Code][Video]

We reveal that knowledge distillation (KD) works as a learned label smoothing regularization, and further propose a novel Teacher-free Knowledge Distillation (Tf-KD) framework.

Distilling Object Detectors with Fine-grained Feature Imitation

Tao Wang, Li Yuan, Xiaopeng Zhang, Jiashi Feng

CVPR 2019, [Paper][Code]

Highly cited work for knowledge distillation of object detection model.
We develop a knowledge distillation (KD) framework for object detection, based on feature-level imitation of the estimated foreground object regions.

Few-shot Adaptive Faster R-CNN

Tao Wang, Xiaopeng Zhang, Li Yuan, Jiashi Feng

CVPR 2019, [Paper][Code]

We reveal that knowledge distillation (KD) works as a learned label smoothing regularization, and further propose a novel Teacher-free Knowledge Distillation (Tf-KD) framework.

Network Architecture Design

SODAR: Segmenting Objects by Dynamically Aggregating Neighboring Mask Representations

Tao Wang, Jun Hao Liew, Yu Li, Yunpeng Chen, Jiashi Feng

TIP 2022, [Paper][Code]

We reveal the usefulness of neighboring mask predictions and introduce a simple and efficient neighbor aggregation method to improve dense instance segmentation models.

Detect Multi-person with 3D Pose Directly from Multi-view images (Multi-view Pose Transformer, MvP)

Tao Wang, Jianfeng Zhang, Yujun Cai, Shuicheng Yan, Jiashi Feng

NeurIPS 2021, [Paper][Code][Industrial Recognition by XRMoCap][Video][Slides]

We develop a simple transformer algorithm that directly detects multi-person and predicts their 3D pose from multi-view images.

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

Li Yuan*, Yunpeng Chen, Tao Wang*, Weihao Yu, Yujun Shi, Zihang Jiang, Francis E.H. Tay, Jiashi Feng, Shuicheng Yan (Work done during internship at Yitu)

ICCV 2021, [Paper][Code][Video][Most Influential ICCV Papers]

We introduce a Tokens-to-Token (T2T) transformation scheme to progressively structurize the image to tokens by recursively aggregating neighboring Tokens.

PnP-DETR: Towards Efficient Visual Analysis with Transformers

Tao Wang, Li Yuan, Yunpeng Chen, Jiashi Feng, Shuicheng Yan

ICCV 2021, [Paper][Code][Video]

We propose Poll and Pool sampling to reduce the spatial redundancy of image features for efficient transformer processing.