default search action
Zheng Shou 0001
Person information
- affiliation: National University of Singapore
- affiliation (former): Columbia University, New York, NY, USA
Other persons with the same name
- Zheng Shou — disambiguation page
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
- [j7]Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai:
A large cross-modal video retrieval dataset with reading comprehension. Pattern Recognit. 157: 110818 (2025) - 2024
- [j6]Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou:
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels. Int. J. Comput. Vis. 132(3): 731-749 (2024) - [j5]Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan:
Enhancing Visual Grounding in Vision-Language Pre-Training With Position-Guided Text Prompts. IEEE Trans. Pattern Anal. Mach. Intell. 46(5): 3406-3421 (2024) - [j4]Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou:
Continual Learning for Image Segmentation With Dynamic Query. IEEE Trans. Circuits Syst. Video Technol. 34(6): 4874-4886 (2024) - [j3]Ming Li, Huazhu Fu, Shengfeng He, Hehe Fan, Jun Liu, Jussi Keppo, Mike Zheng Shou:
DR-FER: Discriminative and Robust Representation Learning for Facial Expression Recognition. IEEE Trans. Multim. 26: 6297-6309 (2024) - [c82]Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou:
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model. CVPR 2024: 1481-1490 - [c81]Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang:
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence. CVPR 2024: 7621-7630 - [c80]Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Mike Zheng Shou:
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis. CVPR 2024: 7631-7640 - [c79]Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou:
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing. CVPR 2024: 7664-7674 - [c78]Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou:
X- Adapter: Universal Compatibility of Plugins for Upgraded Diffusion Model. CVPR 2024: 8775-8784 - [c77]Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou:
AssistGUI: Task-Oriented PC Graphical User Interface Automation. CVPR 2024: 13289-13298 - [c76]Jinheng Xie, Songhe Deng, Bing Li, Haozhe Liu, Yawen Huang, Yefeng Zheng, Jürgen Schmidhuber, Bernard Ghanem, Linlin Shen, Mike Zheng Shou:
Tune-an-Ellipse: CLIP Has Potential to Find what you Want. CVPR 2024: 13723-13732 - [c75]Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou:
Bootstrapping SparseFormers from Vision Foundation Models. CVPR 2024: 17710-17721 - [c74]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CVPR 2024: 18407-18418 - [c73]Jingtao Sun, Yaonan Wang, Mingtao Feng, Yulan Guo, Ajmal Mian, Mike Zheng Shou:
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream. CVPR 2024: 21146-21156 - [c72]Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou:
VIT-LENS: Towards Omni-modal Representations. CVPR 2024: 26637-26647 - [c71]Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou:
GENIXER: Empowering Multimodal Large Language Model as a Powerful Data Generator. ECCV (23) 2024: 129-147 - [c70]Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jia-Wei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou:
MotionDirector: Motion Customization of Text-to-Video Diffusion Models. ECCV (56) 2024: 273-290 - [c69]Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang:
DragAnything: Motion Control for Anything Using Entity Representation. ECCV (22) 2024: 331-348 - [c68]Hai Ci, Pei Yang, Yiren Song, Mike Zheng Shou:
RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-key Identification. ECCV (28) 2024: 338-354 - [c67]Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou:
Parrot Captions Teach CLIP to Spot Text. ECCV (42) 2024: 368-385 - [c66]Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou:
Learning Video Context as Interleaved Multimodal Sequences. ECCV (49) 2024: 375-396 - [c65]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-Leaf: A Learnable Auditory Front-End for Spiking Neural Networks. ICASSP 2024: 226-230 - [c64]Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou:
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens. ICLR 2024 - [c63]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. IJCAI 2024: 3160-3168 - [c62]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. IJCAI 2024: 5862-5871 - [c61]Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. ACM Multimedia 2024: 6842-6850 - [c60]Difei Gao, Siyuan Hu, Zechen Bai, Qinghong Lin, Mike Zheng Shou:
AssistEditor: Multi-Agent Collaboration for GUI Workflow Automation in Video Creation. ACM Multimedia 2024: 11255-11257 - [i135]Alex Jinpeng Wang, Linjie Li, Kevin Qinghong Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training. CoRR abs/2401.00849 (2024) - [i134]David Junhao Zhang, Dongxu Li, Hung Le, Mike Zheng Shou, Caiming Xiong, Doyen Sahoo:
Moonshot: Towards Controllable Video Generation and Editing with Multimodal Conditions. CoRR abs/2401.01827 (2024) - [i133]Jay Zhangjie Wu, Guian Fang, Haoning Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, Yuchao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou:
Towards A Better Metric for Text-to-Video Generation. CoRR abs/2401.07781 (2024) - [i132]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces. CoRR abs/2401.13516 (2024) - [i131]Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou:
Skip \n: A Simple Method to Reduce Hallucination in Large Vision-Language Models. CoRR abs/2402.01345 (2024) - [i130]Zechen Bai, Peng Chen, Xiaolan Peng, Lu Liu, Hui Chen, Mike Zheng Shou, Feng Tian:
Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters. CoRR abs/2402.13724 (2024) - [i129]Weijia Wu, Zhuang Li, Yuchao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang:
DragAnything: Motion Control for Anything using Entity Representation. CoRR abs/2403.07420 (2024) - [i128]Jingtao Sun, Yaonan Wang, Mingtao Feng, Chao Ding, Mike Zheng Shou, Ajmal Saeed Mian:
Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation. CoRR abs/2403.12728 (2024) - [i127]Wentian Zhang, Haozhe Liu, Jinheng Xie, Francesco Faccio, Mike Zheng Shou, Jürgen Schmidhuber:
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models. CoRR abs/2404.02747 (2024) - [i126]Hai Ci, Pei Yang, Yiren Song, Mike Zheng Shou:
RingID: Rethinking Tree-Ring Watermarking for Enhanced Multi-Key Identification. CoRR abs/2404.14055 (2024) - [i125]Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou:
Learning Long-form Video Prior via Generative Pre-Training. CoRR abs/2404.15909 (2024) - [i124]Zechen Bai, Pichao Wang, Tianjun Xiao, Tong He, Zongbo Han, Zheng Zhang, Mike Zheng Shou:
Hallucination of Multimodal Large Language Models: A Survey. CoRR abs/2404.18930 (2024) - [i123]Henry Hengyuan Zhao, Pan Zhou, Difei Gao, Mike Zheng Shou:
LOVA3: Learning to Visual Question Answering, Asking and Assessment. CoRR abs/2405.14974 (2024) - [i122]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Multi-Modal Generative Embedding Model. CoRR abs/2405.19333 (2024) - [i121]Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun:
Visual Perception by Large Language Model's Weights. CoRR abs/2405.20339 (2024) - [i120]Alex Jinpeng Wang, Linjie Li, Yiqi Lin, Min Li, Lijuan Wang, Mike Zheng Shou:
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning. CoRR abs/2406.02547 (2024) - [i119]Yiren Song, Shijie Huang, Chen Yao, Xiaojun Ye, Hai Ci, Jiaming Liu, Yuxuan Zhang, Mike Zheng Shou:
ProcessPainter: Learn Painting Process from Sequence Data. CoRR abs/2406.06062 (2024) - [i118]Hai Ci, Yiren Song, Pei Yang, Jinheng Xie, Mike Zheng Shou:
WMAdapter: Adding WaterMark Control to Latent Diffusion Models. CoRR abs/2406.08337 (2024) - [i117]Pei Yang, Hai Ci, Yiren Song, Mike Zheng Shou:
Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious? CoRR abs/2406.09026 (2024) - [i116]Kevin Qinghong Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Zheng Shou:
VideoGUI: A Benchmark for GUI Automation from Instructional Videos. CoRR abs/2406.10227 (2024) - [i115]Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin Qinghong Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Zheng Shou:
VideoLLM-online: Online Video Large Language Model for Streaming Video. CoRR abs/2406.11816 (2024) - [i114]Qinchen Wu, Difei Gao, Kevin Qinghong Lin, Zhuoyu Wu, Xiangwu Guo, Peiran Li, Weichen Zhang, Hengxu Wang, Mike Zheng Shou:
GUI Action Narrator: Where and When Did That Action Take Place? CoRR abs/2406.13719 (2024) - [i113]Yang Wang, Haiyang Mei, Qirui Bao, Ziqi Wei, Mike Zheng Shou, Haizhou Li, Bo Dong, Xin Yang:
Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition. CoRR abs/2407.09521 (2024) - [i112]Kevin Qinghong Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Zheng Shou:
Learning Video Context as Interleaved Multimodal Sequences. CoRR abs/2407.21757 (2024) - [i111]Zechen Bai, Tianjun Xiao, Tong He, Pichao Wang, Zheng Zhang, Thomas Brox, Mike Zheng Shou:
GQE: Generalized Query Expansion for Enhanced Text-Video Retrieval. CoRR abs/2408.07249 (2024) - [i110]Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou:
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation. CoRR abs/2408.12528 (2024) - [i109]Shiwei Wu, Joya Chen, Kevin Qinghong Lin, Qimeng Wang, Yan Gao, Qianli Xu, Tong Xu, Yao Hu, Enhong Chen, Mike Zheng Shou:
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation. CoRR abs/2408.16730 (2024) - [i108]Zongbo Han, Jialong Yang, Junfan Li, Qinghua Hu, Qianli Xu, Mike Zheng Shou, Changqing Zhang:
DOTA: Distributional Test-Time Adaptation of Vision-Language Models. CoRR abs/2409.19375 (2024) - [i107]Zhongcong Xu, Chaoyue Song, Guoxian Song, Jianfeng Zhang, Jun Hao Liew, Hongyi Xu, You Xie, Linjie Luo, Guosheng Lin, Jiashi Feng, Mike Zheng Shou:
High Quality Human Image Animation using Regional Supervision and Motion Blur Condition. CoRR abs/2409.19580 (2024) - [i106]Zechen Bai, Tong He, Haiyang Mei, Pichao Wang, Ziteng Gao, Joya Chen, Lei Liu, Zheng Zhang, Mike Zheng Shou:
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos. CoRR abs/2409.19603 (2024) - [i105]Ziyu Wang, Shuangpeng Han, Mike Zheng Shou, Mengmi Zhang:
Unsupervised Prior Learning: Discovering Categorical Pose Priors from Videos. CoRR abs/2410.03858 (2024) - [i104]Yepeng Liu, Yiren Song, Hai Ci, Yu Zhang, Haofan Wang, Mike Zheng Shou, Yuheng Bu:
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise. CoRR abs/2410.05470 (2024) - [i103]Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, Yuchao Gu, Lingmin Ran, Xiang Wang, Jay Zhangjie Wu, Junhao Zhang, Yingya Zhang, Mike Zheng Shou:
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models. CoRR abs/2410.07133 (2024) - 2023
- [j2]Wenqian Wang, Faliang Chang, Junhao Zhang, Rui Yan, Chunsheng Liu, Bin Wang, Mike Zheng Shou:
Magi-Net: Meta Negative Network for Early Activity Prediction. IEEE Trans. Image Process. 32: 3254-3265 (2023) - [c59]Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou:
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. AAAI 2023: 1250-1259 - [c58]Rui Yan, Mike Zheng Shou, Yixiao Ge, Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang:
Video-Text Pre-training with Learned Regions for Retrieval. AAAI 2023: 3100-3108 - [c57]Binjie Zhang, Shupeng Su, Yixiao Ge, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan:
Darwinian Model Upgrades: Model Evolving with Selective Compatibility. AAAI 2023: 3393-3400 - [c56]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Mike Zheng Shou, Nan Duan:
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding. ACL (1) 2023: 8013-8028 - [c55]Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Shou:
DOAD: Decoupled One Stage Action Detection Network. CVPR Workshops 2023: 3123-3232 - [c54]Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou:
Making Vision Transformers Efficient from A Token Sparsification View. CVPR 2023: 6195-6205 - [c53]Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Kevin Qinghong Lin, Satoshi Tsutsui, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
All in One: Exploring Unified Video-Language Pre-Training. CVPR 2023: 6598-6608 - [c52]Joya Chen, Difei Gao, Kevin Qinghong Lin, Mike Zheng Shou:
Affordance Grounding from Demonstration Video to Target Image. CVPR 2023: 6799-6808 - [c51]Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou:
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. CVPR 2023: 14773-14783 - [c50]Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang:
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval. CVPR 2023: 14846-14855 - [c49]Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan:
Position-Guided Text Prompt for Vision-Language Pre-Training. CVPR 2023: 23242-23251 - [c48]Muhammet Ilaslan, Chenan Song, Joya Chen, Difei Gao, Weixian Lei, Qianli Xu, Joo Lim, Mike Zheng Shou:
GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations. EMNLP 2023: 10462-10479 - [c47]Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen:
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models. ICCV 2023: 1206-1217 - [c46]Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. ICCV 2023: 2782-2792 - [c45]Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou:
Too Large; Data Reduction for Vision-Language Pre-Training. ICCV 2023: 3124-3134 - [c44]Ming Li, Xiangyu Xu, Hehe Fan, Pan Zhou, Jun Liu, Jia-Wei Liu, Jiahe Li, Jussi Keppo, Mike Zheng Shou, Shuicheng Yan:
STPrivacy: Spatio-Temporal Privacy-Preserving Action Recognition. ICCV 2023: 5083-5092 - [c43]Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang:
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone. ICCV 2023: 5262-5274 - [c42]Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou:
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion. ICCV 2023: 7418-7427 - [c41]Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. ICCV 2023: 7589-7599 - [c40]Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Difei Gao, Morgan B. Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang:
Learning to Learn: How to Continuously Teach Humans and Machines. ICCV 2023: 11674-11685 - [c39]Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He:
Unsupervised Open-Vocabulary Object Localization in Videos. ICCV 2023: 13701-13709 - [c38]Jia-Wei Liu, Yan-Pei Cao, Tianyuan Yang, Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video. ICCV 2023: 18437-18448 - [c37]Jay Zhangjie Wu, David Junhao Zhang, Wynne Hsu, Mengmi Zhang, Mike Zheng Shou:
Label-Efficient Online Continual Object Detection in Streaming Video. ICCV 2023: 19189-19198 - [c36]Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou:
Revisiting Vision Transformer from the View of Path Ensemble. ICCV 2023: 19832-19842 - [c35]Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai:
ICDAR 2023 Competition on Video Text Reading for Dense and Small Text. ICDAR (2) 2023: 405-419 - [c34]Beng Chin Ooi, Gang Chen, Mike Zheng Shou, Kian-Lee Tan, Anthony K. H. Tung, Xiaokui Xiao, James Wei Luen Yip, Bingxue Zhang, Meihui Zhang:
The Metaverse Data Deluge: What Can We Do About It? ICDE 2023: 3675-3687 - [c33]Eric Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou:
PV3D: A 3D Generative Model for Portrait Video Generation. ICLR 2023 - [c32]Mike Zheng Shou:
Large Generative Models Meet Multimodal Video Intelligence. LGM3A@MM 2023: 1 - [c31]Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou:
Transformer-based Open-world Instance Segmentation with Cross-task Consistency Regularization. ACM Multimedia 2023: 2507-2515 - [c30]Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou:
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models. NeurIPS 2023 - [c29]Ziyu Wang, Mike Zheng Shou, Mengmi Zhang:
Object-centric Learning with Cyclic Walks between Parts and Whole. NeurIPS 2023 - [c28]Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen:
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models. NeurIPS 2023 - [c27]Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou:
Learning Visual Prior via Generative Pre-Training. NeurIPS 2023 - [c26]Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Jiashi Feng, Mike Zheng Shou:
XAGen: 3D Expressive Human Avatars Generation. NeurIPS 2023 - [i102]Ming Li, Jun Liu, Hehe Fan, Jiawei Liu, Jiahe Li, Mike Zheng Shou, Jussi Keppo:
STPrivacy: Spatio-Temporal Tubelet Sparsification and Anonymization for Privacy-preserving Action Recognition. CoRR abs/2301.03046 (2023) - [i101]Ziyu Wang, Mike Zheng Shou, Mengmi Zhang:
Object-centric Learning with Cyclic Walks between Parts and Whole. CoRR abs/2302.08023 (2023) - [i100]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Zheng Qin, Mike Zheng Shou:
DeepfakeMAE: Facial Part Consistency Aware Masked Autoencoder for Deepfake Video Detection. CoRR abs/2303.01740 (2023) - [i99]Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou:
Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm. CoRR abs/2303.07910 (2023) - [i98]Shuning Chang, Pichao Wang, Ming Lin, Fan Wang, David Junhao Zhang, Rong Jin, Mike Zheng Shou:
Making Vision Transformers Efficient from A Token Sparsification View. CoRR abs/2303.08685 (2023) - [i97]Weijia Wu, Yuzhong Zhao, Mike Zheng Shou, Hong Zhou, Chunhua Shen:
DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic Segmentation Using Diffusion Models. CoRR abs/2303.11681 (2023) - [i96]Joya Chen, Difei Gao, Kevin Qinghong Lin, Mike Zheng Shou:
Affordance Grounding from Demonstration Video to Target Image. CoRR abs/2303.14644 (2023) - [i95]Shuning Chang, Pichao Wang, Fan Wang, Jiashi Feng, Mike Zheng Shou:
DOAD: Decoupled One Stage Action Detection Network. CoRR abs/2304.00254 (2023) - [i94]Ziteng Gao, Zhan Tong, Limin Wang, Mike Zheng Shou:
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens. CoRR abs/2304.03768 (2023) - [i93]Binqian Xu, Xiangbo Shu, Rui Yan, Guo-Sen Xie, Yixiao Ge, Mike Zheng Shou:
Attack is Good Augmentation: Towards Skeleton-Contrastive Representation Learning. CoRR abs/2304.04023 (2023) - [i92]Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Mike Zheng Shou, Umapada Pal, Dimosthenis Karatzas, Xiang Bai:
ICDAR 2023 Video Text Reading Competition for Dense and Small Text. CoRR abs/2304.04376 (2023) - [i91]Jinheng Xie, Zhaochuan Luo, Yuexiang Li, Haozhe Liu, Linlin Shen, Mike Zheng Shou:
Open-World Weakly-Supervised Object Localization. CoRR abs/2304.08271 (2023) - [i90]Jiawei Liu, Yan-Pei Cao, Tianyuan Yang, Eric Zhongcong Xu, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single Video. CoRR abs/2304.12281 (2023) - [i89]Weijia Wu, Yuzhong Zhao, Zhuang Li, Jiahong Li, Hong Zhou, Mike Zheng Shou, Xiang Bai:
A Large Cross-Modal Video Retrieval Dataset with Reading Comprehension. CoRR abs/2305.03347 (2023) - [i88]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection. CoRR abs/2305.05943 (2023) - [i87]Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou:
VisorGPT: Learning Visual Prior via Generative Pre-Training. CoRR abs/2305.13777 (2023) - [i86]Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou:
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models. CoRR abs/2305.18292 (2023) - [i85]Alex Jinpeng Wang, Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei, Mike Zheng Shou:
Too Large; Data Reduction for Vision-Language Pre-Training. CoRR abs/2305.20087 (2023) - [i84]Difei Gao, Lei Ji, Luowei Zhou, Kevin Qinghong Lin, Joya Chen, Zihan Fan, Mike Zheng Shou:
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn. CoRR abs/2306.08640 (2023) - [i83]Binjie Zhang, Yixiao Ge, Xuyuan Xu, Ying Shan, Mike Zheng Shou:
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter. CoRR abs/2306.12642 (2023) - [i82]Zhijian Hou, Lei Ji, Difei Gao, Wanjun Zhong, Kun Yan, Chao Li, Wing-Kwong Chan, Chong-Wah Ngo, Nan Duan, Mike Zheng Shou:
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023. CoRR abs/2306.15255 (2023) - [i81]Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang:
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone. CoRR abs/2307.05463 (2023) - [i80]Jinheng Xie, Yuexiang Li, Yawen Huang, Haozhe Liu, Wentian Zhang, Yefeng Zheng, Mike Zheng Shou:
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion. CoRR abs/2307.10816 (2023) - [i79]Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou:
UniVTG: Towards Unified Video-Language Temporal Grounding. CoRR abs/2307.16715 (2023) - [i78]Weijia Wu, Yuzhong Zhao, Hao Chen, Yuchao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen:
DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models. CoRR abs/2308.06160 (2023) - [i77]Shuning Chang, Pichao Wang, Hao Luo, Fan Wang, Mike Zheng Shou:
Revisiting Vision Transformer from the View of Path Ensemble. CoRR abs/2308.06548 (2023) - [i76]David Junhao Zhang, Mutian Xu, Chuhui Xue, Wenqing Zhang, Xiaoguang Han, Song Bai, Mike Zheng Shou:
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks. CoRR abs/2308.06739 (2023) - [i75]Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou:
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces. CoRR abs/2308.09921 (2023) - [i74]Weixian Lei, Yixiao Ge, Jianfeng Zhang, Dylan Sun, Kun Yi, Ying Shan, Mike Zheng Shou:
ViT-Lens: Towards Omni-modal Representations. CoRR abs/2308.10185 (2023) - [i73]David Junhao Zhang, Heng Wang, Chuhui Xue, Rui Yan, Wenqing Zhang, Song Bai, Mike Zheng Shou:
Dataset Condensation via Generative Model. CoRR abs/2309.07698 (2023) - [i72]Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou:
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels. CoRR abs/2309.08513 (2023) - [i71]Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li:
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks. CoRR abs/2309.09469 (2023) - [i70]Ke Fan, Zechen Bai, Tianjun Xiao, Dominik Zietlow, Max Horn, Zixu Zhao, Carl-Johann Simon-Gabriel, Mike Zheng Shou, Francesco Locatello, Bernt Schiele, Thomas Brox, Zheng Zhang, Yanwei Fu, Tong He:
Unsupervised Open-Vocabulary Object Localization in Videos. CoRR abs/2309.09858 (2023) - [i69]Xizhe Xue, Haokui Zhang, Ying Li, Liuwei Wan, Zongwen Bai, Mike Zheng Shou:
Bridging Sensor Gaps via Single-Direction Tuning for Hyperspectral Image Classification. CoRR abs/2309.12865 (2023) - [i68]David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou:
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation. CoRR abs/2309.15818 (2023) - [i67]Rui Zhao, Yuchao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou:
MotionDirector: Motion Customization of Text-to-Video Diffusion Models. CoRR abs/2310.08465 (2023) - [i66]Jiawei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, Yuchao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou:
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing. CoRR abs/2310.10624 (2023) - [i65]Jinbin Bai, Zhen Dong, Aosong Feng, Xiao Zhang, Tian Ye, Kaicheng Zhou, Mike Zheng Shou:
Integrating View Conditions for Image Synthesis. CoRR abs/2310.16002 (2023) - [i64]Jay Zhangjie Wu, Xiuyu Li, Difei Gao, Zhen Dong, Jinbin Bai, Aishani Singh, Xiaoyu Xiang, Youzeng Li, Zuwei Huang, Yuanxi Sun, Rui He, Feng Hu, Junhua Hu, Hai Huang, Hanyu Zhu, Xu Cheng, Jie Tang, Mike Zheng Shou, Kurt Keutzer, Forrest N. Iandola:
CVPR 2023 Text Guided Video Editing Competition. CoRR abs/2310.16003 (2023) - [i63]Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Jiashi Feng, Mike Zheng Shou:
XAGen: 3D Expressive Human Avatars Generation. CoRR abs/2311.13574 (2023) - [i62]Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang:
Paragraph-to-Image Generation with Information-Enriched Diffusion Model. CoRR abs/2311.14284 (2023) - [i61]Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou:
ViT-Lens-2: Gateway to Omni-modal Intelligence. CoRR abs/2311.16081 (2023) - [i60]Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu, Chenxu Zhang, Jiashi Feng, Mike Zheng Shou:
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model. CoRR abs/2311.16498 (2023) - [i59]Weijia Wu, Yuzhong Zhao, Zhuang Li, Lianlei Shan, Hong Zhou, Mike Zheng Shou:
Continual Learning for Image Segmentation with Dynamic Query. CoRR abs/2311.17450 (2023) - [i58]Yanqing Liu, Kai Wang, Wenqi Shao, Ping Luo, Yu Qiao, Mike Zheng Shou, Kaipeng Zhang, Yang You:
MLLMs-Augmented Visual-Language Representation Learning. CoRR abs/2311.18765 (2023) - [i57]Bardienus Pieter Duisterhof, Zhao Mandi, Yunchao Yao, Jia-Wei Liu, Mike Zheng Shou, Shuran Song, Jeffrey Ichnowski:
MD-Splatting: Learning Metric Deformation from 4D Gaussians in Highly Deformable Scenes. CoRR abs/2312.00583 (2023) - [i56]Ziteng Gao, Zhan Tong, Kevin Qinghong Lin, Joya Chen, Mike Zheng Shou:
Bootstrapping SparseFormers from Vision Foundation Models. CoRR abs/2312.01987 (2023) - [i55]Yufei Shi, Beijia Lu, Jia-Wei Liu, Ming Li, Mike Zheng Shou:
ColonNeRF: Neural Radiance Fields for High-Fidelity Long-Sequence Colonoscopy Reconstruction. CoRR abs/2312.02015 (2023) - [i54]Yuchao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang:
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence. CoRR abs/2312.02087 (2023) - [i53]Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou:
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model. CoRR abs/2312.02238 (2023) - [i52]Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou:
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator. CoRR abs/2312.06731 (2023) - [i51]Qi Mao, Lan Chen, Yuchao Gu, Zhen Fang, Mike Zheng Shou:
MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance. CoRR abs/2312.11396 (2023) - [i50]Difei Gao, Lei Ji, Zechen Bai, Mingyu Ouyang, Peiran Li, Dongxing Mao, Qinchen Wu, Weichen Zhang, Peiyi Wang, Xiangwu Guo, Hengxu Wang, Luowei Zhou, Mike Zheng Shou:
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation. CoRR abs/2312.13108 (2023) - [i49]Weijia Mao, Yan-Pei Cao, Jia-Wei Liu, Zhongcong Xu, Mike Zheng Shou:
ShowRoom3D: Text to High-Quality 3D Room Generation Using 3D Priors. CoRR abs/2312.13324 (2023) - [i48]Yiqi Lin, Conghui He, Alex Jinpeng Wang, Bin Wang, Weijia Li, Mike Zheng Shou:
Parrot Captions Teach CLIP to Spot Text. CoRR abs/2312.14232 (2023) - 2022
- [j1]Meng Cao, Can Zhang, Long Chen, Mike Zheng Shou, Yuexian Zou:
Deep Motion Prior for Weakly-Supervised Temporal Action Localization. IEEE Trans. Image Process. 31: 5203-5213 (2022) - [c25]Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Object-aware Video-language Pre-training for Retrieval. CVPR 2022: 3303-3312 - [c24]Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan:
Unified Transformer Tracker for Object Tracking. CVPR 2022: 8771-8780 - [c23]Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolár, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbeláez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard A. Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik:
Ego4D: Around the World in 3, 000 Hours of Egocentric Video. CVPR 2022: 18973-18990 - [c22]David Junhao Zhang, Kunchang Li, Yali Wang, Yunpeng Chen, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou:
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning. ECCV (35) 2022: 230-248 - [c21]Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant. ECCV (36) 2022: 485-501 - [c20]Yuxuan Wang, Difei Gao, Licheng Yu, Weixian Lei, Matt Feiszli, Mike Zheng Shou:
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval. ECCV (35) 2022: 709-725 - [c19]Weixian Lei, Difei Gao, Yuxuan Wang, Dongxing Mao, Zihan Liang, Lingmin Ran, Mike Zheng Shou:
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant. EMNLP (Findings) 2022: 319-338 - [c18]Shuning Chang, Pichao Wang, Fan Wang, Hao Li, Zheng Shou:
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation. HCMA@MM 2022: 41-50 - [c17]Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou:
AVA-AVD: Audio-visual Speaker Diarization in the Wild. ACM Multimedia 2022: 3838-3847 - [c16]Zan-Xia Jin, Mike Zheng Shou, Fang Zhou, Satoshi Tsutsui, Jingyan Qin, Xu-Cheng Yin:
From Token to Word: OCR Token Evolution via Contrastive Learning and Semantic Matching for Text-VQA. ACM Multimedia 2022: 4564-4572 - [c15]Kevin Qinghong Lin, Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining. NeurIPS 2022 - [c14]Jiawei Liu, Yan-Pei Cao, Weijia Mao, Wenqiao Zhang, David Junhao Zhang, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes. NeurIPS 2022 - [i47]Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant. CoRR abs/2203.04203 (2022) - [i46]Alex Jinpeng Wang, Yixiao Ge, Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
All in One: Exploring Unified Video-Language Pre-training. CoRR abs/2203.07303 (2022) - [i45]Guanyu Cai, Yixiao Ge, Alex Jinpeng Wang, Rui Yan, Xudong Lin, Ying Shan, Lianghua He, Xiaohu Qie, Jianping Wu, Mike Zheng Shou:
Revitalize Region Feature for Democratizing Video-Language Pre-training. CoRR abs/2203.07720 (2022) - [i44]Fan Ma, Mike Zheng Shou, Linchao Zhu, Haoqi Fan, Yilei Xu, Yi Yang, Zhicheng Yan:
Unified Transformer Tracker for Object Tracking. CoRR abs/2203.15175 (2022) - [i43]Yuxuan Wang, Difei Gao, Licheng Yu, Stan Weixian Lei, Matt Feiszli, Mike Zheng Shou:
GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval. CoRR abs/2204.00486 (2022) - [i42]Satoshi Tsutsui, Weijia Mao, Sijing Lin, Yunyi Zhu, Murong Ma, Mike Zheng Shou:
Novel View Synthesis for High-fidelity Headshot Scenes. CoRR abs/2205.15595 (2022) - [i41]Jiawei Liu, Yan-Pei Cao, Weijia Mao, Wenqiao Zhang, David Junhao Zhang, Jussi Keppo, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes. CoRR abs/2205.15723 (2022) - [i40]Jay Zhangjie Wu, David Junhao Zhang, Wynne Hsu, Mengmi Zhang, Mike Zheng Shou:
Label-Efficient Online Continual Object Detection in Streaming Video. CoRR abs/2206.00309 (2022) - [i39]Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining. CoRR abs/2206.01670 (2022) - [i38]Xudong Lin, Simran Tiwari, Shiyuan Huang, Manling Li, Mike Zheng Shou, Heng Ji, Shih-Fu Chang:
Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval. CoRR abs/2206.02082 (2022) - [i37]Beng Chin Ooi, Kian-Lee Tan, Anthony K. H. Tung, Gang Chen, Mike Zheng Shou, Xiaokui Xiao, Meihui Zhang:
Sense The Physical, Walkthrough The Virtual, Manage The Metaverse: A Data-centric Perspective. CoRR abs/2206.10326 (2022) - [i36]Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, Rong-Cheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022. CoRR abs/2207.01334 (2022) - [i35]Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rong-Cheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou:
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022. CoRR abs/2207.01622 (2022) - [i34]Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou:
Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization. CoRR abs/2208.09023 (2022) - [i33]Stan Weixian Lei, Difei Gao, Jay Zhangjie Wu, Yuxuan Wang, Wei Liu, Mengmi Zhang, Mike Zheng Shou:
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task. CoRR abs/2208.12037 (2022) - [i32]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan:
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding. CoRR abs/2209.10918 (2022) - [i31]Binjie Zhang, Shupeng Su, Yixiao Ge, Xuyuan Xu, Yexin Wang, Chun Yuan, Mike Zheng Shou, Ying Shan:
Darwinian Model Upgrades: Model Evolving with Selective Compatibility. CoRR abs/2210.06954 (2022) - [i30]Zhijian Hou, Wanjun Zhong, Lei Ji, Difei Gao, Kun Yan, Wing Kwong Chan, Chong-Wah Ngo, Zheng Shou, Nan Duan:
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022. CoRR abs/2211.08776 (2022) - [i29]Parantak Singh, You Li, Ankur Sikarwar, Weixian Lei, Daniel Gao, Morgan Bruce Talbot, Ying Sun, Mike Zheng Shou, Gabriel Kreiman, Mengmi Zhang:
Learning to Learn: How to Continuously Teach Humans and Machines. CoRR abs/2211.15470 (2022) - [i28]Yuchao Gu, Xintao Wang, Yixiao Ge, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis. CoRR abs/2212.03185 (2022) - [i27]Eric Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Wenqing Zhang, Song Bai, Jiashi Feng, Mike Zheng Shou:
PV3D: A 3D Generative Model for Portrait Video Generation. CoRR abs/2212.06384 (2022) - [i26]Difei Gao, Luowei Zhou, Lei Ji, Linchao Zhu, Yi Yang, Mike Zheng Shou:
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering. CoRR abs/2212.09522 (2022) - [i25]Alex Jinpeng Wang, Pan Zhou, Mike Zheng Shou, Shuicheng Yan:
Position-guided Text Prompt for Vision-Language Pre-training. CoRR abs/2212.09737 (2022) - [i24]Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation. CoRR abs/2212.11565 (2022) - 2021
- [c13]Junting Pan, Siyu Chen, Mike Zheng Shou, Yu Liu, Jing Shao, Hongsheng Li:
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization. CVPR 2021: 464-474 - [c12]Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou:
On Pursuit of Designing Multi-modal Transformer for Video Grounding. EMNLP (1) 2021: 9810-9823 - [c11]Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan:
Searching for Two-Stream Models in Multivariate Space for Video Recognition. ICCV 2021: 8013-8022 - [c10]Mike Zheng Shou, Stan Weixian Lei, Weiyao Wang, Deepti Ghadiyaram, Matt Feiszli:
Generic Event Boundary Detection: A Benchmark for Event Segmentation. ICCV 2021: 8055-8064 - [c9]Mang Ye, Weijian Ruan, Bo Du, Mike Zheng Shou:
Channel Augmented Joint Learning for Visible-Infrared Recognition. ICCV 2021: 13547-13556 - [c8]Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li:
Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection. ACM Multimedia 2021: 3927-3935 - [i23]Mike Zheng Shou, Deepti Ghadiyaram, Weiyao Wang, Matt Feiszli:
Generic Event Boundary Detection: A Benchmark for Event Segmentation. CoRR abs/2101.10511 (2021) - [i22]Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li:
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection. CoRR abs/2107.06592 (2021) - [i21]Meng Cao, Can Zhang, Long Chen, Mike Zheng Shou, Yuexian Zou:
Deep Motion Prior for Weakly-Supervised Temporal Action Localization. CoRR abs/2108.05607 (2021) - [i20]Xinyu Gong, Heng Wang, Zheng Shou, Matt Feiszli, Zhangyang Wang, Zhicheng Yan:
Searching for Two-Stream Models in Multivariate Space for Video Recognition. CoRR abs/2108.12957 (2021) - [i19]Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou:
On Pursuit of Designing Multi-modal Transformer for Video Grounding. CoRR abs/2109.06085 (2021) - [i18]Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina González, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jáchym Kolár, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbeláez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard A. Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik:
Ego4D: Around the World in 3, 000 Hours of Egocentric Video. CoRR abs/2110.07058 (2021) - [i17]David Junhao Zhang, Kunchang Li, Yunpeng Chen, Yali Wang, Shashwat Chandra, Yu Qiao, Luoqi Liu, Mike Zheng Shou:
MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video. CoRR abs/2111.12527 (2021) - [i16]Eric Zhongcong Xu, Zeyang Song, Chao Feng, Mang Ye, Mike Zheng Shou:
AVA-AVD: Audio-visual Speaker Diarization in the Wild. CoRR abs/2111.14448 (2021) - [i15]Stan Weixian Lei, Yuxuan Wang, Dongxing Mao, Difei Gao, Mike Zheng Shou:
AssistSR: Affordance-centric Question-driven Video Segment Retrieval. CoRR abs/2111.15050 (2021) - [i14]Alex Jinpeng Wang, Yixiao Ge, Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou:
Object-aware Video-language Pre-training for Retrieval. CoRR abs/2112.00656 (2021) - [i13]Rui Yan, Mike Zheng Shou, Yixiao Ge, Alex Jinpeng Wang, Xudong Lin, Guanyu Cai, Jinhui Tang:
Video-Text Pre-training with Learned Regions. CoRR abs/2112.01194 (2021) - 2020
- [c7]Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou:
SF-Net: Single-Frame Supervision for Temporal Action Localization. ECCV (4) 2020: 420-437 - [i12]Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou:
SF-Net: Single-Frame Supervision for Temporal Action Localization. CoRR abs/2003.06845 (2020) - [i11]Junting Pan, Siyu Chen, Zheng Shou, Jing Shao, Hongsheng Li:
Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization. CoRR abs/2006.07976 (2020)
2010 – 2019
- 2019
- [b1]Zheng Shou:
Deep Learning for Action Understanding in Video. Columbia University, USA, 2019 - [c6]Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan:
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition. CVPR 2019: 1268-1277 - [i10]Zheng Shou, Zhicheng Yan, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Xudong Lin, Shih-Fu Chang:
DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition. CoRR abs/1901.03460 (2019) - [i9]Jiawei Ma, Zheng Shou, Alireza Zareian, Hassan Mansour, Anthony Vetro, Shih-Fu Chang:
CDSA: Cross-Dimensional Self-Attention for Multivariate, Geo-tagged Time Series Imputation. CoRR abs/1905.09904 (2019) - [i8]Xudong Lin, Zheng Shou, Shih-Fu Chang:
LPAT: Learning to Predict Adaptive Threshold for Weakly-supervised Temporal Action Localization. CoRR abs/1910.11285 (2019) - 2018
- [c5]Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang:
AutoLoc: Weakly-Supervised Temporal Action Localization in Untrimmed Videos. ECCV (16) 2018: 162-179 - [c4]Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giró-i-Nieto, Shih-Fu Chang:
Online Detection of Action Start in Untrimmed, Streaming Videos. ECCV (3) 2018: 551-568 - [c3]Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang:
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks. NeurIPS 2018: 983-993 - [i7]Zheng Shou, Junting Pan, Jonathan Chan, Kazuyuki Miyazawa, Hassan Mansour, Anthony Vetro, Xavier Giró-i-Nieto, Shih-Fu Chang:
Online Action Detection in Untrimmed, Streaming Videos - Modeling and Evaluation. CoRR abs/1802.06822 (2018) - [i6]Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang:
AutoLoc: Weakly-supervised Temporal Action Localization. CoRR abs/1807.08333 (2018) - [i5]Hang Gao, Zheng Shou, Alireza Zareian, Hanwang Zhang, Shih-Fu Chang:
Low-shot Learning via Covariance-Preserving Adversarial Augmentation Networks. CoRR abs/1810.11730 (2018) - 2017
- [c2]Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang:
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos. CVPR 2017: 1417-1426 - [i4]Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, Shih-Fu Chang:
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos. CoRR abs/1703.01515 (2017) - [i3]Du Tran, Jamie Ray, Zheng Shou, Shih-Fu Chang, Manohar Paluri:
ConvNet Architecture Search for Spatiotemporal Feature Learning. CoRR abs/1708.05038 (2017) - 2016
- [c1]Zheng Shou, Dongang Wang, Shih-Fu Chang:
Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs. CVPR 2016: 1049-1058 - [i2]Zheng Shou, Dongang Wang, Shih-Fu Chang:
Action Temporal Localization in Untrimmed Videos via Multi-stage CNNs. CoRR abs/1601.02129 (2016) - [i1]Dongang Wang, Zheng Shou, Hongyi Liu, Shih-Fu Chang:
EventNet Version 1.1 Technical Report. CoRR abs/1605.07289 (2016)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-21 20:31 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint