
Joint Workshop on Advances in AI
AIGC—From Beingless to Being
This year, we have decided to focus on dialog Generation model and multimedia content generation given the tremendous development of AI content generation in the past year, the theme of this workshop is AIGC : From Beingless to Being. We aim to bring together scientists, scholars and practitioners to discuss some of the most pressing issues with today.
Host by
Gaoling School of Artificial Intelligence, Renmin University, China
Sunday, March 12, 2023

Scan the QR code for sign up
AGENDA (tentative)
| 09:00-09:10 | Opening Remark | Prof. Ji-Rong WEN,Executive Dean of Gaoling School of Artificial Intelligence (GSAI), Renmin University of China | 
| Moderator | Prof. Hao SUN, Tenured Associate Professor, GSAI | |
| Session I | Chat and Language | |
| 09:15-09:45 | Title:A Coarse Analysis of ChatGPT Abstract:On December 1st last year, OpenAI launched ChatGPT, a new generation of conversational AI tool. ChatGPT has gained a lot of attention and buzz across the Internet for its amazing language understanding, generation and knowledge reasoning capabilities. The successful performance of ChatGPT has shown a possible path to solve Natural Language Processing, the core problem of cognitive intelligence. ChatGPT is considered a solid step towards the Artificial General Intelligence (AGI). ChatGPT will pose a huge challenge to applications such as search engines, and will even replace many people's jobs and disrupt many fields and industries. So, what scientific problem does ChatGPT solve, how does it solve that problem, and what other pressing problems will be solved in the future? We hope that the coarse analysis in this talk will partially answer these questions. Prof. Wanxiang CHE, Professor, School of Computer Science and Technology, Harbin Institute of Technology | |
| 5mins | Q&A | |
| 09:45-10:15 | Title: Discussion on the Generalization of Large Language Model Abstract:Large language models such as ChatGPT demonstrate powerful generality, as they can understand various user intents and complete multiple tasks. Their generality is mainly driven by the foundation model, instruction learning, and prompt learning. However, the generality of large models is still limited by the type, scale, and quality of training data, making it difficult to achieve a more comprehensive generality. In this report, We will first discuss the performance of large language models in terms of their generality, then introduce current enhancement techniques to expand the generality, and finally briefly share our attempts and explorations in this regard. 
 Prof. Jiajun ZHANG, Professor, National Laboratory of Pattern Recognition, Institute of Automation Chinese Academy of Sciences | |
| 5mins | Q&A | |
| 10:20-10:50 | Title:Learning and Thinking on Conversational AI between Human and Computer Abstract:Along with the emergence of ChatGPT, conversational AI has attracted unprecedented attention from both academia and industry. People now witness that the technology behind conversational AI is gradually accumulating, and resulting in fundamental changes. With deeper insights from researchers, it is possible to expect conversational AI in reality, not just Sci-Fi movies. This talk covers recent studies on topic modeling and data augmentation in dialogue systems, and also presents a brief introduction to ChatGPT as well as new challenges of conversational AI in the future. Prof. Rui YAN, Tenured Associate Professor, GSAI | |
| 5mins | Q&A | |
| 10:55-11:25 | Title:The Changing Focus of Large Language Model Research Abstract:In this talk, we will introduce changes in the research focus on natural language processing and large language models with technological development. We divide technological development into several stages, briefly introduce the research focus and main approaches of each stage, and focus on analyzing the important research directions in the current period of large language models. Dr. Junlin ZHANG, Researcher, Sina Weibo | |
| 5mins | Q&A | |
| 11:30-12:10 | Panel | |
| 12:30-13:30 | Luncheon | |
| Session II | Multimedia Content Generation | |
| Moderator | Prof. Di Hu, Tenure-Track assistant professor, GSAI | |
| 13:30-14:00 | Title: Breaking the Curse of Dimensionality in Generative Modeling: A Homotopic Approach Abstract:Generative modeling for high-dimensional data, such as images and audio, is extremely challenging due to the curse of dimensionality. To overcome this difficulty, I introduce a homotopic approach inspired by numerical equation solving, which involves designing a homotopy of probability distributions that smoothly progresses from simple noise distribution to complex data distribution. I will present two families of approaches that rely on such homotopies: score-based diffusion models and consistency models. Both approaches use a differential equation to convert data to noise and learn to estimate the time reversal with deep neural networks. These models generate high-quality samples by following the homotopy from tractable distributions to the original data distribution, achieving state-of-the-art performance in many generative modeling benchmarks. They both allow for flexible neural network architectures and zero-shot image editing and exhibit complementary trade-offs. Prof. Yang SONG, Tenure-Track assistant professor, Caltech Member of Technical Staff, OpenAI | |
| 5mins | Q&A | |
| 14:05-14:35 | Title:AIGC Technology and Application in XR Abstract: In the past year, both XR and AIGC have developed rapidly. The development of XR is mainly due to the exploration of VR devices by many large companies, such as Quest by Meta, Pico by Bytedance, etc. Based on the popularity of these devices, XR has many application scenarios in the fields of live broadcasting, digital human, 3D content creation, etc. The explosion of AIGC technology will greatly improve the development of these topics. This talk mainly includes: 1 Introduction to application scenarios in XR 2 AIGC technology in these scenarios Shilei WEN, Researcher, Bytedance | |
| 5mins | Q&A | |
| 14:40-15:10 | Title:Beyond Temporal Activity Localization (TAL) in Long-Form Video Abstract: Temporal activity localization (TAL) is an important task in video understanding especially for long-form videos. While much research has advanced the methodology used in this task, I will give an overview of the remaining challenges that hinder further performance. Moreover, I will elaborate on a relatively new task for the long-form video domain that goes beyond TAL, namely video language grounding. Prof. Bernard Ghanem,Deputy Director, KAUST AI Initiative | |
| 5mins | Q&A | |
| 15:15-15:45 | Title:AI for Music Abstract:In this talk, I will introduce our two recent works about AIGC for music. 1) Given a video, a controllable music transformer model was proposed to generate music according to the user-specified music genre and instruments. 2) We further introduced music theory knowledge into the music generation model, collected the first video-music symbol data set for training, and proposed a retrieval-based metric to evaluate the matching degree between the generated music and video. Prof. Si LIU, Professor, Beihang University | |
| 5mins | Q&A | |
| 15:50-16:20 | Title: Imaginative AI on the EDGE: transforming species discovery, content creation, self-driving cars, emotional well-being, and more Abstract: Most existing AI learning methods can be categorized into supervised, semi-supervised, and unsupervised methods. These approaches rely on defining empirical risks or losses on the provided labeled and/or unlabeled data. Beyond extracting learning signals from labeled/unlabeled training data, we will reflect in this talk on a class of methods that can learn beyond the vocabulary that was trained on and can compose or create novel concepts. Specifically, we address the question of how these AI skills may assist species discovery, content creation, self-driving cars, emotional health, and more. We refer to this class of techniques as imagination AI methods, and we will dive into how we developed several approaches to build machine learning methods that can See, Create, Drive, and Feel. See: recognize unseen visual concepts by imaginative learning signals and how that may extend in a continual setting where seen and unseen classes change dynamically. Create: generate novel art and fashion by creativity losses. Drive: improve trajectory forecasting for autonomous driving by modeling hallucinative driving intents. Feel: generate emotional descriptions of visual art that are metaphoric and go beyond grounded descriptions. Feel: generate emotional descriptions of visual art that are metaphoric and go beyond grounded descriptions, and how to build these AI systems to be more inclusive of multiple cultures. . I will also conclude by pointing out future directions where imaginative AI may help develop better assistive technology for multicultural and more inclusive metaverse, emotional health, and drug discovery. Prof. Mohamed Elhoseiny, Assistant Professor, Electrical and Mathematical Science and Engineering Division, KAUST | |
| 5mins | Q&A | |
| 16:25-16:55 | Title:Fast, Controllable and Multi-modal Diffusion Models Toward Generative AGI Abstract: Diffusion models have shown promise in various generation tasks including image generation, text-to-image generation, 3D scene generation and speech generation. Such models provide fundamental tools to build generative AGI, which is expected to understand and generate multi-modal data via a user-friendly interface like ChatGPT. Toward building such generative AI systems, this talk presents recent advances in fast and controllable inference algorithms, as well as unified backbone and probabilistic framework for multi-modal diffusion models. Prof. Chongxuan LI, Tenure-Track assistant professor, GSAI | |
| 5mins | Q&A | |
| 17:00-17:30 | Title: Modeling and Animating 2D Virtual Humans Abstract: In recent years, 2D virtual humans have provided services in various scenarios including news reporting, teaching and entertainment. This talk gives an introduction to our recent advances in modeling and animating 2D virtual humans. For changing the appearance of virtual humans, I will introduce StyleSwap (ECCV 2022), which swaps and blends faces with high-fidelity. For face reenactment, I will introduce VPNQ (AAAI 2023) which drives virtual humans with videos. For audio-driven lip sync, I will introduce AV-CAT (Siggraph Asia 2022) and StyleSync (CVPR 2023), which seamlessly drive virtual humans to speak with audio. Dr. Hang ZHOU,Researcher, Baidu | |
| 5mins | Q&A | |
| 17:35-18:15 | Panel | 
Speakers
Speakers from Gaoling School of Artificial Intelligence, Renmin University, China
See https://gsai.ruc.edu.cn/addons/teacher/enhome.html

Prof. Wanxiang Che is a professor of School of Computer Science and Technology, Harbin Institute of Technology. He is the vice director of Research Center for Social Computing and Information Retrieval. He is a young scholar of “Heilongjiang Scholar” and a visiting scholar of Stanford University. He is currently the vice director and secretary-general of the Computational Linguistics Professional Committee of the Chinese Information Society of China; Officer and Secretary of AACL Executive Board; a senior member of the China Computer Federation (CCF).

Prof. Jiajun Zhang is a professor of National Laboratory of Pattern Recognition, Institute of Automation Chinese Academy of Sciences. His main research direction is machine translation and natural language processing. He has published more than 80 CCF-A/B papers, 2 academic monographs, 1 translation, 6 best/excellent paper awards, 3 outstanding SPC and reviewers of IJCAI, ACL and NAACL . He also won the first prize of the Qian Weichang Chinese Information Processing Science and Technology Award of the Chinese Information Processing Society of China, the first prize of the Youth Innovation Award, and the first prize of the 2020 Beijing Science and Technology Award. He serves as the (senior) area chair of ACL/EMNLP/COLING, and serves on the editorial boards of journals such as IEEE/ACM T-ASLP, ACM TALLIP etc.

Dr. Junlin Zhang is a researcher in Sina Weibo. He received the Ph.D. degree from the Institute of Software, Chinese Academy of Science. He was as a Senior Expert in SinaWeibo,Baidu, and Alibaba in past ten years. His current research interests include natural language processing, deep learning, and recommender systems.

Prof. Yang Song is the Tenure-Track assistant professor in Caltech. He is also a researcher in OpenAI. As a researcher in machine learning, he focused on developing scalable methods for modeling, analyzing and generating complex, high-dimensional data. His interest spans multiple areas, including generative modeling, representation learning, probabilistic inference, AI safety, and AI for science. His ultimate goal is to address problems that have wide-ranging significance, develop methods that are both accessible and effective, and build intelligent systems that can improve human lives.

Shilei Wen is a researcher in Bytedance. He received the B.E. and M.E. degrees from the Huazhong University of Science and Technology, Wuhan, China, in 2007 and 2011, respectively. His research interests include image or video understanding, image or video retrieval, video surveillance, computer vision, and machine learning.

Prof. Bernard Ghanem is currently a Professor of ECE and CS, a theme leader at the Visual Computing Center (VCC), and the Deputy Director of the AI Initiative at King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia. His research interests lie in computer vision and machine learning with emphasis on topics in video understanding, 3D recognition, and theoretical foundations of deep learning. He received his Bachelor’s degree from the American University of Beirut (AUB) in 2005 and his MS/PhD from the University of Illinois at Urbana-Champaign (UIUC) in 2010. His work has received several awards and honors, including several Best Paper Awards for workshops in CVPR/ECCV/ICCV, a Google Faculty Research Award in 2015 (1st in MENA for Machine Perception), and a Abdul Hameed Shoman Arab Researcher Award for Big Data and Machine Learning in 2020. He has co-authored more than 150 papers in his field. He serves as an Associate Editor for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) and has served as Area Chair (AC) for the main computer vision and AI/ML conferences including CVPR, ICCV, ECCV, NeurIPS, ICLR, and AAAI. Visit ivul.kaust.edu.sa and www.bernardghanem.com for more details.

Prof. Si Liu, who is the leader of the CoLab, is a professor in Beihang University. She was a visiting professor in Microsoft Research Asia. She used to work with Prof. Shuicheng Yan in National University of Singapore. She obtained Ph.D. degree from Institute of Automation, Chinese Academy of Sciences (CASIA), under the supervision of Prof. Hanqing Lu. She obtained Bachelor degree from Advanced Class of Beijing Institute of Technology (BIT) .

Prof. Mohamed Elhoseiny is an assistant professor of Computer Science at KAUST. Previously, he was a visiting Faculty at Stanford Computer Science department (2019-2020), Visiting Faculty at Baidu Research (2019), Postdoc researcher at Facebook AI Research (2016-2019). His primary research interest is in computer vision and especially in efficient multimodal learning with limited data in zero/few-shot learning and Vision & Language. He is also interested in Affective AI and especially to understand and generate novel art and fashion. He received an NSF Fellowship in 2014 and the Doctoral Consortium award at CVPR’16.

Dr.Hang zhou is currently a senior researcher at Department of Computer Vision Technology (VIS), Baidu Inc. He obtained his Ph.D. degree fromMultimedia Lab (MMLab), The Chinese University of Hong Kong in 2021, supervised by Prof. Xiaogang WangandProf. Ziwei Liu. He received his bachelor's degree in Acoustics from the School of Physics atNanjing University (NJU)in 2017. His research interests include deep learning and its applications on audio-visual learning and face generation for virtual human.