Keynote Talks
Keynote 1: Yang Baosong

Title: Qwen Multilingual: From Foundation Models to Translation Applications
Abstract: Multilingual and cross-lingual capabilities are not only core competitive advantages of large language models, but also a key pathway toward Artificial General Intelligence (AGI) and achieving “AI equity.” In this talk, using the Qwen series of large models as the practical vehicle, we delve into several frontier and key topics in the research and optimization of multilingual large language models: mechanisms of cross-lingual knowledge transfer, interpretability, building multilingual “long reasoning” abilities, and cross-modal semantic alignment. Meanwhile, we focus on translation as a core application scenario, discussing how continued pretraining and reinforcement learning can improve translation quality, as well as extensions to audio–video multimodal translation.
Bio: Baosong Yang, Ph.D., is a scientist at Alibaba’s Tongyi Lab. As the algorithm lead for the multilingual track, he has developed foundation large language models such as Qwen and Qwen-Omni at the international leading level, and led the team to build key real-world applications including Qwen-MT and Qwen-LiveTrans, driving large-scale industrial deployment and productization of multilingual AI technologies. His research focuses on multilingual AI, covering multilingual large models, cross-lingual knowledge transfer, multilingual multimodal understanding and generation, machine translation, and low-resource language modeling. He has published over 60 papers at top conferences and journals such as NeurIPS, ICLR, and ACL, with more than 5,000 citations, and has served multiple times as reviewer, program committee member, and area chair.
Keynote 2: Liwei Wang

Title: Building a Dynamic Evaluation Framework for Multimodal Video–Language Understanding
Abstract: Video is a rich multimodal data form that encompasses visual, linguistic, and other dimensions of information. Video understanding therefore requires integrated cross-modal comprehension. This talk will introduce the latest research progress from the Language and Vision Lab (LaVi Lab) at The Chinese University of Hong Kong on video–language understanding: we explore a dynamic evaluation framework for video–language understanding that efficiently and automatically constructs question–answer pairs from video inputs, enabling automatic assessment of video content understanding. Meanwhile, on the language understanding side, we extend the CLEVA Chinese LLM evaluation platform and propose CLEVA-Cantonese, a benchmark for Cantonese understanding.
Bio: Professor Liwei Wang is an assistant professor and Ph.D. advisor in the Department of Computer Science and Engineering at The Chinese University of Hong Kong. He received his Ph.D. from the Department of Computer Science at the University of Illinois at Urbana–Champaign (UIUC). His LaVi Lab focuses on multimodal AI research. He has served multiple times as an area chair for top international AI conferences (e.g., CVPR, ACL, ICML, NeurIPS) and is currently on the editorial board of the top-tier journal IJCV (a CCF-A core journal). He has published over fifty papers at top conferences in natural language processing and computer vision, with more than 11,000 citations on Google Scholar.