
主讲人:沈力 京东探索研究院 研究科学家
邀请人:许洪腾,WilliamHill中文官网高瓴人工智能学经理聘副教授
时间:2023年10月18日(周三)14:00-15:30
地点:立德楼1826多功能厅
报告题目:On Efficient Training for Large-Scale Deep Learning Models
报告摘要:The field of deep learning has witnessed significant developments in recent years. Specifically, the large-scale models trained on vast amounts of data hold immense promise for practical applications, enhancing industrial productivity. However, it suffers from the unstable training process, stringent requirements of computational resources, and underexplored convergence analysis, e.g., Adam, as one of the most influential adaptive stochastic algorithms for training deep neural networks, has been pointed out to be divergent even in the simple convex setting via a few simple counterexamples. In this talk, we systematically investigate the convergence theory and application of efficient training algorithms for pretraining large-scale deep learning models from the perspective of optimization. Specifically, (i) we derive the first easy-to-check sufficient condition, which merely depends on the parameters of the base learning rate and combinations of historical second-order moments, to guarantee the global convergence of the Adam optimizer for the non-convex stochastic setting. This observation, coupled with this sufficient condition, gives much deeper interpretations on the divergence of Adam. (ii) We theoretically show that distributed Adam can be linearly accelerated by using a larger number of nodes. (iii) We propose a communication-efficient variant of distributed Adam, dubbed Efficient-Adam, by adopting bi-directional compression and error-compensation techniques to reduce the communication cost and reduce compression bias, respectively. (iv) We develop FedLADA, a novel momentum-based federated optimizer via utilizing the global gradient descent and locally adaptive amended optimizer, to tackle the client drifts exacerbated by local over-fitting with the local adaptive optimizer in federated learning.
主讲人: 沈力
主讲人简介:Li Shen is currently a research scientist at JD Explore Academy, Beijing, China. Previously, he was a senior researcher at Tencent AI Lab. He received his bachelor's degree and Ph.D. from the School of Mathematics, South China University of Technology. His research interests include theory and algorithms for nonsmooth convex and nonconvex optimization, and their applications in statistical machine learning, deep learning and reinforcement learning. He has published more than 60 papers in peer-reviewed top-tier journal papers (JMLR, IEEE TPAMI, IJCV, IEEE TSP, IEEE TIP, etc.) and conference papers (ICML, NeurIPS, ICLR, CVPR, ICCV, etc.). He has also served as the senior program committee for AAAI 2022, AAAI 2024 and area chair for ICPR 2022, ICPR 2024, ICLR 2024.
检测到您当前使用浏览器版本过于老旧,会导致无法正常浏览网站;请您使用电脑里的其他浏览器如:360、QQ、搜狗浏览器的速模式浏览,或者使用谷歌、火狐等浏览器。
 下载Firefox
        下载Firefox