新葡萄88805官网
网站首页

新葡萄88805官网“博约学术论坛”-阎栋-第435期

来源:周金健 教授 作者:阎栋 博士 (百川智能) 发布时间:2024-03-21

邀请人: 周金健 教授

报告人: 阎栋 博士 (百川智能)

时间: 2024-03-21

地点: 良乡校区,物理实验中心229会议室

主讲人简介:

­新葡萄88805官网博约学术论坛系列报告

435

题目:From Scale to Interaction - The LLM Journey

报告人:阎栋 博士 (百川智能)

间:2024-03-21(周四)上午10 : 00

点:良乡校区物理实验中心229会议室

摘要:

GPT系列为代表的大语言模型(Large Language Model, LLM)正在深刻的改变人类社会的运行方式。 本次报告尝试从ScaleInteraction两个方面讨论LLM的两个训练阶段:Pretrain(预训练)& Alignment(对齐)预训练阶段以AGI现阶段唯一能够scale的第一性原理:next token prediction(通过对下一个标记的预测来进行规模化)入手,介绍技术发展的脉络。AlignmentExploration & Exploitation视角入手,介绍如何使用Human Feedback把模型向人类偏好对齐。

简历:

阎栋,博士毕业于清华大学计算机系。历任Intel中国研究员、清华大学计算机系博士后、启元实验室机器智能基础前沿决策方向负责人。主要从事决策算法和系统方面的研究。在算法方面,提出了通过奖励分配机制连接无模型和基于模型的强化学习算法的求解框架;在系统方面,作为架构师设计的强化学习编程框架天授,在Github获得超过6.6k星标/1k二次开发,相关文章发表于JMLR。所获奖励:ViZDoom挑战赛2017亚军/2018冠军(队长)、腾讯开悟王者荣耀挑战赛2022/2023冠军(指导老师)、2023天行杯智能空战超视距科目第9(共306支队伍,负责人)。现为百川智能强化学习负责人。

联系方式jjzhou@bit.edu.cn

邀请人: 周金教授

址:http:/

承办单位:物理学院先进光电量子结构设计与测量教育部重点实验室

*TitleFrom Scale to Interaction - The LLM Journey

*ReporterDr. Dong Yan, Head of Reinforcement Learning at BaiChuan Intelligence

*TimeMar. 21th, 2024 (Thursday) 10:00 am

*PlaceRoom 229 Physics Experiment Center, Liangxiang Campus

*Contact Person: Prof. Jin-Jian Zhou, jjzhou@bit.edu.cn

*Abstract:

Large Language Models (LLM), represented by the GPT series, are profoundly changing the way human society operates. This report attempts to discuss the two training phases of LLMs - Pretrain and Alignment - from the aspects of Scale and Interaction. The pretraining phase starts with the only scalable first principle approach at the current stage of AGI: next token prediction (scaling by predicting the next token), introducing the development of technology. Alignment starts from the perspective of Exploration & Exploitation, introducing how to use Human Feedback to align the model with human preferences.

*Profile

Dong Yan graduated with a Ph.D. from the Department of Computer Science at Tsinghua University. He has held positions as a researcher at Intel China, a postdoctoral fellow in the Computer Science Department at Tsinghua University, and the head of the Advanced Decision-Making group in Qi Yuan Laboratory, focusing on machine intelligence. His research primarily involves decision-making algorithms and systems. In terms of algorithms, he proposed a solution framework that connects model-free and model-based reinforcement learning algorithms through a reward distribution mechanism. In terms of systems, he designed the reinforcement learning programming framework "Tian Shou," which has garnered over 6.6k stars and 1k forks on GitHub, with related articles published in JMLR. His awards include runner-up in the 2017 ViZDoom challenge and champion in 2018 (as team leader), champion of Tencent's "Enlightenment" Honor of Kings challenge in 2022/2023 (as a mentoring teacher), and 9th place (out of 306 teams, as team leader) in the 2023 "Tian Xing Cup" intelligent aerial combat beyond visual range category. He is currently the head of Reinforcement Learning at BaiChuan Intelligence.