How AI Learns from Human Preferences

How AI Learns from Human Preferences
2024-8-26 05:10:43 Author: hackernoon.com(查看原文) 阅读量:3 收藏

Authors:

(1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier;

(2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier;

(3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier;

(4) Stefano Ermon, CZ Biohub;

(5) Christopher D. Manning, Stanford University;

(6) Chelsea Finn, Stanford University.

We review the RLHF pipeline in Ziegler et al. (and later [38, 1, 26]). It usually includes three phases: 1) supervised fine-tuning (SFT); 2) preference sampling and reward learning and 3) RL optimization.

文章来源: https://hackernoon.com/how-ai-learns-from-human-preferences?source=rss
如有侵权请联系:admin#unsafe.sh