Deriving the Optimum of the KL-Constrained Reward Maximization Objective

Deriving the Optimum of the KL-Constrained Reward Maximization Objective
2024-8-26 05:13:38 Author: hackernoon.com(查看原文) 阅读量:6 收藏

Authors:

(1) Rafael Rafailo, Stanford University and Equal contribution; more junior authors listed earlier;

(2) Archit Sharma, Stanford University and Equal contribution; more junior authors listed earlier;

(3) Eric Mitchel, Stanford University and Equal contribution; more junior authors listed earlier;

(4) Stefano Ermon, CZ Biohub;

(5) Christopher D. Manning, Stanford University;

(6) Chelsea Finn, Stanford University.

In this appendix, we will derive Eq. 4. Analogously to Eq. 3, we optimize the following objective:

文章来源: https://hackernoon.com/deriving-the-optimum-of-the-kl-constrained-reward-maximization-objective?source=rss
如有侵权请联系:admin#unsafe.sh