Comparative Analysis of Prompt Optimization on BBH Tasks

Table 8 and 9 show the instructions found by prompt optimization. A comparison of their accuracies with baselines “Let’s think step by step.” (Kojima et al., 2022), “Let’s work this out in a step by step way to be sure we have the right answer.” (Zhou et al., 2022b), and the empty string is in Table 7; a visualization is in Section 5.2 Figure 5.

Table 8: BBH task-wise instructions found by prompt optimization with the PaLM 2-L scorer and the PaLM 2-L-IT optimizer. The optimization starts from the empty string.

Table 9: BBH task-wise instructions found by prompt optimization with the text-bison scorer and the PaLM 2-L-IT optimizer. The optimization starts from the empty string.

E.2 G P T-3.5-T U R B O AS OPTIMIZER, OPTIMIZATION STARTING FROM THE EMPTY STRING

Table 11, 12 and 13 show the instructions found by prompt optimization. Their accuracies are listed in Table 10. Figure 25 visualizes the difference between their accuracies and those of the baselines “Let’s think step by step.” and the empty string. The optimizations find instructions better than the empty starting point, and most of the found instructions are better than “Let’s think step by step”.

One caveat in the A_begin instructions (Table 11) is that a lot of the found instructions are imperative or interrogative sentences that are more suitable to be put into “Q:” rather than “A:”, like “Solve the sequence by properly closing the parentheses.” for dyck_languages and “Which movie option from the given choices ...?” for movie_recommendation. Such styles appear more often here than the PaLM 2-L-IT optimizer results (Table 8), showing PaLM 2-L-IT understands the needed style better. In Section E.3, we show the A_begin optimization results with the non-empty starting point “Let’s solve the problem.”. Most results there are declarative sentences – more suitable for A_begin.

Figure 25: On 23 BBH tasks, the accuracy differences among instructions found by prompt optimization (with the gpt-3.5 turbo optimizer), “Let’s think step by step.”, and the empty string (optimization starting point).

Table 10: Accuracies on BBH tasks with the gpt-3.5-turbo optimizer that starts from the empty string. The PaLM 2-L scores are from A_begin (left) instructions; the text-bison scores include Q_begin (left) and Q_end (right) instructions.

Table 11: BBH task-wise instructions found by prompt optimization with the PaLM 2-L scorer and the gpt-3.5-turbo optimizer. The optimizations start from the empty string.

Table 12: BBH task-wise Q_begin instructions found by prompt optimization with the text-bison scorer and the gpt-3.5-turbo optimizer. The optimizations start from the empty string.

Table 13: BBH task-wise Q_end instructions found by prompt optimization with the text-bison scorer and the gpt-3.5-turbo optimizer. The optimizations start from the empty string.

E.3 PALM 2-L AS SCORER, G P T-3.5-T U R B O AS OPTIMIZER, OPTIMIZATION STARTING FROM “LET’S SOLVE THE PROBLEM.”

Figure 26 and Table 14 compare the accuracies of found instructions vs “Let’s solve the problem.”, “Let’s think step by step.”, and the instructions in Table 11. Table 15 details the found instructions.

The “Let’s” pattern appears more often in the found instructions because of the starting points, and the instructions are more often declarative that are more suitable for A_begin, even if some are semantically far from “Let’s solve the problem”. In fact, “Let’s” was adopted by Zhou et al. (2022b) as a fixed pattern in generated prompts, possibly because of the same reason.

Table 14: Accuracies on BBH tasks with the PaLM 2-L scorer and the gpt-3.5-turbo optimizer that starts from “Let’s solve the problem”. The scores are from A_begin instructions.

Table 15: BBH task-wise Q_begin instructions found by prompt optimization with the PaLM 2-L scorer and the gpt-3.5-turbo optimizer. The optimizations start from “Let’s solve the problem”.

文章来源: https://hackernoon.com/comparative-analysis-of-prompt-optimization-on-bbh-tasks?source=rss
如有侵权请联系:admin#unsafe.sh