In this section, we first evaluate our approach on ARC, comparing it to existing approaches in terms of success rates, efficiency, model complexity, and model naturalness. We then evaluate the generality of our approach beyond ARC by applying it to a different domain, spreadsheets, where inputs and outputs are rows of strings. Our experiments were run with single-thread implementations on Fedora 32, Intel Core i7x12 with 16GB memory. We used one run per task set as there is no randomness involved.