The idea here is to investigate what distribution trnsformers learn when learning from more concrete “lower abstract” sequences that are generated from a “higher abstract” grammar. The idea here is that with benchmarks like ARC AGI there’s the idea that the current state of LLMS are inherently limited in their ability to be able to actually do “higher abstraction thinking”.
In this experiment we:
- Define a high abtraction grammar with actions “REPEAT”, “ALTERNATE”, and “REVERSE” and a small vocab “ABC”
- Enumerate all sequences over a certain length
- Train a transformer over the concrete sequences, and compare the distribution that the transformer learns, and compare this to the original higher level vocab.
1. High Abstraction Grammar¶
The grammar would be defined as:
/* Non-terminals */
EXPR ::= REPEAT | ALTERNATE | REVERSE | ATOM
/* Basic atoms */
ATOM ::= "A" | "B" | "C"
/* Repeat operation with count */
REPEAT ::= "REPEAT" "(" NUMBER "," ATOM ")"
NUMBER ::= [1-9][0-9]*
/* Alternate between two atoms N times */
ALTERNATE ::= "ALTERNATE" "(" NUMBER "," ATOM "," ATOM ")"
/* Reverse a sequence of atoms */
REVERSE ::= "REVERSE" "(" SEQUENCE ")"
SEQUENCE ::= ATOM | ATOM "," SEQUENCE