-
Jun 10, 2026
Tracing Eval-Awareness Emergence Through Training of OLMo 3 ↗
Tracing how eval-awareness emerges across the training stages of OLMo 3. -
Jul 23, 2025
Scaling Laws for LLM-Based Data Compression ↗
Investigating how large language models compress text, image, and speech with universal power laws -
Oct 27, 2024
Experiments with the Platonic Representation Hypothesis ↗
Investigating the validity of PRH in OOD setting -
Aug 28, 2024
Understanding Hidden Computations in Chain-of-Thought Reasoning ↗
chain-of-thought is decryptable -
Mar 24, 2023
Adversarial training against goal misgeneralization is ELK-hard ↗
can goal-misgeneralization be formulated as an instance of ELK? -
Oct 16, 2021
The AGI needs to be honest ↗
building truthful-ai is hard