arbdwj

Jun 10, 2026 Tracing Eval-Awareness Emergence Through Training of OLMo 3 ↗
Tracing how eval-awareness emerges across the training stages of OLMo 3.
Jul 23, 2025 Scaling Laws for LLM-Based Data Compression ↗
Investigating how large language models compress text, image, and speech with universal power laws
Oct 27, 2024 Experiments with the Platonic Representation Hypothesis ↗
Investigating the validity of PRH in OOD setting
Aug 28, 2024 Understanding Hidden Computations in Chain-of-Thought Reasoning ↗
chain-of-thought is decryptable
Mar 24, 2023 Adversarial training against goal misgeneralization is ELK-hard ↗
can goal-misgeneralization be formulated as an instance of ELK?
Oct 16, 2021 The AGI needs to be honest ↗
building truthful-ai is hard