A sparse attention variant that anchors long-context reasoning to a handful of learned beacon tokens, cutting memory by 40% while preserving recall on needle-in-haystack benchmarks.
Background
This work emerged from our ongoing effort to push open source language model capabilities forward. We share methods, negative results, and reproducible artifacts so the broader community can build on what we learn.
Method
We approached the problem empirically, running controlled ablations across model scales and reporting confidence intervals rather than cherry-picked single runs. Full configs and seeds are published alongside this post.
Results
The headline finding is encouraging but bounded: improvements hold within the regime we tested, and we are explicit about where they do not. See the appendix for the complete evaluation suite.
What’s next
We are releasing the weights and the evaluation harness. If you reproduce, extend, or refute these results, we want to hear about it.