Introducing our code-specialized open model, trained on a curated corpus of permissively-licensed repositories with a focus on multi-file reasoning and tool use.
Background
This work emerged from our ongoing effort to push open source language model capabilities forward. We share methods, negative results, and reproducible artifacts so the broader community can build on what we learn.
Method
We approached the problem empirically, running controlled ablations across model scales and reporting confidence intervals rather than cherry-picked single runs. Full configs and seeds are published alongside this post.
Results
The headline finding is encouraging but bounded: improvements hold within the regime we tested, and we are explicit about where they do not. See the appendix for the complete evaluation suite.
What’s next
We are releasing the weights and the evaluation harness. If you reproduce, extend, or refute these results, we want to hear about it.