The latest Hermes release improves tool-call reliability, reduces refusal rates on benign prompts, and ships with a 256k context window out of the box.
Blog
Why we publish not just our models but the full evaluation harness, seeds, and raw outputs behind every benchmark number we report.
Introducing our code-specialized open model, trained on a curated corpus of permissively-licensed repositories with a focus on multi-file reasoning and tool use.
We argue that intermediate reasoning should be stored, versioned, and trained on directly — and show empirical gains from doing so.
Practical lessons from sharding a mixture-of-experts model across a churn-prone distributed cluster: routing collapse, the all-to-all bottleneck, and how we fixed both.
A retrospective on our first cross-continental training run coordinated entirely over commodity internet links using the Psyche protocol.
How we generate synthetic training data without baking in the biases of a single teacher model, using an ensemble-and-debate pipeline with provenance tracking.
By packing multiple weakly-correlated training signals into superposed token streams, we squeeze more gradient information per FLOP during the early phase of pretraining.
Psyche moves from testnet to a permissionless coordination layer, letting anyone contribute compute to distributed pretraining runs with verifiable gradient attestation.
A sparse attention variant that anchors long-context reasoning to a handful of learned beacon tokens, cutting memory by 40% while preserving recall on needle-in-haystack benchmarks.
Atropos is our framework for defining, sandboxing, and scaling reinforcement learning environments for language model agents, now integrated with the Tinker training loop.
We trace how fine-tuning reshapes internal representations by probing activation manifolds before and after alignment, revealing that behavioral shifts often localize to a surprisingly small subspace.