Blog

The latest Hermes release improves tool-call reliability, reduces refusal rates on benign prompts, and ships with a 256k context window out of the box.

Why we publish not just our models but the full evaluation harness, seeds, and raw outputs behind every benchmark number we report.

NousCoder-14B by emozilla

Introducing our code-specialized open model, trained on a curated corpus of permissively-licensed repositories with a focus on multi-file reasoning and tool use.

We argue that intermediate reasoning should be stored, versioned, and trained on directly — and show empirical gains from doing so.

How we generate synthetic training data without baking in the biases of a single teacher model, using an ensemble-and-debate pipeline with provenance tracking.

Psyche moves from testnet to a permissionless coordination layer, letting anyone contribute compute to distributed pretraining runs with verifiable gradient attestation.

A sparse attention variant that anchors long-context reasoning to a handful of learned beacon tokens, cutting memory by 40% while preserving recall on needle-in-haystack benchmarks.

Atropos is our framework for defining, sandboxing, and scaling reinforcement learning environments for language model agents, now integrated with the Tinker training loop.

We trace how fine-tuning reshapes internal representations by probing activation manifolds before and after alignment, revealing that behavioral shifts often localize to a surprisingly small subspace.