Tinker from Thinking Machines: Low-Level API for Distributed LoRA Training

October 3, 2025 · 3 min

What Tinker does

Tinker is a Python API that lets researchers and engineers write explicit training loops locally while the platform executes them on managed distributed GPU clusters. The design goal is narrow and technical: keep full control over data, objectives, and per-step optimization, while outsourcing scheduling, fault tolerance, and multi-node orchestration to a managed service.

Low-level primitives, not wrappers

Instead of providing a high-level train() abstraction, Tinker exposes core primitives such as forward_backward, optim_step, save_state, and sample. These calls give you direct control over gradient computation, optimizer updates, checkpointing, and evaluation or inference inside custom loops. A typical flow looks like: instantiate a LoRA training client against a base model (for example, Llama-3.2-1B), loop over forward_backward and optim_step, persist state with save_state, and use a sampling client to evaluate or export weights.

LoRA-first approach and portability

Tinker focuses on Low-Rank Adaptation (LoRA) rather than full-parameter fine-tuning. The team argues in a technical note that LoRA can match full fine-tuning for many practical workloads, particularly reinforcement learning setups, when configured properly. Trained adapter weights can be downloaded and used outside Tinker with your preferred inference stack or provider, enabling portability of artifacts.

Model coverage and scaling

The service supports open-weights families such as Llama and Qwen, including very large mixture-of-experts variants like Qwen3-235B-A22B. Switching models is intentionally minimal: change a string identifier and rerun. Under the hood, runs are scheduled on Thinking Machines’ internal clusters; the LoRA approach enables shared compute pools and lower utilization overhead compared with full fine-tuning.

Tinker Cookbook and reference workflows

To reduce boilerplate while keeping the core API lean, Thinking Machines published the Tinker Cookbook under Apache-2.0. The cookbook offers ready-to-use reference loops for supervised learning and reinforcement learning, plus worked examples for RLHF (three-stage SFT → reward modeling → policy RL), math-reasoning rewards, tool-use and retrieval-augmented tasks, prompt distillation, and multi-agent setups. The repo also includes utilities for LoRA hyperparameter calculation and integrations for evaluation.

Early users, availability, and pricing

Early adopters include research groups at Princeton, Stanford, UC Berkeley, and Redwood Research. Tinker is in private beta with a waitlist; the service starts free and will move to usage-based pricing in the coming weeks. Organizations that require broad access are invited to contact the team for onboarding.

Practical considerations and my take

I appreciate that Tinker exposes low-level primitives: it keeps loss design, reward shaping, and evaluation under user control while offloading distributed execution. The LoRA-first posture is pragmatic for cost and turnaround, and the platform’s analysis suggests LoRA can be competitive with full fine-tuning when set up correctly. For production or reproducible research, I would still want transparent logs, deterministic seeds, per-step telemetry, and strong checkpoint portability. The Cookbook’s reference loops are a helpful starting point; the platform will be judged on throughput stability, checkpoint portability, and data governance features such as PII handling and audit trails.

If you want to try Tinker, sign up for the private beta waitlist on Thinking Machines’ site or contact tinker@thinkingmachines.ai for organizational access. For tutorials and examples, check the project’s GitHub and community channels.