<RETURN_TO_BASE

SkyRL tx v0.1.0 Lets Teams Run Tinker-Compatible RL Locally on GPU Clusters

'SkyRL tx v0.1.0 brings a Tinker-compatible training and inference engine to local GPU clusters, adding end-to-end RL support, faster sampling and Postgres support.'

SkyRL tx v0.1.0 provides a way for AI teams to run Tinker-style reinforcement learning on their own GPU infrastructure with a single unified engine and the same minimal Tinker API used by managed services.

What SkyRL tx aims to do

SkyRL tx is presented as a unified training and inference engine that implements the Tinker API and can be deployed on local hardware. The project targets developers who want to avoid a hosted-only workflow while keeping the familiar low-level Tinker primitives for implementing supervised or reinforcement learning loops in plain Python.

Tinker API in brief

Tinker (from Thinking Machines) exposes four core primitives:

  • forward_backward: performs a forward pass and a backward pass and accumulates gradients.
  • optim_step: updates model weights using accumulated gradients.
  • sample: generates tokens for interaction, evaluation, or RL actions.
  • save_state: writes checkpoints to resume training.

Tinker intentionally exposes low-level primitives (rather than a full, task-specific fine-tuning abstraction) so users can implement custom loops while the service handles GPU scheduling and distributed execution.

How SkyRL tx fits in the SkyRL stack

SkyRL is a full-stack reinforcement learning library for large language models, with components such as skyrl-agent (long-horizon agents), skyrl-train (training), and skyrl-gym (tool-use environments for math, coding, search, SQL). SkyRL tx is an experimental cross-platform component that exposes a local Tinker-like REST API for post-training workflows. It functions as the system layer that connects RL logic, environments, and training code to concrete GPU resources through the Tinker interface.

Architecture: an inference engine that also trains

SkyRL tx is described as an inference engine that supports backward passes as well. Its main components are:

  • REST API server: processes incoming requests from different users.
  • Database: tracks metadata about models, checkpoints, requests, and futures; acts as a job queue. The current implementation uses SQLite behind an interface that can support Postgres as well.
  • Engine: schedules and batches requests across users. Each engine instance serves a single base model and can attach many LoRA adapters.
  • Worker: executes forward and backward passes and holds model definitions and optimizer states. Multiple workers will enable more advanced multi-node sharding in future versions.

What v0.1.0 adds

The v0.1.0 release focuses on reinforcement learning support and performance improvements. Notable changes include:

  • Much faster sampling thanks to jitting, proper batching and sharding in the engine.
  • Support for per-request sampling parameters, per-request seeds and stop tokens — useful when multiple experiments share a base model.
  • Fixes so that an RL loop can run properly through the engine.
  • Gradient checkpointing support and micro-batching for sampling.
  • Postgres support added as an alternative to SQLite.

Running RL end to end on 8 H100 GPUs

The release includes a recipe for running reinforcement learning end to end on a cluster with 8 H100 GPUs. First, clone the SkyRL repository and start the engine from the skyrl-tx folder:

uv run --extra gpu --extra tinker -m tx.tinker.api \
  --base-model Qwen/Qwen3-4B \
  --max-lora-adapters 3 \
  --max-lora-rank 1 \
  --tensor-parallel-size 8 \
  --train-micro-batch-size 8 > out.log

Then clone the Tinker Cookbook from the Thinking Machines team and run the RL loop from the tinker_cookbook/recipes folder:

export TINKER_API_KEY=dummy
export WANDB_API_KEY=<your key>
uv run --with wandb --with tinker rl_loop.py \
  base_url=http://localhost:8000 \
  model_name="Qwen/Qwen3-4B" \
  lora_rank=1 \
  max_length=1024 \
  save_every=100

That recipe produces a reward curve that confirms the RL loop runs correctly through the local SkyRL tx backend.

Key takeaways

SkyRL tx v0.1.0 implements a local, Tinker-compatible engine that unifies training and inference for LLM post-training workflows. It exposes the Tinker primitives (forward_backward, optim_step, sample, save_state) over REST while handling batching, LoRA adapters and device placement internally. The architecture is split across an API server, SQL database, scheduling engine and workers. The v0.1.0 release adds end-to-end RL support, faster jitted and sharded sampling, per-request sampling parameters, gradient checkpointing, micro-batching and Postgres support, turning Tinker compatibility into a practical local RL backend for LLMs.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский