<RETURN_TO_BASE

Efficiently Fine-Tune Qwen3-14B on Google Colab with Unsloth AI and LoRA Optimization

This guide explains how to fine-tune the Qwen3-14B model efficiently on Google Colab with Unsloth AI, leveraging 4-bit quantization and LoRA for memory-efficient training using mixed reasoning and instruction datasets.

Streamlining Fine-Tuning of Large Language Models with Unsloth AI

Fine-tuning large language models (LLMs) like Qwen3-14B usually demands significant computational resources, memory, and time. Unsloth AI addresses these challenges by enabling fast and memory-efficient fine-tuning using advanced methods such as 4-bit quantization and LoRA (Low-Rank Adaptation).

Installing Essential Libraries on Google Colab

The tutorial begins by installing necessary libraries conditionally based on the environment to optimize performance and compatibility. Key components installed include bitsandbytes for 4-bit training, trl for training utilities, and unsloth_zoo for model access.

%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

Loading and Preparing the Qwen3-14B Model

The Qwen3-14B model is loaded in 4-bit precision with a maximum sequence length of 2048 tokens using Unsloth's FastLanguageModel utility. Full fine-tuning is disabled to allow efficient parameter tuning using LoRA.

from unsloth import FastLanguageModel
import torch
 
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 2048,
    load_in_4bit = True,
    load_in_8bit = False,
    full_finetuning = False,
)

Applying LoRA for Parameter-Efficient Fine-Tuning

LoRA adapters are injected into specific transformer modules to enable fine-tuning with minimal trainable parameters. Gradient checkpointing is enabled to further reduce memory usage.

model = FastLanguageModel.get_peft_model(
    model,
    r = 32,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 32,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Preparing Mixed Datasets for Fine-Tuning

Two datasets are loaded: a reasoning dataset (OpenMathReasoning-mini) containing chain-of-thought problems, and a non-reasoning instruction-following dataset (FineTome-100k). These datasets are transformed into chat-style conversations suitable for fine-tuning.

from datasets import load_dataset
 
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini", split="cot")
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k", split="train")

Formatting Data into Conversational Structure

A function converts problem-solution pairs into user-assistant dialogue format. The tokenizer applies chat templates to structure conversations properly.

def generate_conversation(examples):
    problems  = examples["problem"]
    solutions = examples["generated_solution"]
    conversations = []
    for problem, solution in zip(problems, solutions):
        conversations.append([
            {"role": "user", "content": problem},
            {"role": "assistant", "content": solution},
        ])
    return {"conversations": conversations}
 
reasoning_conversations = tokenizer.apply_chat_template(
    reasoning_dataset["conversations"],
    tokenize=False,
)
 
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)
 
non_reasoning_conversations = tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize=False,
)

Combining and Sampling Datasets

The reasoning and instruction datasets are combined with a 75% to 25% ratio to balance logical reasoning and general instruction data. The combined dataset is shuffled for training.

import pandas as pd
 
chat_percentage = 0.75
non_reasoning_subset = pd.Series(non_reasoning_conversations).sample(
    int(len(reasoning_conversations) * (1.0 - chat_percentage)),
    random_state=2407,
)
 
data = pd.concat([
    pd.Series(reasoning_conversations),
    pd.Series(non_reasoning_subset)
])
data.name = "text"
 
from datasets import Dataset
combined_dataset = Dataset.from_pandas(pd.DataFrame(data))
combined_dataset = combined_dataset.shuffle(seed=3407)

Configuring and Running the Fine-Tuning Trainer

Using trl's SFTTrainer and SFTConfig, the training process is configured with batch size, learning rate, optimizer, and other parameters optimized for efficient fine-tuning.

from trl import SFTTrainer, SFTConfig
 
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=combined_dataset,
    eval_dataset=None,  
    args=SFTConfig(
        dataset_text_field="text",
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=30,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        report_to="none",
    )
)
 
trainer.train()

Saving the Fine-Tuned Model

The trained model and tokenizer are saved locally for later use or deployment.

model.save_pretrained("qwen3-finetuned-colab")
tokenizer.save_pretrained("qwen3-finetuned-colab")

Unsloth AI's approach makes fine-tuning massive models like Qwen3-14B feasible on consumer hardware, combining efficient quantization, LoRA adaptation, and practical dataset mixing. This tutorial demonstrates a full pipeline from installation to training, empowering developers to create custom domain-specific models and assistants with limited resources.

🇷🇺

Сменить язык

Читать эту статью на русском

Переключить на Русский