How to Fine-Tune LLMs with Kubeflow

Overview of the LLM fine-tuning API in the Training Operator

This page describes how to use a train API from the Training Python SDK that simplifies the ability to fine-tune LLMs with distributed PyTorchJob workers.

If you want to learn more about how the fine-tuning API fits in the Kubeflow ecosystem, head to the explanation guide.

Prerequisites

You need to install the Training Python SDK with fine-tuning support to run this API.

How to use the Fine-Tuning API?

You need to provide the following parameters to use the train API:

  • Pre-trained model parameters.
  • Dataset parameters.
  • Trainer parameters.
  • Number of PyTorch workers and resources per workers.

For example, you can use the train API to fine-tune the BERT model using the Yelp Review dataset from HuggingFace Hub with the code below:

import transformers
from peft import LoraConfig

from kubeflow.training import TrainingClient
from kubeflow.storage_initializer.hugging_face import (
    HuggingFaceModelParams,
    HuggingFaceTrainerParams,
    HuggingFaceDatasetParams,
)

TrainingClient().train(
    name="fine-tune-bert",
    # BERT model URI and type of Transformer to train it.
    model_provider_parameters=HuggingFaceModelParams(
        model_uri="hf://google-bert/bert-base-cased",
        transformer_type=transformers.AutoModelForSequenceClassification,
    ),
    # Use 3000 samples from Yelp dataset.
    dataset_provider_parameters=HuggingFaceDatasetParams(
        repo_id="yelp_review_full",
        split="train[:3000]",
    ),
    # Specify HuggingFace Trainer parameters. In this example, we will skip evaluation and model checkpoints.
    trainer_parameters=HuggingFaceTrainerParams(
        training_parameters=transformers.TrainingArguments(
            output_dir="test_trainer",
            save_strategy="no",
            evaluation_strategy="no",
            do_eval=False,
            disable_tqdm=True,
            log_level="info",
        ),
        # Set LoRA config to reduce number of trainable model parameters.
        lora_config=LoraConfig(
            r=8,
            lora_alpha=8,
            lora_dropout=0.1,
            bias="none",
        ),
    ),
    num_workers=4, # nnodes parameter for torchrun command.
    num_procs_per_worker=2, # nproc-per-node parameter for torchrun command.
    resources_per_worker={
        "gpu": 2,
        "cpu": 5,
        "memory": "10G",
    },
)

After you execute train, the Training Operator will orchestrate the appropriate PyTorchJob resources to fine-tune the LLM.

Next Steps

Feedback

Was this page helpful?