User Guides

User guides for Training Operator

How to Fine-Tune LLMs with Kubeflow

Overview of the LLM fine-tuning API in the Training Operator

TensorFlow Training (TFJob)

Using TFJob to train a model with TensorFlow

PyTorch Training (PyTorchJob)

Using PyTorchJob to train a model with PyTorch

PaddlePaddle Training (PaddleJob)

Using PaddleJob to train a model with PaddlePaddle

XGBoost Training (XGBoostJob)

Using XGBoostJob to train a model with XGBoost

JAX Training (JAXJob)

Using JAXJob to train a model with JAX

Job Scheduling

How to schedule a job with gang-scheduling

MPI Training (MPIJob)

Instructions for using MPI for training

Prometheus Monitoring

Prometheus Metrics for the Training Operator

Feedback

Was this page helpful?