Katib Architecture

How does Katib work?

This page describes Katib concepts and architectures.

Hyperparameter Tuning

Hyperparameters are the variables that control the model training process. They include:

  • The learning rate.
  • The number of layers in a neural network.
  • The number of training epochs.

Hyperparameter values are not learned. In other words, in contrast to the model weights and biases the model training process does not adjust the hyperparameter values.

Hyperparameter tuning is the process of optimizing the hyperparameter values to maximize the predictive accuracy of the model. If you don’t use Katib or a similar system for hyperparameter tuning, you need to run many training jobs yourself, manually adjusting the hyperparameters to find the optimal values.

You can improve your hyperparameter tuning Experiments by using early stopping techniques. Follow the early stopping guide for the details.

Katib Architecture for Hyperparameter Tuning

This diagram shows how Katib performs Hyperparameter tuning:

Katib Overview

First of all, users need to write ML training code which will be evaluated on every Katib Trial with different hyperparameters. Then, using Katib Python SDK users should set the objective, search space, search algorithm, Trial resources, and create the Katib Experiment.

Katib implements the following Kubernetes Custom Resource Definitions (CRDs) to tune Hyperparameters.

Experiment

An Experiment is a single tuning run, also called an optimization run.

You specify configuration settings to define the Experiment. The following are the main configurations:

  • Objective: What you want to optimize. This is the objective metric, also called the target variable. A common metric is the model’s accuracy in the validation pass of the training job (e.g. validation accuracy). You also specify whether you want the hyperparameter tuning job to maximize or minimize the metric.

  • Search space: The set of all possible hyperparameter values that the hyperparameter tuning job should consider for optimization, and the constraints for each hyperparameter. Other names for search space include feasible set and solution space. For example, you may provide the names of the hyperparameters that you want to optimize. For each hyperparameter, you may provide a minimum and maximum value or a list of allowable values.

  • Search algorithm: The algorithm to use when searching for the optimal hyperparameter values. For example, Bayesian Optimization or Random Search.

For details of how to define your Experiment, follow this guide

Suggestion

A Suggestion is a set of hyperparameter values that the hyperparameter tuning process has proposed. Katib creates a Trial to evaluate the suggested set of values.

Trial

A Trial is one iteration of the hyperparameter tuning process. A Trial corresponds to one worker job instance with a list of parameter assignments. The list of parameter assignments corresponds to a Suggestion.

Each Experiment runs several Trials. The Experiment runs the Trials until it reaches either the objective or the configured maximum number of Trials.

Worker

The Worker is the process that runs to evaluate a Trial and calculate its objective value.

The Worker can be any type of Kubernetes resource or Kubernetes CRD. Follow the Trial template guide to check how to support your own Kubernetes resource in Katib.

In addition to hyperparameter tuning, Katib offers a neural architecture search feature. You can use the NAS to design your artificial neural network, with a goal of maximizing the predictive accuracy and performance of your model.

NAS is closely related to hyperparameter tuning. Both are subsets of AutoML. While hyperparameter tuning optimizes the model’s hyperparameters, a NAS system optimizes the model’s structure, node weights and hyperparameters.

NAS technology in general uses various techniques to find the optimal neural network design.

Learn more about various NAS algorithms in Differentiable Architecture Search and Efficient Neural Architecture Search guides.

Katib Control Plane Components

Katib has the following components on the control plane to run Experiments:

  • katib-controller - the controller to manage Katib Kubernetes CRDs: Experiment, Suggestion, Trial.

    • (Optional) If certificate generator is enabled in Katib Config, Katib controller deployment will create self-signed certificate for the Katib webhooks. Learn more about the cert generator in the developer guide.
  • katib-ui - the Katib user interface.

  • katib-db-manager - the gRPC API server to control Katib DB interface.

  • katib-mysql - the MySQL DB backend to store Katib Experiments metrics.

Feedback

Was this page helpful?