Google Summer of Code 2024

Kubeflow Community is excited to announce that we have been selected as organization to participate in Google Summer of Code 2024. This page aims to help you participate in the Kubeflow organization for GSoC 2024.

How can I participate in Kubeflow GSoC?

Please go to Google Summer of Code 2024 and sign up as a student. Next, look at the projects below to decide which ones you are interested in. Note, that you must submit your proposals through the GSoC website and your proposal must be selected to participate.

Contributor applications open March 18th, 2024 and close on April 2nd, 2024. For more information, see the GSoC website and/or reach out to the GSoC organizers. Please only contact mentors about projects, not the program itself.

Slack

Please Join the Kubeflow Slack.

Please do not reach out privately to mentors, instead, start a thread in the #gsoc-participants channel so others can see the response. Be kind to our mentors, please search to see if your question has already been answered.

Meetings

You may wish to attend the next community meeting for the group that is leading your chosen project, please see the calendar below for more information.

What if my proposal is not chosen?

Please understand that not everyone can be selected for Google Summer of Code (GSoC), there are many possible candidates for each project.

However, we still want to encourage you to participate in the Kubeflow project! Get started by attending working group meetings for components you want to help with, and reading our contributing guide.

Project Ideas for 2024 GSoC

Project 1: Kubeflow Notebooks 2.0

Kubeflow Notebook is a widely used component of Kubeflow that allows Data Scientists and ML Engineers to run web-based IDEs (JupyterLab, VSCode, RStudio) on Kubernetes clusters.

There is currently an effort to create the next major version of Kubeflow Notebooks.

The main idea is to change the Kubeflow Notebook CRD so that it is no longer just a wrapper around a Kubernetes PodSpec.

This foundational change enables users to:

Update existing notebooks after spawning, to change their “pod config” (CPU/GPU/RAM), “volumes” (storage), and “image” (what packages are installed) from options that are defined by their admin.
Make spawning notebooks less confusing for end-users. Pod configs stop being about specific parts of the PodSpec (e.g. tolerations, requests, limits), and become a drop-down list of user-friendly names (e.g. “Big GPU Notebook - A100 - 128GB”), similar to cloud “instance types”.
Give admins more control over how workspaces are spawned, and the lifecycle of the “options” which are available to users. For example, admins can now “redirect” existing image/pod configs to new ones, but delay the application of these updates until the next pod restart (during which, the interface will display a warning to users that a change is pending).
Support new web-based IDEs without needing to specifically integrate with them. Cluster admins can define a custom “kind” for their internal app, or even make “flavors” of existing apps (like Jupyter and VSCode) with the packages and pod-sizes required for specific teams in their organization.

You would be part of the larger effort, and involved in one or more code deliverables:

See Kubeflow Notebooks docs: https://www.kubeflow.org/docs/components/notebooks/overview/
See Kubeflow Notebooks 2.0 GitHub proposal: https://github.com/kubeflow/kubeflow/issues/7156
See Kubeflow Notebooks 2.0 design document: https://docs.google.com/document/d/1_zk06zebbaTBdJ8TdU07Ibky25hqHGARXjVcsp2qEnU/edit

Skills required: Kubernetes Controllers (Golang - Kubebuilder) AND/OR Web Development (JS - Angular, Python - Flask)

Difficulty: medium/high

Length: 350 hrs

Mentors: Mathew Wicks, Kimonas Sitorchos, Julius von Kohout

Component: Notebooks

Project 2: Rootless Kubeflow Container Images (Istio Ambient Mesh)

Kubeflow uses Istio as a service mesh, which by default requires “root level” network permissions for its init-containers. We want to reduce the number of privileged containers required to run Kubeflow, so are investigating using the Istio CNI, and eventually the Istio Ambient mesh.

You would be involved in testing and investigating the impacts of these changes, and helping push the integration forwards.

See the proposal for more information: https://github.com/kubeflow/manifests/blob/master/proposals/20200913-rootlessKubeflow.md

Skills required: Istio, Kubernetes, YAML

Difficulty: medium

Length: 175 hrs

Mentors: Kimonas Sitorchos, Julius von Kohout

Component: Notebooks

Project 3: Triage and Categorize Kubeflow GitHub Issues & PRs

The Kubeflow project needs help to triage, categorize, and highlight important Issues/PRs from the https://github.com/kubeflow/kubeflow GitHub repo. There are around 200 open Issues and 200 open PRs, in addition to many Issues/PRs that have been lost to time (closed automatically due to inactivity).

Specifically, your goal would be to:

Decide which Issues/PRs are still relevant
Categorize Issues/PRs by type
De-duplicate multiple Issues for the same request
Suggest which ones are the most important.
Help find “good first issues” for new members:
- https://github.com/kubeflow/manifests/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22
Review which PRs are likely safe to merge (especially dependabot ones)

Skills required: GitHub, Kubernetes, YAML, Python, GO, JS

Difficulty: medium

Length: 175 hrs

Mentors: Mathew Wicks, Kimonas Sitorchos, Julius von Kohout

Component: Notebooks/General

Project 4: Implement LLM Tuning API for Katib

Recently, we implemented a new train Python SDK API in Kubeflow Training Operator to easily fine-tune LLMs on multiple GPUs with predefined datasets provider, model provider, and HuggingFace trainer.

To continue our roadmap around LLMOps in Kubeflow, we want to give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs. It requires making appropriate changes to the Katib Python SDK which allows users to set model, dataset, and HyperParameters that they want to optimize for LLM.

Skills required: Kubernetes, YAML, Python

Difficulty: medium

Length: 350 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuan (Terry) Tang, Yuki Iwai

Component: Katib

Project 5: Support Distributed Jax for Training Operator

Open issue: https://github.com/kubeflow/training-operator/issues/1619

We want to integrate Jax in Training Operator to run distributed training and fine-tuning jobs on Kubernetes using the Jax ML framework. We need to create a new Kubernetes Custom Resource for Jax (e.g. JaxJob) and update the Training Operator controller to support it. Potentially, we can integrate Jax with the Training Operator Python SDK to give Data Scientists simple APIs to create JaxJob on Kubernetes.

Skills required: Kubernetes, Go, YAML, Python

Difficulty: medium

Length: 350 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuan (Terry) Tang, Yuki Iwai

Component: Training Operator

Project 6: Push-based metrics collection for Katib

Open issue: https://github.com/kubeflow/katib/issues/577.

Katib implements Metrics Collector as a sidecar container to collect training metrics from the Trials once training is complete. This Metrics Collector waits until the training container is complete and parses training logs to get appropriate metrics like accuracy or loss to get evaluation results for the HyperParameter tuning algorithm.

Sometimes the container sidecar approach might not work for users. For example, if their Trial resources executor doesn’t support sidecar containers. For such use-cases, we want to implement a new API to the Katib Python SDK to allow users to push metrics directly from their training scripts to the Katib DB.

Skills required: Kubernetes, Go, YAML, Python

Difficulty: medium

Length: 175 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuan (Terry) Tang, Yuki Iwai

Component: Katib

Project 7: Automate docs generation for Kubeflow Python SDKs

Open issue: https://github.com/kubeflow/katib/issues/2081

Training Operator and Katib SDKs have a valid docstring for each public API that users are running. We want to automatically generate documentation for Kubeflow users from these docstrings, so users don’t need to read source code to understand APIs parameters.

Skills required: Python

Difficulty: medium

Length: 90 hrs

Mentors: Andrey Velichkevich, Johnu George, Shivay Lamba, Yuan (Terry) Tang, Yuki Iwai

Component: Katib/Training Operator

Project 8: Support various parameter distributions like log-uniform in Katib

Open issue: https://github.com/kubeflow/katib/pull/2059

We need to enhance Katib Experiment APIs to support various parameter distributions like uniform, log-uniform, qlog-uniform to make Katib more native to other HyperParameter tuning frameworks like Hyperopt. Currently, Katib supports only uniform distribution of integer, float, and categorical HyperParameters.

Skills required: Kubernetes, Python, Go, YAML

Difficulty: medium

Length: 350 hrs

Mentors: Andrey Velichkevich, Johnu George, Yuan (Terry) Tang, Yuki Iwai

Component: Katib

Project 9: PostgreSQL integration in Kubeflow Pipelines

Open issue: https://github.com/kubeflow/pipelines/issues/9813

Kubeflow Pipelines must store information about pipelines, experiments, runs, and artifacts in a database. Currently, the only database it supports is MySQL/MariaDB.

We plan to support PostgreSQL as an alternative to MySQL/MariaDB so users will be able to reuse existing databases, and PostgreSQL will be a good use case for supporting multiple databases.

Skills required: Kubernetes, Python, Go, YAML

Difficulty: medium

Length: 175 hrs

Mentors: Ricardo Martinelli, Shivay Lamba

Component: Pipelines

Project 10: Enhancing KF Model Registry Python client for seamless ML imports from alternative registries

We aim to extend the capabilities of the KF Model Registry Python client by enabling smooth imports from various machine learning registries. While import from HuggingFace is already implemented (and can be used as a basis) we seek to integrate support for MLFlow, and other popular registry formats.

Skills required: Python, ML model serialization formats, YAML, Kubernetes/Kubeflow as a plus

Difficulty: medium

Length: 175 hrs

Mentors: Matteo Mortari, Andrea Lamparelli

Component: Model Registry

Feedback

Was this page helpful?

Thank you for your feedback!

We're sorry this page wasn't helpful. If you have a moment, please share your feedback so we can improve.

Last modified January 21, 2025: Restructure events section, and add GSoC 2025 page (#3962) (2bb1bc07)