Google Summer of Code 2025

Google Summer of Code 2025

The Kubeflow Community plans to participate in Google Summer of Code 2025. This page aims to help you participate in GSoC 2025 with Kubeflow.

What is GSoC?

Google Summer of Code (GSoC) is a global program that offers students stipends for working on open-source projects during the summer.

For more information, see the GSoC FAQ and watch the video below:

How can I participate?

Thank you for your interest in participating in GSoC with Kubeflow!

Please carefully read the following information to learn how to participate in GSoC with Kubeflow.

Key Dates

Here are the key dates for GSoC 2025, the full timeline is available on the GSoC website:

EventDate
Applications OpenMarch 24 @ 18:00 UTC
Applications DeadlineApril 8 @ 18:00 UTC
Accepted Proposals AnnouncedMay 8
Community BondingMay 8 - June 1
Coding BeginsJune 2
Midterm EvaluationsJuly 14 - 18
Coding EndsSeptember 1
Final EvaluationsSeptember 1 - 8

Eligibility

To participate in GSoC with Kubeflow, you must meet the GSoC eligibility requirements:

  • Be at least 18 years old at time of registration.
  • Be a student or an open source beginner.
  • Be eligible to work in their country of residence during duration of program.
  • Be a resident of a country not currently embargoed by the United States.

Steps

  1. Sign up as a student on the GSoC website.
  2. Join the Kubeflow Slack:
    • NOTE: please do not reach out privately to mentors, instead, start a thread in the #kubeflow-contributors channel so others can see the response.
  3. Learn about Kubeflow:
  4. Review the project ideas to decide which ones you are interested in:
    • You may wish to attend the next community meeting for the group that is leading your chosen project.
    • NOTE: while we recommend you submit a proposal based on the project ideas, you can also submit a proposal with your own idea.
  5. Submit a proposal through the GSoC website between March 24th and April 8th.
  6. Wait for the results to be announced on May 8th.

Project Ideas for 2025 GSoC

Project 1

Project Title: multi-tenancy/security, Kubeflow repository Migration, and CI/CD Enhancement

Component: Kubeflow

Detailed Description:

  • Productionize the seaweedfs PoC as secure minio replacement https://github.com/kubeflow/manifests/tree/master/contrib/seaweedfs, since we are stuck with a 5 year old minio due to AGPL
  • add multi tenancy to seweadfs similar to the approach in feat(backend): isolate artifacts per namespace/profile/user using only one bucket pipelines#7725
  • Help to migrate Kubeflow/kubeflow components to kubeflow/dashboard and kubeflow/notebooks. This might even start before Gsoc
  • Renovate kubeflow/dashboard dependencies, addressing CVEs.
  • Implement architectural changes as needed.
  • Adapt and improve scripts and CI/CD in kubeflow/manifests, including matrix calls to test multiple Kubernetes versions simultaneously.

Expected Outcomes:

Successful migration of Kubeflow components. Hard multi-tenancy for the object storage Updated dependencies with resolved CVEs. Enhanced CI/CD pipeline supporting multiple Kubernetes versions.

Skills Required/Preferred:

GitHub and GitHub Actions Containerfile and Kubernetes knowledge Experience with Python, Go and JavaScript frameworks

Mentors:

@juliusvonkohout @thesuperzapper

Difficulty: Medium

Expected Size of the Project: 350 hours


Project 2

Project Title: Maintain Kserve Models Web Application such that we have a proper UI for model serving.

Component: Kubeflow

Detailed Description:

  • Clean up and merge the open issues and PRs
  • Implement a better CI/CD pipeline.
  • Potentially migrate the application to /kubeflow/kserve-model-ui.
  • Add features for editing, regression testing, and monitoring/metrics support.
  • Synchronize with kserve 0.14+ changes.

Expected Outcomes:

  • Resolved issues and merged PRs.
  • less CVEs
  • long-term maintainability
  • Improved CI/CD pipeline.
  • New editing and regression testing features.
  • Enhanced monitoring and metrics support.
  • Application synchronized with kserve 0.14+ changes.

Skills Required/Preferred:

  • GitHub Actions
  • Containerfiles
  • JavaScript frameworks

Possible Mentors:

Julius von Kohout Valentina Rodriguez Sosa Griffin Sullivan

Difficulty: Medium

Expected Size of the Project: 175+ hours


Project 3

Project Title: Make our service mesh rootless by default and provide secure model inference by default.

Component: Kubeflow

Detailed Description:

Secure our service mesh with istio-cni by default kubeflow/manifests#2907 and an option for istio-ambient mesh kubeflow/manifests#2676 with zero overhead namespaces (single waypoint proxy deployment). Furthermore we need to enable a secure Kserve by default kubeflow/manifests#2811. A general plan is also available at kubeflow/manifests#2528

For example one subtask for the ambient mesh would be:

  • Controllers to create HTTPRoute and AuthorizationPolicies, that align with way-point proxies
  • Manifests to also have a flavour of HTTPRoute and updated AuthorizationPolicies

Expected Outcomes:

  • A secure cni/ambient mesh with waypoint proxies only in the istio-system namespace for zero overhead namespaces. Just the same as the detailed description.

Skills Required/Preferred:

  • GitHub and GitHub Actions
  • Kubernetes and networking
  • Istio, Kustomize

Possible Mentors:

@juliusvonkohout @kimwnasptd

Difficulty: small

Expected Size of the Project: 175 hours


Project 4

Title: Deploying Kubeflow with Helm

Component

Kubeflow

Description:

Over the past several KubeCons, the number one issue from users has been the difficulty of deploying Kubeflow and the heavy dependency on various distributions for a simple setup. Currently, distributions use a mix of:

  • Manifests (some written and maintained separately)
  • Operators (managed outside the Kubeflow project repositories)
  • Templating tools (Helm, Kustomize, etc.)

To improve accessibility and enable users to self-manage their own Kubeflow deployments, a community-driven Helm chart is being developed for Kubeflow v1.9, with an eventual upgrade path to v1.10. This project is designed to “cross the finish line” for the Kubeflow Helm Chart, ensuring it is a fully functional deployment method.

As part of this initiative, we aim to:

  • Complete a working Helm Chart for Kubeflow 1.9
  • Ensure KServe and Knative integrations are included
  • Document the chart’s structure and configuration
  • Create a deployment walkthrough tutorial for a Kubernetes distribution of choice

By working on this project, contributors will improve the ability for end users to deploy Kubeflow in their own environments without relying on third-party distributions, making it easier to evaluate, customize, and scale.

Why This Project Matters?

For many enterprises and researchers, deploying Kubeflow remains a major challenge. By completing the official Kubeflow Helm Chart, this project will:

  • Reduce dependency on custom distributions
  • Lower the barrier to entry for new users
  • Enable self-managed deployments for evaluation and production
  • Foster stronger adoption of Kubeflow in the open-source community

🚀 Be part of the effort to make Kubeflow easier and more accessible!

Key Objectives:

1. Validate and Deploy a Kubeflow Helm Chart
Define the Chart Structure
  • Organize Helm files (templates/, values.yaml, Chart.yaml, crds/, etc.).
  • Follow Helm best practices for maintainability and clarity.
Define Core Kubeflow Components
  • Model deployment for essential Kubeflow components (e.g., Pipelines, Katib, Notebooks, Istio).
  • Define CRDs, if applicable, under crds/.
  • Write Deployment, Service, Ingress, and ConfigMap templates.
  • Configure Persistence (PVCs, VolumeClaims) for storage-based components.
  • Implement RBAC and authentication for security.
Parameterize Configuration
  • Use values.yaml to enable modular configurations.
  • Implement Helm templating ({{ .Values.foo }}) for flexibility.
  • Provide default configurations and allow for overrides.
Handle Dependencies (Optional)
  • Define external dependencies (e.g., Istio, cert-manager) in requirements.yaml.
  • Evaluate subcharts for modular deployments.
2. Document Helm Chart and Configuration Values
  • Write detailed documentation for values.yaml settings.
  • Describe deployment requirements (e.g., Kubernetes version, resource limits).
  • Include troubleshooting guidance for common issues.
3. Provide a Tutorial for Deploying the Helm Chart
  • Select a Kubernetes distribution (e.g., Kind, Minikube, EKS, GKE, AKS).
  • Develop a step-by-step deployment guide with verification steps.
  • Include sample configurations for different use cases (e.g., minimal vs. full deployment).
  • Document upgrade paths (helm upgrade and rollback strategies).

Expected Outcomes:

✅ A fully functional Kubeflow Helm chart for 1.9 that integrates KServe and Knative
✅ Published Helm chart documentation with clear configuration options
✅ A step-by-step tutorial that simplifies Kubeflow deployment for users
✅ Contribution to the Kubeflow community effort for Helm-based installation

Skills Required:

✔️ Experience with Kubernetes & Helm
✔️ Understanding of Kubeflow architecture
✔️ Familiarity with Helm templating (values.yaml, Chart.yaml)
✔️ YAML & Kubernetes manifests expertise

Nice to Have:

➕ Experience with KServe & Knative
➕ Knowledge of RBAC, authentication & security policies
➕ Understanding of Kubernetes networking & ingress controllers

Mentors:

  • Chase Cristensen
  • Valentina Rodriguez Sosa
  • Julius von Kohout

Resources & References:

Estimated Time Commitment:

350 hours


Project 5

Title: Enhancing Kubeflow Data Science Experience with Jupyter-Centric Tools

Component

Kubeflow IDE / Kubeflow Jupyter Extension (IDE Working Group)

Description & Outcomes

This project is dedicated to elevating the data science workflow within Kubeflow by creating a suite of tools and extensions that deliver a seamless, Jupyter-native user experience. The work begins with reviving and modernizing the existing Kale tool – ensuring compatibility with the latest Jupyter and Kubeflow Pipelines (KFP v2) standards – and then, optionally and depending on candidate progress, extends those capabilities by introducing new functionality. These enhancements include interactive pipeline visualizations and additional integrations, all designed to simplify and streamline the development user experience within Kubeflow.

Phase 1: Reviving and Modernizing Kale
  • Audit & Dependency Update:
    • Create a detailed map of the current Kale features and capabilities.
    • Update dependencies to resolve known CVEs and deprecated modules.
    • Ensure full compatibility with the latest versions of Jupyter (e.g., JupyterLab 4.x) and Kubeflow Pipelines (KFP v2).
  • API Refactoring & Integration:
    • Refactor internal APIs to align with KFP v2 and modern Jupyter environments.
    • Develop integration tests and demo workflows that validate the end-to-end execution of pipelines from within a Jupyter notebook.
  • Migration
    • Migrate the updated Kale codebase into a new Jupyter-centric tools repository under the Kubeflow GitHub organization repository.
Phase 2: (optional) Extending Capabilities with Jupyter-Centric Enhancements
  • Interactive Pipeline Visualization Editor:
    • Develop an interactive pipeline visualization editor that mirrors the Kubeflow Pipelines dashboard.
    • Integrate the visualization tool into the Jupyter environment, enabling users to monitor, modify, and interact with their pipelines in real time.
  • Enhanced User Experience:
    • Revamp the user interface to follow Jupyter’s native design, improving usability for data scientists.
  • Unified Access via a Python SDK:
    • Build a lightweight Python SDK that simplifies the configuration and access to multiple Kubeflow components (e.g., Pipelines, Training Operator & Katib, Model Registry, Model Serving, Feast).
    • Ensure that the SDK offers a “one-stop-shop” experience, reducing dependency and configuration overhead for data scientists.
Phase 3: Consolidation and Final Documentation
  • Comprehensive Documentation & Tutorials:
    • Prepare detailed user guides, developer documentation, and tutorials covering the entire enhanced workflow.
    • Prepare and deliver a live demo to the Kubeflow community that illustrates the new data science experience within Kubeflow.
  • Community Engagement & Feedback:
    • Engage with the Kubeflow community for feedback during the development process
    • Iterate on features based on user input and testing to ensure a smooth, production-ready experience.
Additional Notes:
  • The project has high exposure within both the Kubeflow community and the Jupyter community, offering opportunities for broad collaboration and visibility.
  • This project is anticipated to significantly enhance the user-friendliness and overall adoption of Kubeflow. Its outputs are highly anticipated by Kubeflow practitioners.
  • Despite the project encompassing a wide range of deliverables, it will be considered successful even if only a portion of the work is completed. The student will not be working in isolation – other Kubeflow contributors will actively assist and collaborate throughout the project.

Skills Required

  • Proficiency in Python for backend development and API integration.
  • Experience with JavaScript/TypeScript for frontend development.
  • Experience with modern UI frameworks (e.g., React, Jupyter widgets) is a plus.
  • Familiarity with Jupyter Notebook, JupyterLab, and extension development.
  • Experience using Git/GitHub and contributing to community-driven projects.

Mentors

@ederign @StefanoFioravanzo

Difficulty

Medium

GitHub Issues References

Expected Size of the Project

350 Hours


Project 6

Title: Enhancing Kubeflow with Batch Processing Gateway for Scalable and Efficient Multi-Cluster Spark Job Management

Component:

  • Kubeflow
  • Batch Processing Gateway
  • Apache Spark Operator
  • Kubeflow SDK proposal

Description:

Managing Apache Spark jobs in a multi-cluster Kubernetes environment presents several challenges, including inefficient workload distribution, performance bottlenecks, and debugging complexities. Kubeflow users working with large-scale data processing and machine learning workflows require a streamlined, cloud-native experience that optimizes Spark job execution, monitoring, and debugging. This project proposes integrating the Batch Processing Gateway (BPG) with Kubeflow to enable an efficient, scalable, and centralized approach for submitting, monitoring, and managing Spark applications across multiple clusters. By enhancing the API and user experience, this integration will provide seamless job routing, performance tracking, and debugging capabilities while reducing the computational burden on Kubeflow notebooks.

By undertaking this project, contributors will play a crucial role in enhancing the performance, efficiency, and usability of Spark applications within the Kubeflow ecosystem, thereby addressing current challenges in multi-cluster job management.

Why This Project?

By integrating Batch Processing Gateway with Kubeflow Notebooks, this project provides a cloud-native, scalable, and user-friendly solution for Spark job execution, debugging. It will significantly improve performance, efficiency, and developer experience, enabling ML practitioners and data engineers to focus on experimentation and optimization without struggling with job management complexities.

Batch Processing Gateway can help enterprise environments, managing Apache Spark jobs across multiple Kubernetes clusters presents challenges such as inefficient job distribution, performance bottlenecks, and complex workload balancing. Integrating the Batch Processing Gateway (BPG) into Kubeflow aims to address these issues by providing a centralized mechanism for submitting, monitoring, and managing Spark applications across various clusters.

Key Objectives:

  • Analyse, Design, Plan, and Execute Spark Job Execution Strategies:
    • Evaluate the trade-offs between running a Spark kernel directly within a Kubeflow Notebook versus leveraging the Batch Processing Gateway for job submission.
    • Assess the cloud-native design of Kubeflow SDK and Notebook environments to determine the optimal approach for Spark integration that maximizes efficiency, scalability, and usability.
    • Make a well-informed decision on whether to support Spark kernels within notebooks, use BPG, or implement a hybrid approach for an enhanced user experience.
  • Automated Job Routing and Scalable Execution:
    • Implement dynamic workload routing using BPG to automatically distribute Spark jobs based on cluster load, resource availability, and workload priority.
    • Integrate with the Spark Operator to optimize resource allocation, minimize execution delays, and ensure efficient scaling for petabyte-scale machine learning and data processing workloads.
  • Enhanced User API and Notebook Integration:
    • Develop a Python SDK for Kubeflow notebooks, enabling users to submit, manage, and monitor Spark jobs via BPG REST APIs for a lightweight, scalable solution.
    • Ensure a seamless user experience by providing intuitive APIs that abstract complex job management operations, making it easier for data scientists and ML engineers to experiment and iterate on workflows within
  • Comprehensive Debugging and Performance Monitoring:
    • Enable full debugging capabilities by integrating Spark UI, logging, and monitoring tools into Kubeflow, allowing users to visualize Spark DAGs, tasks, and execution stages.
    • Implement centralized logging and Prometheus-based monitoring to provide real-time insights into Spark job performance across clusters.
    • Ensure users can efficiently analyze job execution, detect bottlenecks, and optimize data processing and ML workflows within Kubeflow.
    • Note: Most of the logging APIs must be leveraged out of the box from either BPG or Spark - but we need to document, show case examples to user.

Mentors expecting GSoC project execution plan - phase wise with weekly milestone that student want to achieve.

Expected Outcomes:

  • A robust integration of BPG into the Kubeflow ecosystem, enabling efficient Spark job management across multiple clusters.
  • A Python SDK compatible with Kubeflow notebooks to facilitate Spark job submission through BPG, offering a scalable solution that reduces notebook load and enhances performance.
  • Comprehensive documentation and user guides to assist users in leveraging the new features effectively.

Skills Required/Preferred:

**Curious to learn, design, develop and having below skillsets will be plus : **

  • Proficiency in Python, Java and familiarity with developing SDKs.
  • Experience with Kubernetes and managing containerized applications.
  • Understanding of Apache Spark and its deployment on Kubernetes clusters.
  • Familiarity with RESTful API development and integration.
  • Experience with monitoring tools and logging frameworks is a plus.

Mentors:

[Shekhar Prasad Rajak], [Software Development Engineer, Apple Inc], [Github: @Shekharrajak , Mail: shekharrajak@live.com] [Chaoran Yu], [Mentor Title/Position], [Github: @yuchaoran2011] [Andrey Velichkevich], [Mentor Title/Position], [Github @andreyvelich]

Difficulty:

Medium to Hard

GitHub Issue:

Enhancing Kubeflow with Batch Processing Gateway for Efficient Multi-Cluster Spark Job Management spark-operator#2422

Expected Project Size:

350 hours


Project 7

Title: Enable GPU Testing for LLM Blueprints in Kubeflow Trainer

Component: Kubeflow Trainer

Tracking issue: kubeflow/trainer#2432

Component: Kubeflow Trainer

Skills required:

  • GitHub Actions
  • Kubernetes
  • PyTorch
  • Python

Difficulty: hard

Mentors:

  • Andrey Velichkevich,
  • Valentina Rodriguez Sosa

Length: 350 hrs


Project 8

Title: Support JAX and/or TensorFlow Training Runtimes in Kubeflow Trainer

Component: Kubeflow Trainer

Tracking issue: TBD

Skills required:

  • Go
  • Kubernetes
  • JAX
  • TensorFlow

Difficulty: medium

Mentors:

  • Andrey Velichkevich
  • TBD

Length: 350 hrs


Project 9

Title: Support Kubernetes Sidecars for Katib Metrics Collectors

Component: Kubeflow Katib

Description

Open issue: kubeflow/katib#2181

Katib implements Pull-based Metrics Collector as a sidecar container to collect training metrics from the Trials once training is complete. However, the Pull-based Metrics Collector has some problems. For example, the Trial will fail if the training container is finished before Metrics Collector is started. After v1.28, Kubernetes add native support for sidecar containers as part of KEP 753, which allows us to launch sidecar containers before the training container. In order to address the problems with the Pull-based Metrics Collector, we need to support Kubernetes Sidecar Containers in Katib.

Expected Outcomes

Successfully integrate Katib Metrics Collectors with Kubernetes Sidecars

Provide some unit tests and e2e tests

Skills Required/Preferred:

  • Kubernetes
  • Go
  • YAML
  • Python

Mentors @Electronic-Waste , Andrey Velichkevich, TBD

Difficulty: Medium

Expected Size of the Project: 350 hours


Project 10

Title: Export Fine-Tuned LLM to Model Registry

Component: Kubeflow Trainer / Model Registry

Tracking issue: TBD

Description:

Trainer has implemented initializers for model and dataset, and will support model exporter in the future. By supporting the model registry as one of the destinations of the exporter, Trainer will integrate with Kubeflow ecosystem more deeply.

Skills required:

  • Kubernetes
  • Go
  • YAML
  • Python

Difficulty: hard

Length: 350 hrs

Mentors:

  • Shao Wang, TBD

Project 11

Title: Add Volcano scheduler support in Trainer

Component: Kubeflow Trainer

Tracking issue: TBD

Description

Currently, Trainer does not support Volcano for scheduling. Since Volcano is a widely adopted scheduler for AI workloads, it could provide Trainer with more AI-specific scheduling capabilities if we integrate Volcano into Trainer

Skills required:

  • Kubernetes
  • Go
  • Volcano

Difficulty: hard

Length: 350 hrs

Mentors:

  • Shao Wang, TBD

Project 12: PostgreSQL integration in Kubeflow Pipelines

Component: Pipelines

Open issue: https://github.com/kubeflow/pipelines/issues/9813

Kubeflow Pipelines must store information about pipelines, experiments, runs, and artifacts in a database. Currently, the only database it supports is MySQL/MariaDB.

We plan to support PostgreSQL as an alternative to MySQL/MariaDB so users will be able to reuse existing databases, and PostgreSQL will be a good use case for supporting multiple databases.

Skills required:

  • Kubernetes
  • Python
  • Go
  • YAML

Difficulty: medium

Mentors: Ricardo Martinelli, Shivay Lamba

Length: 175 hrs


More projects being ideated here

https://github.com/kubeflow/community/issues/809

Feedback

Was this page helpful?