Google Summer of Code 2025
The Kubeflow Community plans to participate in Google Summer of Code 2025. This page aims to help you participate in GSoC 2025 with Kubeflow.
Note
While Kubeflow participated in GSoC 2024, we are currently awaiting final confirmation of our participation in GSoC 2025. Google will announce the final list of accepted organizations on February 27, 2025.What is GSoC?
Google Summer of Code (GSoC) is a global program that offers students stipends for working on open-source projects during the summer.
For more information, see the GSoC FAQ and watch the video below:
How can I participate?
Thank you for your interest in participating in GSoC with Kubeflow!
Please carefully read the following information to learn how to participate in GSoC with Kubeflow.
Key Dates
Here are the key dates for GSoC 2025, the full timeline is available on the GSoC website:
Event | Date |
---|---|
Applications Open | March 24 @ 18:00 UTC |
Applications Deadline | April 8 @ 18:00 UTC |
Accepted Proposals Announced | May 8 |
Community Bonding | May 8 - June 1 |
Coding Begins | June 2 |
Midterm Evaluations | July 14 - 18 |
Coding Ends | September 1 |
Final Evaluations | September 1 - 8 |
Eligibility
To participate in GSoC with Kubeflow, you must meet the GSoC eligibility requirements:
- Be at least 18 years old at time of registration.
- Be a student or an open source beginner.
- Be eligible to work in their country of residence during duration of program.
- Be a resident of a country not currently embargoed by the United States.
Steps
- Sign up as a student on the GSoC website.
- Join the Kubeflow Slack:
- NOTE: please do not reach out privately to mentors, instead, start a thread in the
#kubeflow-contributors
channel so others can see the response.
- NOTE: please do not reach out privately to mentors, instead, start a thread in the
- Learn about Kubeflow:
- Read the Introduction to Kubeflow
- Review the Architecture Overview
- Consider trying out Kubeflow (not required, can be challenging)
- Review the project ideas to decide which ones you are interested in:
- You may wish to attend the next community meeting for the group that is leading your chosen project.
- NOTE: while we recommend you submit a proposal based on the project ideas, you can also submit a proposal with your own idea.
- Submit a proposal through the GSoC website between March 24th and April 8th.
- Wait for the results to be announced on May 8th.
Project Ideas for 2025 GSoC
Project 1
Project Title: multi-tenancy/security, Kubeflow repository Migration, and CI/CD Enhancement
Component: Kubeflow
Detailed Description:
- Productionize the seaweedfs PoC as secure minio replacement https://github.com/kubeflow/manifests/tree/master/contrib/seaweedfs, since we are stuck with a 5 year old minio due to AGPL
- add multi tenancy to seweadfs similar to the approach in feat(backend): isolate artifacts per namespace/profile/user using only one bucket pipelines#7725
- Help to migrate Kubeflow/kubeflow components to kubeflow/dashboard and kubeflow/notebooks. This might even start before Gsoc
- Renovate kubeflow/dashboard dependencies, addressing CVEs.
- Implement architectural changes as needed.
- Adapt and improve scripts and CI/CD in kubeflow/manifests, including matrix calls to test multiple Kubernetes versions simultaneously.
Expected Outcomes:
Successful migration of Kubeflow components. Hard multi-tenancy for the object storage Updated dependencies with resolved CVEs. Enhanced CI/CD pipeline supporting multiple Kubernetes versions.
Skills Required/Preferred:
GitHub and GitHub Actions Containerfile and Kubernetes knowledge Experience with Python, Go and JavaScript frameworks
Mentors:
@juliusvonkohout @thesuperzapper
Difficulty: Medium
Expected Size of the Project: 350 hours
Project 2
Project Title: Maintain Kserve Models Web Application such that we have a proper UI for model serving.
Component: Kubeflow
Detailed Description:
- Clean up and merge the open issues and PRs
- Implement a better CI/CD pipeline.
- Potentially migrate the application to /kubeflow/kserve-model-ui.
- Add features for editing, regression testing, and monitoring/metrics support.
- Synchronize with kserve 0.14+ changes.
Expected Outcomes:
- Resolved issues and merged PRs.
- less CVEs
- long-term maintainability
- Improved CI/CD pipeline.
- New editing and regression testing features.
- Enhanced monitoring and metrics support.
- Application synchronized with kserve 0.14+ changes.
Skills Required/Preferred:
- GitHub Actions
- Containerfiles
- JavaScript frameworks
Possible Mentors:
Julius von Kohout Valentina Rodriguez Sosa Griffin Sullivan
Difficulty: Medium
Expected Size of the Project: 175+ hours
Project 3
Project Title: Make our service mesh rootless by default and provide secure model inference by default.
Component: Kubeflow
Detailed Description:
Secure our service mesh with istio-cni by default kubeflow/manifests#2907 and an option for istio-ambient mesh kubeflow/manifests#2676 with zero overhead namespaces (single waypoint proxy deployment). Furthermore we need to enable a secure Kserve by default kubeflow/manifests#2811. A general plan is also available at kubeflow/manifests#2528
For example one subtask for the ambient mesh would be:
- Controllers to create HTTPRoute and AuthorizationPolicies, that align with way-point proxies
- Manifests to also have a flavour of HTTPRoute and updated AuthorizationPolicies
Expected Outcomes:
- A secure cni/ambient mesh with waypoint proxies only in the istio-system namespace for zero overhead namespaces. Just the same as the detailed description.
Skills Required/Preferred:
- GitHub and GitHub Actions
- Kubernetes and networking
- Istio, Kustomize
Possible Mentors:
@juliusvonkohout @kimwnasptd
Difficulty: small
Expected Size of the Project: 175 hours
Project 4
Title: Deploying Kubeflow with Helm
Component
Kubeflow
Description:
Over the past several KubeCons, the number one issue from users has been the difficulty of deploying Kubeflow and the heavy dependency on various distributions for a simple setup. Currently, distributions use a mix of:
- Manifests (some written and maintained separately)
- Operators (managed outside the Kubeflow project repositories)
- Templating tools (Helm, Kustomize, etc.)
To improve accessibility and enable users to self-manage their own Kubeflow deployments, a community-driven Helm chart is being developed for Kubeflow v1.9, with an eventual upgrade path to v1.10. This project is designed to “cross the finish line” for the Kubeflow Helm Chart, ensuring it is a fully functional deployment method.
As part of this initiative, we aim to:
- Complete a working Helm Chart for Kubeflow 1.9
- Ensure KServe and Knative integrations are included
- Document the chart’s structure and configuration
- Create a deployment walkthrough tutorial for a Kubernetes distribution of choice
By working on this project, contributors will improve the ability for end users to deploy Kubeflow in their own environments without relying on third-party distributions, making it easier to evaluate, customize, and scale.
Why This Project Matters?
For many enterprises and researchers, deploying Kubeflow remains a major challenge. By completing the official Kubeflow Helm Chart, this project will:
- Reduce dependency on custom distributions
- Lower the barrier to entry for new users
- Enable self-managed deployments for evaluation and production
- Foster stronger adoption of Kubeflow in the open-source community
🚀 Be part of the effort to make Kubeflow easier and more accessible!
Key Objectives:
1. Validate and Deploy a Kubeflow Helm Chart
Define the Chart Structure
- Organize Helm files (
templates/
,values.yaml
,Chart.yaml
,crds/
, etc.). - Follow Helm best practices for maintainability and clarity.
Define Core Kubeflow Components
- Model deployment for essential Kubeflow components (e.g., Pipelines, Katib, Notebooks, Istio).
- Define CRDs, if applicable, under
crds/
. - Write Deployment, Service, Ingress, and ConfigMap templates.
- Configure Persistence (PVCs, VolumeClaims) for storage-based components.
- Implement RBAC and authentication for security.
Parameterize Configuration
- Use
values.yaml
to enable modular configurations. - Implement Helm templating (
{{ .Values.foo }}
) for flexibility. - Provide default configurations and allow for overrides.
Handle Dependencies (Optional)
- Define external dependencies (e.g., Istio, cert-manager) in
requirements.yaml
. - Evaluate subcharts for modular deployments.
2. Document Helm Chart and Configuration Values
- Write detailed documentation for
values.yaml
settings. - Describe deployment requirements (e.g., Kubernetes version, resource limits).
- Include troubleshooting guidance for common issues.
3. Provide a Tutorial for Deploying the Helm Chart
- Select a Kubernetes distribution (e.g., Kind, Minikube, EKS, GKE, AKS).
- Develop a step-by-step deployment guide with verification steps.
- Include sample configurations for different use cases (e.g., minimal vs. full deployment).
- Document upgrade paths (
helm upgrade
and rollback strategies).
Expected Outcomes:
✅ A fully functional Kubeflow Helm chart for 1.9 that integrates KServe and Knative
✅ Published Helm chart documentation with clear configuration options
✅ A step-by-step tutorial that simplifies Kubeflow deployment for users
✅ Contribution to the Kubeflow community effort for Helm-based installation
Skills Required:
✔️ Experience with Kubernetes & Helm
✔️ Understanding of Kubeflow architecture
✔️ Familiarity with Helm templating (values.yaml
, Chart.yaml
)
✔️ YAML & Kubernetes manifests expertise
Nice to Have:
➕ Experience with KServe & Knative
➕ Knowledge of RBAC, authentication & security policies
➕ Understanding of Kubernetes networking & ingress controllers
Mentors:
- Chase Cristensen
- Valentina Rodriguez Sosa
- Julius von Kohout
Resources & References:
- Kubeflow Helm Chart Effort: GitHub Repo
- Kubeflow Documentation: Kubeflow.org
- Helm Documentation: Helm.sh
Estimated Time Commitment:
350 hours
Project 5
Title: Enhancing Kubeflow Data Science Experience with Jupyter-Centric Tools
Component
Kubeflow IDE / Kubeflow Jupyter Extension (IDE Working Group)
Description & Outcomes
This project is dedicated to elevating the data science workflow within Kubeflow by creating a suite of tools and extensions that deliver a seamless, Jupyter-native user experience. The work begins with reviving and modernizing the existing Kale tool – ensuring compatibility with the latest Jupyter and Kubeflow Pipelines (KFP v2) standards – and then, optionally and depending on candidate progress, extends those capabilities by introducing new functionality. These enhancements include interactive pipeline visualizations and additional integrations, all designed to simplify and streamline the development user experience within Kubeflow.
Phase 1: Reviving and Modernizing Kale
- Audit & Dependency Update:
- Create a detailed map of the current Kale features and capabilities.
- Update dependencies to resolve known CVEs and deprecated modules.
- Ensure full compatibility with the latest versions of Jupyter (e.g., JupyterLab 4.x) and Kubeflow Pipelines (KFP v2).
- API Refactoring & Integration:
- Refactor internal APIs to align with KFP v2 and modern Jupyter environments.
- Develop integration tests and demo workflows that validate the end-to-end execution of pipelines from within a Jupyter notebook.
- Migration
- Migrate the updated Kale codebase into a new Jupyter-centric tools repository under the Kubeflow GitHub organization repository.
Phase 2: (optional) Extending Capabilities with Jupyter-Centric Enhancements
- Interactive Pipeline Visualization Editor:
- Develop an interactive pipeline visualization editor that mirrors the Kubeflow Pipelines dashboard.
- Integrate the visualization tool into the Jupyter environment, enabling users to monitor, modify, and interact with their pipelines in real time.
- Enhanced User Experience:
- Revamp the user interface to follow Jupyter’s native design, improving usability for data scientists.
- Unified Access via a Python SDK:
- Build a lightweight Python SDK that simplifies the configuration and access to multiple Kubeflow components (e.g., Pipelines, Training Operator & Katib, Model Registry, Model Serving, Feast).
- Ensure that the SDK offers a “one-stop-shop” experience, reducing dependency and configuration overhead for data scientists.
Phase 3: Consolidation and Final Documentation
- Comprehensive Documentation & Tutorials:
- Prepare detailed user guides, developer documentation, and tutorials covering the entire enhanced workflow.
- Prepare and deliver a live demo to the Kubeflow community that illustrates the new data science experience within Kubeflow.
- Community Engagement & Feedback:
- Engage with the Kubeflow community for feedback during the development process
- Iterate on features based on user input and testing to ensure a smooth, production-ready experience.
Additional Notes:
- The project has high exposure within both the Kubeflow community and the Jupyter community, offering opportunities for broad collaboration and visibility.
- This project is anticipated to significantly enhance the user-friendliness and overall adoption of Kubeflow. Its outputs are highly anticipated by Kubeflow practitioners.
- Despite the project encompassing a wide range of deliverables, it will be considered successful even if only a portion of the work is completed. The student will not be working in isolation – other Kubeflow contributors will actively assist and collaborate throughout the project.
Skills Required
- Proficiency in Python for backend development and API integration.
- Experience with JavaScript/TypeScript for frontend development.
- Experience with modern UI frameworks (e.g., React, Jupyter widgets) is a plus.
- Familiarity with Jupyter Notebook, JupyterLab, and extension development.
- Experience using Git/GitHub and contributing to community-driven projects.
Mentors
@ederign @StefanoFioravanzo
Difficulty
Medium
GitHub Issues References
Expected Size of the Project
350 Hours
Project 6
Title: Enhancing Kubeflow with Batch Processing Gateway for Scalable and Efficient Multi-Cluster Spark Job Management
Component:
- Kubeflow
- Batch Processing Gateway
- Apache Spark Operator
- Kubeflow SDK proposal
Description:
Managing Apache Spark jobs in a multi-cluster Kubernetes environment presents several challenges, including inefficient workload distribution, performance bottlenecks, and debugging complexities. Kubeflow users working with large-scale data processing and machine learning workflows require a streamlined, cloud-native experience that optimizes Spark job execution, monitoring, and debugging. This project proposes integrating the Batch Processing Gateway (BPG) with Kubeflow to enable an efficient, scalable, and centralized approach for submitting, monitoring, and managing Spark applications across multiple clusters. By enhancing the API and user experience, this integration will provide seamless job routing, performance tracking, and debugging capabilities while reducing the computational burden on Kubeflow notebooks.
By undertaking this project, contributors will play a crucial role in enhancing the performance, efficiency, and usability of Spark applications within the Kubeflow ecosystem, thereby addressing current challenges in multi-cluster job management.
Why This Project?
By integrating Batch Processing Gateway with Kubeflow Notebooks, this project provides a cloud-native, scalable, and user-friendly solution for Spark job execution, debugging. It will significantly improve performance, efficiency, and developer experience, enabling ML practitioners and data engineers to focus on experimentation and optimization without struggling with job management complexities.
Batch Processing Gateway can help enterprise environments, managing Apache Spark jobs across multiple Kubernetes clusters presents challenges such as inefficient job distribution, performance bottlenecks, and complex workload balancing. Integrating the Batch Processing Gateway (BPG) into Kubeflow aims to address these issues by providing a centralized mechanism for submitting, monitoring, and managing Spark applications across various clusters.
Key Objectives:
- Analyse, Design, Plan, and Execute Spark Job Execution Strategies:
- Evaluate the trade-offs between running a Spark kernel directly within a Kubeflow Notebook versus leveraging the Batch Processing Gateway for job submission.
- Assess the cloud-native design of Kubeflow SDK and Notebook environments to determine the optimal approach for Spark integration that maximizes efficiency, scalability, and usability.
- Make a well-informed decision on whether to support Spark kernels within notebooks, use BPG, or implement a hybrid approach for an enhanced user experience.
- Automated Job Routing and Scalable Execution:
- Implement dynamic workload routing using BPG to automatically distribute Spark jobs based on cluster load, resource availability, and workload priority.
- Integrate with the Spark Operator to optimize resource allocation, minimize execution delays, and ensure efficient scaling for petabyte-scale machine learning and data processing workloads.
- Enhanced User API and Notebook Integration:
- Develop a Python SDK for Kubeflow notebooks, enabling users to submit, manage, and monitor Spark jobs via BPG REST APIs for a lightweight, scalable solution.
- Ensure a seamless user experience by providing intuitive APIs that abstract complex job management operations, making it easier for data scientists and ML engineers to experiment and iterate on workflows within
- Comprehensive Debugging and Performance Monitoring:
- Enable full debugging capabilities by integrating Spark UI, logging, and monitoring tools into Kubeflow, allowing users to visualize Spark DAGs, tasks, and execution stages.
- Implement centralized logging and Prometheus-based monitoring to provide real-time insights into Spark job performance across clusters.
- Ensure users can efficiently analyze job execution, detect bottlenecks, and optimize data processing and ML workflows within Kubeflow.
- Note: Most of the logging APIs must be leveraged out of the box from either BPG or Spark - but we need to document, show case examples to user.
Mentors expecting GSoC project execution plan - phase wise with weekly milestone that student want to achieve.
Expected Outcomes:
- A robust integration of BPG into the Kubeflow ecosystem, enabling efficient Spark job management across multiple clusters.
- A Python SDK compatible with Kubeflow notebooks to facilitate Spark job submission through BPG, offering a scalable solution that reduces notebook load and enhances performance.
- Comprehensive documentation and user guides to assist users in leveraging the new features effectively.
Skills Required/Preferred:
**Curious to learn, design, develop and having below skillsets will be plus : **
- Proficiency in Python, Java and familiarity with developing SDKs.
- Experience with Kubernetes and managing containerized applications.
- Understanding of Apache Spark and its deployment on Kubernetes clusters.
- Familiarity with RESTful API development and integration.
- Experience with monitoring tools and logging frameworks is a plus.
Mentors:
[Shekhar Prasad Rajak], [Software Development Engineer, Apple Inc], [Github: @Shekharrajak , Mail: shekharrajak@live.com] [Chaoran Yu], [Mentor Title/Position], [Github: @yuchaoran2011] [Andrey Velichkevich], [Mentor Title/Position], [Github @andreyvelich]
Difficulty:
Medium to Hard
GitHub Issue:
Enhancing Kubeflow with Batch Processing Gateway for Efficient Multi-Cluster Spark Job Management spark-operator#2422
Expected Project Size:
350 hours
Project 7
Title: Enable GPU Testing for LLM Blueprints in Kubeflow Trainer
Component: Kubeflow Trainer
Tracking issue: kubeflow/trainer#2432
Component: Kubeflow Trainer
Skills required:
- GitHub Actions
- Kubernetes
- PyTorch
- Python
Difficulty: hard
Mentors:
- Andrey Velichkevich,
- Valentina Rodriguez Sosa
Length: 350 hrs
Project 8
Title: Support JAX and/or TensorFlow Training Runtimes in Kubeflow Trainer
Component: Kubeflow Trainer
Tracking issue: TBD
Skills required:
- Go
- Kubernetes
- JAX
- TensorFlow
Difficulty: medium
Mentors:
- Andrey Velichkevich
- TBD
Length: 350 hrs
Project 9
Title: Support Kubernetes Sidecars for Katib Metrics Collectors
Component: Kubeflow Katib
Description
Open issue: kubeflow/katib#2181
Katib implements Pull-based Metrics Collector as a sidecar container to collect training metrics from the Trials once training is complete. However, the Pull-based Metrics Collector has some problems. For example, the Trial will fail if the training container is finished before Metrics Collector is started. After v1.28, Kubernetes add native support for sidecar containers as part of KEP 753, which allows us to launch sidecar containers before the training container. In order to address the problems with the Pull-based Metrics Collector, we need to support Kubernetes Sidecar Containers in Katib.
Expected Outcomes
Successfully integrate Katib Metrics Collectors with Kubernetes Sidecars
Provide some unit tests and e2e tests
Skills Required/Preferred:
- Kubernetes
- Go
- YAML
- Python
Mentors @Electronic-Waste , Andrey Velichkevich, TBD
Difficulty: Medium
Expected Size of the Project: 350 hours
Project 10
Title: Export Fine-Tuned LLM to Model Registry
Component: Kubeflow Trainer / Model Registry
Tracking issue: TBD
Description:
Trainer has implemented initializers for model and dataset, and will support model exporter in the future. By supporting the model registry as one of the destinations of the exporter, Trainer will integrate with Kubeflow ecosystem more deeply.
Skills required:
- Kubernetes
- Go
- YAML
- Python
Difficulty: hard
Length: 350 hrs
Mentors:
- Shao Wang, TBD
Project 11
Title: Add Volcano scheduler support in Trainer
Component: Kubeflow Trainer
Tracking issue: TBD
Description
Currently, Trainer does not support Volcano for scheduling. Since Volcano is a widely adopted scheduler for AI workloads, it could provide Trainer with more AI-specific scheduling capabilities if we integrate Volcano into Trainer
Skills required:
- Kubernetes
- Go
- Volcano
Difficulty: hard
Length: 350 hrs
Mentors:
- Shao Wang, TBD
Project 12: PostgreSQL integration in Kubeflow Pipelines
Component: Pipelines
Open issue: https://github.com/kubeflow/pipelines/issues/9813
Kubeflow Pipelines must store information about pipelines, experiments, runs, and artifacts in a database. Currently, the only database it supports is MySQL/MariaDB.
We plan to support PostgreSQL as an alternative to MySQL/MariaDB so users will be able to reuse existing databases, and PostgreSQL will be a good use case for supporting multiple databases.
Skills required:
- Kubernetes
- Python
- Go
- YAML
Difficulty: medium
Mentors: Ricardo Martinelli, Shivay Lamba
Length: 175 hrs
More projects being ideated here
https://github.com/kubeflow/community/issues/809
Feedback
Was this page helpful?
Thank you for your feedback!
We're sorry this page wasn't helpful. If you have a moment, please share your feedback so we can improve.