Loading…
Attending this event?
Cloud Native AI + Kubernetes Day clear filter
Tuesday, November 12
 

9:10am MST

SkyRay: Seamlessly Extending KubeRay to Multi-Cluster Multi-Cloud Operation - Anne Holler, Elotl
Tuesday November 12, 2024 9:10am - 9:35am MST
Ray is a unified framework for scaling AI applications from a laptop to a cluster. KubeRay supports the creation, deletion, and scaling of Ray clusters on K8s, along with managing Ray jobs and services on the Ray clusters. This talk introduces SkyRay, in which KubeRay is extended towards the Sky computing model via interoperation with a multi-cluster fleet manager. With SkyRay, each Ray cluster is seamlessly scheduled onto a cloud K8s cluster suited to the Ray cluster's resource needs and policy requirements. The policies can capture a variety of cluster characteristics, e.g., desired cloud provider, region, K8s version, service quality, and GPU type availability. Fleet manager policy updates can be used to trigger automatic migration of Ray clusters between K8s workload clusters. The talk presents several example use cases for SkyRay, including cluster selection for resource needs, service availability, development vs production cluster configuration, and K8s version upgrade.
Speakers
avatar for Anne Holler

Anne Holler

Chief Scientist, Elotl
Anne is Chief Scientist at Elotl. She is interested in resource efficiency. She worked on Uber's Michelangelo Machine Learning platform, on Velocloud's SD-WAN management, on VMware's Distributed Resource Schedulers for servers and storage, on performance analysis for VMware, on Transmeta's... Read More →
Tuesday November 12, 2024 9:10am - 9:35am MST
Salt Palace | Level 1 | Grand Ballroom AC

10:40am MST

Multitenancy and Fairness at Scale with Kueue: A Case Study - Aldo Culquicondor, Google & Rajat Phull, Apple
Tuesday November 12, 2024 10:40am - 11:05am MST
Developed by the Kubernetes community in collaboration with the ecosystem, Kueue augments k8s and ClusterAutoscaler to provide an E2E batch system. Kueue implements job queueing, deciding when jobs should wait and when they should start or be preempted, based on quotas and a hierarchy for sharing resources among teams. An exciting addition in the v0.7 release is fair sharing, designed to support large ML platforms serving multiple teams. Kueue allows platforms to model their teams and achieve a high utilization of resources, while sharing cost and providing equitative access to unused resources. Teams can always reclaim their guaranteed quotas via preemption. The Kueue v0.7 and the Kubernetes v1.31 releases also include performance optimizations to achieve high throughput. In this talk, you will learn about the challenges faced during design and implementation of fair sharing and preemption, about this system running in production, and the plans to support complex hierarchies.
Speakers
avatar for Aldo Culquicondor

Aldo Culquicondor

Sr. Software Engineer, Google
Aldo is a Senior Software Engineer at Google. He works on Kubernetes and Google Kubernetes Engine, where he contributes to kube-scheduler, the Job API and other features to support batch, AI/ML and HPC workloads. He is currently a TL at SIG Scheduling and an active member of WG Batch... Read More →
avatar for Rajat Phull

Rajat Phull

Engineering Manager, Apple
Rajat Phull is an Engineering Manager at Apple. He works in Machine Learning Platform team with a focus on GPU resource management, and ML training orchestration at scale using Kubernetes.
Tuesday November 12, 2024 10:40am - 11:05am MST
Salt Palace | Level 1 | Grand Ballroom AC

11:15am MST

LLM Powered Agents with Kubernetes - Hema Veeradhi & Shrey Anand, Red Hat
Tuesday November 12, 2024 11:15am - 11:40am MST
How would you build an LLM system to modify a Kubernetes deployment based on its live telemetry data stream? A vanilla LLM is not enough to solve this problem as it is limited to outdated training data and is prone to hallucinations. In this talk, we will explore the concept of Agents—a powerful framework for solving complex multi-level tasks using a LLM as its reasoning engine, supported by a suite of tools. These tools can be advanced calculators, real time web scrapers, domain knowledge extractors, etc. They include executable functions, RAG pipelines, APIs or other services that allow the agents to complete their tasks effectively. We will walk-through a demo that leverages Kubernetes services and Podman containerization techniques that enable the agent workflow. Attendees will learn how a Kubernetes based agent framework enhances the performance capabilities of LLMs, offering a scalable and autonomous solution for next-generation intelligent systems.
Speakers
avatar for Shrey Anand

Shrey Anand

Mr., Red Hat
Shrey Anand is a data scientist with over five years of experience in the field of AI / ML. He collaborates with the emerging technologies at Red Hat where he develops cutting-edge data science solutions to solve open source and business problems. As a strong advocate of open source... Read More →
avatar for Hema Veeradhi

Hema Veeradhi

Principal Data Scientist, Red Hat
Hema Veeradhi is a Principal Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch... Read More →
Tuesday November 12, 2024 11:15am - 11:40am MST
Salt Palace | Level 1 | Grand Ballroom AC

11:50am MST

From Supercomputing to Serving: A Case Study Delivering Cloud Native Foundation Models - Autumn Moulder, Cohere
Tuesday November 12, 2024 11:50am - 12:15pm MST
Cloud native takes on new meaning in the AI and HPC domains. What does cloud native mean when your software is tightly coupled to hardware? When capacity is fixed, which assumptions start to break down? How can you flex GPUs batch training workloads and inference? Join us for a case study, demonstrating how a small team scaled ML infrastructure from a single cloud to multiple clusters across 4 cloud providers - in under 6 months. We’ll share unique multi-cloud challenges we uncovered around supercomputing infrastructure, cross cloud networking, capacity & quota management, batch workloads, FinOps, and observability. We will particularly highlight our experience using Kueue to manage fixed capacity across clouds & where Kubernetes still falls short for HPC workloads. Leave with a solid understanding of what it takes for an infrastructure team to support the lifecycle of a cloud native foundation model.
Speakers
avatar for Autumn  Moulder

Autumn Moulder

Director of Infrastructure & Security, Cohere
Autumn is the Director of Infrastructure & Security at Cohere. She’s been with the company since September 2022 scaling teams & tools. Prior to buying into the startup life, she spent 3 years in financial services and 14 years at a large non-profit. Her passion is helping innovative... Read More →
Tuesday November 12, 2024 11:50am - 12:15pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:00pm MST

⚡ Lightning Talk: Transform Your Kubernetes Cluster Into a GenAI Platform: Get Ready-to-Use LLM APIs Today! - Kenji Kaneda, CloudNatix Inc.
Tuesday November 12, 2024 5:00pm - 5:10pm MST
Are you eager to fine-tune LLMs and run inference directly within your Kubernetes clusters? Do you want an API compatible with OpenAI to leverage the extensive GenAI ecosystem? If so, LLM Operator (https://llm-operator.readthedocs.io) is what you need. It instantly builds a software stack that provides an OpenAI-compatible API for inference, fine-tuning, and model management. In this talk, we'll provide an overview of the LLM Operator and showcase its capabilities through practical use cases. Join us to learn how you can leverage LLM Operator to enhance Kubernetes for your Generative AI workflows.
Speakers
avatar for Kenji Kaneda

Kenji Kaneda

Chief Architect, CloudNatix Inc.
Kenji is a chief architect at CloudNatix and has been working on large-scale distributed systems - especially cluster management systems - for over ten years. Most recently, he was a Principal Engineer at Nvidia, responsible for developing their deep learning training platform and... Read More →
Tuesday November 12, 2024 5:00pm - 5:10pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:15pm MST

⚡ Lightning Talk: Cost Saving Strategies for Interactive AI Development - Shravan Achar, Apple
Tuesday November 12, 2024 5:15pm - 5:25pm MST
The interactive nature of Jupyter notebooks has made them indispensable tools for data scientists and AI researchers, facilitating exploratory data analysis, prototyping, and model development. However, managing the cost of resource-intensive computations at different stages of AI/ML lifecycle presents significant challenges. We leveraged Apache YuniKorn to design a resource management system tailored for notebook workloads, which incorporates fair sharing, user-specific policies and budget constraints to allocate computational resources efficiently while adapting for both data preparation and model training stages. And thanks to the extensibility of JupyterLab, we offer rich displays next to the Notebook enabling data scientists to introspect resource usage in real time. This session presents cost saving strategies for interactive development on Jupyter using Kubeflow for model training and Spark for data preparation with YuniKorn scheduler.
Speakers
avatar for Shravan Achar

Shravan Achar

Shravan Achar, Sr. Software Engineer, Apple, Apple
Shravan is a senior software engineer at Apple with a passion for open source technologies. With a background in Mathematics and Computer Science, their current interests include MLOps, Scheduling in AI and Jupyter Notebooks.
Tuesday November 12, 2024 5:15pm - 5:25pm MST
Salt Palace | Level 1 | Grand Ballroom AC
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • AppDeveloperCon
  • ArgoCon
  • BackstageCon
  • Breaks
  • Cilium + eBPF Day
  • Cloud Native AI + Kubernetes Day
  • Cloud Native StartupFest
  • Cloud Native University
  • Data on Kubernetes Day
  • EnvoyCon
  • Istio Day
  • Kubernetes on Edge Day
  • Observability Day
  • OpenFeature Summit
  • OpenTofu Day
  • Platform Engineering Day
  • WasmCon