Loading…
Attending this event?
Tuesday, November 12
 

9:00am MST

9:10am MST

SkyRay: Seamlessly Extending KubeRay to Multi-Cluster Multi-Cloud Operation - Anne Holler, Elotl
Tuesday November 12, 2024 9:10am - 9:35am MST
Ray is a unified framework for scaling AI applications from a laptop to a cluster. KubeRay supports the creation, deletion, and scaling of Ray clusters on K8s, along with managing Ray jobs and services on the Ray clusters. This talk introduces SkyRay, in which KubeRay is extended towards the Sky computing model via interoperation with a multi-cluster fleet manager. With SkyRay, each Ray cluster is seamlessly scheduled onto a cloud K8s cluster suited to the Ray cluster's resource needs and policy requirements. The policies can capture a variety of cluster characteristics, e.g., desired cloud provider, region, K8s version, service quality, and GPU type availability. Fleet manager policy updates can be used to trigger automatic migration of Ray clusters between K8s workload clusters. The talk presents several example use cases for SkyRay, including cluster selection for resource needs, service availability, development vs production cluster configuration, and K8s version upgrade.
Speakers
avatar for Anne Holler

Anne Holler

Chief Scientist, Elotl
Anne is Chief Scientist at Elotl. She is interested in resource efficiency. She worked on Uber's Michelangelo Machine Learning platform, on Velocloud's SD-WAN management, on VMware's Distributed Resource Schedulers for servers and storage, on performance analysis for VMware, on Transmeta's... Read More →
Tuesday November 12, 2024 9:10am - 9:35am MST
Salt Palace | Level 1 | Grand Ballroom AC

9:45am MST

Attack, Defense & Danger in the Age of AI - Shane Lawrence, Shopify
Tuesday November 12, 2024 9:45am - 10:10am MST
The transformers revolution has spurred a race between hackers looking for an edge and security teams looking for leverage, while organizations of all kinds rush to make use of this new technology with little awareness of how it works and how it could be used against them. In this talk, Shane will describe some of the ways that AI is being used by attackers, countermeasures for AI attacks, opportunities for AI to mitigate conventional attacks, and how AI-powered services might be used against their owners. He’ll show a live demo combining these concepts. Attendees will learn about the AI-related risks they face and techniques for managing those risks.
Speakers
avatar for Shane Lawrence

Shane Lawrence

Senior Staff Infrastructure Security Engineer, Shopify
Shane is a Senior Staff Infrastructure Security Engineer at Shopify, where he's working on a multi-tenant platform that allows developers to securely build scalable apps and services for crafters, entrepreneurs, and businesses of all sizes.
Tuesday November 12, 2024 9:45am - 10:10am MST
Salt Palace | Level 1 | Grand Ballroom AC

10:15am MST

Sponsored Keynote: Advancing Cloud Native AI Innovation Through Open Collaboration - Yuan Tang, Red Hat
Tuesday November 12, 2024 10:15am - 10:20am MST
In the rapidly evolving field of AI, innovation flourishes through the open exchange of ideas, resources, and knowledge. In this keynote, we will delve into Red Hat’s journey in cloud native and AI, showcasing our community-driven efforts and initiatives that promote a culture of open collaboration within the cloud native AI ecosystem. We invite you to join us in this collaborative effort and explore opportunities to contribute to and benefit from a vibrant community.
Speakers
avatar for Yuan Tang

Yuan Tang

Principal Software Engineer, Red Hat
Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes. He's also a maintainer and... Read More →
Tuesday November 12, 2024 10:15am - 10:20am MST
Salt Palace | Level 1 | Grand Ballroom AC

10:20am MST

AM Break 2
Tuesday November 12, 2024 10:20am - 10:40am MST
Tuesday November 12, 2024 10:20am - 10:40am MST

10:30am MST

AM Break 4
Tuesday November 12, 2024 10:30am - 10:40am MST
Tuesday November 12, 2024 10:30am - 10:40am MST

10:40am MST

Multitenancy and Fairness at Scale with Kueue: A Case Study - Aldo Culquicondor, Google & Rajat Phull, Apple
Tuesday November 12, 2024 10:40am - 11:05am MST
Developed by the Kubernetes community in collaboration with the ecosystem, Kueue augments k8s and ClusterAutoscaler to provide an E2E batch system. Kueue implements job queueing, deciding when jobs should wait and when they should start or be preempted, based on quotas and a hierarchy for sharing resources among teams. An exciting addition in the v0.7 release is fair sharing, designed to support large ML platforms serving multiple teams. Kueue allows platforms to model their teams and achieve a high utilization of resources, while sharing cost and providing equitative access to unused resources. Teams can always reclaim their guaranteed quotas via preemption. The Kueue v0.7 and the Kubernetes v1.31 releases also include performance optimizations to achieve high throughput. In this talk, you will learn about the challenges faced during design and implementation of fair sharing and preemption, about this system running in production, and the plans to support complex hierarchies.
Speakers
avatar for Aldo Culquicondor

Aldo Culquicondor

Sr. Software Engineer, Google
Aldo is a Senior Software Engineer at Google. He works on Kubernetes and Google Kubernetes Engine, where he contributes to kube-scheduler, the Job API and other features to support batch, AI/ML and HPC workloads. He is currently a TL at SIG Scheduling and an active member of WG Batch... Read More →
avatar for Rajat Phull

Rajat Phull

Engineering Manager, Apple
Rajat Phull is an Engineering Manager at Apple. He works in Machine Learning Platform team with a focus on GPU resource management, and ML training orchestration at scale using Kubernetes.
Tuesday November 12, 2024 10:40am - 11:05am MST
Salt Palace | Level 1 | Grand Ballroom AC

11:15am MST

LLM Powered Agents with Kubernetes - Hema Veeradhi & Shrey Anand, Red Hat
Tuesday November 12, 2024 11:15am - 11:40am MST
How would you build an LLM system to modify a Kubernetes deployment based on its live telemetry data stream? A vanilla LLM is not enough to solve this problem as it is limited to outdated training data and is prone to hallucinations. In this talk, we will explore the concept of Agents—a powerful framework for solving complex multi-level tasks using a LLM as its reasoning engine, supported by a suite of tools. These tools can be advanced calculators, real time web scrapers, domain knowledge extractors, etc. They include executable functions, RAG pipelines, APIs or other services that allow the agents to complete their tasks effectively. We will walk-through a demo that leverages Kubernetes services and Podman containerization techniques that enable the agent workflow. Attendees will learn how a Kubernetes based agent framework enhances the performance capabilities of LLMs, offering a scalable and autonomous solution for next-generation intelligent systems.
Speakers
avatar for Shrey Anand

Shrey Anand

Mr., Red Hat
Shrey Anand is a data scientist with over five years of experience in the field of AI / ML. He collaborates with the emerging technologies at Red Hat where he develops cutting-edge data science solutions to solve open source and business problems. As a strong advocate of open source... Read More →
avatar for Hema Veeradhi

Hema Veeradhi

Principal Data Scientist, Red Hat
Hema Veeradhi is a Principal Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch... Read More →
Tuesday November 12, 2024 11:15am - 11:40am MST
Salt Palace | Level 1 | Grand Ballroom AC

11:50am MST

From Supercomputing to Serving: A Case Study Delivering Cloud Native Foundation Models - Autumn Moulder, Cohere
Tuesday November 12, 2024 11:50am - 12:15pm MST
Cloud native takes on new meaning in the AI and HPC domains. What does cloud native mean when your software is tightly coupled to hardware? When capacity is fixed, which assumptions start to break down? How can you flex GPUs batch training workloads and inference? Join us for a case study, demonstrating how a small team scaled ML infrastructure from a single cloud to multiple clusters across 4 cloud providers - in under 6 months. We’ll share unique multi-cloud challenges we uncovered around supercomputing infrastructure, cross cloud networking, capacity & quota management, batch workloads, FinOps, and observability. We will particularly highlight our experience using Kueue to manage fixed capacity across clouds & where Kubernetes still falls short for HPC workloads. Leave with a solid understanding of what it takes for an infrastructure team to support the lifecycle of a cloud native foundation model.
Speakers
avatar for Autumn  Moulder

Autumn Moulder

Director of Infrastructure & Security, Cohere
Autumn is the Director of Infrastructure & Security at Cohere. She’s been with the company since September 2022 scaling teams & tools. Prior to buying into the startup life, she spent 3 years in financial services and 14 years at a large non-profit. Her passion is helping innovative... Read More →
Tuesday November 12, 2024 11:50am - 12:15pm MST
Salt Palace | Level 1 | Grand Ballroom AC

12:20pm MST

⚡ Lightning Talk: Charm++ on Kubernetes Cloud - Kavitha Chandrasekar, University of Illinois at Urbana-Champaign
Tuesday November 12, 2024 12:20pm - 12:30pm MST
In this talk, we will detail the use of Kubernetes operators to run HPC applications using Charm++ runtime system on Kubernetes cluster on cloud. Charm++ is an adaptive intelligent runtime system that provides capabilities such as dynamic load balancing, energy optimizations, and communication optimizations, in addition to support for resource elasticity. It is a well-established system in the HPC world, and supports highly scalable applications such as NAMD for biomolecular simulations. We will talk about capabilities added to the setup like job malleability utilizing the shrink and expand feature in Charm++ jobs and by changing the number of pods assigned to a job at run-time. We will demonstrate effectiveness of shrink/expand operations for different scheduling policies and quantify the associated overhead. Charm++ has recently added support for python-based framework, Charm4py, for python codes for HPC. We will also talk about running Charm4Py applications on Kubernetes.
Speakers
avatar for Kavitha Chandrasekar

Kavitha Chandrasekar

Graduate Student, University of Illinois at Urbana-Champaign
Kavitha Chandrasekar is a PhD student with the Parallel Programming Lab in the CS department at the University of Illinois at Urbana-Champaign. Her research interests are in implementing adaptive mechanisms within the HPC runtime system, Charm++, with a focus on performance on modern... Read More →
Tuesday November 12, 2024 12:20pm - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom AC

12:35pm MST

Lunch Break 3
Tuesday November 12, 2024 12:35pm - 1:30pm MST
Tuesday November 12, 2024 12:35pm - 1:30pm MST

1:30pm MST

Dressing-up Your Cluster for AI in Minutes with a Portable Network CR - Sunyanan Choochotkaew & Tatsuhiro Chiba, IBM Research
Tuesday November 12, 2024 1:30pm - 1:55pm MST
Kubernetes network overhead and complexity is one of the impediments of Cloud adoption for AI, especially when considering using multiple networks to boost bandwidth for distributed tasks. Defining a network configuration for secondary interfaces in a static way is not a trivial task for platform engineers to meet the distinctive demands of heterogeneity and scale within a virtual-private-cloud cluster. In this talk, we show how deploying a single portable custom resource can play a significant role in transforming a VPC cluster into a supercomputer tailored for AI workloads. We share our journey of the Multi-NIC CNI project and demonstrate the benefit of seamlessly enabling dynamicity in network attachment definitions via practical use cases, along with outlining future directions towards the related open source projects like Multus, Node Resource Interface (NRI), Dynamic Resource Allocation (DRA), and Kubernetes Networking Interface (KNI).
Speakers
avatar for Tatsuhiro Chiba

Tatsuhiro Chiba

Senior Technical Staff Member, IBM Research
Tatsuhiro Chiba is a STSM and Manager at IBM Research, specialized in performance optimization and acceleration of large scale AI and HPC workloads on Hybrid Cloud. He is leading a project to enhance OpenShift performance and sustainability for AI and HPC by exploiting various cloud... Read More →
avatar for Sunyanan Choochotkaew

Sunyanan Choochotkaew

Staff Research Scientist, IBM Research
Sunyanan Choochotkaew is working at IBM Research - Tokyo, specializing in cloud platform optimization. She actively contributes to various open-source projects, including Kepler, Multi-NIC CNI, and CPE operator, where she holds the role of maintainer. She has also made contributions... Read More →
Tuesday November 12, 2024 1:30pm - 1:55pm MST
Salt Palace | Level 1 | Grand Ballroom AC

2:05pm MST

Boosting Training and Inference Performance via Topology-Aware Scheduling of Heterogeneous Resources - He Cao, ByteDance
Tuesday November 12, 2024 2:05pm - 2:30pm MST
As LLMs rapidly evolve, K8s’ topology management can not meet the performance demands in several aspects: 1. For new-generation high-density processors, NUMA affinity is insufficient to ensure inference performance. 2. The performance bottleneck has shifted from computation to networking. However, K8s does not consider the topology of heterogeneous resources like GPU and RDMA.

In this talk, He will introduce how ByteDance significantly improves LLM workload performance by enhancing topology-aware scheduling: 1. For nodes with high-density processors, achieve die-level affinity and implement anti-affinity between memory bandwidth-intensive pods. 2. For pods within a training job, achieve inter-RDMA affinity at the ToR level to avoid switch congestion. 3. For inference workloads, achieve GPU-RDMA affinity at PCIe switch level to enable GPUDirect RDMA for accelerated communication. 4. How we achieve job-level topology affinity based on K8s scheduler which operates at the pod level.
Speakers
avatar for He Cao

He Cao

Senior Software Engineer, ByteDance
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
Tuesday November 12, 2024 2:05pm - 2:30pm MST
Salt Palace | Level 1 | Grand Ballroom AC

2:40pm MST

Brag Your RAG with the MLOPS Swag - Madhav Sathe, Google & Jitender Kumar, publicissapient
Tuesday November 12, 2024 2:40pm - 3:05pm MST
Organizations are beginning to unlock significant value by integrating Large Language Models (LLMs) & Retrieval-Augmented Generation (RAG) into their business-critical processes. However, enterprises often face challenges in meeting the high expectations of GenAI-driven business outcomes. Bridging this gap requires meticulous planning in governance, continuous evaluation, seamless scaling, operational costs, and time-to-market. In this session, attendees will witness a live demonstration of a RAG application stack built with LangChain, Canopy, and a PostgreSQL Vector database, all deployed on Kubernetes. Additionally, we will discuss leveraging GPU and TPU accelerators to enhance computational efficiency. The audience will also gain insights into MLOps strategies for data splitting, embeddings, retrieval, and prompt engineering. Join us to explore how to effectively leverage MLOps with Kubernetes to achieve scalable and impactful GenAI solutions.
Speakers
avatar for Jitender Kumar

Jitender Kumar

Director Technology- Devops and Cloud, Publicis Sapient
20+ years of successful IT and Delivery management experience leading mission critical infrastructure, software development and implementation projects involving strategic business and technology change and providing measurable financial results for the organization. Worked with Financial... Read More →
avatar for Madhav Sathe

Madhav Sathe

Principal Architect, Google
Madhav helps major enterprises drive innovation using modern application architectures, containers and DevOps. Madhav has been a speaker at conferences such as SpringOne, Cloud Foundry Summit and Oracle OpenWorld. He has co-authored a white paper on container security. Madhav currently... Read More →
Tuesday November 12, 2024 2:40pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom AC

3:05pm MST

PM Break 1
Tuesday November 12, 2024 3:05pm - 3:20pm MST
Tuesday November 12, 2024 3:05pm - 3:20pm MST

3:20pm MST

Reproducible AI with Kubeflow, Delta Lake and Langchain - Oz Katz, Treeverse
Tuesday November 12, 2024 3:20pm - 3:45pm MST
Langchain has become one of the most popular frameworks for anyone building custom, generative AI-driven apps powered by LLMs, that leverage RAG (Retrieval-Augmented Generation) for the most enhanced results.  But like all data products, these applications are really only as good as the organizational data fed into them––and we’ve all learned the hard way that the data is oftentimes far from perfect. In this hands-on tutorial you’ll learn how to build a reproducible AI application pipeline with Kubeflow, Langchain and Delta Lake, widely adopted OSS tools in the ML & GenAI stack.  By learning how to build a RAG chatbot, while iteratively tuning it for best results leveraging Delta Lake’s temporal versions, you’ll come away with improved methods for data reproducibility for your custom AI apps, that provide better data quality, alongside an improved user experience for your application users.
Speakers
avatar for Oz Katz

Oz Katz

Co-Creator, lakeFS Open Source, CTO & Co-Founder, Treeverse
Oz Katz is the CTO and Co-Creator of the open source lakeFS Project, an open source platform that delivers resilience and manageability to object-storage based data lakes. Oz engineered and maintained petabyte-scale data infrastructure at analytics giant SmilarWeb, which he joined... Read More →
Tuesday November 12, 2024 3:20pm - 3:45pm MST
Salt Palace | Level 1 | Grand Ballroom AC

3:55pm MST

Inference on Streaming Data at Scale at Intuit - Sri Harsha Yayi & Vigith Maurice, Intuit
Tuesday November 12, 2024 3:55pm - 4:20pm MST
At Intuit, ML teams faced challenges with processing and running inference on high throughput streaming data. Connecting to various messaging systems like Kafka, Pulsar, and SQS proved to be a time-consuming and intricate process. Moreover, our ML teams required the ability to perform intermediate processing and execute inference as part of their workflows. To further complicate, scaling the processing and inference based on the volume of events introduced additional challenges. Based on challenges, we created Numaflow, a K8s native open-source platform for scalable event processing. It simplifies connecting to event sources, enables teams to do event processing and inference on streaming data without a learning curve, and integrates seamlessly with existing systems. This talk is for ML engineers, data scientists, and those interested in asynchronous inference on streaming data. We'll show how Numaflow overcomes obstacles and streamlines inference on streaming data
Speakers
avatar for Vigith Maurice

Vigith Maurice

Principal Engineer, Intuit
Vigith is a co-creator of Numaproj and Principal Software Engineer for the Intuit Core Platform team in Mountain View, California. One of Vigith's current day-to-day focus areas is the various challenges in building scalable data and AIOps solutions for both batch and high-throughput... Read More →
avatar for Sri Harsha Yayi

Sri Harsha Yayi

Product Manager, Intuit
Sri Harsha Yayi is a Product Manager at Intuit, where he primarily focuses on the company's Modern SaaS Kubernetes platform, specifically within the event-driven systems domain. He is the PM for Numaflow, an open-source, Kubernetes native platform designed for the development of event-driven... Read More →
Tuesday November 12, 2024 3:55pm - 4:20pm MST
Salt Palace | Level 1 | Grand Ballroom AC

4:30pm MST

Incremental GPU Slicing in Action - Abhishek Malvankar & Olivier Tardieu, IBM Research
Tuesday November 12, 2024 4:30pm - 4:55pm MST
Large language models are often released as families of models with varying parameter counts and quantization. To reduce cost, inference services increasingly rely on dynamic model selection, preferring smaller models when possible. GPU vendors are on a journey to enable dynamic GPU slicing, making it possible for a workload to request a fraction of the compute and memory units in a GPU, and for the slices to be created and destroyed on demand without disrupting existing workloads. The onus is now on Kubernetes. The Device Management Working Group is hard at work to expose these capabilities. While vendor-agnostic slicing APIs do not exist yet, this talk demonstrates that incremental GPU slicing is possible today. We replace the Multi-Instance GPU manager, which only permits partitioning GPUs in bulk, with an open-source incremental-slicing controller without needing new APIs or changes to the device plugin. Come learn how to achieve incremental slicing in your GPU clusters.
Speakers
avatar for Abhishek Malvankar

Abhishek Malvankar

Senior Software Engineer, IBM Research
Abhishek is Senior Software Engineer and Master Inventor at IBM Research. He works closely with Red Hat as Partner Engineer. He focuses on resource management, performance, and distributed computing for AI workloads in the cloud. He enjoys designing easy-to-use solutions for the cloud... Read More →
avatar for Olivier Tardieu

Olivier Tardieu

Principal Research Scientist, Manager, IBM
Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →
Tuesday November 12, 2024 4:30pm - 4:55pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:00pm MST

⚡ Lightning Talk: Transform Your Kubernetes Cluster Into a GenAI Platform: Get Ready-to-Use LLM APIs Today! - Kenji Kaneda, CloudNatix Inc.
Tuesday November 12, 2024 5:00pm - 5:10pm MST
Are you eager to fine-tune LLMs and run inference directly within your Kubernetes clusters? Do you want an API compatible with OpenAI to leverage the extensive GenAI ecosystem? If so, LLM Operator (https://llm-operator.readthedocs.io) is what you need. It instantly builds a software stack that provides an OpenAI-compatible API for inference, fine-tuning, and model management. In this talk, we'll provide an overview of the LLM Operator and showcase its capabilities through practical use cases. Join us to learn how you can leverage LLM Operator to enhance Kubernetes for your Generative AI workflows.
Speakers
avatar for Kenji Kaneda

Kenji Kaneda

Chief Architect, CloudNatix Inc.
Kenji is a chief architect at CloudNatix and has been working on large-scale distributed systems - especially cluster management systems - for over ten years. Most recently, he was a Principal Engineer at Nvidia, responsible for developing their deep learning training platform and... Read More →
Tuesday November 12, 2024 5:00pm - 5:10pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:15pm MST

⚡ Lightning Talk: Cost Saving Strategies for Interactive AI Development - Shravan Achar, Apple
Tuesday November 12, 2024 5:15pm - 5:25pm MST
The interactive nature of Jupyter notebooks has made them indispensable tools for data scientists and AI researchers, facilitating exploratory data analysis, prototyping, and model development. However, managing the cost of resource-intensive computations at different stages of AI/ML lifecycle presents significant challenges. We leveraged Apache YuniKorn to design a resource management system tailored for notebook workloads, which incorporates fair sharing, user-specific policies and budget constraints to allocate computational resources efficiently while adapting for both data preparation and model training stages. And thanks to the extensibility of JupyterLab, we offer rich displays next to the Notebook enabling data scientists to introspect resource usage in real time. This session presents cost saving strategies for interactive development on Jupyter using Kubeflow for model training and Spark for data preparation with YuniKorn scheduler.
Speakers
avatar for Shravan Achar

Shravan Achar

Shravan Achar, Sr. Software Engineer, Apple, Apple
Shravan is a senior software engineer at Apple with a passion for open source technologies. With a background in Mathematics and Computer Science, their current interests include MLOps, Scheduling in AI and Jupyter Notebooks.
Tuesday November 12, 2024 5:15pm - 5:25pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:25pm MST

Cloud Native + Kubernetes AI Day - Closing Remarks
Tuesday November 12, 2024 5:25pm - 5:30pm MST
Tuesday November 12, 2024 5:25pm - 5:30pm MST
Salt Palace | Level 1 | Grand Ballroom AC

5:30pm MST

Evening Reception
Tuesday November 12, 2024 5:30pm - 7:00pm MST
Join us onsite for drinks and appetizers with fellow co-located attendees from Tuesday's CNCF-hosted Co-located Events.

Attendees from all CNCF Co-located Events are welcome.
Tuesday November 12, 2024 5:30pm - 7:00pm MST
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • AppDeveloperCon
  • ArgoCon
  • BackstageCon
  • Breaks
  • Cilium + eBPF Day
  • Cloud Native AI + Kubernetes Day
  • Cloud Native StartupFest
  • Cloud Native University
  • Data on Kubernetes Day
  • EnvoyCon
  • Istio Day
  • Kubernetes on Edge Day
  • Observability Day
  • OpenFeature Summit
  • OpenTofu Day
  • Platform Engineering Day
  • WasmCon