Staff Software Engineer at Broadcom | Tech Lead CNCF TAG Runtime, Broadcom
Rajas is a staff software engineer at Broadcom and a tech lead of the CNCF Technical Advisory Group, Runtime. He is actively involved in the AI working group in the CNCF. He is a Kubernetes contributor and has been a maintainer of the Kube Proxy Next Gen Project. He has also served... Read More →
Ricardo leads the Platform Infrastructure team at CERN with a strong focus on cloud native deployments and machine learning. He has led for several years the internal effort to transition services and workloads to use cloud native technologies, as well as dissemination and training... Read More →
I’m a seasoned professional with a rich history in open source communities–Ubuntu, Linaro, Open Compute Project Foundation, Zeek, Kubeflow and more. I’m known for my leadership skills and commitment to inclusivity. I served as an all-source intelligence analyst in the US Army... Read More →
Ray is a unified framework for scaling AI applications from a laptop to a cluster. KubeRay supports the creation, deletion, and scaling of Ray clusters on K8s, along with managing Ray jobs and services on the Ray clusters. This talk introduces SkyRay, in which KubeRay is extended towards the Sky computing model via interoperation with a multi-cluster fleet manager. With SkyRay, each Ray cluster is seamlessly scheduled onto a cloud K8s cluster suited to the Ray cluster's resource needs and policy requirements. The policies can capture a variety of cluster characteristics, e.g., desired cloud provider, region, K8s version, service quality, and GPU type availability. Fleet manager policy updates can be used to trigger automatic migration of Ray clusters between K8s workload clusters. The talk presents several example use cases for SkyRay, including cluster selection for resource needs, service availability, development vs production cluster configuration, and K8s version upgrade.
Anne is Chief Scientist at Elotl. She is interested in resource efficiency. She worked on Uber's Michelangelo Machine Learning platform, on Velocloud's SD-WAN management, on VMware's Distributed Resource Schedulers for servers and storage, on performance analysis for VMware, on Transmeta's... Read More →
The transformers revolution has spurred a race between hackers looking for an edge and security teams looking for leverage, while organizations of all kinds rush to make use of this new technology with little awareness of how it works and how it could be used against them. In this talk, Shane will describe some of the ways that AI is being used by attackers, countermeasures for AI attacks, opportunities for AI to mitigate conventional attacks, and how AI-powered services might be used against their owners. He’ll show a live demo combining these concepts. Attendees will learn about the AI-related risks they face and techniques for managing those risks.
Shane is a Senior Staff Infrastructure Security Engineer at Shopify, where he's working on a multi-tenant platform that allows developers to securely build scalable apps and services for crafters, entrepreneurs, and businesses of all sizes.
In the rapidly evolving field of AI, innovation flourishes through the open exchange of ideas, resources, and knowledge. In this keynote, we will delve into Red Hat’s journey in cloud native and AI, showcasing our community-driven efforts and initiatives that promote a culture of open collaboration within the cloud native AI ecosystem. We invite you to join us in this collaborative effort and explore opportunities to contribute to and benefit from a vibrant community.
Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes. He's also a maintainer and... Read More →
The need to evolve from DevOps to MLOps arises from the unique challenges that machine learning (ML) systems bring, which traditional DevOps processes aren’t equipped to handle. While DevOps focuses on software development and operations, MLOps is necessary because ML models introduce complexities related to data, model lifecycle, and experimentation that go beyond typical software management.
Developed by the Kubernetes community in collaboration with the ecosystem, Kueue augments k8s and ClusterAutoscaler to provide an E2E batch system. Kueue implements job queueing, deciding when jobs should wait and when they should start or be preempted, based on quotas and a hierarchy for sharing resources among teams. An exciting addition in the v0.7 release is fair sharing, designed to support large ML platforms serving multiple teams. Kueue allows platforms to model their teams and achieve a high utilization of resources, while sharing cost and providing equitative access to unused resources. Teams can always reclaim their guaranteed quotas via preemption. The Kueue v0.7 and the Kubernetes v1.31 releases also include performance optimizations to achieve high throughput. In this talk, you will learn about the challenges faced during design and implementation of fair sharing and preemption, about this system running in production, and the plans to support complex hierarchies.
Aldo is a Senior Software Engineer at Google. He works on Kubernetes and Google Kubernetes Engine, where he contributes to kube-scheduler, the Job API and other features to support batch, AI/ML and HPC workloads. He is currently a TL at SIG Scheduling and an active member of WG Batch... Read More →
Rajat Phull is an Engineering Manager at Apple. He works in Machine Learning Platform team with a focus on GPU resource management, and ML training orchestration at scale using Kubernetes.
How would you build an LLM system to modify a Kubernetes deployment based on its live telemetry data stream? A vanilla LLM is not enough to solve this problem as it is limited to outdated training data and is prone to hallucinations. In this talk, we will explore the concept of Agents—a powerful framework for solving complex multi-level tasks using a LLM as its reasoning engine, supported by a suite of tools. These tools can be advanced calculators, real time web scrapers, domain knowledge extractors, etc. They include executable functions, RAG pipelines, APIs or other services that allow the agents to complete their tasks effectively. We will walk-through a demo that leverages Kubernetes services and Podman containerization techniques that enable the agent workflow. Attendees will learn how a Kubernetes based agent framework enhances the performance capabilities of LLMs, offering a scalable and autonomous solution for next-generation intelligent systems.
Shrey Anand is a data scientist with over five years of experience in the field of AI / ML. He collaborates with the emerging technologies at Red Hat where he develops cutting-edge data science solutions to solve open source and business problems. As a strong advocate of open source... Read More →
Hema Veeradhi is a Principal Data Scientist working in the Emerging Technologies team part of the office of the CTO at Red Hat. Her work primarily focuses on implementing innovative open AI and machine learning solutions to help solve business and engineering problems. Hema is a staunch... Read More →
Cloud native takes on new meaning in the AI and HPC domains. What does cloud native mean when your software is tightly coupled to hardware? When capacity is fixed, which assumptions start to break down? How can you flex GPUs batch training workloads and inference? Join us for a case study, demonstrating how a small team scaled ML infrastructure from a single cloud to multiple clusters across 4 cloud providers - in under 6 months. We’ll share unique multi-cloud challenges we uncovered around supercomputing infrastructure, cross cloud networking, capacity & quota management, batch workloads, FinOps, and observability. We will particularly highlight our experience using Kueue to manage fixed capacity across clouds & where Kubernetes still falls short for HPC workloads. Leave with a solid understanding of what it takes for an infrastructure team to support the lifecycle of a cloud native foundation model.
Autumn is the Director of Infrastructure & Security at Cohere. She’s been with the company since September 2022 scaling teams & tools. Prior to buying into the startup life, she spent 3 years in financial services and 14 years at a large non-profit. Her passion is helping innovative... Read More →
In this talk, we will detail the use of Kubernetes operators to run HPC applications using Charm++ runtime system on Kubernetes cluster on cloud. Charm++ is an adaptive intelligent runtime system that provides capabilities such as dynamic load balancing, energy optimizations, and communication optimizations, in addition to support for resource elasticity. It is a well-established system in the HPC world, and supports highly scalable applications such as NAMD for biomolecular simulations. We will talk about capabilities added to the setup like job malleability utilizing the shrink and expand feature in Charm++ jobs and by changing the number of pods assigned to a job at run-time. We will demonstrate effectiveness of shrink/expand operations for different scheduling policies and quantify the associated overhead. Charm++ has recently added support for python-based framework, Charm4py, for python codes for HPC. We will also talk about running Charm4Py applications on Kubernetes.
Kubernetes network overhead and complexity is one of the impediments of Cloud adoption for AI, especially when considering using multiple networks to boost bandwidth for distributed tasks. Defining a network configuration for secondary interfaces in a static way is not a trivial task for platform engineers to meet the distinctive demands of heterogeneity and scale within a virtual-private-cloud cluster. In this talk, we show how deploying a single portable custom resource can play a significant role in transforming a VPC cluster into a supercomputer tailored for AI workloads. We share our journey of the Multi-NIC CNI project and demonstrate the benefit of seamlessly enabling dynamicity in network attachment definitions via practical use cases, along with outlining future directions towards the related open source projects like Multus, Node Resource Interface (NRI), Dynamic Resource Allocation (DRA), and Kubernetes Networking Interface (KNI).
Tatsuhiro Chiba is a STSM and Manager at IBM Research, specialized in performance optimization and acceleration of large scale AI and HPC workloads on Hybrid Cloud. He is leading a project to enhance OpenShift performance and sustainability for AI and HPC by exploiting various cloud... Read More →
Sunyanan Choochotkaew is working at IBM Research - Tokyo, specializing in cloud platform optimization. She actively contributes to various open-source projects, including Kepler, Multi-NIC CNI, and CPE operator, where she holds the role of maintainer. She has also made contributions... Read More →
As LLMs rapidly evolve, K8s’ topology management can not meet the performance demands in several aspects: 1. For new-generation high-density processors, NUMA affinity is insufficient to ensure inference performance. 2. The performance bottleneck has shifted from computation to networking. However, K8s does not consider the topology of heterogeneous resources like GPU and RDMA.
In this talk, He will introduce how ByteDance significantly improves LLM workload performance by enhancing topology-aware scheduling: 1. For nodes with high-density processors, achieve die-level affinity and implement anti-affinity between memory bandwidth-intensive pods. 2. For pods within a training job, achieve inter-RDMA affinity at the ToR level to avoid switch congestion. 3. For inference workloads, achieve GPU-RDMA affinity at PCIe switch level to enable GPUDirect RDMA for accelerated communication. 4. How we achieve job-level topology affinity based on K8s scheduler which operates at the pod level.
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
Organizations are beginning to unlock significant value by integrating Large Language Models (LLMs) & Retrieval-Augmented Generation (RAG) into their business-critical processes. However, enterprises often face challenges in meeting the high expectations of GenAI-driven business outcomes. Bridging this gap requires meticulous planning in governance, continuous evaluation, seamless scaling, operational costs, and time-to-market. In this session, attendees will witness a live demonstration of a RAG application stack built with LangChain, Canopy, and a PostgreSQL Vector database, all deployed on Kubernetes. Additionally, we will discuss leveraging GPU and TPU accelerators to enhance computational efficiency. The audience will also gain insights into MLOps strategies for data splitting, embeddings, retrieval, and prompt engineering. Join us to explore how to effectively leverage MLOps with Kubernetes to achieve scalable and impactful GenAI solutions.
Director Technology- Devops and Cloud, Publicis Sapient
20+ years of successful IT and Delivery management experience leading mission critical infrastructure, software development and implementation projects involving strategic business and technology change and providing measurable financial results for the organization. Worked with Financial... Read More →
Madhav helps major enterprises drive innovation using modern application architectures, containers and DevOps. Madhav has been a speaker at conferences such as SpringOne, Cloud Foundry Summit and Oracle OpenWorld. He has co-authored a white paper on container security. Madhav currently... Read More →
Langchain has become one of the most popular frameworks for anyone building custom, generative AI-driven apps powered by LLMs, that leverage RAG (Retrieval-Augmented Generation) for the most enhanced results. But like all data products, these applications are really only as good as the organizational data fed into them––and we’ve all learned the hard way that the data is oftentimes far from perfect. In this hands-on tutorial you’ll learn how to build a reproducible AI application pipeline with Kubeflow, Langchain and lakeFS, widely adopted OSS tools in the ML & GenAI stack. By learning how to build a RAG chatbot, while iteratively tuning it for best results leveraging lakeFS’s temporal versions, you’ll come away with improved methods for data reproducibility for your custom AI apps, that provide better data quality, alongside an improved user experience for your application users.
Oz Katz is the CTO and Co-Creator of the open source lakeFS Project, an open source platform that delivers resilience and manageability to object-storage based data lakes. Oz engineered and maintained petabyte-scale data infrastructure at analytics giant SmilarWeb, which he joined... Read More →
At Intuit, ML teams faced challenges with processing and running inference on high throughput streaming data. Connecting to various messaging systems like Kafka, Pulsar, and SQS proved to be a time-consuming and intricate process. Moreover, our ML teams required the ability to perform intermediate processing and execute inference as part of their workflows. To further complicate, scaling the processing and inference based on the volume of events introduced additional challenges. Based on challenges, we created Numaflow, a K8s native open-source platform for scalable event processing. It simplifies connecting to event sources, enables teams to do event processing and inference on streaming data without a learning curve, and integrates seamlessly with existing systems. This talk is for ML engineers, data scientists, and those interested in asynchronous inference on streaming data. We'll show how Numaflow overcomes obstacles and streamlines inference on streaming data
Vigith is a co-creator of Numaproj and Principal Software Engineer for the Intuit Core Platform team in Mountain View, California. One of Vigith's current day-to-day focus areas is the various challenges in building scalable data and AIOps solutions for both batch and high-throughput... Read More →
Sri Harsha Yayi is a Product Manager at Intuit, where he primarily focuses on the company's Modern SaaS Kubernetes platform, specifically within the event-driven systems domain. He is the PM for Numaflow, an open-source, Kubernetes native platform designed for the development of event-driven... Read More →
Large language models are often released as families of models with varying parameter counts and quantization. To reduce cost, inference services increasingly rely on dynamic model selection, preferring smaller models when possible. GPU vendors are on a journey to enable dynamic GPU slicing, making it possible for a workload to request a fraction of the compute and memory units in a GPU, and for the slices to be created and destroyed on demand without disrupting existing workloads. The onus is now on Kubernetes. The Device Management Working Group is hard at work to expose these capabilities. While vendor-agnostic slicing APIs do not exist yet, this talk demonstrates that incremental GPU slicing is possible today. We replace the Multi-Instance GPU manager, which only permits partitioning GPUs in bulk, with an open-source incremental-slicing controller without needing new APIs or changes to the device plugin. Come learn how to achieve incremental slicing in your GPU clusters.
Abhishek is a Senior Software Engineer and Master Inventor at IBM Research and Co-chairs CNCF Batch System Initiative. He focuses on resource management, performance, and distributed computing for AI workloads in the cloud. Abhishek enjoys designing easy-to-use solutions for the cloud... Read More →
Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →
Are you eager to fine-tune LLMs and run inference directly within your Kubernetes clusters? Do you want an API compatible with OpenAI to leverage the extensive GenAI ecosystem? If so, LLMariner (https://llmariner.ai) is what you need. It instantly builds a software stack that provides an OpenAI-compatible API for inference, fine-tuning, and model management. In this talk, we'll provide an overview of the LLMariner and showcase its capabilities through practical use cases. Join us to learn how you can leverage LLMariner to enhance Kubernetes for your Generative AI workflows.
Kenji is a chief architect at CloudNatix and has been working on large-scale distributed systems - especially cluster management systems - for over ten years. Most recently, he was a Principal Engineer at Nvidia, responsible for developing their deep learning training platform and... Read More →
The interactive nature of Jupyter notebooks has made them indispensable tools for data scientists and AI researchers, facilitating exploratory data analysis, prototyping, and model development. However, managing the cost of resource-intensive computations at different stages of AI/ML lifecycle presents significant challenges. We leveraged Apache YuniKorn to design a resource management system tailored for notebook workloads, which incorporates fair sharing, user-specific policies and budget constraints to allocate computational resources efficiently while adapting for both data preparation and model training stages. And thanks to the extensibility of JupyterLab, we offer rich displays next to the Notebook enabling data scientists to introspect resource usage in real time. This session presents cost saving strategies for interactive development on Jupyter using Kubeflow for model training and Spark for data preparation with YuniKorn scheduler.
Shravan is a senior software engineer at Apple with a passion for open source technologies. With a background in Mathematics and Computer Science, their current interests include MLOps, Scheduling in AI and Jupyter Notebooks.
Ricardo leads the Platform Infrastructure team at CERN with a strong focus on cloud native deployments and machine learning. He has led for several years the internal effort to transition services and workloads to use cloud native technologies, as well as dissemination and training... Read More →
I’m a seasoned professional with a rich history in open source communities–Ubuntu, Linaro, Open Compute Project Foundation, Zeek, Kubeflow and more. I’m known for my leadership skills and commitment to inclusivity. I served as an all-source intelligence analyst in the US Army... Read More →
Staff Software Engineer at Broadcom | Tech Lead CNCF TAG Runtime, Broadcom
Rajas is a staff software engineer at Broadcom and a tech lead of the CNCF Technical Advisory Group, Runtime. He is actively involved in the AI working group in the CNCF. He is a Kubernetes contributor and has been a maintainer of the Kube Proxy Next Gen Project. He has also served... Read More →