Loading…
Attending this event?
Tuesday November 12, 2024 4:30pm - 4:55pm MST
Large language models are often released as families of models with varying parameter counts and quantization. To reduce cost, inference services increasingly rely on dynamic model selection, preferring smaller models when possible. GPU vendors are on a journey to enable dynamic GPU slicing, making it possible for a workload to request a fraction of the compute and memory units in a GPU, and for the slices to be created and destroyed on demand without disrupting existing workloads. The onus is now on Kubernetes. The Device Management Working Group is hard at work to expose these capabilities. While vendor-agnostic slicing APIs do not exist yet, this talk demonstrates that incremental GPU slicing is possible today. We replace the Multi-Instance GPU manager, which only permits partitioning GPUs in bulk, with an open-source incremental-slicing controller without needing new APIs or changes to the device plugin. Come learn how to achieve incremental slicing in your GPU clusters.
Speakers
avatar for Abhishek Malvankar

Abhishek Malvankar

Senior Software Engineer, IBM Research
Abhishek is Senior Software Engineer and Master Inventor at IBM Research. He works closely with Red Hat as Partner Engineer. He focuses on resource management, performance, and distributed computing for AI workloads in the cloud. He enjoys designing easy-to-use solutions for the cloud... Read More →
avatar for Olivier Tardieu

Olivier Tardieu

Principal Research Scientist, Manager, IBM
Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →
Tuesday November 12, 2024 4:30pm - 4:55pm MST
Salt Palace | Level 1 | Grand Ballroom AC

Attendees (5)


Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link