Loading…
Tuesday November 12, 2024 2:05pm - 2:30pm MST
As LLMs rapidly evolve, K8s’ topology management can not meet the performance demands in several aspects: 1. For new-generation high-density processors, NUMA affinity is insufficient to ensure inference performance. 2. The performance bottleneck has shifted from computation to networking. However, K8s does not consider the topology of heterogeneous resources like GPU and RDMA.

In this talk, He will introduce how ByteDance significantly improves LLM workload performance by enhancing topology-aware scheduling: 1. For nodes with high-density processors, achieve die-level affinity and implement anti-affinity between memory bandwidth-intensive pods. 2. For pods within a training job, achieve inter-RDMA affinity at the ToR level to avoid switch congestion. 3. For inference workloads, achieve GPU-RDMA affinity at PCIe switch level to enable GPUDirect RDMA for accelerated communication. 4. How we achieve job-level topology affinity based on K8s scheduler which operates at the pod level.
Speakers
avatar for He Cao

He Cao

Senior Software Engineer, ByteDance
He Cao is a senior software engineer on the Cloud Native team at ByteDance, a maintainer of Katalyst and KubeZoo, and a member of Istio. He has 5+ years of experience in the cloud native area. Since joining ByteDance, he has designed and implemented several critical systems for VKE... Read More →
Tuesday November 12, 2024 2:05pm - 2:30pm MST
Salt Palace | Level 1 | Grand Ballroom A

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link