Managing large-scale batch workflows efficiently is critical for AI/ML workloads. Data preparation for training or fine tuning models can involve a large number of steps. These make for excellent Argo workflows. But Argo faces the etcd limitation of the 1.5MB object size. This limitation restricts the ability of Argo to run truly large-scale workflows. This talk will delve into the intricacies of this limitation and its impact on AI/ML workflows. We will illustrate with examples how this has been a non-deterministic and frustrating bottleneck for users. To address this challenge, Argo introduced a feature that circumvents the etcd object size restriction. By offloading the bulk of the workflow status to an RDBMS and only storing the reference in etcd, Argo maintains its scaling capabilities still adhering to Kubernetes' limitations. This talk will provide a comprehensive guide on configuring and utilizing the Argo offloading feature in AWS using Aurora Postgres RDS and EKS.