A paper co-authored by Prof. Alex Lew has been selected as one of four "Outstanding Papers" at this year's Conference on Language Modeling (COLM 2025), held in Montreal in October.
Serving Large Language Models (LLMs) at scale is complex. Modern LLMs now exceed the memory and compute capacity of a single GPU or even a single multi-GPU node. As a result, inference workloads for ...