Meta Info

Homepage: https://www.usenix.org/conference/osdi25

Papers

LLM Inference

[2412.17246] Fast and Live Model Auto Scaling with O(1) Host Caching

Huawei & SJTU IPADS

Serverless Computing, Model Autoscaling

[2502.04563] WaferLLM: A Wafer-Scale LLM Inference System

Edinburgh(Luo Mai) & Microsoft

wafer-scale LLM inference system

GPU Sharing

Preemptive Scheduling for Diverse XPUs using Multi-level Hardware Model

SJTU IPADS

Resource Allocation

[2412.11447] Zeal: Rethinking Large-Scale Resource Allocation with "Decouple and Decompose"

results matching ""

    No results matching ""