Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter Paper • 2604.15039 • Published Apr 22 • 3
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 5 days ago • 90
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving Paper • 2605.13734 • Published 11 days ago • 10
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving Paper • 2605.13734 • Published 11 days ago • 10
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Paper • 2507.10069 • Published Nov 11, 2025 • 1
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism Paper • 2507.10069 • Published Nov 11, 2025 • 1