Posts
-
FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems
-
Sorting-Free GPU Kernels for LLM Sampling
-
FlashInfer 0.2 - Efficient and Customizable Kernels for LLM Inference Serving
-
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding
-
Accelerating Self-Attentions for LLM Serving with FlashInfer
subscribe via RSS
