Independent Science + Technology

Category: llm-inference

Re: Defeating Nondeterminism in LLM Inference, The Future is Predictable

Post date September 10, 2025
Post author By BotBeat.Tech: Trusted Generative AI Research Firm
Post categories In defeating-nondeterminism, future-is-predictable, inference, llm, llm-inference, nondeterminism-in-llm, re-research-paper, thinking-machine-labs-blog

The Conductor in Your Pocket: How PowerInfer-2 Orchestrates Smartphone Hardware for LLM Inference

Post date August 26, 2025
Post author By Writings, Papers and Blogs on Text Models
Post categories In Edge AI, heterogeneous-computing, llm-inference, mobile-computing, neuron-cluster, on-device-ai, power-infer-2, system-for-ml

Why Your Phone’s AI is Slow: A Story of Sparse Neurons and Finicky Flash Storage

Post date August 26, 2025
Post author By Writings, Papers and Blogs on Text Models
Post categories In edge-computing, llm-inference, mobile-system, on-device-ai, performance-analysis, sparse-activation, system-for-ml, ufs4

vAttention System Design: Dynamic KV-Cache with Contiguous Virtual Memory

Post date June 12, 2025
Post author By Text Generation
Post categories In contiguous-virtual-memory, dynamic-memory-allocation, gpu-memory, kv-cache-management, llm-inference, system-architecture, system-design, vattention

Self-Speculative Decoding Speeds for Multi-Token LLMs

Post date June 6, 2025
Post author By Large Models (dot tech)
Post categories In ai-efficiency, code-generation, inference-optimization, llm-decoding-speed, llm-inference, multi-token-models, multi-token-prediction, self-speculative-decoding

AI Innovations and Insights 22: LLM Inference, SubgraphRAG, and FastRAG

Post date January 27, 2025
Post author By Florian June
Post categories In ai, graphrag, large-language-models, llm-inference, retrieval-augmented-gen

Nothing left to load.