Recent
inference-aware-ai

Speculative Decoding and the Model Choice: Lessons
Speculative decoding model differences.
Inference-Aware AI AI EngineeringEngineering Best Practices

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment
Deploying vLLM with docker on AWS using terraform.
AI EngineeringInference-Aware AI
More Articles

Inference-Aware AI: Working Definitions
A glossary of terms that define the concept of inference-aware agents, breaking down the core ideas, agent types, awareness dimensions, and platform components behind cost-efficient AI systems.
AI EngineeringInference-Aware AI Software Philosophy

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency
An introduction to the hypothesis that AI agents can be made faster, cheaper, and more effective through an inference-aware platform that optimizes how they decide, act, and use resources.
AI EngineeringInference-Aware AI Software Philosophy