Recent
ai-engineering

Speculative Decoding and the Model Choice: Lessons
Speculative decoding model differences.
Inference-Aware AI AI EngineeringEngineering Best Practices

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment
Deploying vLLM with docker on AWS using terraform.
AI EngineeringInference-Aware AI
More Articles

Inference-Aware AI: Working Definitions
A glossary of terms that define the concept of inference-aware agents, breaking down the core ideas, agent types, awareness dimensions, and platform components behind cost-efficient AI systems.
AI EngineeringInference-Aware AI Software Philosophy

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency
An introduction to the hypothesis that AI agents can be made faster, cheaper, and more effective through an inference-aware platform that optimizes how they decide, act, and use resources.
AI EngineeringInference-Aware AI Software Philosophy

Scaling Engineering with AI from 0 to 50
What it really takes to scale an engineering team from 0 to 50 inside a 100+ person company in today’s AI-native world.
AI EngineeringEngineering Team ScalingLeadership