Nicholas Bannister

Recent

ai-engineering

Speculative Decoding and the Model Choice: Lessons

Speculative Decoding and the Model Choice: Lessons

Speculative decoding model differences.

Inference-Aware AI AI EngineeringEngineering Best Practices

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment

Deploying vLLM with docker on AWS using terraform.

AI EngineeringInference-Aware AI

More Articles

Inference-Aware AI: Working Definitions

Inference-Aware AI: Working Definitions

A glossary of terms that define the concept of inference-aware agents, breaking down the core ideas, agent types, awareness dimensions, and platform components behind cost-efficient AI systems.

AI EngineeringInference-Aware AI Software Philosophy

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency

An introduction to the hypothesis that AI agents can be made faster, cheaper, and more effective through an inference-aware platform that optimizes how they decide, act, and use resources.

AI EngineeringInference-Aware AI Software Philosophy

Scaling Engineering with AI from 0 to 50

Scaling Engineering with AI from 0 to 50

What it really takes to scale an engineering team from 0 to 50 inside a 100+ person company in today’s AI-native world.

AI EngineeringEngineering Team ScalingLeadership