Context engineering for production LLMs

"Prompt engineering" undersells the real work. The model only sees the context window — engineering what goes into it, in what order, under a token budget, is most of the job.

Treat the window as a cache with a cost function: every token competes for attention and money.

budget.ts

function fitToBudget(parts: Part[], maxTokens: number): Part[] {
  const sorted = [...parts].sort((a, b) => b.priority - a.priority)
  const out: Part[] = []
  let used = 0
  for (const part of sorted) {
    if (used + part.tokens > maxTokens) continue
    out.push(part)
    used += part.tokens
  }
  return out
}

Drop low-priority context before you hit the limit — a truncated mid-sentence chunk is worse than no chunk at all.