"Prompt engineering" undersells the real work. The model only sees the context window — engineering what goes into it, in what order, under a token budget, is most of the job.
Treat the window as a cache with a cost function: every token competes for attention and money.
function fitToBudget(parts: Part[], maxTokens: number): Part[] {
const sorted = [...parts].sort((a, b) => b.priority - a.priority)
const out: Part[] = []
let used = 0
for (const part of sorted) {
if (used + part.tokens > maxTokens) continue
out.push(part)
used += part.tokens
}
return out
}Drop low-priority context before you hit the limit — a truncated mid-sentence chunk is worse than no chunk at all.