Writing · Muhammad Farooq

← home

Writing

Essays on retrieval, language models, and the engineering around them. Most pair with a video on the channel.

jun 2026 Harness engineering: why agent performance now lives outside the model Same model, same benchmark, six times the performance difference. Two March 2026 papers show the code around the model now matters more than the model. Here is what they found.
jun 2026 What is an agent harness? The nine components of a great one A harness is the fixed architecture that turns a model into an agent. What it is, how it differs from a framework, and the nine components every modern harness needs.
jun 2026 How agent harnesses manage context: cap, slice, search, store What occupies an agent's context window and the four moves harnesses use when content does not fit: cap it, slice it, search it, or store it elsewhere.
jun 2026 Compaction is the hardest problem in agent engineering Why agent harnesses summarize old history, what a careless summary destroys, the failure modes that follow, and the patterns that make compaction safe.
jun 2026 DeepSeek visual primitives: teaching models to reason with a cursor Notes on DeepSeek's briefly public paper Thinking with Visual Primitives: boxes, points, and paths placed inside the reasoning trace, and its honest limits.
jun 2026 DiffusionGemma: what Google's open text diffusion model actually changes Notes on DiffusionGemma, Google's first open-weight text diffusion model: how block diffusion refines a 256-token canvas in parallel, the official speed and benchmark numbers, and what it takes to run locally.
jun 2026 DwarfStar 4: how a 284B model runs on a MacBook A 284B parameter model needs 568 GB stored normally. DwarfStar runs it on 128 GB machines at usable speeds. The quantization recipe, SSD streaming, and the numbers.
jun 2026 How to evaluate an agent harness Harness configurations cluster at 74-76% resolve rate while cost varies fourteen times. A five-step method for judging harnesses on accuracy and cost.
jun 2026 Loop engineering: what it is, when to use it, and when to stay away Loop engineering means designing systems that prompt your agents instead of prompting them yourself. What a loop is, what a serious one needs, and the caveats that matter.
jun 2026 RAG beyond similarity search: how a modern retrieval pipeline works Traditional RAG embeds chunks and hopes similarity search finds the right ones. What replaced it: hybrid retrieval, reranking, enrichment, verification, with localGPT as a working example.
jun 2026 Sub-agents: when one context window is not enough Why single-context agents hit a wall, how harnesses isolate work in child agents with the spawn, restrict, collect pattern, and when delegation backfires.
jun 2026 Tools vs skills vs MCP: how agents acquire capabilities Tools are primitives. Skills are knowledge. MCP is neither: a protocol that connects external tool servers to any harness. How the three fit together.

rss feed