<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://derekji.github.io/blog/feed.xml" rel="self" type="application/atom+xml" /><link href="https://derekji.github.io/blog/" rel="alternate" type="text/html" /><updated>2026-04-15T02:08:04+00:00</updated><id>https://derekji.github.io/blog/feed.xml</id><title type="html">ZhigangJi</title><subtitle>Senior Software Engineer | Expert in .NET Ecosystem, AI Agents &amp; AI-Augmented Engineering</subtitle><author><name>Zhigang Ji</name></author><entry><title type="html">From MCP to CLI - A Paradigm Shift in Enterprise-Grade AI Agent Architecture</title><link href="https://derekji.github.io/blog/2026/04/12/mcp-vs-cli-paradigm-shift-en.html" rel="alternate" type="text/html" title="From MCP to CLI - A Paradigm Shift in Enterprise-Grade AI Agent Architecture" /><published>2026-04-12T00:00:00+00:00</published><updated>2026-04-12T00:00:00+00:00</updated><id>https://derekji.github.io/blog/2026/04/12/mcp-vs-cli-paradigm-shift-en</id><content type="html" xml:base="https://derekji.github.io/blog/2026/04/12/mcp-vs-cli-paradigm-shift-en.html"><![CDATA[<p>When designing AI Agent systems, one overlooked detail is silently reshaping the entire industry’s technical foundation.</p>

<p>This detail is called <strong>Context Budgeting</strong>.</p>

<h2 id="the-root-problem-the-elegance-of-mcp-and-its-hidden-cost">The Root Problem: The Elegance of MCP and Its Hidden Cost</h2>

<p>Model Context Protocol (MCP), as a relatively young standard, introduced a unified paradigm for integrating AI with application systems. The intent is sound—by defining clear JSON Schemas, it enables models to understand and use arbitrary tools. However, in high-concurrency, high-complexity enterprise environments, this “complete declaration” approach is revealing fundamental limitations.</p>

<p>Picture this scenario: You have 15 microservices, each exposing 7-8 API endpoints on average. Add integrations with GitHub, Azure DevOps, Atlassian, and other third-party platforms. The result: 100+ available “tools” in your system.</p>

<p>Each tool requires a detailed JSON Schema definition—parameters, types, validation rules, return value structures. MCP’s elegance lies in its completeness. But this completeness carries a hidden cost.</p>

<h3 id="cost-one-the-economics-of-tokens">Cost One: The Economics of Tokens</h3>

<p>Let’s do a simple calculation. A moderately complex API specification averages 150-300 tokens. For 100 tools, that’s 15,000-30,000 tokens in standing overhead. With models like Claude or GPT-4, where input token pricing is typically 1/3 of output pricing, this translates to:</p>

<ul>
  <li>The “startup cost” of each request has already consumed a significant portion of the context window</li>
  <li>Within 200K or 100K context limits, the actual space available for problem-domain reasoning is severely eroded</li>
  <li>In production environments handling long conversations or complex multi-step reasoning, this cost rapidly becomes unacceptable</li>
</ul>

<p>This is not merely an economic problem—it’s fundamentally an <strong>information density</strong> problem. The model’s attention mechanism experiences capability degradation when processing extremely long inputs. More tokens don’t mean better understanding; they risk burying critical information under mountains of Schema definitions.</p>

<h3 id="cost-two-cognitive-load-and-attention-fragmentation">Cost Two: Cognitive Load and Attention Fragmentation</h3>

<p>From a neuroscience perspective, the Transformer architecture’s attention mechanism faces a fundamental trade-off when processing long sequences: <strong>signal-to-noise ratio</strong>.</p>

<p>MCP forces the model to browse all 100+ tool definitions with nearly identical probability weights before executing any operation. This means:</p>

<ol>
  <li><strong>Information Competition</strong>: The model must distinguish “which tools truly matter for this task” from a sea of parameter definitions—a process that itself consumes cognitive resources.</li>
  <li><strong>Instruction Following Degradation</strong>: Research shows that when input length exceeds 60-70% of the context window, the model’s “Instruction Following” capability drops significantly.</li>
  <li><strong>Reasoning Trajectory Pollution</strong>: When generating reasoning steps, the model is easily distracted by irrelevant tool definitions, reducing reasoning path efficiency.</li>
</ol>

<p>For enterprise applications requiring high reliability and consistency, this “cognitive pollution” is unacceptable.</p>

<h2 id="the-quiet-renaissance-of-cli">The Quiet Renaissance of CLI</h2>

<p>Against this backdrop, we’re witnessing an interesting counter-trend: mature AI Agent frameworks (like OpenClaw) increasingly turn toward <strong>CLI (Command Line Interface)</strong> rather than MCP when handling large-scale tool problems.</p>

<p>This may look like “technical regression.” But actually, it reflects a deeper insight: <strong>For any complex system, constraints themselves are features</strong>.</p>

<h3 id="core-advantage-one-zero-entropy-intrinsic-knowledge-base">Core Advantage One: Zero-Entropy Intrinsic Knowledge Base</h3>

<p>Consider this fact: Most modern LLMs have encountered millions of lines of Bash scripts, Git commands, Kubernetes YAML, and Azure CLI directives in their training sets. These tools form the “lingua franca” of internet infrastructure operations.</p>

<p>For these <strong>public CLIs</strong> (git, kubectl, az, gh, etc.), models possess prior knowledge. Moreover, these command-line tools follow consistent design patterns:</p>

<ul>
  <li>The standard <code class="language-plaintext highlighter-rouge">&lt;command&gt; &lt;subcommand&gt; --flag value</code> structure</li>
  <li>Self-documenting capability through <code class="language-plaintext highlighter-rouge">--help</code></li>
  <li>Immediate learning ability via <code class="language-plaintext highlighter-rouge">man</code> or <code class="language-plaintext highlighter-rouge">--help</code></li>
</ul>

<p>In other words, the model doesn’t need us to re-explain “what is git” in the prompt—it’s already a resident in the model’s knowledge base. We simply tell the Agent “use git commands to complete this task,” and the model can automatically reason about potential command combinations.</p>

<p>This <strong>“Pull Mode” rather than “Push Mode”</strong> tool discovery has qualitative advantages in token cost.</p>

<h3 id="core-advantage-two-pipelinesthe-breakthrough-in-execution-density">Core Advantage Two: Pipelines—The Breakthrough in Execution Density</h3>

<p>Now let’s discuss why pipelines are CLI’s greatest advantage over MCP.</p>

<p>Consider a real workflow:</p>

<p><strong>Scenario</strong>: Find error logs in a microservice, extract related request IDs, query the trace system for complete call chains of these IDs, then restart the affected service instances.</p>

<p>How would this unfold in MCP mode?</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Agent: I need to view logs from microservice A
[Model calls Tool: GetLogs(service='A', filter='error')]
Result: Returns 100 log records with request IDs and timestamps

Agent: I need to extract request IDs from these logs
[Model calls Tool: ParseLogs(logs=[...], pattern='request_id')]
Result: Returns 20 unique request IDs

Agent: I need to query the trace system for these IDs' call chains
[Model calls Tool: QueryTraces(request_ids=[...])]
Result: Returns call chain data

Agent: Based on the call chains, I need to restart these services
[Model calls Tool: RestartServices(services=[...])]
Result: Services restarted
</code></pre></div></div>

<p><strong>This requires 4 model roundtrips</strong>. Each roundtrip not only consumes tokens but introduces <strong>intermediate decision points</strong>. The model must confirm intent, evaluate results, and decide the next step at each stage. Any uncertainty or misunderstanding in this process can derail the trajectory.</p>

<p>In CLI mode:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ms logs <span class="nt">--service</span> A <span class="nt">--filter</span> error | <span class="se">\</span>
  <span class="nb">grep</span> <span class="nt">-oP</span> <span class="s1">'request_id=\K[^,]*'</span> | <span class="se">\</span>
  <span class="nb">sort</span> <span class="nt">-u</span> | <span class="se">\</span>
  xargs <span class="nt">-I</span> <span class="o">{}</span> ms traces <span class="nt">--request-id</span> <span class="o">{}</span> | <span class="se">\</span>
  jq <span class="nt">-r</span> <span class="s1">'.services[]'</span> | <span class="se">\</span>
  <span class="nb">sort</span> <span class="nt">-u</span> | <span class="se">\</span>
  xargs <span class="nt">-I</span> <span class="o">{}</span> ms restart <span class="nt">--service</span> <span class="o">{}</span>
</code></pre></div></div>

<p>The Agent generates this command once. The system then <strong>executes deterministically</strong> through the pipeline. No intermediate decision points, no roundtrip delays or uncertainty.</p>

<p>This is the difference in <strong>Execution Density</strong>. CLI achieves “logic fusion” through pipelines, compressing steps scattered across multiple reasoning phases into a single execution instruction.</p>

<p>From a cost perspective:</p>
<ul>
  <li>MCP approach: 4 roundtrips × ~500 tokens/roundtrip = 2000+ tokens consumed</li>
  <li>CLI approach: 1 generation + standard command execution = just the pipeline command’s token cost</li>
</ul>

<p>This is no longer in the optimization realm—<strong>it’s a physics-level dimensional reduction</strong>—from 2000+ tokens of multiple roundtrips down to a single few-hundred-token pipeline instruction. This represents roughly a 10x difference. When we see 444x or even 800x differences in tool definition costs later, the magnitude of change becomes truly striking.</p>

<p>Beyond tokens, the deeper advantage is <strong>deterministic failure recovery</strong>. In MCP mode, any step’s failure requires the model to re-evaluate and adjust. In CLI mode, any command failure is natively captured by bash (via <code class="language-plaintext highlighter-rouge">set -e</code> or error handling), providing immediate feedback. This clear fault boundary makes reliable error handling easier to build.</p>

<h3 id="execution-density-comparison-diagram">Execution Density Comparison Diagram</h3>

<p>Let’s visualize the fundamental difference between these two modes:</p>

<p><img src="https://kroki.io/plantuml/svg/eNp9kd1OwzAMhe_7FH6BCa296wUCFfEjrdIEXE5CIfW6iMypEkcwId4dp9lGEWNXdY5PTuyvV4GV57i1hUfNinqL0DbLl9Z1CJ8FgOqRGFSA61TMRckFzGaXwM7ZeQ2NshaepYbUH8WxnZ01PCJHT_IJ0TKcyCinGeU-ozyTUf7NqKYZ1T6jOpORPOQYwZt-w-DWR18F3kXq2JshrEhOqIIjQz3onbYo2r1ckQfekEC7wCtqJdEMAm-tjI0eYXCGOBRfxQRss3j4D-xxobx12KC1NdwhoVcy4tIMaA2huMbWaNLWyLB62yWi6XBQBecgF7JaHtRqqmYwLvIQud4TkU4WfqBJ0lPUGkO4uJXFThMT0xz6PKpxtKIbZPRbQyaw0YAfqGNuLNz7L2qNReUFWfodrwm58jtB9g3zD9VN" alt="Diagram 1" /></p>

<h2 id="enterprise-reality-the-challenge-of-private-tools">Enterprise Reality: The Challenge of Private Tools</h2>

<p>So far, we’ve discussed advantages of “public CLIs.” But within enterprises, the situation is more complex.</p>

<p>Most enterprises maintain proprietary microservice orchestration toolsets. These tools aren’t in the model’s training set and can’t rely on “intrinsic knowledge.” If we simply switch to CLI without proper discovery and self-description mechanisms, the problem becomes: <strong>How can the Agent discover and understand these private commands?</strong></p>

<p>If we don’t explicitly tell the LLM how each command works and what parameters it accepts, the model can’t judge how to call it correctly or which command to choose. But if we write detailed documentation for each command, that documentation’s volume and complexity often rivals what MCP’s Descriptions (JSON Schema) would require.</p>

<p>Merely replacing MCP with CLI without accompanying capability discovery and self-description mechanisms doesn’t fundamentally solve the integration challenge.</p>

<p>This is where design thinking needs to shift. My answer is: <strong>Router CLI architecture combined with dynamic self-description</strong>.</p>

<h3 id="solution-unified-router-cli--dynamic-self-description">Solution: Unified Router CLI + Dynamic Self-Description</h3>

<p>Suppose we design a unified command entry point:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ms &lt;service-name&gt; &lt;action&gt; <span class="o">[</span><span class="nt">--option</span> value]
</code></pre></div></div>

<p>This <code class="language-plaintext highlighter-rouge">ms</code> (Microservices) command acts as a <strong>central router</strong> with clear responsibilities:</p>

<ol>
  <li><strong>Intent Routing</strong>: Determine the target microservice based on service-name</li>
  <li><strong>Action Dispatch</strong>: Call the specific microservice implementation based on action</li>
  <li><strong>Parameter Passthrough</strong>: Forward options and values to downstream services</li>
</ol>

<p><strong>Key Insight: From “Full Pre-Loading” to “On-Demand Hot Loading”</strong></p>

<p>Let’s use a more engineering-focused framework for comparison—one the industry is adopting:</p>

<p><strong>MCP’s “Eager Loading” Mode:</strong></p>

<p>The moment the Agent starts working, all 100 tools’ complete JSON Schemas must enter Context. This resembles requiring someone to memorize every page of a 2000-page reference manual before taking an exam—regardless of whether the exam covers those sections. The result:</p>

<ul>
  <li><strong>Constant Context Consumption</strong>: 80,000 tokens occupied unconditionally, impossible to release</li>
  <li><strong>Intensified Information Competition</strong>: The model’s every decision involves weighing among 100 complete definitions, fragmenting attention</li>
  <li><strong>Cold Tool Pollution</strong>: Even if the session only uses 5 tools, the remaining 95 definitions’ mere presence constitutes reasoning interference</li>
</ul>

<p>This is why MCP exhibits “token cost rigidity” in high-complexity systems.</p>

<p><strong>Router CLI’s “Lazy Loading” Mode:</strong></p>

<p>The system initially just says “I have an <code class="language-plaintext highlighter-rouge">ms</code> command.” Let’s trace a real enterprise session workflow:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>【Step 0】Agent initialization
  Context contains only the `ms` entry point (~10 tokens)

【Step 1】User request: List all active users
  Agent executes: ms user --help
  → Context loads: user-service documentation (~50 tokens)
  → Agent reasoning: OK, I understand. Execute ms user list --filter "status=active"
  
【Step 2】Continued user operations
  Agent executes: ms user get --user-id 123
  → Note: No need to reload help; pattern is known

【Step 3】New requirement emerged: Initialize payment flow
  User request: Create a virtual card for the user
  Agent executes: ms payment --help
  → Context loads: payment-service documentation (~50 tokens)
  → Context pruning: Remove user-service help definition
     * In MCP: Would require client re-initialization at huge cost
     * In CLI: Implemented via system prompt (e.g., "Discard help for services not used in last 2 steps")
     * Framework auto-manages context lifecycle; developers don't intervene
  → Note: user-service patterns remain in Agent's execution memory; no reload needed

【Step 4】Execute payment operation
  Agent executes: ms payment card create --user-id 123 --card-type virtual
  → Success

【Session Summary】
  - Involved business domains: 2 (user + payment)
  - Actual context consumption: 10 + 50 + 50 = 110 tokens
  - MCP mode for equivalent session: Constant 80,000 tokens
  - Savings ratio: ~99.86%
</code></pre></div></div>

<p>This comparison illustrates the <strong>fundamental difference</strong>: In any real enterprise session, the Agent typically operates within 2-3 logical business domains. MCP forces permanent retention of all 100 domain definitions in Context. Router CLI lets Context dynamically adjust according to <strong>temporal/task flow</strong>, preserving only “hot” information.</p>

<p>Let’s visualize how this 99.86% savings is achieved:</p>

<p><img src="https://kroki.io/plantuml/svg/eNp9k89q3DAQxu9-iiFQ6NJ1au_mz9qQEFhyKCQ0kIZefFHsWUeNPBLWaNN9kj5QX6xjOwnO1uzBHqHxfN9vRvKVZ9VyaEzEmg3C2hLjb4YHr2qE71ts4R6915bgRm-QdYM53K7vYOthffMtipyU61I7RdzvKw9HXby1FR59yMrnfbaLQzZSgS2F5hHbqKuJLzuJHO4ZHST5O8wFrJJ5kiTA9hnJF_RZGQNpv2GNB2NVhdUs6pRFQ8KURjqq_xU8AxK3O3BWE8-i_wHSeHEIIXiZjXXYKpbp-An3PYGzUbGwfIHTBJ7QuCnvZXxyyNupXSP0I_s5PAYuqIcahuJZy5BelESqwTtV4gTjnlGaTkBaUw0LwheBJcsIra6fxH8zMP-wrMybTNwLu9ZWoeS8oFXyDH__wInILxejRmIv34nPT0GUS3WezVfZhxRId9Ct9j17_IOeA_vZ-0v6msFXYVDbuqALWCzPj09fO4VGyQ2QB6spnztsexwplWHXQjo6yOvNRq43UrnLIcuOs-UnaLFjkDMpSBOUoQlGjmiLUL5OubSeoyukSv66fxW_HBE=" alt="Diagram 2" /></p>

<p>This reveals a critical insight: <strong>MCP’s cost isn’t just constant—it’s cumulative</strong>. Each operation occurs within an 80,000-token quagmire. CLI maintains context at extremely low levels throughout the session, genuinely liberating reasoning space.</p>

<p><strong>Second Layer: Leveraging “Form Semantics” to Reduce Description Burden</strong></p>

<p>You might ask: Aren’t help text and JSON Schema essentially the same? Why does help save so many tokens?</p>

<p>The answer: <strong>Models possess innate heuristic understanding of CLI form</strong>.</p>

<p>When encountering <code class="language-plaintext highlighter-rouge">ms user list --filter "status=active" --limit 100</code>, the model doesn’t need detailed explanation:</p>
<ul>
  <li>It knows <code class="language-plaintext highlighter-rouge">list</code> typically means “query a list”</li>
  <li>It understands <code class="language-plaintext highlighter-rouge">--filter</code> is a filter parameter</li>
  <li>It recognizes <code class="language-plaintext highlighter-rouge">"status=active"</code> follows Key=Value convention</li>
  <li>It infers <code class="language-plaintext highlighter-rouge">--limit 100</code> restricts result count</li>
</ul>

<p>These “form intuitions” derive from billions of lines of actual CLI code in training data.</p>

<p>By contrast, JSON Schema is purely machine language—the model depends 100% on explicit documentation:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"parameters"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"filter"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"string"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"description"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Query filter expression following the pattern..."</span><span class="p">,</span><span class="w">
      </span><span class="nl">"pattern"</span><span class="p">:</span><span class="w"> </span><span class="s2">"^(status|role|department)=(active|inactive|...)$"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"examples"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="err">...</span><span class="p">],</span><span class="w">
      </span><span class="nl">"constraints"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="err">...</span><span class="p">]</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"limit"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"integer"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"minimum"</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w">
      </span><span class="nl">"maximum"</span><span class="p">:</span><span class="w"> </span><span class="mi">1000</span><span class="p">,</span><span class="w">
      </span><span class="nl">"default"</span><span class="p">:</span><span class="w"> </span><span class="mi">50</span><span class="w">
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>This creates a stark difference: CLI help can be radically concise:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ms user list
  Lists all <span class="nb">users
  
  </span>Options:
    <span class="nt">--filter</span> &lt;query&gt;   Query condition <span class="o">(</span><span class="nv">status</span><span class="o">=</span>active, <span class="nv">role</span><span class="o">=</span>admin, etc.<span class="o">)</span>
    <span class="nt">--limit</span> &lt;num&gt;      Result count <span class="o">(</span>default: 50<span class="o">)</span>
    <span class="nt">--sort</span> &lt;field&gt;     Sort field <span class="o">(</span>name, created_at, etc.<span class="o">)</span>
</code></pre></div></div>

<p>While MCP requires 5-10x more tokens explaining type systems, constraints, and validation rules. The model’s POSIX command-line intrinsic knowledge becomes our “private knowledge library,” saving vast amounts of explicit documentation. This isn’t cutting corners—<strong>it’s leveraging model priors to optimize context utilization</strong>.</p>

<p><strong>Third Layer: Defensive Allocation of Context Window</strong></p>

<p>This raises a strategic consideration: every token is a scarce reasoning resource.</p>

<p>In MCP mode, context allocation is “passive”: 80,000 tokens are dead weight, leaving 120,000 for user problems, intermediate reasoning, and tool results. Models are forced to work in an extremely crowded environment, with attention inevitably scattered.</p>

<p>In Router CLI mode, we achieve <strong>active context defense</strong>:</p>
<ul>
  <li>Hot tools (needed this session) occupy prominent context positions with highest attention weight</li>
  <li>Cold tools (possibly useful but unused) are lazy-loaded via <code class="language-plaintext highlighter-rouge">--help</code>, activated on demand</li>
  <li>Unrelated tools never enter Context, completely isolated</li>
</ul>

<p>From an enterprise architecture FURPS+ perspective:</p>

<ul>
  <li><strong>Functionality</strong>: No regression; all tools remain accessible</li>
  <li><strong>Usability</strong>: Improved; model isn’t drowned in massive definitions</li>
  <li><strong>Reliability</strong>: Improved; bigger reasoning space means lower error rates</li>
  <li><strong>Performance</strong>: Dramatic improvement; 100x+ token reduction</li>
  <li><strong>Maintainability</strong>: Improved; new tools don’t impact existing sessions</li>
  <li><strong>Scalability</strong>: Qualitative leap; from “context explodes at ~100 tools” to “handles 1000+ tools effortlessly”</li>
</ul>

<h3 id="router-cli-design-best-practices">Router CLI Design Best Practices</h3>

<p>For this approach to work effectively, Router CLI must follow several design principles:</p>

<p><strong>1. Minimize Interface Set</strong></p>

<p>Group related operations under the same service. For instance, all user authentication operations belong under <code class="language-plaintext highlighter-rouge">ms auth-service</code>, not scattered across independent commands. Benefits include:</p>

<ul>
  <li>Reduce top-level command count (100 → 20-30 services)</li>
  <li>Increase command “information density”—each top-level command represents a clear conceptual domain</li>
  <li>Help models build domain concepts—”authentication,” “payment,” “orders” as discrete business objects</li>
</ul>

<p><strong>2. Consistent Flags and Output Format</strong></p>

<p>All commands follow identical flag naming conventions and output formats (e.g., JSON). After learning a few commands, the model can project these patterns to new commands.</p>

<p><strong>3. Composability</strong></p>

<p>Design commands for pipeline composability. For example, <code class="language-plaintext highlighter-rouge">ms order list</code> output should be structured JSON filterable by <code class="language-plaintext highlighter-rouge">jq</code>; <code class="language-plaintext highlighter-rouge">ms order fetch --order-id &lt;id&gt;</code> should return the same structure. Agents can naturally combine such commands.</p>

<p><strong>4. Error Handling Clarity</strong></p>

<p>Command failures should return non-zero exit codes and structured error messages, allowing bash to immediately capture errors without relying on model inference.</p>

<h3 id="practical-example-token-cost-comparison">Practical Example: Token Cost Comparison</h3>

<p>Let’s compare costs using our previous scenario. Assume 80 microservices, ~5 actions each, ~400 total operations.</p>

<p><strong>MCP Mode</strong>: Complete JSON Schema for all 400 operations</p>
<ul>
  <li>Average 200 tokens per Schema</li>
  <li>Total overhead: 80,000 tokens (constant overhead per request)</li>
  <li>Usable context with 200K limit: only 120K</li>
</ul>

<p><strong>CLI + Self-Description Mode</strong>: Unified entry + dynamic help</p>
<ul>
  <li>Top-level <code class="language-plaintext highlighter-rouge">ms --help</code>: Lists 80 services, ~100 tokens</li>
  <li>First service invocation <code class="language-plaintext highlighter-rouge">ms &lt;service&gt; --help</code>: ~50 tokens</li>
  <li>Specific action <code class="language-plaintext highlighter-rouge">ms &lt;service&gt; &lt;action&gt; --help</code>: ~30 tokens</li>
  <li>Single request cost: 100 + 50 + 30 = 180 tokens (only on first interaction with a service)</li>
  <li>Subsequent same-service requests: just 100 tokens</li>
</ul>

<p><strong>Cost Differential</strong> — This transcends “optimization”:</p>

<ul>
  <li>First complex operation: MCP’s 80,000 tokens vs CLI’s 180 tokens, <strong>~444x difference</strong></li>
  <li>Subsequent operations: MCP’s 80,000 tokens vs CLI’s 100 tokens, <strong>~800x difference</strong></li>
  <li>Real numbers: For a mid-size enterprise (100 daily requests), MCP consumes 8M tokens daily on tool definitions; CLI consumes just 18K tokens for dynamic help. <strong>This is collapsing from millions to tens of thousands</strong>.</li>
</ul>

<p>This isn’t conceptual confusion—enterprise workloads are typically repetitive. Same queries, same operations execute repeatedly. CLI’s advantage is that token cost diminishes over time (as Agent understanding deepens), while MCP’s cost remains a constant monolith.</p>

<h3 id="context-utilization-comparison-diagram">Context Utilization Comparison Diagram</h3>

<p>Let’s visualize how both modes allocate the context window:</p>

<p><img src="https://kroki.io/plantuml/svg/eNqNlN1uGjEQhe_3KUap-hMFpF1CUOAq6f4UVJJsS1GqihuzDNSq10a2oaAqeY0-UF-sY7MgoEIpV2vPnjlnvh0R3BjLtF2UIrDcCoRYSYsrC49cTtRPuBVCFcxyJTtwF-ewNBD3e0Hw1p_u1AQDjYVlckbas-0d1Hd9GmH4Eaz6gdKcwassS1tpC34FAHuyL0oJSHDKJXdOZiTf5ahLJlHac69q08-rAKSyJLkOa2EYVo1HMvLPShj48xuudwVgy9lIZkrjEjU5GtRLnJwBM3AfUbenwxxDqkOu1VhgCYM5K9CbJ82keWAeNQ7dp0rDvJJpZEZJLsn3OzdW6XXNGS-EdWMNPg3T9FuanG8yNHyGJ4eTqG7Q7fPcXZ7imbayyyw75tlVFhxT5xgvtCaMMEBjiK3D2W4nzUq0nej56vXWYSSvwnq0N52ncgE5W5eukWPICzSbCS7_k2LSpKhHnu1906jtmV4cQJ0gzveJcovaL2MNcMXKuUB6kckJnbBYuAJU0B3r_DbuPQwH8Aayfvq1976fVtibW-w-yFhZq0pQ0xf3l2Q9l6v0GWhjpeF23YH-wyOVBnwmmahbVb9X3CB8dm91IFdKu2rBBBtz4QVdbg0USCc5A2bhOSLmje0OB0gTuWj_BnxhIU4F7PY-dE8nTFcFCkHf9jhmzCQYukDqD1H1dQ4C3tAT_Xf8BdnNXdo=" alt="Diagram 3" /></p>

<p>This reveals a fundamental truth: <strong>context window is inherently a scarce reasoning resource</strong>. MCP’s philosophy is “pre-reserve all possibilities”; Router CLI’s is “activate on-demand” and “optimize dynamically.”</p>

<p>Within a 200K context budget, MCP wastes 400x resources on cold tool definitions. Router CLI redirects these to “problem space”—where deep chain-of-thought reasoning, historical accumulation, and error recovery happen.</p>

<h3 id="deep-dive-why-this-difference-matters-in-practice">Deep Dive: Why This Difference Matters in Practice</h3>

<p><strong>1. Long-Chain Reasoning Capability</strong></p>

<p>When diagnosing complex production issues, Agents might need to:</p>
<ul>
  <li>Query logs (engaging multiple time-series)</li>
  <li>Extract information (preserving intermediate state)</li>
  <li>Cross-verify (building on results from earlier steps)</li>
  <li>Make decisions (requiring clear reasoning space)</li>
</ul>

<p>In MCP mode, each step of this chain executes within a crowded 100-tool Context. Model attention constantly fragments, causing “Chain Forgetting”—later steps lose early constraints.</p>

<p>In Router CLI mode, the reasoning chain unfolds in a “clear” environment, with only current-session tools occupying attention.</p>

<p><strong>2. Model Architecture-Level Performance Degradation</strong></p>

<p>Research shows Transformer attention degrades significantly when input length exceeds 60-70% of context window (called “Context Limit Degradation”). In MCP, tool definitions alone occupy 40% of the window, compressing reasoning space to the efficiency inflection point. Router CLI by “timely release” of cold tools maintains reasoning space in the 80%+ sweet spot.</p>

<h2 id="security-considerations-prompt-injection-and-command-execution-risks">Security Considerations: Prompt Injection and Command Execution Risks</h2>

<p>So far we’ve focused on efficiency advantages. But any technical decision involves tradeoffs. Router CLI introduces a risk requiring careful handling: <strong>command injection from prompt injection</strong>.</p>

<h3 id="risk-scenario">Risk Scenario</h3>

<p>Consider:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>User input: I want to query users named "admin'; DROP DATABASE;"
Agent-generated pipeline:
ms user list --filter "name=admin'; DROP DATABASE;"
</code></pre></div></div>

<p>While this specific example probably won’t directly execute SQL due to proper quoting, more complex scenarios could cause unintended command execution if bash parameter escaping is mishandled.</p>

<h3 id="defense-strategy">Defense Strategy</h3>

<p>Production Router CLI systems require layered defense:</p>

<p><strong>Layer One: Parameter Whitelisting and Type Validation</strong></p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># In Router CLI implementation</span>
ms user list <span class="nt">--filter</span> &lt;STRING_WITH_STRICT_PATTERN&gt;
<span class="c"># Rather than accepting arbitrary strings</span>
</code></pre></div></div>

<p><strong>Layer Two: Sandboxed Execution Environment</strong>
Execute Agent-generated commands within containers or sandboxes, restricting accessible resources and command scope. This is an industry best practice reflecting “don’t trust model-generated code.”</p>

<p><strong>Layer Three: Audit and Rollback Mechanisms</strong>
Log all command executions and establish rapid rollback mechanisms. Misexecuted commands can be quickly recovered.</p>

<p><strong>Layer Four: Agent-Side Defense Prompting</strong>
Via prompt engineering, explicitly guide Agents to avoid special characters in parameters. Example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>When constructing CLI commands, ALWAYS escape user inputs using proper shell quoting.
For example, use 'admin'"'"'s users' instead of 'admin's users'.
</code></pre></div></div>

<p>Critically, <strong>these risks exist in MCP too</strong>—they’re not CLI-specific. Any tool-calling system needs such defenses. Router CLI doesn’t reduce security; it demands clearer understanding of “tool-calling trust boundaries.”</p>

<h2 id="public-cli-out-of-the-box-advantages">Public CLI: Out-of-the-Box Advantages</h2>

<p>Another crucial mindset shift in enterprise systems: <strong>leverage existing public CLI tools aggressively; minimize custom tools</strong>.</p>

<p>Most major services provide comprehensive CLIs:</p>

<ul>
  <li><strong>AWS CLI / Azure CLI</strong>: Cover 90%+ of cloud operations</li>
  <li><strong>kubectl</strong>: The standard Kubernetes interface</li>
  <li><strong>gh</strong> (GitHub CLI) + <strong>curl</strong> + <strong>jq</strong>: Almost any API is accessible</li>
  <li><strong>Atlassian CLI</strong> (or direct curl): JIRA, Confluence, etc.</li>
  <li><strong>git</strong>: Version control and base operations</li>
</ul>

<p>These tools’ massive advantage: <strong>models already know them</strong>. In prompts, just saying “use AWS CLI” lets models automatically reason about <code class="language-plaintext highlighter-rouge">aws describe-instances</code>, <code class="language-plaintext highlighter-rouge">aws ec2 ...</code> commands. No schema needed.</p>

<p>The recommended strategy in practice:</p>

<ol>
  <li><strong>Prioritize public CLIs</strong>: During architecture design, ask “can I use existing CLI tools?”</li>
  <li><strong>Enhance via pipelines and scripts</strong>: Combine multiple public CLI tools into higher-level operations</li>
  <li><strong>Design CLI only for core private logic</strong>: Create custom CLI commands only for truly proprietary operations</li>
</ol>

<p>The result: custom CLI count drops significantly. Perhaps what once required 30 custom tools now needs just 10. This directly translates to linear token cost and cognitive load reduction.</p>

<h2 id="architecture-decision-framework-when-to-use-mcp-when-to-use-cli">Architecture Decision Framework: When to Use MCP, When to Use CLI</h2>

<p>Based on the analysis, here’s a practical decision framework:</p>

<p><img src="https://kroki.io/plantuml/svg/eNptk01v2zAMhu_-FbwMSDAULrpbMiAN2m4Llg5F12HYUZFpm6hMeRKVNPv1k2yrDdpeZJkfrx--sopLL8pJ6EwxbIrFPWqkPQLjAcRaA8SCjVNClsHh30AOO2RZFouNB2lxrFLQKWIvDlUHfdgZ0kNiBbOvJOX3sEMtplz_Cw7L9e-fJYqeLwuqYXb7fuM8iTPM_qCfFwCLXx5z_mq7WabQhmvrOlg3kQf2pKB3tusl5ZCrAk1smf2wQ3uCTaAVeW336I5gGc-EOgTrwNjDWZ2mQ9bHVRJIZFt7gOdometfgU1ot1d3y_Htzig9uVJhTUyDdcQndCNfXAfCb9S0J99xGLzaGZzEry1O6NqGOCc-acQKLs5Xo1ICvRliPgbfwE149zYIuuQcKKdbkngc8SiWueYaPTUMgammqB4NjQ59hOrIqouOt2j659otyuS5QeUYquCImwiGOqRZc-E4Y57yCx4igLSKE2bWutkrE5Qkvx4jt7Zecnsa7MGKMmPOw2f4dP4BbB2r4j_5JHnWtdbYy4tlb8_klCbzZM-i5qu2F6_e7eaK6uJ0Nz7H1Yvti8u4j1fqP984CYk=" alt="Diagram 4" /></p>

<p>This framework’s principle: prioritize token efficiency and cognitive load to select the right integration mode.</p>

<h2 id="conclusion-the-paradigm-shift">Conclusion: The Paradigm Shift</h2>

<p>AI Agents have evolved from simple chatbots to enterprise automation programs. This shift demands more than better LLMs—it requires <strong>deep understanding of complex systems</strong>.</p>

<p>The MCP-to-CLI transition fundamentally reflects maturity:</p>

<p><strong>1. CLI is more efficient abstraction</strong>. It replaces verbose Schema definitions with symbolic interfaces. Models process known symbol systems far more efficiently than understanding new data structures from scratch.</p>

<p><strong>2. Pipelines represent paradigm shift</strong>. From single Tool Calls to multi-step Pipeline chaining, we gain not just token savings but execution determinism and system reliability—critical for production applications.</p>

<p><strong>3. Router CLI enables scalability</strong>. For enterprise microservice arrays, Router CLI with dynamic self-description offers a scalable “tool discovery” path. Agents learn during execution without pre-declaring all possibilities.</p>

<p><strong>4. Leverage existing tools</strong>. Fully leverage models’ intrinsic knowledge of public CLI tools; minimize custom tool count. This may be the fastest path to improved token efficiency.</p>

<p>Behind this shift lies deeper understanding of <strong>information theory</strong> and <strong>system design</strong>: when facing complex systems, constraints (like strict command formats) aren’t burdens—they’re features. They reduce model search space, elevate reasoning determinism, ultimately improving system efficiency.</p>

<p>This evolution moves from “more information enables better reasoning” to “carefully structured information enables optimal reasoning.” Just as mature architects choose well-designed APIs over vast documentation, future AI Agent systems will find true efficiency within clear CLI interfaces and orderly pipeline modes.</p>

<hr />

<h2 id="extended-reading-and-practical-recommendations">Extended Reading and Practical Recommendations</h2>

<p>If you’re designing a new AI Agent system or improving an existing one, consider:</p>

<ol>
  <li>
    <p><strong>Token Audit</strong>: Calculate current tool definition tokens and compare actual usage frequency. You’ll likely find many “cold tools” rarely used.</p>
  </li>
  <li>
    <p><strong>Minimal Router CLI Prototype</strong>: Select 3-5 most-used services from your system. Design a Router CLI. Compare pre/post efficiency and token consumption.</p>
  </li>
  <li>
    <p><strong>Cognitive Load Metrics</strong>: Track agent “correction rate”—how often models misunderstand and make incorrect decisions. You’ll see significant improvement post-CLI migration.</p>
  </li>
  <li>
    <p><strong>Leverage Public CLI Knowledge</strong>: Explicitly list available public tools in your prompts. Let models self-select, rather than passively awaiting MCP declarations.</p>
  </li>
</ol>

<p>Enterprise AI Agent system efficiency ultimately depends on whether we organize tools and information in models’ most natural, efficient manner. CLI and Router architecture prove: sometimes, computing’s most ancient paradigm (command line) offers the deepest alignment with cutting-edge AI technology.</p>]]></content><author><name>Zhigang Ji</name></author><category term="AI" /><category term="Architecture" /><category term="MCP" /><category term="CLI" /><category term="Enterprise-Grade" /><category term="Design" /><category term="Microservices" /><summary type="html"><![CDATA[Exploring how CLI-based approaches are replacing MCP for large-scale AI Agent systems, with deep analysis of token efficiency, execution density, and enterprise architecture patterns.]]></summary></entry><entry><title type="html">Preventing Overposting in ASP.NET Core JSON Patch APIs</title><link href="https://derekji.github.io/blog/2026/03/25/prevent-overposting-in-asp-dotnet-core-json-patch-en.html" rel="alternate" type="text/html" title="Preventing Overposting in ASP.NET Core JSON Patch APIs" /><published>2026-03-25T00:00:00+00:00</published><updated>2026-03-25T00:00:00+00:00</updated><id>https://derekji.github.io/blog/2026/03/25/prevent-overposting-in-asp-dotnet-core-json-patch-en</id><content type="html" xml:base="https://derekji.github.io/blog/2026/03/25/prevent-overposting-in-asp-dotnet-core-json-patch-en.html"><![CDATA[<p><em>A custom Model Binder that enforces field-level access control using annotations</em></p>

<h2 id="1-it-starts-with-an-innocent-looking-request">1. It Starts with an Innocent-Looking Request</h2>

<p>Imagine you’re building a multi-step online form. Each time the user clicks “Next”, the frontend only needs to send the fields filled in on that step — not the entire form all over again.</p>

<p>That’s a perfectly reasonable requirement. To solve it, we chose <strong>JSON Patch</strong> (RFC 6902, <a href="https://datatracker.ietf.org/doc/html/rfc6902">official spec</a>) — a standard format specifically designed to describe partial modifications to a JSON document.</p>

<p>A typical JSON Patch request looks like this:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
  </span><span class="p">{</span><span class="w"> </span><span class="nl">"op"</span><span class="p">:</span><span class="w"> </span><span class="s2">"replace"</span><span class="p">,</span><span class="w"> </span><span class="nl">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/firstName"</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"Alice"</span><span class="w"> </span><span class="p">},</span><span class="w">
  </span><span class="p">{</span><span class="w"> </span><span class="nl">"op"</span><span class="p">:</span><span class="w"> </span><span class="s2">"replace"</span><span class="p">,</span><span class="w"> </span><span class="nl">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/dateOfBirth"</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"1990-01-15"</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<p>Straightforward enough: two <code class="language-plaintext highlighter-rouge">replace</code> operations, updating <code class="language-plaintext highlighter-rouge">/firstName</code> to <code class="language-plaintext highlighter-rouge">"Alice"</code> and <code class="language-plaintext highlighter-rouge">/dateOfBirth</code> to <code class="language-plaintext highlighter-rouge">"1990-01-15"</code>.</p>

<p>So far, so good.</p>

<hr />

<h2 id="2-the-problem">2. The Problem</h2>

<p>Now look at the server-side data model. In a multi-step application, the backend maintains a single “state object” that spans the entire session. It contains two kinds of fields:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>State Object
├── Fields the user fills in (client may modify)
│   ├── firstName
│   ├── dateOfBirth
│   ├── occupation
│   └── ...
│
└── Fields managed by the server (client must NEVER modify)
    ├── pricingToken        ← token generated by the pricing engine
    ├── policyNumber        ← assigned policy number
    ├── externalReferenceId ← correlation ID from an external system
    ├── redirectionUrl      ← server-injected redirect target
    ├── completedStages     ← stages the user has completed (used by flow guards)
    └── ...
</code></pre></div></div>

<p>The problem: <strong>the <code class="language-plaintext highlighter-rouge">path</code> in a JSON Patch request can point to any field.</strong></p>

<p>A malicious user — or a security researcher — can craft a request like this:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">[</span><span class="w">
  </span><span class="p">{</span><span class="w"> </span><span class="nl">"op"</span><span class="p">:</span><span class="w"> </span><span class="s2">"replace"</span><span class="p">,</span><span class="w"> </span><span class="nl">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/pricingToken"</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"my-forged-token"</span><span class="w"> </span><span class="p">},</span><span class="w">
  </span><span class="p">{</span><span class="w"> </span><span class="nl">"op"</span><span class="p">:</span><span class="w"> </span><span class="s2">"replace"</span><span class="p">,</span><span class="w"> </span><span class="nl">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/redirectionUrl"</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://evil.example.com"</span><span class="w"> </span><span class="p">},</span><span class="w">
  </span><span class="p">{</span><span class="w"> </span><span class="nl">"op"</span><span class="p">:</span><span class="w"> </span><span class="s2">"replace"</span><span class="p">,</span><span class="w"> </span><span class="nl">"path"</span><span class="p">:</span><span class="w"> </span><span class="s2">"/completedStages"</span><span class="p">,</span><span class="w"> </span><span class="nl">"value"</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">"step3"</span><span class="p">,</span><span class="w"> </span><span class="s2">"step4"</span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">]</span><span class="w">
</span></code></pre></div></div>

<p>Without any server-side defence, these operations are executed as-is:</p>
<ul>
  <li>The pricing token is tampered with — subsequent underwriting calculations are based on forged data</li>
  <li>The redirect URL is overwritten — every dynamically generated link on the page now points to a malicious site</li>
  <li>Completed stages are fabricated — the user can skip flow validation and jump straight to checkout</li>
</ul>

<p>This is an <strong>Overposting attack</strong>, also known as <strong>Mass Assignment</strong> — a classic vulnerability listed in the OWASP API Security Top 10 (<a href="https://owasp.org/API-Security/editions/2023/en/0xa3-broken-object-property-level-authorization/">OWASP API3: Broken Object Property Level Authorization</a>).</p>

<hr />

<h2 id="3-why-json-patch-makes-this-harder-to-see-coming">3. Why JSON Patch Makes This Harder to See Coming</h2>

<p>Mass Assignment exists in traditional REST APIs too, but there’s usually a natural layer of protection: developers define a dedicated DTO (request model) that only exposes the fields a client is allowed to set. The server then maps from the DTO to the domain object.</p>

<p>In a JSON Patch workflow, that protection simply doesn’t exist:</p>

<pre><code class="language-mermaid">graph LR
    A[Traditional PUT/POST] --&gt; B[DTO filter layer&lt;br/&gt;only exposes allowed fields]
    B --&gt; C[Domain model]

    D[JSON Patch PATCH] --&gt; E[No filter layer&lt;br/&gt;path can point to any field]
    E --&gt; F[Domain model]
</code></pre>

<p>JSON Patch was designed to let you say precisely what you want to change — but that expressiveness is exactly what widens the attack surface. The target is no longer a constrained DTO; it’s the entire state tree.</p>

<p>The framework’s default JSON Patch handler (such as ASP.NET Core’s <code class="language-plaintext highlighter-rouge">JsonPatchDocument</code>) has no opinion on whether a given field <em>should</em> be writable by a client. It simply executes whatever <code class="language-plaintext highlighter-rouge">path</code> it’s given.</p>

<hr />

<h2 id="4-the-obvious-fix--and-why-it-falls-short">4. The Obvious Fix — and Why It Falls Short</h2>

<p>The first instinct is to check each <code class="language-plaintext highlighter-rouge">path</code> before processing and reject anything on a blocklist:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Blocklist approach</span>
<span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">operation</span> <span class="k">in</span> <span class="n">patchOperations</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">Blocklist</span><span class="p">.</span><span class="nf">Contains</span><span class="p">(</span><span class="n">operation</span><span class="p">.</span><span class="n">Path</span><span class="p">))</span>
        <span class="k">return</span> <span class="nf">BadRequest</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>

<p><strong>The problem: a blocklist requires manual maintenance and is inherently reactive.</strong></p>

<p>Every time a new server-managed field is added to the model, someone has to remember to update the blocklist. That’s easy to forget — especially in a fast-moving codebase with multiple contributors.</p>

<p>More fundamentally, this violates the principle that security should be structural, not memorial.</p>

<hr />

<h2 id="5-the-right-approach-let-fields-declare-their-own-access-level">5. The Right Approach: Let Fields Declare Their Own Access Level</h2>

<p>A better model is to move the ownership of the protection rule to the field itself — <strong>declare at definition time whether a field can be modified by the client</strong>, rather than maintaining a separate list somewhere else.</p>

<p>This is the core mechanism we introduced: the <strong><code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code> annotation</strong>.</p>

<p>In the model, every server-managed field gets the annotation:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">class</span> <span class="nc">ApplicationStateModel</span>
<span class="p">{</span>
    <span class="c1">// User-provided fields — no annotation, client may modify</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">FirstName</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
    <span class="k">public</span> <span class="n">DateTime</span> <span class="n">DateOfBirth</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="c1">// Server-managed fields — annotated, client must not modify</span>
    <span class="p">[</span><span class="n">ClientReadOnly</span><span class="p">]</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">PricingToken</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="p">[</span><span class="n">ClientReadOnly</span><span class="p">]</span>
    <span class="k">public</span> <span class="kt">string</span> <span class="n">PolicyNumber</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>

    <span class="p">[</span><span class="n">ClientReadOnly</span><span class="p">]</span>
    <span class="k">public</span> <span class="n">List</span><span class="p">&lt;</span><span class="kt">string</span><span class="p">&gt;</span> <span class="n">CompletedStages</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The benefit: <strong>the protection rule lives next to the field definition</strong>. When a developer adds a new field, they’re naturally prompted to decide: “Should this be <code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code>?” The question is asked at the right moment — during design, not during a post-incident review.</p>

<hr />

<h2 id="6-implementation-intercept-at-the-model-binding-stage">6. Implementation: Intercept at the Model Binding Stage</h2>

<p>The annotation alone doesn’t do anything. You need a mechanism to <strong>automatically strip out</strong> any Patch operations targeting annotated fields before they reach business logic.</p>

<p>We do this filtering at the <strong>Model Binding stage</strong> — the earliest point in the web framework’s request pipeline, before the request body is even deserialised into an object:</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant Client
    participant Custom ModelBinder
    participant Business Logic

    Client-&gt;&gt;Custom ModelBinder: JSON Patch request (any operations)
    Note over Custom ModelBinder: Filter stage:&lt;br/&gt;Scan each operation's path&lt;br/&gt;Resolve the corresponding field&lt;br/&gt;If [ClientReadOnly] — silently discard
    Custom ModelBinder-&gt;&gt;Business Logic: Sanitised operation list (only legitimate ops)
    Business Logic-&gt;&gt;Client: Normal response
</code></pre>

<p>The core filtering algorithm:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">private</span> <span class="n">IList</span><span class="p">&lt;</span><span class="n">Operation</span><span class="p">&gt;</span> <span class="nf">FilterClientReadOnlyOperations</span><span class="p">(</span>
    <span class="n">IList</span><span class="p">&lt;</span><span class="n">Operation</span><span class="p">&gt;</span> <span class="n">operations</span><span class="p">,</span> <span class="n">Type</span> <span class="n">modelType</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="k">new</span> <span class="n">List</span><span class="p">&lt;</span><span class="n">Operation</span><span class="p">&gt;();</span>

    <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">operation</span> <span class="k">in</span> <span class="n">operations</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="c1">// JSON Patch path format: "/fieldName" or "/nested/fieldName"</span>
        <span class="kt">var</span> <span class="n">propertyPath</span> <span class="p">=</span> <span class="nf">ParsePath</span><span class="p">(</span><span class="n">operation</span><span class="p">.</span><span class="n">Path</span><span class="p">);</span>

        <span class="c1">// Use reflection to resolve the field at the end of the path</span>
        <span class="kt">var</span> <span class="n">property</span> <span class="p">=</span> <span class="nf">ResolveProperty</span><span class="p">(</span><span class="n">modelType</span><span class="p">,</span> <span class="n">propertyPath</span><span class="p">);</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">property</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="c1">// Path points to a non-existent field — discard</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="k">if</span> <span class="p">(</span><span class="n">property</span><span class="p">.</span><span class="n">GetCustomAttribute</span><span class="p">&lt;</span><span class="n">ClientReadOnlyAttribute</span><span class="p">&gt;()</span> <span class="p">!=</span> <span class="k">null</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="c1">// Annotation hit — silently discard, write to log</span>
            <span class="n">_logger</span><span class="p">.</span><span class="nf">LogWarning</span><span class="p">(</span><span class="s">"Blocked ClientReadOnly field: {Path}"</span><span class="p">,</span> <span class="n">operation</span><span class="p">.</span><span class="n">Path</span><span class="p">);</span>
            <span class="k">continue</span><span class="p">;</span>
        <span class="p">}</span>

        <span class="n">result</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">operation</span><span class="p">);</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Two implementation details worth calling out:</p>

<p><strong>1. Path resolution must support nesting</strong></p>

<p>A JSON Patch <code class="language-plaintext highlighter-rouge">path</code> can be deeply nested — for example, <code class="language-plaintext highlighter-rouge">/profile/address</code>. The algorithm needs to walk the path segment by segment (<code class="language-plaintext highlighter-rouge">profile</code> → <code class="language-plaintext highlighter-rouge">address</code>) until it reaches the terminal field, then check for the <code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code> annotation.</p>

<p><strong>2. Silently discard, don’t error</strong></p>

<p>The filter drops illegal operations silently rather than returning a 400. The reasoning: an attacker should receive no useful feedback (denying reconnaissance). If the operation came from a framework bug or misconfigured tool, discarding is more resilient than breaking the request — and the log entry ensures the issue is still surfaced.</p>

<hr />

<h2 id="7-plugging-into-the-framework">7. Plugging Into the Framework</h2>

<p>In ASP.NET Core, Model Binding behaviour is extended via <code class="language-plaintext highlighter-rouge">IModelBinderProvider</code>. We implemented a custom <code class="language-plaintext highlighter-rouge">FlowModelBinderProvider</code> that performs the filtering above before any request reaches an action method, and registered it in the DI container to replace the default binder.</p>

<p>Every JSON Patch endpoint gets this protection automatically — no per-controller or per-action boilerplate required.</p>

<p>The full flow:</p>

<pre><code class="language-mermaid">graph TD
    A[HTTP Request\nJSON Patch Body] --&gt; B[FlowModelBinderProvider\nCustom Model Binder]
    B --&gt; C{Scan each operation}
    C --&gt; D[Resolve field via reflection\nalong the path]
    D --&gt; E{Annotated with\nClientReadOnly?}
    E --&gt;|Yes| F[Silently discard\nWrite to log]
    E --&gt;|No| G[Keep operation]
    F --&gt; H[Next operation]
    G --&gt; H
    H --&gt; I[Sanitised operation list]
    I --&gt; J[Business logic layer\nController → Task → Service]
</code></pre>

<hr />

<h2 id="8-a-second-defence-layer-xss-content-scanning">8. A Second Defence Layer: XSS Content Scanning</h2>

<p>While building the Overposting defence, we added <strong>XSS content scanning</strong> in the same pipeline at no extra cost: each operation’s <code class="language-plaintext highlighter-rouge">value</code> is checked for dangerous patterns — <code class="language-plaintext highlighter-rouge">&lt;script&gt;</code> tags, inline event handlers (<code class="language-plaintext highlighter-rouge">onXxx=</code>), <code class="language-plaintext highlighter-rouge">javascript:</code> URIs, and similar. If any match, the entire request is rejected before it touches the application.</p>

<p>Together, the two layers form a combined defence:</p>

<table>
  <thead>
    <tr>
      <th>Defence layer</th>
      <th>Protects against</th>
      <th>Behaviour on violation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code> filtering</td>
      <td>Server fields being overwritten</td>
      <td>Silently discard the operation</td>
    </tr>
    <tr>
      <td>XSS content scanning</td>
      <td>Injected scripts entering the system</td>
      <td>Reject the entire request (400)</td>
    </tr>
  </tbody>
</table>

<hr />

<h2 id="9-proving-it-works-the-tests">9. Proving It Works: The Tests</h2>

<p>Any security mechanism without test coverage is a security mechanism you can’t trust.</p>

<p>We wrote two test suites for the filtering logic:</p>

<p><strong><code class="language-plaintext highlighter-rouge">ModelBinderHelperRemoveOperationsTests</code></strong> (validates filtering behaviour):</p>
<ul>
  <li>A <code class="language-plaintext highlighter-rouge">replace</code> operation targeting a <code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code> field → filtered out</li>
  <li>A <code class="language-plaintext highlighter-rouge">replace</code> operation targeting a non-annotated field → passes through</li>
  <li>A nested path (e.g. <code class="language-plaintext highlighter-rouge">/profile/address</code>) → correctly resolved and filtered</li>
  <li>A path pointing to a non-existent field → filtered out</li>
  <li>An empty operation list → returns empty normally</li>
</ul>

<p><strong><code class="language-plaintext highlighter-rouge">ModelBinderHelperBindValueTests</code></strong> (validates binding behaviour):</p>
<ul>
  <li>Legitimate operations after filtering are correctly bound to model fields</li>
  <li>After filtering, the server-side value of a protected field remains unchanged</li>
</ul>

<p>Together, these two suites cover both directions: “filtered what should be filtered” and “kept what should be kept.”</p>

<hr />

<h2 id="10-three-principles-of-security-design">10. Three Principles of Security Design</h2>

<p>This implementation prompted us to think more carefully about what good security design actually looks like.</p>

<p><strong>1. Allowlist over blocklist; declaration over inspection</strong></p>

<p>A blocklist asks developers to remember “this field is dangerous, so add it.” The <code class="language-plaintext highlighter-rouge">[ClientReadOnly]</code> annotation asks developers to consider “can the client modify this field?” The former is reactive; the latter is built into the design. Their failure modes are different: blocklists fail by omission, annotations fail by mislabelling — and deliberate omissions are far less common than forgotten list entries.</p>

<p><strong>2. Enforce protection as early in the pipeline as possible</strong></p>

<p>The filtering happens at the Model Binding stage. By the time business logic runs, the illegal operations don’t exist — they were removed at the gate. This mirrors how firewalls work: defence closest to the entry point is hardest to bypass.</p>

<p><strong>3. Security mechanisms must have tests</strong></p>

<p>Security code that passes today can silently break during a future refactor. Tests don’t just validate “it works now” — they’re a regression guard that protects the defence from being accidentally removed later.</p>]]></content><author><name>Zhigang Ji</name></author><category term="ASP.NET Core" /><category term="JSON Patch" /><category term="Security" /><category term="Overposting" /><category term="Mass Assignment" /><category term="Model Binder" /><summary type="html"><![CDATA[A custom Model Binder that enforces field-level access control using annotations, preventing overposting/mass assignment vulnerabilities in JSON Patch APIs.]]></summary></entry><entry><title type="html">Why We Removed SignalR — Elegant Architecture Doesn’t Always Mean the Right Fit</title><link href="https://derekji.github.io/blog/2026/03/24/reason-why-removed-signalr-en.html" rel="alternate" type="text/html" title="Why We Removed SignalR — Elegant Architecture Doesn’t Always Mean the Right Fit" /><published>2026-03-24T00:00:00+00:00</published><updated>2026-03-24T00:00:00+00:00</updated><id>https://derekji.github.io/blog/2026/03/24/reason-why-removed-signalr-en</id><content type="html" xml:base="https://derekji.github.io/blog/2026/03/24/reason-why-removed-signalr-en.html"><![CDATA[<h2 id="1-what-were-building">1. What We’re Building</h2>

<p>Our system is an online insurance quoting platform. A user opens a web page, answers a series of questions — vehicle details, driving history, personal information — and the system calculates a premium in the background, presenting a final price for the user to accept or decline.</p>

<p>The flow spans more than a dozen steps. At each step forward, the backend sends the newly collected data to a downstream underwriting engine, retrieves the latest quote result, and displays it to the user.</p>

<p><strong>The core interaction model is simple: user fills in → backend calculates → frontend shows result → user continues.</strong></p>

<hr />

<h2 id="2-how-signalr-got-in">2. How SignalR Got In</h2>

<p>At project kick-off, the team faced a design question: the underwriting engine’s calculation is asynchronous — it can’t return a result immediately. How do you push the result back to the frontend?</p>

<p>At the time, the underwriting engine and our application were part of the same deployment unit, and the team had full control over it. The technology choice was <strong>SignalR</strong> — Microsoft’s standard solution for real-time bidirectional communication in the .NET ecosystem, built on WebSockets with fallback support for Server-Sent Events and Long Polling.</p>

<p>On the architecture diagram, the solution looked elegant:</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant Browser
    participant BFF
    participant Underwriting Engine

    Browser-&gt;&gt;BFF: Submit form data
    BFF-&gt;&gt;Underwriting Engine: Initiate calculation
    BFF--&gt;&gt;Browser: 202 Accepted (return immediately)
    Underwriting Engine--&gt;&gt;BFF: Calculation complete
    BFF-&gt;&gt;Browser: Push result via WebSocket Hub ✅
</code></pre>

<p>The user doesn’t have to wait — the backend pushes at the exact moment the calculation finishes, with minimal latency.</p>

<p>Was this the right choice? <strong>In certain scenarios, absolutely.</strong></p>

<p>Over time, however, the underwriting engine was split out into an independent microservice, maintained by a separate team. <strong>Our application became a pure consumer — able only to call the interfaces it exposed, with no ability to modify its behaviour.</strong> What had once been an “internal implementation detail” was now a coupling point spanning a service boundary. By the time we started re-examining this design, the architectural reality had changed completely from when the original decision was made.</p>

<hr />

<h2 id="3-what-signalr-is-actually-good-for">3. What SignalR Is Actually Good For</h2>

<p>Before diving into our problems, let’s establish a baseline: SignalR is a solid technology. It solves a real class of problems.</p>

<p>The scenarios where SignalR genuinely shines share these characteristics:</p>

<pre><code class="language-mermaid">graph LR
    A[Server-initiated events] --&gt; B[Unpredictable timing]
    B --&gt; C[Multiple clients need to receive simultaneously]
    C --&gt; D[Clients cannot or should not poll]
</code></pre>

<p>Canonical examples:</p>
<ul>
  <li><strong>Collaborative documents</strong> (like Google Docs): any user’s edit must be synchronised in real time to everyone else with the document open</li>
  <li><strong>Stock price dashboards</strong>: prices change every second; push is far more efficient than polling</li>
  <li><strong>Chat rooms</strong>: message arrival is completely unpredictable; frequent polling wastes resources</li>
  <li><strong>High-concurrency notification systems</strong>: tens of thousands of connections simultaneously waiting for server-side pushes</li>
</ul>

<p>What these scenarios have in common: <strong>the server needs to broadcast to multiple connections, or the timing of events is truly unpredictable — the wait could be seconds or hours.</strong></p>

<hr />

<h2 id="4-looking-back-at-our-actual-situation">4. Looking Back at Our Actual Situation</h2>

<p>Now apply that standard to our scenario.</p>

<p><strong>How slow is our underwriting calculation?</strong></p>

<p>From monitoring data: the vast majority of calculations complete in <strong>1–2 seconds</strong>, with the slowest cases under 5 seconds. This isn’t “unpredictable, potentially long-duration waiting” — it’s a known, bounded window.</p>

<p><strong>How many concurrent users are waiting?</strong></p>

<p>Our platform sees tens of thousands to low hundreds of thousands of visits per day — a normal scale for a consumer insurance business. Even at peak, the number of users concurrently waiting for a quote calculation is quite limited. Insurance quoting is a <strong>deliberate, step-by-step process</strong>, not a flash sale.</p>

<p><strong>Does the result need to be broadcast to multiple clients?</strong></p>

<p>Not at all. Each calculation result belongs only to the one user who initiated that request. There is zero broadcast requirement.</p>

<p>A direct comparison:</p>

<table>
  <thead>
    <tr>
      <th>Characteristic</th>
      <th>Where SignalR fits</th>
      <th>Our actual situation</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Wait duration</td>
      <td>Unpredictable (seconds to hours)</td>
      <td>Known (1–5 seconds)</td>
    </tr>
    <tr>
      <td>Number of receivers</td>
      <td>Broadcast to multiple clients</td>
      <td>Only the requesting user</td>
    </tr>
    <tr>
      <td>Concurrent connection scale</td>
      <td>Tens of thousands of persistent connections</td>
      <td>Small number of brief waits</td>
    </tr>
    <tr>
      <td>Who triggers the event</td>
      <td>Server-initiated</td>
      <td>In response to user action</td>
    </tr>
  </tbody>
</table>

<p><strong>Conclusion:</strong> Our scenario simply does not need SignalR. We brought a sledgehammer to drive a thumbtack.</p>

<hr />

<h2 id="5-the-real-cost-of-maintaining-signalr">5. The Real Cost of Maintaining SignalR</h2>

<p>An ill-fitting architecture is a design problem, but at the engineering level it gradually becomes a <strong>maintenance burden</strong>.</p>

<p><strong>Frontend complexity</strong></p>

<p>To integrate with the SignalR Hub, the frontend had to maintain an entire infrastructure:</p>

<pre><code class="language-mermaid">graph TD
    A[SignalR Module] --&gt; B[Connection management&lt;br/&gt;establish / disconnect / reconnect]
    A --&gt; C[HTTP Interceptor&lt;br/&gt;intercept requests, suspend until Hub callback]
    A --&gt; D[Response Handler Service&lt;br/&gt;process messages pushed from Hub]
    A --&gt; E[Awaitable Command model&lt;br/&gt;convert SignalR callbacks to Promises]
    A --&gt; F[Test helper components&lt;br/&gt;Mock Hub / Mock Handler]
</code></pre>

<p>More than 10 files, each requiring development, testing, and maintenance. Every framework upgrade needed verification that this mechanism still worked. Every new engineer needed an explanation of “why does sending an HTTP request require waiting for a WebSocket callback before execution continues?”</p>

<p><strong>Backend dependencies</strong></p>

<p>The backend needed:</p>
<ul>
  <li>A dedicated Azure SignalR Service (a managed service with its own cost)</li>
  <li>Hub class and connection management logic</li>
  <li>A full set of configuration options and validators</li>
</ul>

<p><strong>Test friction</strong></p>

<p>Unit tests required mocking Hub behaviour; integration tests had to ensure WebSocket connections were stable in the test environment. These extra test infrastructure pieces created friction on every iteration.</p>

<hr />

<h2 id="6-the-alternatives-we-evaluated">6. The Alternatives We Evaluated</h2>

<p>Before deciding to remove SignalR, we assessed several alternatives — worth going through one by one to explain <strong>why we didn’t choose them</strong>.</p>

<h3 id="option-a-bff-internal-synchronous-wait-making-http-look-synchronous">Option A: BFF-internal synchronous wait (making HTTP “look synchronous”)</h3>

<p>The BFF receives the frontend request, polls the downstream calculation service internally, and only returns to the frontend once it has the result. From the frontend’s perspective, there is a single request/response.</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant Frontend
    participant BFF
    participant Underwriting Engine

    Frontend-&gt;&gt;BFF: POST calculation request
    BFF-&gt;&gt;Underwriting Engine: Initiate calculation
    Underwriting Engine--&gt;&gt;BFF: 202 Accepted
    loop BFF internal polling
        BFF-&gt;&gt;Underwriting Engine: GET status
        Underwriting Engine--&gt;&gt;BFF: Not ready
    end
    Underwriting Engine--&gt;&gt;BFF: Calculation complete
    BFF--&gt;&gt;Frontend: 200 + result
</code></pre>

<p><strong>Why we didn’t choose it:</strong></p>
<ul>
  <li>BFF HTTP connections have timeout limits (typically 30–60 seconds); if a calculation occasionally exceeds this, the frontend receives a 504 rather than a normal result</li>
  <li>BFF threads are blocked during the wait, increasing thread pool pressure under load</li>
  <li>Most importantly: the underwriting engine was by now an independent microservice — <strong>it only exposed a GET query interface, with no way for us to make it support “synchronous wait” semantics</strong> — we simply couldn’t change how it responded</li>
</ul>

<h3 id="option-b-server-sent-events-sse">Option B: Server-Sent Events (SSE)</h3>

<p>SSE is HTTP’s one-way “streaming” mechanism: the server can continuously push data to the frontend, which receives via the <code class="language-plaintext highlighter-rouge">EventSource</code> API. Lighter than SignalR, no WebSocket required.</p>

<p>Why we didn’t choose it:</p>
<ul>
  <li>Still requires a persistent connection; maintenance cost compared to SignalR is reduced but not eliminated</li>
  <li>Given that our known wait time is only 1–5 seconds, the overhead of keeping a connection open is not worth it</li>
  <li>Interaction with the downstream service still boils down to HTTP + polling; the only benefit of SSE would be in the frontend layer</li>
</ul>

<h3 id="option-c-polling-what-we-ultimately-chose">Option C: Polling (what we ultimately chose)</h3>

<p>The frontend issues an operation request. The BFF internally sends status queries to the downstream service until it confirms completion, then returns the final result to the frontend.</p>

<p>This is precisely the pattern that SignalR was originally designed to “replace.” In our scenario, <strong>it turned out to be the most appropriate choice.</strong></p>

<hr />

<h2 id="7-our-final-solution-bff-internal-polling">7. Our Final Solution: BFF-Internal Polling</h2>

<p>The architectural change looks like this:</p>

<pre><code class="language-mermaid">sequenceDiagram
    participant Frontend
    participant BFF
    participant Downstream Service

    Note over Frontend,Downstream Service: Original approach (SignalR)
    Frontend-&gt;&gt;BFF: POST/PATCH operation request
    BFF-&gt;&gt;Downstream Service: Issue operation
    BFF--&gt;&gt;Frontend: 202 (return immediately)
    Downstream Service--&gt;&gt;BFF: Operation complete notification (WebSocket)
    BFF-&gt;&gt;Frontend: Push result via SignalR Hub

    Note over Frontend,Downstream Service: New approach (BFF-internal polling)
    Frontend-&gt;&gt;BFF: POST/PATCH operation request
    BFF-&gt;&gt;Downstream Service: Issue operation
    Downstream Service--&gt;&gt;BFF: 202 (accepted)
    loop BFF internal polling
        BFF-&gt;&gt;Downstream Service: GET query result
        Downstream Service--&gt;&gt;BFF: Result ready
    end
    BFF--&gt;&gt;Frontend: 200 + final result
</code></pre>

<p>From the user’s perspective, the experience is identical: click “Continue,” a spinner appears, and 1–2 seconds later the next step loads.</p>

<p><strong>Polling strategy design</strong></p>

<p>Polling is not an infinite loop. We use a <strong>linear incremental backoff</strong> strategy:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Poll 1: wait 1 second
Poll 2: wait 2 seconds
Poll 3: wait 3 seconds
...
Poll 7: wait 7 seconds
Maximum total wait: 28 seconds (1+2+3+4+5+6+7)
</code></pre></div></div>

<p><strong>Why does the second poll usually give us the result?</strong></p>

<p>This is the most critical engineering detail in the article: the downstream service caches GET responses, and every data mutation invalidates and immediately updates the cache. From the time a mutation request (POST/PATCH) is sent to when the cache is updated typically takes less than 1 second. So:</p>

<ul>
  <li>Poll 1 (after a 1-second delay): hits the already-updated cache — almost always returns the result</li>
  <li>Only in rare cases (when the downstream service is under higher load) are polls 2 or 3 needed</li>
</ul>

<p>In practice, 99% of cases complete within the first two polls, and the wait time perceived by users is indistinguishable from the original SignalR solution.</p>

<p><strong>Where does polling apply?</strong></p>

<p>Our polling is used for two categories of operation:</p>

<ol>
  <li><strong>Create/delete operations</strong>: after issuing a POST or DELETE, confirm via GET that the resource has been created or deleted (GET returning 200 confirms creation; returning 404 confirms deletion)</li>
  <li><strong>Update operations</strong>: after issuing a PATCH, read the field values back via GET and compare against the expected values rather than relying solely on the status code</li>
</ol>

<hr />

<h2 id="8-pitfalls-of-polling--and-how-we-handled-them">8. Pitfalls of Polling — and How We Handled Them</h2>

<p>Polling looks simple, but there are a few details worth taking seriously in a production environment.</p>

<h3 id="pitfall-1-what-happens-when-we-time-out">Pitfall 1: What happens when we time out?</h3>

<p>28 seconds is already a long maximum wait. If we’ve polled 7 times and still haven’t received the expected result, something is seriously wrong with the downstream service.</p>

<p>Our approach: once the maximum retry count is exceeded, throw an exception, log detailed error information (including the actual state returned by the last GET), and return a 500 to the frontend, directing the user to contact support.</p>

<p>This is far better than leaving the user staring at a spinner indefinitely.</p>

<h3 id="pitfall-2-test-suites-become-unbearably-slow">Pitfall 2: Test suites become unbearably slow</h3>

<p>If 28 seconds of total wait time ends up inside a unit test, each test case takes tens of seconds, and the entire test suite becomes unusable.</p>

<p>Our approach: design the polling delay as an <strong>injectable property</strong> (set to 1 millisecond in the test environment). Unit tests complete in milliseconds; production behaviour and test behaviour are perfectly consistent.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Production: DelayMillisecondsPerRetry = 1000 (each wait = N × 1 second)
Test:       DelayMillisecondsPerRetry = 1     (each wait = N × 1 millisecond)
</code></pre></div></div>

<p>A small design detail, but one that saves a significant amount of test time in practice.</p>

<h3 id="pitfall-3-bff-threads-blocked-during-polling">Pitfall 3: BFF threads blocked during polling</h3>

<p>The BFF’s internal polling is implemented with <code class="language-plaintext highlighter-rouge">async/await</code>. During the wait, the thread is released back to the thread pool (<code class="language-plaintext highlighter-rouge">Task.Delay</code> is non-blocking), so there is no thread starvation. This is standard usage of the modern .NET async model — no special handling required.</p>

<h3 id="pitfall-4-dont-confuse-polly-retries-with-business-level-polling">Pitfall 4: Don’t confuse Polly retries with business-level polling</h3>

<p>Our HTTP client is configured with both a Polly retry policy (for transient failures such as network errors and 5xx responses) and business-level polling (for waiting on downstream data synchronisation). The two are easy to conflate, and it’s important to keep them clearly separated:</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Polly retry</th>
      <th>Business polling</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Trigger</td>
      <td>HTTP error (5xx, network exception)</td>
      <td>Business state not yet ready (data not synced)</td>
    </tr>
    <tr>
      <td>Layer</td>
      <td>HTTP client layer</td>
      <td>Business service layer</td>
    </tr>
    <tr>
      <td>Expected outcome</td>
      <td>Retry yields a normal response</td>
      <td>Retry yields data matching expected value</td>
    </tr>
    <tr>
      <td>On final failure</td>
      <td>Wrapped as HTTP exception</td>
      <td>Wrapped as business exception</td>
    </tr>
  </tbody>
</table>

<p>In plain terms: Polly handles “the request failed — retry it”; business polling handles “the request succeeded, but the data hasn’t updated yet — check again.” These are two completely different problems.</p>

<hr />

<h2 id="9-the-result">9. The Result</h2>

<p>A single PR to remove SignalR (<code class="language-plaintext highlighter-rouge">a6e569dd</code>):</p>

<ul>
  <li><strong>77 files</strong> changed</li>
  <li>Over <strong>1,200 lines</strong> of code deleted</li>
  <li>Net reduction of <strong>548 lines</strong></li>
  <li><strong>253 lines</strong> of new unit tests added in the same PR</li>
</ul>

<p>From that point on, “the frontend issues an operation and waits for the result” no longer requires a WebSocket connection, a Hub class, an Interceptor, a ResponseHandler, or an AwaitableCommand. It’s a plain HTTP request, plus a polling loop quietly doing its job inside the BFF.</p>

<hr />

<h2 id="10-closing-thoughts-elegant-architecture-doesnt-always-mean-the-right-fit">10. Closing Thoughts: Elegant Architecture Doesn’t Always Mean the Right Fit</h2>

<p>SignalR isn’t wrong. It’s excellent technology with scenarios where it genuinely shines.</p>

<p>The real lesson here is: <strong>when choosing a technical solution, fit matters more than sophistication.</strong> Over-investing at the architecture level often isn’t a sign of poor engineers — it’s a sign that the intuition of “this technology looks correct” has obscured the question of “is this technology actually necessary for us?”</p>

<p>A few questions worth asking yourself at the next technology selection:</p>

<ol>
  <li>Is our data update truly “unpredictable” — or just “a little slow”?</li>
  <li>Does this result need to be pushed to “multiple receivers,” or only to the one person who made the request?</li>
  <li>If we used the simplest possible polling implementation, what’s the worst-case wait time for the user? Is that acceptable?</li>
  <li>What is the maintenance cost for the team three months after introducing this technology?</li>
</ol>

<p>Sometimes, one fewer dependency is the best architecture you can have.</p>]]></content><author><name>Zhigang Ji</name></author><category term="SignalR" /><category term="Architecture" /><category term="Real-time" /><category term="Microservices" /><category term="Polling" /><category term=".NET" /><category term="BFF" /><summary type="html"><![CDATA[Why SignalR was removed from a real-world insurance quoting platform, and how a simpler polling-based architecture proved to be a better fit.]]></summary></entry></feed>