<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Cemrehan Çavdar</title><description>Thoughts and notes</description><link>https://cemrehancavdar.com</link><atom:link href="https://cemrehancavdar.com/feed.xml" rel="self" type="application/rss+xml" /><managingEditor>Cemrehan Çavdar</managingEditor><item><title>The Optimization Ladder</title><link>https://cemrehancavdar.com/2026/03/10/optimization-ladder/</link><guid isPermaLink="true">https://cemrehancavdar.com/2026/03/10/optimization-ladder/</guid><pubDate>Tue, 10 Mar 2026 20:00:00 +0000</pubDate><description>&lt;p&gt;Every year, someone posts a benchmark showing Python is 100x slower than C. The same argument plays out: one side says &amp;quot;benchmarks don't matter, real apps are I/O bound,&amp;quot; the other says &amp;quot;just use a real language.&amp;quot; Both are wrong.&lt;/p&gt;
&lt;p&gt;I took two of the most-cited &lt;a href="https://benchmarksgame-team.pages.debian.net/benchmarksgame/" target="_blank"&gt;Benchmarks Game&lt;/a&gt; problems -- &lt;strong&gt;n-body&lt;/strong&gt; and &lt;strong&gt;spectral-norm&lt;/strong&gt; -- reproduced them on my machine, and ran every optimization tool I could find. Then I added a third benchmark -- a JSON event pipeline -- to test something closer to real-world code.&lt;/p&gt;
&lt;p&gt;Same problems, same Apple M4 Pro, real numbers. This is one developer's journey up the ladder -- not a definitive ranking. A dedicated expert could squeeze more out of any of these tools. The full code is at &lt;a href="https://github.com/cemrehancavdar/faster-python-bench" target="_blank"&gt;faster-python-bench&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here's the starting point -- CPython 3.13 on the official Benchmarks Game run:&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;C gcc&lt;/th&gt;
&lt;th&gt;CPython 3.13&lt;/th&gt;
&lt;th&gt;Ratio&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;n-body (50M)&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;372s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;177x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;spectral-norm (5500)&lt;/td&gt;
&lt;td&gt;0.4s&lt;/td&gt;
&lt;td&gt;350s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;875x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;fannkuch-redux (12)&lt;/td&gt;
&lt;td&gt;2.1s&lt;/td&gt;
&lt;td&gt;311s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;145x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;mandelbrot (16000)&lt;/td&gt;
&lt;td&gt;1.3s&lt;/td&gt;
&lt;td&gt;183s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;142x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;binary-trees (21)&lt;/td&gt;
&lt;td&gt;1.6s&lt;/td&gt;
&lt;td&gt;33s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;21x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The question isn't whether Python is slow at computation. It is. The question is how much effort each fix costs and how far it gets you. That's the ladder.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Why Python Is Slow&lt;/h2&gt;
&lt;p&gt;The usual suspects are the GIL, interpretation, and dynamic typing. All three matter, but none of them is the real story. The real story is that Python is designed to be &lt;em&gt;maximally dynamic&lt;/em&gt; -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it &lt;strong&gt;fundamentally hard to optimize&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;A C compiler sees &lt;code&gt;a + b&lt;/code&gt; between two integers and emits one CPU instruction. The Python VM sees &lt;code&gt;a + b&lt;/code&gt; and has to ask: what is &lt;code&gt;a&lt;/code&gt;? What is &lt;code&gt;b&lt;/code&gt;? Does &lt;code&gt;a.__add__&lt;/code&gt; exist? Has it been replaced since the last call? Is &lt;code&gt;a&lt;/code&gt; actually a subclass of &lt;code&gt;int&lt;/code&gt; that overrides &lt;code&gt;__add__&lt;/code&gt;? Every operation goes through this dispatch because the language &lt;em&gt;guarantees&lt;/em&gt; you can change anything at any time.&lt;/p&gt;
&lt;p&gt;The object overhead is where this shows up concretely. In C, an integer is 4 bytes on the stack. In Python:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-"&gt;&lt;span class="nx"&gt;C&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nx"&gt;Python&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ob_refcnt&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;reference&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ob_type&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;pointer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;object&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ob_size&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;number&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;digits&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ob_digit&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="nx"&gt;B&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;actual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="err"&gt;─────────────────&lt;/span&gt;
&lt;span class="w"&gt;              &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bytes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;minimum&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(Simplified -- CPython 3.12+ replaced &lt;code&gt;ob_size&lt;/code&gt; with &lt;code&gt;lv_tag&lt;/code&gt; in a restructured int layout. Total is still 28 bytes. See &lt;a href="https://github.com/python/cpython/blob/main/Include/cpython/longintrepr.h" target="_blank"&gt;longintrepr.h&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;4 bytes of number, 24 bytes of machinery to support dynamism. &lt;code&gt;a + b&lt;/code&gt; means: dereference two heap pointers, look up type slots, dispatch to &lt;code&gt;int.__add__&lt;/code&gt;, allocate a new &lt;code&gt;PyObject&lt;/code&gt; for the result (unless it hits the small-integer cache), update reference counts. CPython 3.11+ mitigates this with &lt;a href="https://docs.python.org/3/whatsnew/3.11.html#faster-cpython" target="_blank"&gt;adaptive specialization&lt;/a&gt; -- hot bytecodes like &lt;code&gt;BINARY_OP_ADD_INT&lt;/code&gt; skip the dispatch for known types -- but the overhead is still there for the general case. One number isn't slow. Millions in a loop are.&lt;/p&gt;
&lt;p&gt;The GIL (Global Interpreter Lock) gets blamed a lot, but it has &lt;strong&gt;no impact on single-threaded performance&lt;/strong&gt; -- it only matters when multiple CPU-bound threads compete for the interpreter. For the benchmarks in this post, the GIL is irrelevant. CPython 3.13 shipped experimental free-threaded mode (&lt;code&gt;--disable-gil&lt;/code&gt;) -- still experimental in 3.14 -- but as we'll see, it actually makes single-threaded code &lt;em&gt;slower&lt;/em&gt; because removing the GIL adds overhead to every reference count operation.&lt;/p&gt;
&lt;p&gt;The interpretation overhead is real but is being actively addressed. CPython 3.11's &lt;a href="https://docs.python.org/3/whatsnew/3.11.html#faster-cpython" target="_blank"&gt;Faster CPython&lt;/a&gt; project added adaptive specialization -- the VM detects &amp;quot;hot&amp;quot; bytecodes and replaces them with type-specialized versions, skipping some of the dispatch. It helped (~1.4x). CPython 3.13 went further with an experimental &lt;a href="https://docs.python.org/3/whatsnew/3.13.html#an-experimental-jit-compiler" target="_blank"&gt;copy-and-patch JIT compiler&lt;/a&gt; -- a lightweight JIT that stitches together pre-compiled machine code templates instead of generating code from scratch. It's not a full optimizing JIT like V8's TurboFan or a tracing JIT like PyPy's; it's designed to be small and fast to start, avoiding the heavyweight JIT startup cost that has historically kept CPython from going this route. Early results in 3.13 show no improvement on most benchmarks, but the infrastructure is now in place for more aggressive optimizations in future releases. JavaScript's V8 achieves much better JIT results, but V8 also had a large dedicated team and a single-threaded JavaScript execution model that makes speculative optimization easier. (For more on the &amp;quot;why doesn't CPython JIT&amp;quot; question, see Anthony Shaw's &lt;a href="https://tonybaloney.github.io/posts/why-is-python-so-slow.html#so-why-doesnt-cpython-use-a-jit" target="_blank"&gt;&amp;quot;Why is Python so slow?&amp;quot;&lt;/a&gt;.)&lt;/p&gt;
&lt;p&gt;So the picture is: &lt;strong&gt;Python is slow because its dynamic design requires runtime dispatch on every operation.&lt;/strong&gt; The GIL, the interpreter, the object model -- these are all consequences of that design choice. Each rung of the ladder removes some of this dispatch. The higher you climb, the more you bypass -- and the more effort it costs.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 0: Upgrade CPython&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: changing your base image. Reward: up to 1.4x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;vs 3.14&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;th&gt;vs 3.14&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.10&lt;/td&gt;
&lt;td&gt;1,663ms&lt;/td&gt;
&lt;td&gt;0.75x&lt;/td&gt;
&lt;td&gt;16,826ms&lt;/td&gt;
&lt;td&gt;0.83x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.11&lt;/td&gt;
&lt;td&gt;1,200ms&lt;/td&gt;
&lt;td&gt;1.04x&lt;/td&gt;
&lt;td&gt;13,430ms&lt;/td&gt;
&lt;td&gt;1.05x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.13&lt;/td&gt;
&lt;td&gt;1,134ms&lt;/td&gt;
&lt;td&gt;1.10x&lt;/td&gt;
&lt;td&gt;13,637ms&lt;/td&gt;
&lt;td&gt;1.03x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14t (free-threaded)&lt;/td&gt;
&lt;td&gt;1,513ms&lt;/td&gt;
&lt;td&gt;0.82x&lt;/td&gt;
&lt;td&gt;14,551ms&lt;/td&gt;
&lt;td&gt;0.97x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The story is &lt;strong&gt;3.10 to 3.11&lt;/strong&gt;: a 1.39x speedup on n-body, for free. That's the &lt;a href="https://docs.python.org/3/whatsnew/3.11.html#faster-cpython" target="_blank"&gt;Faster CPython&lt;/a&gt; project -- adaptive specialization of bytecodes, inline caching, zero-cost exceptions. 3.13 squeezed out a bit more. 3.14 gave some of it back -- a minor regression on these benchmarks.&lt;/p&gt;
&lt;p&gt;Free-threaded Python (3.14t) is &lt;strong&gt;slower&lt;/strong&gt; on single-threaded code. The GIL removal adds overhead to every reference count operation. Worth it only if you have genuinely parallel CPU-bound threads. (&lt;a href="https://github.com/cemrehancavdar/faster-python-bench/blob/main/docs/cpython-versions.md" target="_blank"&gt;Full version comparison&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;This rung costs nothing. If you're still on 3.10, upgrade.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 1: Alternative Runtimes (PyPy, GraalPy)&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: switching interpreters. Reward: 6-66x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraalPy&lt;/td&gt;
&lt;td&gt;211ms (&lt;strong&gt;5.9x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;212ms (&lt;strong&gt;66x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPy&lt;/td&gt;
&lt;td&gt;98ms (&lt;strong&gt;13x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;1,065ms (&lt;strong&gt;13x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Both are JIT-compiled runtimes that generate native machine code from your unmodified Python. Zero code changes. Just a different interpreter.&lt;/p&gt;
&lt;p&gt;PyPy uses a tracing JIT -- it records hot loops and compiles them. GraalPy runs on GraalVM's Truffle framework with a method-based JIT. PyPy wins on n-body (13x vs 5.9x), but GraalPy dominates spectral-norm (66x vs 13x) -- the matrix-heavy inner loop plays to GraalVM's strengths. GraalPy also offers Java interop and is actively developed by Oracle.&lt;/p&gt;
&lt;p&gt;The catch: ecosystem compatibility. Both support major packages, but C extensions run through compatibility layers that can be slower than on CPython. GraalPy is on Python 3.12 (no 3.14 yet) and has slow startup -- it's JVM-based, so the JIT needs warmup before reaching peak performance. For pure Python code with long-running hot loops -- these are free speed.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 2: Mypyc&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: type annotations you probably already have. Reward: 2.4-14x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc&lt;/td&gt;
&lt;td&gt;518ms (&lt;strong&gt;2.4x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;990ms (&lt;strong&gt;14x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Mypyc compiles type-annotated Python to C extensions using the same type analysis as mypy. No new syntax, no new language -- just your existing typed Python, compiled ahead of time.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="c1"&gt;# Already valid typed Python -- mypyc compiles this to C&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;advance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bodies&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BodyPair&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;dx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;dy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;dz&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;dist_sq&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;mag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dist_sq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;dz&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dz&lt;/span&gt;
            &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist_sq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;mag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist_sq&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The difference from the baseline: explicit type declarations on every local variable so mypyc can use C primitives instead of Python objects, and decomposing &lt;code&gt;** (-1.5)&lt;/code&gt; into &lt;code&gt;sqrt()&lt;/code&gt; + arithmetic to avoid slow power dispatch. That's it -- no special decorators, no new build system beyond &lt;code&gt;mypycify()&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;The mypy project itself -- ~100k+ lines of Python -- achieved a &lt;a href="https://github.com/mypyc/mypyc" target="_blank"&gt;4x end-to-end speedup&lt;/a&gt; by compiling with mypyc. The official docs say &amp;quot;1.5x to 5x&amp;quot; for existing annotated code, &amp;quot;5x to 10x&amp;quot; for code tuned for compilation. The spectral-norm result (14x) lands above that range because the inner loop is pure arithmetic that mypyc compiles directly to C. On our dict-heavy JSON pipeline, mypyc hit 2.3x on pre-parsed dicts -- closer to the expected floor.&lt;/p&gt;
&lt;p&gt;The constraint: mypyc supports a subset of Python. Dynamic patterns like &lt;code&gt;**kwargs&lt;/code&gt;, &lt;code&gt;getattr&lt;/code&gt; tricks, and heavily duck-typed code will compile but won't be optimized -- they fall back to slow generic paths. But if your code already passes mypy strict mode, mypyc is the lowest-effort compilation rung on the ladder.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 3: NumPy&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: knowing NumPy. Reward: up to 520x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NumPy&lt;/td&gt;
&lt;td&gt;27ms (&lt;strong&gt;520x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;520x. Faster than our single-threaded Rust at 154x on the same problem -- though NumPy delegates to BLAS, which uses multiple cores.&lt;/p&gt;
&lt;p&gt;Spectral-norm is matrix-vector multiplication. NumPy pre-computes the matrix once and delegates to BLAS (Apple Accelerate on macOS):&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;build_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Each &lt;code&gt;@&lt;/code&gt; is a single call to hand-optimized BLAS with SIMD and multithreading. NumPy trades O(N) memory for O(N^2) memory -- it stores the full 2000x2000 matrix (30MB) -- but the computation is done in compiled C/C++ (Apple Accelerate on macOS, OpenBLAS or MKL on Linux), not Python.&lt;/p&gt;
&lt;p&gt;This is the lesson people miss when they say &amp;quot;Python is slow.&amp;quot; Python the loop runner is slow. Python the orchestrator of compiled libraries is as fast as anything.&lt;/p&gt;
&lt;p&gt;The constraint: your problem must fit vectorized operations. Element-wise math, matrix algebra, reductions, conditionals (&lt;code&gt;np.where&lt;/code&gt; computes both branches and masks the result -- redundant work, but still faster than a Python loop on large arrays) -- NumPy handles all of these. What it can't help with: sequential dependencies where each step feeds the next, recursive structures, and small arrays where NumPy's per-call overhead costs more than the computation itself.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Interlude: JAX&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: rewriting loops as &lt;code&gt;jax.lax.fori_loop&lt;/code&gt; + array operations. Reward: 12-1,633x.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A Reddit commenter (&lt;a href="https://www.reddit.com/r/Python/comments/1rpqugj/comment/o9qvpg4/" target="_blank"&gt;justneurostuff&lt;/a&gt;) suggested testing &lt;a href="https://github.com/jax-ml/jax" target="_blank"&gt;JAX&lt;/a&gt; -- an array computing library that uses XLA JIT compilation. I expected it to land somewhere near NumPy. I was wrong.&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NumPy&lt;/td&gt;
&lt;td&gt;--&lt;/td&gt;
&lt;td&gt;27ms (&lt;strong&gt;520x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JAX JIT&lt;/td&gt;
&lt;td&gt;100ms (&lt;strong&gt;12.2x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;8.6ms (&lt;strong&gt;1,633x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;8.6ms on spectral-norm. That's 3x faster than NumPy and the fastest result in this entire post. On n-body, 12.2x -- between Mypyc and Numba. Both results match the CPython baseline to 9 decimal places. This is single-threaded -- forcing one thread gave 9.1ms vs 8.6ms on spectral-norm.&lt;/p&gt;
&lt;p&gt;I don't know JAX well enough to explain exactly why it's 3x faster than NumPy on the same matrix multiplications. Both call BLAS under the hood. My best guess is that JAX's &lt;code&gt;@jit&lt;/code&gt; compiles the entire function -- matrix build, loop, dot products -- so Python is never involved between operations, while NumPy returns to Python between each &lt;code&gt;@&lt;/code&gt; call. But I haven't verified that in detail. Might be time to learn.&lt;/p&gt;
&lt;p&gt;The catch: JAX is a different programming model. Python loops become &lt;code&gt;lax.fori_loop&lt;/code&gt;. Conditionals become &lt;code&gt;lax.cond&lt;/code&gt;. You're writing functional array programs that happen to use Python syntax -- closer to a domain-specific language than a drop-in optimizer. But if your problem fits, the numbers speak for themselves. JAX isn't the only library that compiles array code -- PyTorch has &lt;a href="https://pytorch.org/docs/stable/user_guide/torch_compiler/torch.compiler.html" target="_blank"&gt;&lt;code&gt;torch.compile&lt;/code&gt;&lt;/a&gt;, for example. I only tested JAX, so I can't say whether others would produce similar results on these benchmarks.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 4: Numba&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: &lt;code&gt;@njit&lt;/code&gt; + restructuring data into NumPy arrays. Reward: 56-135x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Numba @njit&lt;/td&gt;
&lt;td&gt;22ms (&lt;strong&gt;56x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;104ms (&lt;strong&gt;135x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;Numba JIT-compiles decorated functions to machine code via LLVM:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="nd"&gt;@njit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;advance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mass&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dz&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;pos&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dy&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;dz&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dz&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;mag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dt&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;dist&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;vel&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="n"&gt;dx&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;mag&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;mass&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="c1"&gt;# ...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;One decorator. Restructure data into NumPy arrays. The constraint: Numba works best with NumPy arrays and numeric types. It has limited support for typed dicts, typed lists, and &lt;code&gt;@jitclass&lt;/code&gt;, but strings and general Python objects are largely out of reach. It's a scalpel, not a saw.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 5: Cython&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: learning C's mental model, expressed in Python syntax. Reward: 99-124x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython&lt;/td&gt;
&lt;td&gt;10ms (&lt;strong&gt;124x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;142ms (&lt;strong&gt;99x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;124x on n-body. Within 10% of Rust. But here's the thing about this rung:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;My first Cython n-body got 10.5x.&lt;/strong&gt; Same Cython, same compiler. The final version got 124x. The difference was three landmines, none of which produced warnings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Cython's &lt;code&gt;**&lt;/code&gt; operator with float exponents. Even with typed doubles and &lt;code&gt;-ffast-math&lt;/code&gt;, &lt;code&gt;x ** 0.5&lt;/code&gt; is 40x slower than &lt;code&gt;sqrt(x)&lt;/code&gt; in Cython -- the operator goes through a slow dispatch path instead of compiling to C's &lt;code&gt;sqrt()&lt;/code&gt;. The n-body baseline uses &lt;code&gt;** (-1.5)&lt;/code&gt;, which can't be replaced with a single &lt;code&gt;sqrt()&lt;/code&gt; call -- it required decomposing the formula into &lt;code&gt;sqrt()&lt;/code&gt; + arithmetic. &lt;strong&gt;7x penalty on the overall benchmark.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Precomputed pair index arrays prevent the C compiler from unrolling the nested loop. &lt;strong&gt;2x penalty.&lt;/strong&gt; The &amp;quot;clever&amp;quot; version is slower.&lt;/li&gt;
&lt;li&gt;Missing &lt;code&gt;@cython.cdivision(True)&lt;/code&gt; inserts a zero-division check before every floating-point divide in the inner loop. Millions of branches that are never taken.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Cython's promise is that it &amp;quot;makes writing C extensions for Python as easy as Python itself.&amp;quot; In practice that means: learn C's mental model, express it in Python syntax, and use the annotation report (&lt;code&gt;cython -a&lt;/code&gt;) to verify the compiler did what you think. The full story is in &lt;a href="https://github.com/cemrehancavdar/faster-python-bench/blob/main/docs/cython-minefield.md" target="_blank"&gt;The Cython Minefield&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The reward is real -- 99-124x, matching compiled languages. But the failure mode is silent. All three landmines cost you silently, and the annotation report is the only way to catch them.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 6: The New Wave&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: new toolchains, rough edges, ecosystem gaps. Reward: 26-198x.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Three tools promise to compile Python (or Python-like code) to native machine code. I tested all three.&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;The catch&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Codon 0.19&lt;/td&gt;
&lt;td&gt;47ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;26x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;142x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Own runtime, limited stdlib, limited CPython interop&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mojo nightly&lt;/td&gt;
&lt;td&gt;11ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;113x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;118ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;119x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New language (pre-1.0), full rewrite required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Taichi 1.7&lt;/td&gt;
&lt;td&gt;16ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;78x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;71ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;198x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python 3.13 only (no 3.14 wheels)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The numbers are real. The developer experience is rough. Codon can't import your existing code. Mojo is a new language wearing Python's clothes -- but with SIMD vectorization and compile-time loop unrolling, it reaches Rust/Cython territory on n-body (113x). Taichi has the best spectral-norm result (198x) but &lt;strong&gt;doesn't ship wheels for Python 3.14&lt;/strong&gt; -- its numbers above were benchmarked on a separate Python 3.13 environment. That's the compromise with these tools: if your runtime doesn't keep up with CPython releases, you're stuck on an old version or juggling multiple environments. (&lt;a href="https://github.com/cemrehancavdar/faster-python-bench/blob/main/docs/new-wave-compilers.md" target="_blank"&gt;Full deep dive with code and DX verdicts&lt;/a&gt;)&lt;/p&gt;
&lt;p&gt;None are drop-in. All are worth watching.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Rung 7: Rust via PyO3&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Cost: learning Rust. Reward: 113-154x.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;N-body&lt;/th&gt;
&lt;th&gt;Spectral-norm&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (PyO3)&lt;/td&gt;
&lt;td&gt;11ms (&lt;strong&gt;113x&lt;/strong&gt;)&lt;/td&gt;
&lt;td&gt;91ms (&lt;strong&gt;154x&lt;/strong&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;The top of the ladder. But notice: on n-body, Cython at 10ms, Mojo at 11ms, Rust at 11ms -- they're essentially tied. All compiled to native machine code. The remaining difference is noise, not a fundamental language gap.&lt;/p&gt;
&lt;p&gt;The real Rust advantage isn't raw speed -- it's &lt;strong&gt;pipeline ownership&lt;/strong&gt;. When Rust parses JSON directly with serde into typed structs, it never creates a Python dict. It bypasses the Python object system entirely. That matters more on the next benchmark.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The Ceiling&lt;/h2&gt;
&lt;p&gt;The Benchmarks Game problems are pure compute: tight loops, no I/O, no data structures beyond arrays. Most Python code looks nothing like that. So I built a third benchmark: load 100K JSON events, filter, transform, aggregate per user. Dicts, strings, datetime parsing -- the kind of code that makes Numba useless and makes Cython fight the Python object system.&lt;/p&gt;
&lt;p&gt;First, every tool starts from pre-parsed Python dicts -- same input, same work:&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;What it costs you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;48ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc&lt;/td&gt;
&lt;td&gt;21ms&lt;/td&gt;
&lt;td&gt;2.3x&lt;/td&gt;
&lt;td&gt;Type annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython (dict optimized)&lt;/td&gt;
&lt;td&gt;12ms&lt;/td&gt;
&lt;td&gt;4.1x&lt;/td&gt;
&lt;td&gt;Days of annotation work&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;4.1x. Not 50x. The bottleneck is &lt;strong&gt;Python dict access&lt;/strong&gt;. Even Cython's fully optimized version -- &lt;code&gt;@cython.cclass&lt;/code&gt;, C arrays for counters, direct CPython C-API calls (&lt;code&gt;PyList_GET_ITEM&lt;/code&gt;, &lt;code&gt;PyDict_GetItem&lt;/code&gt; with borrowed refs) -- still reads input dicts through the Python C API.&lt;/p&gt;
&lt;p&gt;But wait -- why are we feeding Cython Python dicts at all? &lt;code&gt;json.loads()&lt;/code&gt; takes ~57ms to create those dicts. That's more than the entire baseline pipeline. What if Cython reads the raw bytes itself?&lt;/p&gt;
&lt;p&gt;I wrote a second Cython pipeline that calls &lt;a href="https://github.com/ibireme/yyjson" target="_blank"&gt;yyjson&lt;/a&gt; -- a general-purpose C JSON parser, comparable to Rust's serde_json. Both are schema-agnostic: they parse any valid JSON, not just our event format. Cython walks the parsed tree with C pointers, filters and aggregates into C structs, and builds Python dicts only for the final output. For Rust, idiomatic serde with zero-copy deserialization. Both own the data end-to-end:&lt;/p&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;What it costs you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14 (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;105ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;77ms&lt;/td&gt;
&lt;td&gt;1.4x&lt;/td&gt;
&lt;td&gt;Type annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;67ms&lt;/td&gt;
&lt;td&gt;1.6x&lt;/td&gt;
&lt;td&gt;C-API dict access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (serde, from bytes)&lt;/td&gt;
&lt;td&gt;21ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;5.0x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;New language + bindings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython (yyjson, from bytes)&lt;/td&gt;
&lt;td&gt;17ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.3x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;C library + Cython declarations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;6.3x for Cython, 5.0x for Rust.&lt;/strong&gt; The ceiling was never the pipeline code -- it was &lt;code&gt;json.loads()&lt;/code&gt;. Both approaches use general-purpose JSON parsers -- yyjson on the Cython side, serde on the Rust side -- and both avoid Python objects entirely in the hot loop: Cython walks yyjson's C tree into C structs, Rust deserializes into native structs via serde.&lt;/p&gt;
&lt;p&gt;I'm not claiming Cython is faster than Rust or vice versa. A sufficiently motivated person could make either one faster -- swap parsers, tune allocators, restructure the pipeline. The point isn't which tool wins this specific benchmark. The point is &lt;em&gt;how many rungs you're willing to climb&lt;/em&gt;. Both land in the same neighborhood once you bypass &lt;code&gt;json.loads()&lt;/code&gt;. The code is at &lt;a href="https://github.com/cemrehancavdar/faster-python-bench" target="_blank"&gt;faster-python-bench&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The Full Report Card&lt;/h2&gt;
&lt;h3&gt;N-body (500K iterations, tight floating-point loops)&lt;/h3&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;What it costs you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.10&lt;/td&gt;
&lt;td&gt;1,663ms&lt;/td&gt;
&lt;td&gt;0.75x&lt;/td&gt;
&lt;td&gt;Old version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;1,242ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14t&lt;/td&gt;
&lt;td&gt;1,513ms&lt;/td&gt;
&lt;td&gt;0.82x&lt;/td&gt;
&lt;td&gt;GIL-free but slower single-thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc&lt;/td&gt;
&lt;td&gt;518ms&lt;/td&gt;
&lt;td&gt;2.4x&lt;/td&gt;
&lt;td&gt;Type annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraalPy&lt;/td&gt;
&lt;td&gt;211ms&lt;/td&gt;
&lt;td&gt;5.9x&lt;/td&gt;
&lt;td&gt;Python 3.12 only, ecosystem compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JAX JIT&lt;/td&gt;
&lt;td&gt;100ms&lt;/td&gt;
&lt;td&gt;12.2x&lt;/td&gt;
&lt;td&gt;Rewrite loops as &lt;code&gt;lax.fori_loop&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPy&lt;/td&gt;
&lt;td&gt;98ms&lt;/td&gt;
&lt;td&gt;13x&lt;/td&gt;
&lt;td&gt;Ecosystem compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codon&lt;/td&gt;
&lt;td&gt;47ms&lt;/td&gt;
&lt;td&gt;26x&lt;/td&gt;
&lt;td&gt;Separate runtime, limited stdlib&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Numba&lt;/td&gt;
&lt;td&gt;22ms&lt;/td&gt;
&lt;td&gt;56x&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@njit&lt;/code&gt; + NumPy arrays&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Taichi&lt;/td&gt;
&lt;td&gt;16ms&lt;/td&gt;
&lt;td&gt;78x&lt;/td&gt;
&lt;td&gt;Python 3.13 only (no 3.14 wheels)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mojo&lt;/td&gt;
&lt;td&gt;11ms&lt;/td&gt;
&lt;td&gt;113x&lt;/td&gt;
&lt;td&gt;New language + toolchain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython&lt;/td&gt;
&lt;td&gt;10ms&lt;/td&gt;
&lt;td&gt;124x&lt;/td&gt;
&lt;td&gt;C knowledge + landmines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (PyO3)&lt;/td&gt;
&lt;td&gt;11ms&lt;/td&gt;
&lt;td&gt;113x&lt;/td&gt;
&lt;td&gt;Learning Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;h3&gt;Spectral-norm (N=2000, matrix-vector multiply)&lt;/h3&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;What it costs you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.10&lt;/td&gt;
&lt;td&gt;16,826ms&lt;/td&gt;
&lt;td&gt;0.83x&lt;/td&gt;
&lt;td&gt;Old version&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14&lt;/td&gt;
&lt;td&gt;14,046ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14t&lt;/td&gt;
&lt;td&gt;14,551ms&lt;/td&gt;
&lt;td&gt;0.97x&lt;/td&gt;
&lt;td&gt;GIL-free but slower single-thread&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc&lt;/td&gt;
&lt;td&gt;990ms&lt;/td&gt;
&lt;td&gt;14x&lt;/td&gt;
&lt;td&gt;Type annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GraalPy&lt;/td&gt;
&lt;td&gt;212ms&lt;/td&gt;
&lt;td&gt;66x&lt;/td&gt;
&lt;td&gt;Python 3.12 only, ecosystem compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyPy&lt;/td&gt;
&lt;td&gt;1,065ms&lt;/td&gt;
&lt;td&gt;13x&lt;/td&gt;
&lt;td&gt;Ecosystem compatibility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codon&lt;/td&gt;
&lt;td&gt;99ms&lt;/td&gt;
&lt;td&gt;142x&lt;/td&gt;
&lt;td&gt;Separate runtime, limited stdlib&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Numba&lt;/td&gt;
&lt;td&gt;104ms&lt;/td&gt;
&lt;td&gt;135x&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@njit&lt;/code&gt; + NumPy arrays&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mojo&lt;/td&gt;
&lt;td&gt;118ms&lt;/td&gt;
&lt;td&gt;119x&lt;/td&gt;
&lt;td&gt;New language + toolchain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (PyO3)&lt;/td&gt;
&lt;td&gt;91ms&lt;/td&gt;
&lt;td&gt;154x&lt;/td&gt;
&lt;td&gt;Learning Rust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython&lt;/td&gt;
&lt;td&gt;142ms&lt;/td&gt;
&lt;td&gt;99x&lt;/td&gt;
&lt;td&gt;C knowledge + landmines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Taichi&lt;/td&gt;
&lt;td&gt;71ms&lt;/td&gt;
&lt;td&gt;198x&lt;/td&gt;
&lt;td&gt;Python 3.13 only (no 3.14 wheels)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NumPy&lt;/td&gt;
&lt;td&gt;27ms&lt;/td&gt;
&lt;td&gt;520x&lt;/td&gt;
&lt;td&gt;Knowing NumPy + O(N^2) memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JAX JIT&lt;/td&gt;
&lt;td&gt;8.6ms&lt;/td&gt;
&lt;td&gt;1,633x&lt;/td&gt;
&lt;td&gt;Rewrite loops as &lt;code&gt;lax.fori_loop&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;h3&gt;JSON pipeline (100K events, end-to-end from raw bytes)&lt;/h3&gt;
&lt;div class="bench-table"&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Approach&lt;/th&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Speedup&lt;/th&gt;
&lt;th&gt;What it costs you&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CPython 3.14 (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;105ms&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;td&gt;Nothing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mypyc (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;77ms&lt;/td&gt;
&lt;td&gt;1.4x&lt;/td&gt;
&lt;td&gt;Type annotations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython (json.loads + pipeline)&lt;/td&gt;
&lt;td&gt;67ms&lt;/td&gt;
&lt;td&gt;1.6x&lt;/td&gt;
&lt;td&gt;C-API dict access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rust (serde, from bytes)&lt;/td&gt;
&lt;td&gt;21ms&lt;/td&gt;
&lt;td&gt;5.0x&lt;/td&gt;
&lt;td&gt;New language + bindings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cython (yyjson, from bytes)&lt;/td&gt;
&lt;td&gt;17ms&lt;/td&gt;
&lt;td&gt;6.3x&lt;/td&gt;
&lt;td&gt;C library + Cython declarations&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;h2&gt;When to Stop Climbing&lt;/h2&gt;
&lt;p&gt;The effort curve is exponential. Mypyc (2.4-14x) costs type annotations. PyPy/GraalPy (6-66x) costs a binary swap. Numba (56-135x) costs a decorator and data restructuring. JAX (12-1,633x) costs rewriting your code functionally. Cython (99-124x) costs days and C knowledge. Rust (113-154x) costs learning a new language.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Upgrade first.&lt;/strong&gt; 3.10 to 3.11 gives you 1.4x for free.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mypyc for typed codebases.&lt;/strong&gt; If your code already passes mypy strict, compile it. 2.4x on n-body, 14x on spectral-norm, for almost no work.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NumPy for vectorizable math.&lt;/strong&gt; If your problem is matrix algebra or element-wise operations, NumPy gets you 520x with code you already know.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;JAX if you can express it functionally.&lt;/strong&gt; Same array paradigm as NumPy, but XLA whole-graph compilation took spectral-norm to 1,633x -- 3x faster than NumPy. The cost is rewriting loops as &lt;code&gt;lax.fori_loop&lt;/code&gt; and conditionals as &lt;code&gt;lax.cond&lt;/code&gt;. On problems that don't vectorize well (n-body with 5 bodies), JAX is 12x -- good but not exceptional.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Numba for numeric loops.&lt;/strong&gt; &lt;code&gt;@njit&lt;/code&gt; gives you 56-135x with one decorator and honest error messages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cython if you know C.&lt;/strong&gt; 99-124x is real, but the failure mode is silent slowness.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Rust for pipeline ownership.&lt;/strong&gt; On pure compute, Cython and Rust are neck and neck. The real advantage is when Rust owns the data flow end-to-end.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PyPy or GraalPy for pure Python.&lt;/strong&gt; 6-66x for zero code changes is remarkable, if your dependencies support it. GraalPy's spectral-norm result (66x) rivals compiled solutions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Most code doesn't need any of this.&lt;/strong&gt; The pipeline benchmark -- the most realistic of the three -- topped out at 4.1x when starting from Python dicts. 6.3x when Cython called yyjson and owned the bytes. If your hot path is &lt;code&gt;dict[str, Any]&lt;/code&gt;, the answer might be &amp;quot;stop creating dicts,&amp;quot; not &amp;quot;change the language.&amp;quot; And if your code is I/O bound, none of this matters at all.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/cemrehancavdar/faster-python-bench/blob/main/docs/profiling.md" target="_blank"&gt;Profile before you optimize.&lt;/a&gt; &lt;code&gt;cProfile&lt;/code&gt; to find the function. &lt;code&gt;line_profiler&lt;/code&gt; to find the line. Then pick the right rung.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Not covered:&lt;/strong&gt; &lt;a href="https://nuitka.net/" target="_blank"&gt;Nuitka&lt;/a&gt; (Python-to-C compiler, mostly used for packaging -- speedups are in the Mypyc range), &lt;a href="https://pythran.readthedocs.io/" target="_blank"&gt;Pythran&lt;/a&gt; (NumPy-focused AOT compiler, niche), &lt;a href="https://github.com/spylang/spy" target="_blank"&gt;SPy&lt;/a&gt; (Antonio Cuni's static Python dialect -- not ready yet but worth watching), and &lt;a href="https://github.com/facebookincubator/cinderx" target="_blank"&gt;CinderX&lt;/a&gt; (Meta's performance-oriented CPython fork -- not ready yet).&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Found an error? &lt;a href="https://github.com/cemrehancavdar/faster-python-bench/pulls" target="_blank"&gt;Open a PR.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Edits&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;2026-03-10:&lt;/strong&gt; Rewrote the NumPy constraints paragraph. The original listed &lt;em&gt;&amp;quot;irregular access patterns, conditionals per element, recursive structures&amp;quot;&lt;/em&gt; as things NumPy can't handle. Two of those were wrong: NumPy fancy indexing handles irregular access fine (22x faster than Python on random gather), and &lt;code&gt;np.where&lt;/code&gt; handles conditionals (2.8-15.5x faster on 1M elements, even though it computes both branches). Replaced with things NumPy actually can't help with: sequential dependencies (n-body with 5 bodies is 2.3x slower with NumPy), recursive structures, and small arrays (NumPy loses below ~50 elements due to per-call overhead).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2026-03-10:&lt;/strong&gt; The original text said &lt;em&gt;&amp;quot;Early results are modest (single-digit percent improvements)&amp;quot;&lt;/em&gt; -- implying the 3.13 JIT was already delivering gains. Changed to &lt;em&gt;&amp;quot;Early results in 3.13 show no improvement on most benchmarks.&amp;quot;&lt;/em&gt; Bad wording on my part -- 3.13 JIT shows no speedup (and can be slightly slower). The speedups are coming in 3.15: &lt;a href="https://www.linkedin.com/posts/savannahostrowski_pyperformancepyperformancedata-filesbenchmarks-activity-7427027722201186305-ySkY" target="_blank"&gt;Savannah Ostrowski's preliminary FastAPI benchmarks&lt;/a&gt; show ~8% improvement on 3.15 (see also &lt;a href="https://doesjitgobrrr.com/" target="_blank"&gt;doesjitgobrrr.com&lt;/a&gt;). Thanks to &lt;a href="https://github.com/Fidget-Spinner" target="_blank"&gt;Fidget-Spinner&lt;/a&gt; (CPython core developer working on the JIT) for the &lt;a href="https://github.com/cemrehancavdar/blog/pull/4" target="_blank"&gt;correction&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2026-03-11:&lt;/strong&gt; Added JAX JIT benchmarks after &lt;a href="https://www.reddit.com/r/Python/comments/1rpqugj/comment/o9qvpg4/" target="_blank"&gt;a Reddit comment&lt;/a&gt; from justneurostuff suggested testing it. Results: 1,633x on spectral-norm (fastest in the post -- 3x faster than NumPy), 12.2x on n-body. Both match baseline to 9 decimal places. Added as an interlude between NumPy and Numba sections, and to both report card tables.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2026-03-19:&lt;/strong&gt; Updated Mojo n-body result from 16ms (78x) to 11ms (113x) after &lt;a href="https://github.com/cemrehancavdar/faster-python-bench/pull/2" target="_blank"&gt;PR #2&lt;/a&gt; by jgsimard applied SIMD vectorization and compile-time loop unrolling. Mojo now ties Rust/Cython on n-body.&lt;/p&gt;
</description><dc:creator>Cemrehan Çavdar</dc:creator><category>python</category><category>performance</category><category>benchmark</category><category>cython</category><category>rust</category><category>numba</category><category>numpy</category><category>mypyc</category><category>mojo</category><category>codon</category><category>taichi</category><category>graalpy</category><category>pypy</category><category>jax</category></item><item><title>pip install ziglang</title><link>https://cemrehancavdar.com/2026/03/05/zig-cc-cython/</link><guid isPermaLink="true">https://cemrehancavdar.com/2026/03/05/zig-cc-cython/</guid><pubDate>Thu, 05 Mar 2026 20:00:00 +0000</pubDate><description>&lt;p&gt;I built &lt;a href="https://github.com/cemrehancavdar/marimo-cython" target="_blank"&gt;marimo-cython&lt;/a&gt;, Cython inside &lt;a href="https://marimo.io" target="_blank"&gt;marimo&lt;/a&gt; notebooks. A few days later, &lt;a href="https://github.com/koaning" target="_blank"&gt;Vincent Warmerdam&lt;/a&gt; (one of my favorite YouTubers, he runs &lt;a href="https://www.youtube.com/@calmcode-io" target="_blank"&gt;calmcode&lt;/a&gt;) opened &lt;a href="https://github.com/cemrehancavdar/marimo-cython/pull/1" target="_blank"&gt;a PR&lt;/a&gt; to add a &amp;quot;Open in molab&amp;quot; badge. molab is marimo's &lt;a href="https://molab.marimo.io" target="_blank"&gt;cloud notebook platform&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then he closed his own PR:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Ah wait, nevermind, it seems we don't have gcc on molab containers by default.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Right. Cython compiles Python to C, then you need a C compiler to turn that C into a &lt;code&gt;.so&lt;/code&gt; file. No gcc, no Cython. The whole point of marimo-cython (write Cython in a notebook and run it) doesn't work if the environment can't compile C.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The idea&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://ziglang.org/" target="_blank"&gt;Zig&lt;/a&gt; is a systems programming language, but the important part for this story is that it ships with a full C/C++ compiler toolchain built on Clang/LLVM. And the &lt;a href="https://pypi.org/project/ziglang/" target="_blank"&gt;&lt;code&gt;ziglang&lt;/code&gt;&lt;/a&gt; PyPI package bundles the entire Zig binary distribution.&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-"&gt;uv add ziglang
uv run python-zig cc --version
# clang version 20.1.2
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;A C compiler. As a Python dependency. Installed with &lt;code&gt;uv add&lt;/code&gt;. Lives in the venv. No system packages, no Xcode, no &lt;code&gt;apt install build-essential&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So the plan was simple: add &lt;code&gt;ziglang&lt;/code&gt; as a dependency, set &lt;code&gt;CC=&amp;quot;python-zig cc&amp;quot;&lt;/code&gt;, and the molab notebook compiles Cython extensions without gcc. Should take about 20 minutes.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;Fixing the lightbulb&lt;/h2&gt;
&lt;p&gt;There's &lt;a href="https://www.youtube.com/watch?v=5W4NFcamRhM" target="_blank"&gt;a scene in Malcolm in the Middle&lt;/a&gt; where Hal goes to fix a lightbulb, but the shelf is in the way, so he has to fix the shelf, but the screw is stripped, so he needs to get a new one, but the drawer is broken, and so on, each fix revealing the next problem. That's what happened.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 1: just use &lt;code&gt;zig cc&lt;/code&gt; directly&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-"&gt;CC=&amp;quot;python-zig cc&amp;quot; uv run python setup.py build_ext --inplace
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Crash. On macOS, Python's build system passes &lt;code&gt;-bundle&lt;/code&gt; to the linker. &lt;code&gt;zig cc&lt;/code&gt; silently ignores it and produces an executable instead of a shared library. It also passes &lt;code&gt;-LModules/_hacl&lt;/code&gt;, a relative path baked into &lt;code&gt;sysconfig&lt;/code&gt; from CPython's own build. Apple's clang ignores the missing directory. &lt;code&gt;zig cc&lt;/code&gt; does not. And &lt;code&gt;-Wl,-headerpad,0x40&lt;/code&gt; crashes the zig 0.15.x linker outright.&lt;/p&gt;
&lt;p&gt;OK, so raw &lt;code&gt;zig cc&lt;/code&gt; doesn't work. Surely someone's solved this.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 2: find &lt;code&gt;zigcc&lt;/code&gt; on PyPI&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;There's a package called &lt;a href="https://pypi.org/project/zigcc/" target="_blank"&gt;&lt;code&gt;zigcc&lt;/code&gt;&lt;/a&gt;, a wrapper that filters out problematic flags. Exactly what I need.&lt;/p&gt;
&lt;p&gt;Except it's archived. And it has a bug: it drops any argument &lt;em&gt;containing&lt;/em&gt; the substring &lt;code&gt;-x&lt;/code&gt;. On Linux x86_64, the output path often includes &lt;code&gt;linux-x86_64&lt;/code&gt;, which matches. So &lt;code&gt;zigcc&lt;/code&gt; drops the output file argument and the build silently produces nothing.&lt;/p&gt;
&lt;p&gt;OK, I'll write my own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 3: write a wrapper, fix macOS&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Started with the &lt;code&gt;-bundle&lt;/code&gt; problem: rewrite it to &lt;code&gt;-shared&lt;/code&gt;, same output format Python expects. Build runs further. Now &lt;code&gt;-LModules/_hacl&lt;/code&gt; crashes it, drop it. Now &lt;code&gt;-Wl,-headerpad,0x40&lt;/code&gt; crashes it.&lt;/p&gt;
&lt;p&gt;Is &lt;code&gt;-headerpad&lt;/code&gt; safe to drop? I built the same extension with Apple ld (which honors the flag) and with zig ld (which doesn't). Compared the Mach-O headers with &lt;code&gt;otool -l&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-"&gt;# Apple ld
sizeofcmds 1576

# zig ld
sizeofcmds 1576
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Identical. The flag does nothing in practice for Python extensions. Drop it.&lt;/p&gt;
&lt;p&gt;macOS works.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Step 4: try Linux, try OpenMP&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Linux had its own set of flags (&lt;code&gt;-Wl,--exclude-libs&lt;/code&gt;, &lt;code&gt;-Wl,-Bsymbolic-functions&lt;/code&gt;) none of which zig's linker supports. More drops, more checking whether the drops are safe. They are, for normal extension builds. Linux works too.&lt;/p&gt;
&lt;p&gt;I packaged the whole thing as &lt;a href="https://pypi.org/project/zig-cc-python/" target="_blank"&gt;&lt;code&gt;zig-cc-python&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Then I tried OpenMP. Worked on Linux. On macOS, &lt;code&gt;prange&lt;/code&gt; loops silently returned wrong results. zig cc compiled the code without actually emitting the parallel runtime calls. No fix. Moved on.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The payoff&lt;/h2&gt;
&lt;p&gt;Eight flags. The entire wrapper is ~80 lines of Python. That's what took days. It's harder when you're not sure you know what you're doing.&lt;/p&gt;
&lt;p&gt;The &lt;a href="https://github.com/cemrehancavdar/marimo-cython/pull/1" target="_blank"&gt;PR&lt;/a&gt; that started this works now. Here's &lt;a href="https://molab.marimo.io/notebooks/nb_4AMe9xM8Pxp5sLnxHk6mwo" target="_blank"&gt;a marimo notebook compiling Cython on molab&lt;/a&gt;, no gcc, no system compiler, just Python packages.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;What I learned&lt;/h2&gt;
&lt;p&gt;A C compiler is a huge piece of machinery. Swapping one in isn't like swapping a JSON library. The build system, the platform linker, and decades of accumulated flags all assume a specific compiler. I got basic Cython extensions working, but that's a narrow slice. I couldn't get OpenMP to work on macOS. I haven't tested C++ heavily, or cross-compilation, or extensions that link against system libraries in unusual ways. There are probably flags I haven't hit yet.&lt;/p&gt;
&lt;p&gt;The wrapper is &lt;a href="https://pypi.org/project/zig-cc-python/" target="_blank"&gt;on PyPI&lt;/a&gt; and &lt;a href="https://github.com/cemrehancavdar/zig-cc-python" target="_blank"&gt;on GitHub&lt;/a&gt;. Some findings and a thin wrapper to save the next person from the same debugging. If you hit something it doesn't handle, &lt;a href="https://github.com/cemrehancavdar/zig-cc-python/issues" target="_blank"&gt;open an issue&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Oh, and while I was deep in linker flags and &lt;code&gt;otool&lt;/code&gt; output, Vincent asked the marimo engineers to just &lt;a href="https://github.com/cemrehancavdar/marimo-cython/pull/1#issuecomment-4003847281" target="_blank"&gt;add gcc to the molab containers&lt;/a&gt;. The original notebook already works now.&lt;/p&gt;
&lt;p&gt;I know &lt;a href="https://youtu.be/5W4NFcamRhM?si=WnGo_XEfuLJ_bWAU&amp;t=38" target="_blank"&gt;what this looks like&lt;/a&gt;.&lt;/p&gt;
</description><dc:creator>Cemrehan Çavdar</dc:creator><category>python</category><category>cython</category><category>zig</category><category>c</category><category>compiler</category><category>build</category></item><item><title>Your Framework Doesn't Matter</title><link>https://cemrehancavdar.com/2026/02/19/your-framework-may-not-matter/</link><guid isPermaLink="true">https://cemrehancavdar.com/2026/02/19/your-framework-may-not-matter/</guid><pubDate>Thu, 19 Feb 2026 20:00:00 +0000</pubDate><description>&lt;p&gt;Last week I &lt;a href="/2026/02/10/framework-benchmark/"&gt;benchmarked four web frameworks&lt;/a&gt; and found that BlackSheep is 2x faster than FastAPI. A Rust-based server and JSON serializer pushed Python within striking distance of Go. Impressive numbers.&lt;/p&gt;
&lt;p&gt;But I kept thinking: does any of this matter? Those benchmarks measured localhost throughput with no database and no network. That's not what users experience. A real API request crosses the internet, hits a framework, queries a database through an ORM, serializes the result, and travels back. How much of that time is actually the framework?&lt;/p&gt;
&lt;p&gt;So I built a real app, deployed it, and measured every phase.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The App&lt;/h2&gt;
&lt;p&gt;A book catalog API. FastAPI + SQLAlchemy 2.0 (async) + asyncpg + Uvicorn. The standard Python stack that a developer following the FastAPI docs would use. No exotic dependencies, no optimization tricks.&lt;/p&gt;
&lt;p&gt;Three tables: &lt;strong&gt;Publisher&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Author&lt;/strong&gt; -&amp;gt; &lt;strong&gt;Book&lt;/strong&gt;. Seeded with 4,215 real books from the Open Library API: Agatha Christie, Dostoevsky, Penguin Books, real data with real-world cardinality.&lt;/p&gt;
&lt;p&gt;Deployed to &lt;a href="https://fly.io" target="_blank"&gt;Fly.io&lt;/a&gt; on a shared-cpu-1x machine with 512MB RAM and Postgres 17, both in Amsterdam. The cheapest setup you'd use for a side project.&lt;/p&gt;
&lt;p&gt;Four endpoints:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GET /api/health&lt;/code&gt;&lt;/strong&gt; returns &lt;code&gt;{&amp;quot;status&amp;quot;: &amp;quot;ok&amp;quot;}&lt;/code&gt;. No database, no ORM, no serialization. Pure framework overhead.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GET /api/books/{id}&lt;/code&gt;&lt;/strong&gt; single book with author details. 4 SQL queries via &lt;code&gt;selectinload&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GET /api/books?page=1&amp;amp;per_page=100&lt;/code&gt;&lt;/strong&gt; 100 books with full details. 5 queries, &lt;code&gt;selectinload&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GET /api/books/n-plus-one?page=1&amp;amp;per_page=100&lt;/code&gt;&lt;/strong&gt; same data as #3, but with the classic N+1 bug. &lt;strong&gt;302 queries&lt;/strong&gt; (2 + 100 x 3 individual SELECTs).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Endpoint #4 is the &amp;quot;what not to do&amp;quot; scenario. Same response, same data, but instead of letting SQLAlchemy batch the loads, each book triggers separate queries for its author, publisher, and sibling books.&lt;/p&gt;
&lt;h2&gt;How I Measured&lt;/h2&gt;
&lt;p&gt;Every response carries timing headers measured with &lt;code&gt;time.perf_counter()&lt;/code&gt;. The database layer uses SQLAlchemy's &lt;code&gt;before_cursor_execute&lt;/code&gt; / &lt;code&gt;after_cursor_execute&lt;/code&gt; events to split ORM overhead from raw driver time. A &lt;code&gt;contextvars.ContextVar&lt;/code&gt; stores per-request timings so nothing leaks between concurrent requests.&lt;/p&gt;
&lt;p&gt;The client measures total round-trip time. Network = client total - server total.&lt;/p&gt;
&lt;p&gt;I ran 200 requests per endpoint from Turkey to Amsterdam (~57ms baseline RTT), with 30 warmup requests discarded. All numbers below are medians.&lt;/p&gt;
&lt;h2&gt;Where Does Server Time Go?&lt;/h2&gt;
&lt;p&gt;Let's start with what happens inside the server. No network, just the work Python does.&lt;/p&gt;
&lt;style&gt;
.lc-chart { margin: 24px 0; }
.lc-row { margin: 12px 0; }
.lc-label {
  font-size: 0.85rem;
  color: var(--text-dim);
  margin-bottom: 4px;
}
.lc-bar {
  display: flex;
  height: 28px;
  border-radius: 3px;
  background: var(--border);
}
.lc-bar span {
  display: flex;
  align-items: center;
  justify-content: center;
  font-size: 0.7rem;
  font-weight: 600;
  white-space: nowrap;
  color: #fff;
  min-width: 0;
  position: relative;
  cursor: default;
}
.lc-bar span:first-child { border-radius: 3px 0 0 3px; }
.lc-bar span:last-child { border-radius: 0 3px 3px 0; }
.lc-bar span.lc-narrow { font-size: 0; }
.lc-bar span::after {
  content: attr(data-tip);
  position: absolute;
  bottom: calc(100% + 6px);
  left: 50%;
  transform: translateX(-50%);
  background: var(--text);
  color: var(--bg);
  padding: 3px 8px;
  border-radius: 3px;
  font-size: 0.7rem;
  font-weight: 500;
  white-space: nowrap;
  pointer-events: none;
  opacity: 0;
  transition: opacity 0.15s;
  z-index: 10;
}
.lc-bar span:hover::after {
  opacity: 1;
}
.lc-legend {
  display: flex;
  flex-wrap: wrap;
  gap: 12px;
  margin: 16px 0 8px;
  font-size: 0.8rem;
  color: var(--text-dim);
}
.lc-legend-item {
  display: flex;
  align-items: center;
  gap: 4px;
}
.lc-legend-dot {
  width: 10px;
  height: 10px;
  border-radius: 2px;
  flex-shrink: 0;
}
.lc-meta {
  font-size: 0.75rem;
  color: var(--text-dim);
  margin-top: 2px;
}
.lc-network { background: #94b0cc; }
.lc-db { background: #b8a09b; }
.lc-orm { background: #c5bb9e; color: #555 !important; }
.lc-serialize { background: #a3bca8; color: #555 !important; }
.lc-framework { background: #c26356; }
.lc-encode { background: #b5bfb0; color: #555 !important; }
.lc-bar span { transition: width 0.3s ease; }
.lc-fw-callout {
  margin: 14px 0 0;
  padding: 6px 12px;
  font-size: 0.85rem;
  color: #c26356;
  border-left: 3px solid #c26356;
}

.lc-presets {
  display: flex;
  flex-wrap: wrap;
  gap: 6px;
  margin: 16px 0 12px;
}
.lc-presets button {
  font-family: inherit;
  font-size: 0.8rem;
  padding: 4px 12px;
  border: 1px solid var(--border);
  border-radius: 3px;
  background: var(--bg);
  color: var(--text-dim);
  cursor: pointer;
  transition: border-color 0.15s, color 0.15s;
}
.lc-presets button:hover {
  color: var(--text);
  border-color: var(--text-dim);
}
.lc-presets button.active {
  color: var(--text);
  border-color: var(--text);
  font-weight: 600;
}
&lt;/style&gt;
&lt;div class="lc-chart"&gt;
&lt;div class="lc-legend"&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-db"&gt;&lt;/span&gt;DB Driver&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-orm"&gt;&lt;/span&gt;ORM&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-serialize"&gt;&lt;/span&gt;Serialize&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-encode"&gt;&lt;/span&gt;Encode&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-framework"&gt;&lt;/span&gt;Framework&lt;/span&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;Health check (no DB, no work) &lt;span class="lc-meta"&gt;0.3ms server, 0 queries&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-db lc-narrow" style="width:0%" data-tip="DB Driver: 0ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-orm lc-narrow" style="width:0%" data-tip="ORM: 0ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" style="width:5.6%" data-tip="Serialize: 0.02ms (6%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode" style="width:11.9%" data-tip="Encode: 12%"&gt;0.04ms&lt;/span&gt;
    &lt;span class="lc-framework" style="width:81.8%" data-tip="Framework: 82%"&gt;0.25ms&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;Single book + author &lt;span class="lc-meta"&gt;11.5ms server, 4 queries&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-db" style="width:52.1%" data-tip="DB Driver: 52%"&gt;6.0ms&lt;/span&gt;
    &lt;span class="lc-orm" style="width:39.7%" data-tip="ORM: 40%"&gt;4.6ms&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" style="width:1.3%" data-tip="Serialize: 0.1ms (1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" style="width:0.6%" data-tip="Encode: 0.1ms (1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" style="width:4.3%" data-tip="Framework: 0.5ms (4%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;100 books, optimized &lt;span class="lc-meta"&gt;30.2ms server, 5 queries&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-db" style="width:35.1%" data-tip="DB Driver: 35%"&gt;10.6ms&lt;/span&gt;
    &lt;span class="lc-orm" style="width:46.9%" data-tip="ORM: 47%"&gt;14.2ms&lt;/span&gt;
    &lt;span class="lc-serialize" style="width:10.1%" data-tip="Serialize: 10%"&gt;3.1ms&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" style="width:2.4%" data-tip="Encode: 0.7ms (2%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" style="width:2.5%" data-tip="Framework: 0.7ms (3%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;100 books, N+1 queries &lt;span class="lc-meta"&gt;491.9ms server, 302 queries&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-db" style="width:67.2%" data-tip="DB Driver: 67%"&gt;330.4ms&lt;/span&gt;
    &lt;span class="lc-orm" style="width:30.9%" data-tip="ORM: 31%"&gt;152.1ms&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" style="width:0.5%" data-tip="Serialize: 2.6ms (1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" style="width:0.1%" data-tip="Encode: 0.7ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" style="width:0.3%" data-tip="Framework: 1.3ms (0%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Hover over any segment for percentages. Bars don't sum to exactly 100%. A small residual (1-3%) falls between the timed sections.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The health check tells the story immediately. When there's no database, the framework &lt;em&gt;is&lt;/em&gt; the server, 82% of 0.3ms. But the moment you add real work, it disappears. For the optimized 100-book query, the DB driver and ORM together account for 82% of server time. Serialization is 10%. The framework (FastAPI's routing, middleware, dependency injection) is 2-3%. For the single book endpoint, it's 4%.&lt;/p&gt;
&lt;p&gt;The N+1 scenario is brutal. Same data, same response, but 302 queries instead of 5. Server time goes from 30ms to 492ms, a &lt;strong&gt;16x increase&lt;/strong&gt;, because each of those 302 queries pays a round-trip to Postgres and an ORM hydration cost.&lt;/p&gt;
&lt;p&gt;But this is still only the server's perspective. What does the user actually experience?&lt;/p&gt;
&lt;h2&gt;Now Zoom Out&lt;/h2&gt;
&lt;p&gt;Same four endpoints, but now we include what happens before and after the server: DNS, TCP, TLS, request travel, response travel, all lumped together as &amp;quot;Network.&amp;quot;&lt;/p&gt;
&lt;p&gt;Pick a distance to see how it changes the picture:&lt;/p&gt;
&lt;div class="lc-chart" id="act2-chart"&gt;
&lt;div class="lc-presets"&gt;
  &lt;button data-rtt="5"&gt;Same building&lt;/button&gt;
  &lt;button data-rtt="15"&gt;Same city&lt;/button&gt;
  &lt;button data-rtt="40"&gt;Across Europe&lt;/button&gt;
  &lt;button data-rtt="57" class="active"&gt;Ankara → Amsterdam&lt;/button&gt;
  &lt;button data-rtt="150"&gt;Other continent&lt;/button&gt;
&lt;/div&gt;
&lt;div class="lc-legend"&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-network"&gt;&lt;/span&gt;Network&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-db"&gt;&lt;/span&gt;DB Driver&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-orm"&gt;&lt;/span&gt;ORM&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-serialize"&gt;&lt;/span&gt;Serialize&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-encode"&gt;&lt;/span&gt;Encode&lt;/span&gt;
  &lt;span class="lc-legend-item"&gt;&lt;span class="lc-legend-dot lc-framework"&gt;&lt;/span&gt;Framework&lt;/span&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;Health check (no DB, no work) &lt;span class="lc-meta" id="a2-s0-meta"&gt;69.6ms total&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-network" id="a2-s0-net" style="width:99.4%" data-tip="Network: 99%"&gt;69.2ms&lt;/span&gt;
    &lt;span class="lc-db lc-narrow" id="a2-s0-db" style="width:0%" data-tip="DB Driver: 0ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-orm lc-narrow" id="a2-s0-orm" style="width:0%" data-tip="ORM: 0ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" id="a2-s0-ser" style="width:0%" data-tip="Serialize: 0ms (0%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" id="a2-s0-enc" style="width:0.1%" data-tip="Encode: 0ms (0.1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" id="a2-s0-fw" style="width:0.4%" data-tip="Framework: 0.2ms (0.4%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;Single book + author &lt;span class="lc-meta" id="a2-s1-meta"&gt;68.8ms total&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-network" id="a2-s1-net" style="width:82.6%" data-tip="Network: 83%"&gt;56.9ms&lt;/span&gt;
    &lt;span class="lc-db" id="a2-s1-db" style="width:8.7%" data-tip="DB Driver: 9%"&gt;6.0ms&lt;/span&gt;
    &lt;span class="lc-orm" id="a2-s1-orm" style="width:6.6%" data-tip="ORM: 7%"&gt;4.6ms&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" id="a2-s1-ser" style="width:0.2%" data-tip="Serialize: 0.1ms (0.2%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" id="a2-s1-enc" style="width:0.1%" data-tip="Encode: 0.1ms (0.1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" id="a2-s1-fw" style="width:0.7%" data-tip="Framework: 0.5ms (0.7%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;100 books, optimized &lt;span class="lc-meta" id="a2-s2-meta"&gt;97.0ms total&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-network" id="a2-s2-net" style="width:68.6%" data-tip="Network: 69%"&gt;66.6ms&lt;/span&gt;
    &lt;span class="lc-db" id="a2-s2-db" style="width:10.9%" data-tip="DB Driver: 11%"&gt;10.6ms&lt;/span&gt;
    &lt;span class="lc-orm" id="a2-s2-orm" style="width:14.6%" data-tip="ORM: 15%"&gt;14.2ms&lt;/span&gt;
    &lt;span class="lc-serialize" id="a2-s2-ser" style="width:3.1%" data-tip="Serialize: 3%"&gt;3.1ms&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" id="a2-s2-enc" style="width:0.7%" data-tip="Encode: 0.7ms (0.7%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" id="a2-s2-fw" style="width:0.8%" data-tip="Framework: 0.7ms (0.8%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-row"&gt;
  &lt;div class="lc-label"&gt;100 books, N+1 queries &lt;span class="lc-meta" id="a2-s3-meta"&gt;613.2ms total&lt;/span&gt;&lt;/div&gt;
  &lt;div class="lc-bar"&gt;
    &lt;span class="lc-network" id="a2-s3-net" style="width:13.4%" data-tip="Network: 13%"&gt;82.2ms&lt;/span&gt;
    &lt;span class="lc-db" id="a2-s3-db" style="width:53.9%" data-tip="DB Driver: 54%"&gt;330.4ms&lt;/span&gt;
    &lt;span class="lc-orm" id="a2-s3-orm" style="width:24.8%" data-tip="ORM: 25%"&gt;152.1ms&lt;/span&gt;
    &lt;span class="lc-serialize lc-narrow" id="a2-s3-ser" style="width:0.4%" data-tip="Serialize: 2.6ms (0.4%)"&gt;&lt;/span&gt;
    &lt;span class="lc-encode lc-narrow" id="a2-s3-enc" style="width:0.1%" data-tip="Encode: 0.7ms (0.1%)"&gt;&lt;/span&gt;
    &lt;span class="lc-framework lc-narrow" id="a2-s3-fw" style="width:0.2%" data-tip="Framework: 1.3ms (0.2%)"&gt;&lt;/span&gt;
  &lt;/div&gt;
&lt;/div&gt;
&lt;div class="lc-fw-callout" id="a2-fw-callout"&gt;Framework is &lt;strong id="a2-fw-pct"&gt;0.2–0.9%&lt;/strong&gt; of total response time&lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;em&gt;Hover over any segment for percentages. Server timings are constant, only network changes.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;There it is. The health check, where the framework has nothing to do except route and respond, is &lt;strong&gt;99% network&lt;/strong&gt;. The server finishes in 0.3ms. The user waits 70ms.&lt;/p&gt;
&lt;p&gt;For a single book lookup, &lt;strong&gt;83% of what the user waits for is the network&lt;/strong&gt;. The entire server (framework, ORM, database, serialization, JSON encoding) is the remaining 17%. The framework specifically is 0.7%.&lt;/p&gt;
&lt;p&gt;For 100 books with proper queries, network is 69%. The server does more work (30ms vs 12ms), but the user still spends most of their time waiting for packets to cross the internet.&lt;/p&gt;
&lt;p&gt;These numbers default to my setup. I live in Ankara, Turkey, and my closest Fly.io region is Amsterdam. Try the presets above to see how distance changes the picture. Even in the best case (same building, 5ms) network is still 30% of a single book lookup. And most SaaS products aren't running multi-region deployments with edge nodes. They have one server in one region.&lt;/p&gt;
&lt;p&gt;The N+1 scenario flips everything. Network drops to 13%, not because the network got faster, but because the server got so slow (492ms) that it dwarfs the network time. This is the only scenario where server-side code meaningfully impacts user experience. And the cause isn't the framework, it's 302 queries instead of 5.&lt;/p&gt;
&lt;h2&gt;Framework Overhead Across All Scenarios&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Total&lt;/th&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Framework %&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Health check (no DB)&lt;/td&gt;
&lt;td&gt;69.6ms&lt;/td&gt;
&lt;td&gt;0.2ms&lt;/td&gt;
&lt;td&gt;0.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Single book&lt;/td&gt;
&lt;td&gt;68.8ms&lt;/td&gt;
&lt;td&gt;0.5ms&lt;/td&gt;
&lt;td&gt;0.7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 books (optimized)&lt;/td&gt;
&lt;td&gt;97.0ms&lt;/td&gt;
&lt;td&gt;0.7ms&lt;/td&gt;
&lt;td&gt;0.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 books (N+1)&lt;/td&gt;
&lt;td&gt;613.2ms&lt;/td&gt;
&lt;td&gt;1.3ms&lt;/td&gt;
&lt;td&gt;0.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The health check is the best case for the framework: no database, no ORM, no serialization. The server does almost nothing. And still, framework overhead is 0.2ms out of a 70ms request. FastAPI's routing, middleware, dependency injection, and ASGI handling cost 0.2-1.3ms across all scenarios. That's the thing benchmarks compare when they say &amp;quot;FastAPI vs BlackSheep&amp;quot; or &amp;quot;Python vs Go.&amp;quot; The thing that accounts for less than 1% of what users experience.&lt;/p&gt;
&lt;p&gt;In my &lt;a href="/2026/02/10/framework-benchmark/"&gt;previous benchmark&lt;/a&gt;, BlackSheep was 2x faster than FastAPI. That 2x difference applies to 0.7% of the total response time. Switching frameworks would save roughly 0.25ms on a 69ms request.&lt;/p&gt;
&lt;h2&gt;Putting Traffic in Perspective&lt;/h2&gt;
&lt;p&gt;Let's say your API gets 1 million requests per day. That sounds like a lot. It's 12 requests per second.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Daily Requests&lt;/th&gt;
&lt;th&gt;Avg req/s&lt;/th&gt;
&lt;th&gt;Peak req/s (3x avg)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;1.2&lt;/td&gt;
&lt;td&gt;3.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;td&gt;11.6&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000,000&lt;/td&gt;
&lt;td&gt;115.7&lt;/td&gt;
&lt;td&gt;347&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Levels.fyi, a site with 1-2 million monthly uniques and over $1M ARR, runs one of its most trafficked services on &lt;a href="https://www.levels.fyi/blog/scaling-to-millions-with-google-sheets.html" target="_blank"&gt;a single Node.js instance serving 60K requests per hour&lt;/a&gt;. That's 17 req/s. FastAPI handles 46,000 req/s on a single worker in my benchmarks. You have roughly 2,700x headroom.&lt;/p&gt;
&lt;p&gt;In 2016, Stack Overflow served &lt;a href="https://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/" target="_blank"&gt;209 million HTTP requests per day&lt;/a&gt; (about 2,400 req/s average) on 9 web servers. Nick Craver said they'd unintentionally tested running on a single server, and it worked.&lt;/p&gt;
&lt;p&gt;Framework throughput differences don't matter when your actual traffic is three orders of magnitude below capacity.&lt;/p&gt;
&lt;h2&gt;What I Didn't Measure&lt;/h2&gt;
&lt;p&gt;This is a sequential measurement from a single client, no concurrent load. Under concurrency, connection pooling, async scheduling, and GIL contention could change the server-side breakdown. The &amp;quot;Framework&amp;quot; bucket lumps together Uvicorn, Starlette, and FastAPI. I didn't separate them. &amp;quot;Network&amp;quot; lumps DNS, TLS, TCP, and raw packet travel. Response sizes are pre-compression (the real responses would be smaller over gzip).&lt;/p&gt;
&lt;p&gt;At scale, a faster framework means fewer servers, that's real cost savings. But &amp;quot;at scale&amp;quot; means hundreds of thousands of requests per second, not millions per day. And long before you get there, you'll have optimized your queries, added caching, moved to handwritten SQL, and maybe even forked your runtime. &lt;a href="https://github.com/facebookincubator/cinder" target="_blank"&gt;Facebook built their own Python&lt;/a&gt; before they worried about framework overhead.&lt;/p&gt;
&lt;p&gt;All measurements: 200 samples each, medians, from Turkey to Amsterdam. The raw data is in the repository.&lt;/p&gt;
&lt;h2&gt;What I Learned&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Deploy closer to your users.&lt;/strong&gt; For well-written queries, 69-83% of response time is packets crossing the internet. No framework optimization changes this. If your server is in Amsterdam and your users are in Ankara, they're waiting 57ms before your code even runs. Move the server, or put a cache at the edge.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix your queries, not your framework.&lt;/strong&gt; The N+1 bug turned a 97ms response into a 613ms one, 6.3x slower, and framework overhead was still only 0.2%. Switching from FastAPI to BlackSheep would save 0.25ms. Fixing the N+1 bug saves 516ms. Profile your queries. Add &lt;code&gt;selectinload&lt;/code&gt;. Use &lt;code&gt;EXPLAIN ANALYZE&lt;/code&gt;. That's where the seconds are.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pick your framework for everything except speed.&lt;/strong&gt; Framework benchmarks compare the one component that doesn't matter (0.2-0.8% of total time) under conditions that don't exist (localhost, no database, no network). Pick for developer experience, documentation, ecosystem, and hiring. The framework that lets you ship faster is the fast framework.&lt;/p&gt;
&lt;p&gt;If you want to see what actually makes a website fast in practice, Wes Bos has &lt;a href="https://www.youtube.com/watch?v=-Ln-8QM8KhQ" target="_blank"&gt;a great breakdown&lt;/a&gt;. Hint: it's not the framework.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;Benchmarking is hard. I'm sure I got something wrong, missed an important variable, or made an assumption that doesn't hold. All the code, measurement scripts, and raw timing data are in the &lt;a href="https://github.com/cemrehancavdar/api-lifecycle" target="_blank"&gt;repository&lt;/a&gt;. Please try to break it. If you find a flaw in the methodology, a timing error, or a scenario that would change the conclusions, I genuinely want to hear about it.&lt;/p&gt;
&lt;script&gt;
(function () {
  var scenarios = [
    { key: "s0", server: 0.303, db: 0, orm: 0, ser: 0.017, enc: 0.036, fw: 0.248 },
    { key: "s1", server: 11.523, db: 6.001, orm: 4.575, ser: 0.145, enc: 0.065, fw: 0.500 },
    { key: "s2", server: 30.231, db: 10.599, orm: 14.174, ser: 3.053, enc: 0.720, fw: 0.742 },
    { key: "s3", server: 491.918, db: 330.353, orm: 152.146, ser: 2.587, enc: 0.690, fw: 1.296 }
  ];

  var parts = ["net", "db", "orm", "ser", "enc", "fw"];
  var partLabels = ["Network", "DB Driver", "ORM", "Serialize", "Encode", "Framework"];
  var narrowThreshold = 4;

  function fmt(v) { return v &lt; 10 ? v.toFixed(1) : Math.round(v).toString(); }

  function updateChart(rtt) {
    var fwPcts = [];
    scenarios.forEach(function (s) {
      var total = rtt + s.server;
      var values = { net: rtt, db: s.db, orm: s.orm, ser: s.ser, enc: s.enc, fw: s.fw };

      parts.forEach(function (p, i) {
        var el = document.getElementById("a2-" + s.key + "-" + p);
        if (!el) return;
        var ms = values[p];
        var pct = (ms / total) * 100;
        el.style.width = pct + "%";

        if (pct &gt;= narrowThreshold) {
          el.textContent = fmt(ms) + "ms";
          el.classList.remove("lc-narrow");
        } else {
          el.textContent = "";
          el.classList.add("lc-narrow");
        }

        if (pct &gt;= narrowThreshold) {
          el.setAttribute("data-tip", partLabels[i] + ": " + Math.round(pct) + "%");
        } else {
          el.setAttribute("data-tip", partLabels[i] + ": " + fmt(ms) + "ms (" + pct.toFixed(1) + "%)");
        }
      });

      fwPcts.push((s.fw / total) * 100);

      var meta = document.getElementById("a2-" + s.key + "-meta");
      if (meta) meta.textContent = fmt(total) + "ms total";
    });

    var fwMin = Math.min.apply(null, fwPcts);
    var fwMax = Math.max.apply(null, fwPcts);
    var callout = document.getElementById("a2-fw-pct");
    if (callout) callout.textContent = fwMin.toFixed(1) + "–" + fwMax.toFixed(1) + "%";
  }

  var buttons = document.querySelectorAll(".lc-presets button");
  buttons.forEach(function (btn) {
    btn.addEventListener("click", function () {
      buttons.forEach(function (b) { b.classList.remove("active"); });
      btn.classList.add("active");
      updateChart(parseFloat(btn.getAttribute("data-rtt")));
    });
  });

  updateChart(57);
})();
&lt;/script&gt;</description><dc:creator>Cemrehan Çavdar</dc:creator><category>python</category><category>benchmark</category><category>web</category><category>performance</category></item><item><title>Benchmarking Gin, Elysia, BlackSheep, and FastAPI</title><link>https://cemrehancavdar.com/2026/02/10/framework-benchmark/</link><guid isPermaLink="true">https://cemrehancavdar.com/2026/02/10/framework-benchmark/</guid><pubDate>Tue, 10 Feb 2026 20:00:00 +0000</pubDate><description>&lt;p&gt;I always felt like JavaScript and Go are the alternative languages for Python. I wouldn't compare Python to Rust or Zig. So when I keep seeing Gin vs Elysia benchmarks, I wanted to throw Python into the mix. FastAPI says it's fast right in the name. Let's find out.&lt;/p&gt;
&lt;p&gt;Four frameworks, three languages, same Docker constraints, same endpoints.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://github.com/cemrehancavdar/framework-benchmark"&gt;Source code on GitHub&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; Python's ecosystem (Granian, orjson) makes it fast enough to beat Bun's Elysia on validation and routing. Gin (Go) still wins overall. BlackSheep &amp;gt; FastAPI by 2x. Full numbers and code below.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2&gt;The Setup&lt;/h2&gt;
&lt;p&gt;Every framework runs in a Docker container with identical constraints:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Server&lt;/strong&gt;: 2 CPUs, 512MB RAM, 2 workers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Client&lt;/strong&gt;: &lt;a href="https://github.com/wg/wrk"&gt;wrk&lt;/a&gt; with 2 threads, 128 connections, 10 seconds per endpoint&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Machine&lt;/strong&gt;: Apple M4 Pro&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Four endpoints&lt;/strong&gt;: plaintext, JSON, URL params, POST validation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &amp;quot;2 workers&amp;quot; part is important. Go uses &lt;code&gt;GOMAXPROCS=2&lt;/code&gt;, Python uses 2 uvicorn workers, and Bun uses cluster mode with 2 processes. Everyone gets two CPU cores and two parallel execution contexts.&lt;/p&gt;
&lt;p&gt;Each endpoint does progressively more work:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;/plaintext&lt;/code&gt; return &lt;code&gt;&amp;quot;Hello, World!&amp;quot;&lt;/code&gt; (raw I/O)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/json&lt;/code&gt; return &lt;code&gt;{&amp;quot;message&amp;quot;: &amp;quot;Hello, World!&amp;quot;}&lt;/code&gt; (serialization)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;/user/42&lt;/code&gt; parse a URL param, return &lt;code&gt;{&amp;quot;id&amp;quot;: &amp;quot;42&amp;quot;, &amp;quot;name&amp;quot;: &amp;quot;User 42&amp;quot;}&lt;/code&gt; (routing)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;POST /validate&lt;/code&gt; parse JSON body, validate fields, return result (real-world work)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;The Contenders&lt;/h2&gt;
&lt;h3&gt;FastAPI (Python)&lt;/h3&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;fastapi&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;fastapi.responses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;PlainTextResponse&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;pydantic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;HELLO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Hello, World!&amp;quot;&lt;/span&gt;


&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;UserInput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Field&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;le&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/plaintext&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;PlainTextResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;PlainTextResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;json_endpoint&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;message&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/user/&lt;/span&gt;&lt;span class="si"&gt;{user_id}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/validate&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UserInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;valid&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The crowd favorite. Pydantic models give you validation, serialization, and OpenAPI docs in one shot. It's the most productive framework here, but that productivity has a cost at runtime, which we'll see in the numbers.&lt;/p&gt;
&lt;h3&gt;Gin (Go)&lt;/h3&gt;
&lt;pre&gt;&lt;code class="language-go"&gt;&lt;span class="kn"&gt;package&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;fmt&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;net/http&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;os&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;runtime&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;strconv&amp;quot;&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;github.com/gin-gonic/gin&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hello&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;Hello, World!&amp;quot;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ValidateInput&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;struct&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;quot;name&amp;quot; binding:&amp;quot;required,min=1&amp;quot;`&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;Age&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s"&gt;`json:&amp;quot;age&amp;quot; binding:&amp;quot;required,gte=0,lte=150&amp;quot;`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;workers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;strconv&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Atoi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;WORKERS&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;workers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;v&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GOMAXPROCS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SetMode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ReleaseMode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;New&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/plaintext&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;String&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;message&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;hello&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GET&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/user/:id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;User %s&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;/validate&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kd"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;ValidateInput&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;:=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ShouldBindJSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;!=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusBadRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;()})&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="nx"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;http&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StatusOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;gin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;H&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="nx"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;			&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;valid&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;		&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="w"&gt;	&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;&amp;quot;:3000&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Gin is terse. Struct tags handle validation. The &lt;code&gt;gin.H{}&lt;/code&gt; shorthand for map literals keeps handlers compact. Go's goroutine scheduler makes concurrency almost invisible. You just set &lt;code&gt;GOMAXPROCS&lt;/code&gt; and everything scales.&lt;/p&gt;
&lt;h3&gt;Elysia (Bun)&lt;/h3&gt;
&lt;pre&gt;&lt;code class="language-typescript"&gt;&lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Elysia&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kr"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;elysia&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;HELLO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Hello, World!&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;3000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="ow"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Elysia&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/plaintext&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;HELLO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/user/:id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;params.id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`User &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/validate&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;=&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;body.name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;age&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;body.age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;valid&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;t.Object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;t.String&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;minLength&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nx"&gt;age&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;t.Integer&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;minimum&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maximum&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kt"&gt;150&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;The most elegant of the bunch. Elysia's API is beautifully minimal. Return an object and it becomes JSON. The TypeBox schema validation (&lt;code&gt;t.Object&lt;/code&gt;, &lt;code&gt;t.String&lt;/code&gt;) is declarative and type-safe. Bun's runtime makes it fast.&lt;/p&gt;
&lt;h3&gt;BlackSheep (Python)&lt;/h3&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blacksheep&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Application&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blacksheep.server.responses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blacksheep.server.responses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;text_resp&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Application&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;HELLO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Hello, World!&amp;quot;&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/plaintext&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_resp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;json_endpoint&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;message&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/user/&lt;/span&gt;&lt;span class="si"&gt;{user_id}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/validate&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name must be a non-empty string&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;age must be an integer between 0 and 150&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_resp&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;valid&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Most Python developers haven't heard of BlackSheep. No magic, no heavy abstractions, manual validation. You'll see why it's here in a moment.&lt;/p&gt;
&lt;h2&gt;Results&lt;/h2&gt;
&lt;p&gt;All numbers are requests per second, higher is better. Each framework ran with 2 workers on 2 CPUs.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Plaintext&lt;/th&gt;
&lt;th&gt;JSON&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Validate (POST)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gin&lt;/strong&gt; (Go)&lt;/td&gt;
&lt;td&gt;299,632&lt;/td&gt;
&lt;td&gt;288,408&lt;/td&gt;
&lt;td&gt;266,471&lt;/td&gt;
&lt;td&gt;195,275&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Elysia&lt;/strong&gt; (Bun)&lt;/td&gt;
&lt;td&gt;246,068&lt;/td&gt;
&lt;td&gt;219,089&lt;/td&gt;
&lt;td&gt;185,984&lt;/td&gt;
&lt;td&gt;102,488&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BlackSheep&lt;/strong&gt; (Python)&lt;/td&gt;
&lt;td&gt;152,005&lt;/td&gt;
&lt;td&gt;129,958&lt;/td&gt;
&lt;td&gt;128,939&lt;/td&gt;
&lt;td&gt;98,829&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI&lt;/strong&gt; (Python)&lt;/td&gt;
&lt;td&gt;79,785&lt;/td&gt;
&lt;td&gt;66,114&lt;/td&gt;
&lt;td&gt;51,560&lt;/td&gt;
&lt;td&gt;45,963&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;A few things jump out.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Gin wins everything&lt;/strong&gt;, which isn't surprising. Go's goroutine scheduler and compiled performance are hard to beat. But it's not a blowout against Elysia on plaintext (300k vs 246k, only 22% ahead).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Elysia drops hard under load.&lt;/strong&gt; From plaintext (246k) to validate (102k), it loses 58% of its throughput. Bun is fast at raw I/O, but TypeBox validation in JavaScript is expensive relative to the baseline.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;BlackSheep is shockingly fast for Python.&lt;/strong&gt; 152k req/s on plaintext, and it holds up well under load, only a 35% drop to validate (99k). That validate number is close to Elysia's (102k vs 99k). A Python framework running at 96% of Bun's speed on a real workload.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;FastAPI is about half of BlackSheep&lt;/strong&gt; across the board. The Pydantic validation layer and middleware stack cost roughly 2x in overhead. Still, 46k req/s on validate is respectable.&lt;/p&gt;
&lt;h2&gt;So Can BlackSheep Get Even Faster?&lt;/h2&gt;
&lt;p&gt;Two swaps. &lt;a href="https://github.com/emmett-framework/granian"&gt;Granian&lt;/a&gt; instead of uvicorn, a Rust-based ASGI server. &lt;a href="https://github.com/ijl/orjson"&gt;orjson&lt;/a&gt; instead of stdlib &lt;code&gt;json&lt;/code&gt;, a Rust-based JSON serializer. Same application code, different plumbing:&lt;/p&gt;
&lt;pre&gt;&lt;code class="language-python"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;orjson&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blacksheep&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Application&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;blacksheep.server.responses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;text_resp&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Application&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;show_error_details&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;HELLO&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Hello, World!&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;CT_JSON&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;application/json&amp;quot;&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="sd"&gt;&amp;quot;&amp;quot;&amp;quot;Build a Response from orjson-serialized bytes.&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CT_JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;orjson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/plaintext&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;plaintext&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;text_resp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/json&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;json_endpoint&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;message&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;HELLO&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/user/&lt;/span&gt;&lt;span class="si"&gt;{user_id}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;get_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;User &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;


&lt;span class="nd"&gt;@app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;router&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;/validate&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;orjson&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;name must be a non-empty string&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nb"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;150&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;error&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;age must be an integer between 0 and 150&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json_bytes_response&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;age&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;valid&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Plaintext&lt;/th&gt;
&lt;th&gt;JSON&lt;/th&gt;
&lt;th&gt;Params&lt;/th&gt;
&lt;th&gt;Validate (POST)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gin&lt;/strong&gt; (Go)&lt;/td&gt;
&lt;td&gt;299,632&lt;/td&gt;
&lt;td&gt;288,408&lt;/td&gt;
&lt;td&gt;266,471&lt;/td&gt;
&lt;td&gt;195,275&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Elysia&lt;/strong&gt; (Bun)&lt;/td&gt;
&lt;td&gt;246,068&lt;/td&gt;
&lt;td&gt;219,089&lt;/td&gt;
&lt;td&gt;185,984&lt;/td&gt;
&lt;td&gt;102,488&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BlackSheep+Granian+orjson&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;204,575&lt;/td&gt;
&lt;td&gt;202,394&lt;/td&gt;
&lt;td&gt;189,881&lt;/td&gt;
&lt;td&gt;119,527&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BlackSheep&lt;/strong&gt; (uvicorn)&lt;/td&gt;
&lt;td&gt;152,005&lt;/td&gt;
&lt;td&gt;129,958&lt;/td&gt;
&lt;td&gt;128,939&lt;/td&gt;
&lt;td&gt;98,829&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FastAPI&lt;/strong&gt; (Python)&lt;/td&gt;
&lt;td&gt;79,785&lt;/td&gt;
&lt;td&gt;66,114&lt;/td&gt;
&lt;td&gt;51,560&lt;/td&gt;
&lt;td&gt;45,963&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Yes it can. BlackSheep+Granian+orjson beats Elysia on validate (120k vs 102k) and params (190k vs 186k). The JSON endpoint improved 56%. That's what swapping &lt;code&gt;json.dumps()&lt;/code&gt; for a Rust serializer does.&lt;/p&gt;
&lt;h2&gt;What I Learned&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Python may be slower, but the ecosystem makes it fast.&lt;/strong&gt; The language itself isn't winning any speed contests. But Granian (Rust HTTP server), orjson (Rust JSON), and uvloop (Cython event loop) let Python compete with Bun and get within striking distance of Go. The ecosystem does the heavy lifting.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The server matters as much as the framework.&lt;/strong&gt; Swapping uvicorn for Granian gave a 35% boost on plaintext without changing application code. HTTP parsing and connection management aren't free.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Validation is the great equalizer.&lt;/strong&gt; Raw I/O benchmarks favor compiled languages. But once every framework has to parse JSON, validate fields, and return structured errors, the gaps shrink. BlackSheep goes from 62% of Gin on plaintext to 61% on validate. Elysia goes from 82% to 52%.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;All the code, Dockerfiles, and raw results are in the &lt;a href="https://github.com/cemrehancavdar/framework-benchmark"&gt;repository&lt;/a&gt;. Benchmarking is hard. If you spot something unfair or wrong, please tell me.&lt;/p&gt;
</description><dc:creator>Cemrehan Çavdar</dc:creator><category>python</category><category>go</category><category>javascript</category><category>benchmark</category><category>web</category></item></channel></rss>