Branchless UTF-8 DecodingPermalink

Charles Eckman:

In a Recurse chat, Nathan Goldbaum asked:

I know how to decode UTF-8 using bitmath and some LUTs (see https://github.com/skeeto/branchless-utf8), but if I want to to go from a codepoint to UTF-8, is there a way to do it without branches?

To start with, is there a way to write this C function, which returns the number of bytes needed to store the UTF-8 bytes for the codepoint without any branches? Or would I need a huge look-up-table?

A neat problem to explore.

No if statements, loops, or other conditionals. So, branchless, right?

…well, no. If we peek at the (optimized) code in Compiler Explorer, we can see the x86_64 assembly has two different kinds of branches.

I tinkered with this initial implementation a little. If you compile it with target-cpu=x86-64-v3 (circa 2013 CPUs) the test and je from leading_zeros call are eliminated, leaving just the bounds check ja.

Using the Rust Type System to Prevent Bugs in the Fuchsia Network StackPermalink

Daroc Alden at LWN covering Joshua Liebow-Feeser at RustConf:

Netstack3 encompasses 63 crates and 60 developer-years of code. It contains more code than the top ten crates on crates.io combined. Over the last year, the code has finally become ready to deploy. But deploying it to production requires care — networking code can be hard to test, and the developers have to assume it has bugs. For the past eleven months, they have been running the new networking stack on 60 devices, full time. In that time, Liebow-Feeser said, most code would have been expected to show “mountains of bugs”. Netstack3 had only three; he attributed that low number to the team’s approach of encoding as many important invariants in the type system as possible.

A remarkable result. Great to see signs that Fuchsia is still alive too.

Debugging an arm64 Segfault in the PostgreSQL JIT CompilerPermalink

Anthonin Bonnefoy, Mitch Ward, and Gillian McGarvey on the Datadog blog:

Ultimately, by successfully isolating and debugging the crashes down to the assembly level, we were able to identify an issue in Postgres JIT compilation, and more specifically a bug in LLVM for Arm64 architectures. This post describes our investigation into the root cause of these crashes, which took us on a deep dive into JIT compilation and led us to an upstream fix resolution.

A crashing database must be a pretty stressful scenario when working at the sort of scale Datadog is operating at. Great detective work getting to the bottom of the issue. Also interesting to learn that they’re running the db on ARM servers.

With JIT disabled, the query ran without triggering a segfault. As an immediate mitigation, we disabled JIT for the entire cluster, which stopped the crashes completely and without any noticeable impact on query latencies. It was time to relax and grab a cup of coffee.

If turning the JIT off had no noticeable impact on the query latencies you have to wonder if it’s worth the trouble of using it?

NixOS Paper CutsPermalink

Jono Finger:

This is my perspective on using Nix (the OS, the package manager, and the language) as a main driver for the past 2 years. I have gone to conferences, engaged the community, donated, submitted bug reports, converted my home servers, and probably spent hundreds of hours in Nix configs. I consider myself well versed, but certainly no expert.

TLDR: In its current state (2025), I don’t generally recommend desktop use of Nix(OS), even for seasoned Linux users.

That’s a pretty strong summary, but I’m not sure I’d take this post as general advice. It’s detailed, and documents Jono’s experience, but with any niche system like NixOS you’re going to run into paper cuts. The specific paper cuts will vary by person based on what they do with their computer, as will their threshold for tolerating them.

To be clear, I love Nix and have learned a lot from it. I am not giving up on it, but its time for me to take a break and scale back my all-in attitude.

In this case it seems Jono reached their paper cut threshold, which is totally reasonable. Some people will push through because they want the benefits despite the friction, others will drop off earlier. If you’re thinking of trying out NixOS I think this post is worth a read, but I wouldn’t let you stop you from trying it out.

On the Utility of JSON Feed

JSON Feed is a specification by Manton Reece and Brent Simmons for a syndication format like RSS and Atom that uses JSON instead of XML. Robb Knight posed the question:

Ignoring podcasts, which has to be RSS, is there an argument for using RSS/atom over JSON feeds?

Ruben Schade responded with a thoughtful post citing several theories as to why RSS/Atom feed are still the default:

  • JSON Feed suffers the XKCD Standards Effect. Every blog has RSS support, even those that implement JSON Feed. I’ve yet to encounter the reverse. If JSON feed were offering anything more than an alternative serialisation format, maybe it’d be more compelling. But RSS does the job, and adding another standard didn’t offer much.
  • JSON Feed was mostly a redundant format. As I wrote in 2017, if the issue with RSS was XML, you could directly serialise RSS with JSON. JSON Feed was entirely new, which requires you to reimplement everything to do… what you can already do with an existing tool. Meh.
  • JSON Feed solved a problem for developers, not users. People who write material that end up in feeds, and the people who read those feeds, couldn’t give a toss what the format is written in. Those of us in the industry forget this at our peril, every single time.

These are great points and it seems unlikely that JSON Feed will displace RSS and Atom as the default syndication formats in use. However, I want to discuss that last point a little more.

Ruben is right that the typical visitor that wants to subscribe to your website doesn’t care what the format is, as long it works in their feed reader. Although I’d argue that developers that want to make use of content in a machine readable form are users too. And for that purpose JSON is way easier to handle than XML. Here’s a couple of examples of that in practice:

  • The unreleased tool that posts each new post on this site to Mastodon uses the JSON Feed as the source of posts.
  • This tutorial used the JSON Feed from Read Rust (an earlier project of mine) to build a CLI tool for the site.

It’s also easy to consume JSON on the command line with curl and jq/jaq. For example, this command line lists all the posts in the Linked List JSON Feed that are tagged with ‘retro‑computing’:

$ curl -s https://linkedlist.org/feed.json | \
    jaq '.items | map(select(.tags | contains(["retro-computing"]))) | .[].title'
"More Than You Ever Wanted to Know About the Windows 3.1 Graphics Stack"

All that is to say I think JSON Feed has utility even if it doesn’t replace XML based feeds. I think of JSON Feed as an additional representation of my website, not a replacement for RSS and Atom feeds. As I a result I offer both, so visitors can use whatever format makes sense for their use-case.