Linked List: Branchless UTF-8 Decoding

Charles Eckman:

In a Recurse chat, Nathan Goldbaum asked:

I know how to decode UTF-8 using bitmath and some LUTs (see https://github.com/skeeto/branchless-utf8), but if I want to to go from a codepoint to UTF-8, is there a way to do it without branches?

To start with, is there a way to write this C function, which returns the number of bytes needed to store the UTF-8 bytes for the codepoint without any branches? Or would I need a huge look-up-table?

A neat problem to explore.

No if statements, loops, or other conditionals. So, branchless, right?

…well, no. If we peek at the (optimized) code in Compiler Explorer, we can see the x86_64 assembly has two different kinds of branches.

I tinkered with this initial implementation a little. If you compile it with target-cpu=x86-64-v3 (circa 2013 CPUs) the test and je from leading_zeros call are eliminated, leaving just the bounds check ja.