A Programmer’s Introduction to Unicode – Nathan Reed’s coding blog
march 2017 by jm
Fascinating Unicode details -- a lot of which were new to me. Love the heat map of usage in Wikipedia:
unicode
coding
character-sets
wikipedia
bmp
emoji
twitter
languages
characters
heat-maps
dataviz
One more interesting way to visualize the codespace is to look at the distribution of usage—in other words, how often each code point is actually used in real-world texts. Below is a heat map of planes 0–2 based on a large sample of text from Wikipedia and Twitter (all languages). Frequency increases from black (never seen) through red and yellow to white.
You can see that the vast majority of this text sample lies in the BMP, with only scattered usage of code points from planes 1–2. The biggest exception is emoji, which show up here as the several bright squares in the bottom row of plane 1.
march 2017 by jm
Copy this bookmark: