jm + characters   6

Turla’s watering hole campaign: An updated Firefox extension abusing Instagram
Pretty crazy.
The extension will look at each photo’s comment and will compute a custom hash value. If the hash matches 183, it will then run this regular expression on the comment in order to obtain the path of the bit.ly URL:
(?:\\u200d(?:#|@)(\\w)

Looking at the photo’s comments, there was only one for which the hash matches 183. This comment was posted on February 6, while the original photo was posted in early January. Taking the comment and running it through the regex, you get the following bit.ly URL: bit.ly/2kdhuHX

Looking a bit more closely at the regular expression, we see it is looking for either @|# or the Unicode character \200d. This character is actually a non-printable character called ‘Zero Width Joiner’, normally used to separate emojis. Pasting the actual comment or looking at its source, you can see that this character precedes each character that makes the path of the bit.ly URL
security  malware  russia  turla  zwj  unicode  characters  social-media  instagram  command-and-control 
june 2017 by jm
A Programmer’s Introduction to Unicode – Nathan Reed’s coding blog
Fascinating Unicode details -- a lot of which were new to me. Love the heat map of usage in Wikipedia:
One more interesting way to visualize the codespace is to look at the distribution of usage—in other words, how often each code point is actually used in real-world texts. Below is a heat map of planes 0–2 based on a large sample of text from Wikipedia and Twitter (all languages). Frequency increases from black (never seen) through red and yellow to white.

You can see that the vast majority of this text sample lies in the BMP, with only scattered usage of code points from planes 1–2. The biggest exception is emoji, which show up here as the several bright squares in the bottom row of plane 1.
unicode  coding  character-sets  wikipedia  bmp  emoji  twitter  languages  characters  heat-maps  dataviz 
march 2017 by jm
Dark corners of Unicode
I’m assuming, if you are on the Internet and reading kind of a nerdy blog, that you know what Unicode is. At the very least, you have a very general understanding of it — maybe “it’s what gives us emoji”.

That’s about as far as most people’s understanding extends, in my experience, even among programmers. And that’s a tragedy, because Unicode has a lot of… ah, depth to it. Not to say that Unicode is a terrible disaster — more that human language is a terrible disaster, and anything with the lofty goals of representing all of it is going to have some wrinkles.

So here is a collection of curiosities I’ve encountered in dealing with Unicode that you generally only find out about through experience. Enjoy.
unicode  characters  encoding  emoji  utf-8  utf-16  utf  mysql  text 
september 2015 by jm
Shapecatcher: Draw the Unicode character you want!
'This is a tool to help you find Unicode characters. Finding a specific character whose name you don't know is cumbersome. On shapecatcher.com, all you need to know is the shape of the character!' Handy.
shapes  drawing  unicode  characters  language  recognition  web 
may 2014 by jm
Accentuate.us
'We are proud to announce the free and open-source Accentuate.us, a new method of input for over 100 languages that uses statistical reasoning so that users can type effortlessly in plain ASCII while ultimately producing accurate text. This allows Vietnamese users, for example, to simply type “Moi nguoi deu co quyen tu do ngon luan va bay to quan diem,” which will be automatically corrected to “Mọi người đều có quyền tự do ngôn luận và bầy tỏ quan điểm” after Accentuation. To date, we support four clients: Mozilla Firefox, Perl, Python, and Vim, with more to be added shortly.' cool
accents  language  web-services  typing  text-entry  ascii  unicode  characters  from delicious
december 2010 by jm
Unicode 6.0 released
including PILE OF POO, at codepoint 1F4A9:
pile-of-poo  poo  unicode  funny  emoji  characters  from delicious
october 2010 by jm

Copy this bookmark:



description:


tags: