My latest article for Wired is about how a new dataset on the popularity of emoji reveals a problem with Unicode's approval process, along with a way to fix it. I also made this graph showing which categories of emoji are more and less popular, which I’m very excited about.
Usate l'emoji del dito medio (🖕) ogni qualvolta sia necessario, senza timore alcuno. Perché altrimenti, se si accorgono che viene usato poco, quelli dell'Unicode Consortium lo tolgono.
The takeout box and the fortune cookie are perceived as emblems of Chinese culture, when they’re actually central to the American experience of it.
“I never saw any fortune cookie in my life until I was a teenager,” said Yiying Lu, a San Francisco-based artist who was born in Shanghai. Lu encountered her first fortune cookie when she left China and moved to Sydney, Australia.
Now, the fortune cookie she designed for the Unicode Consortium will be one of dozens of new emoji that are part of a June update. Lu also created the new emoji depicting a takeout box, chopsticks, and a dumpling.
The irony, she says, is that two of the four new Chinese-themed emoji—the fortune cookie and the takeout box—are not Chinese Chinese, but instead reflect Westernized elements of Chinese culture. “It’s kind of like Häagen-Dazs,” Lu told me. “People think its Scandinavian just because of the two dots in the name, but it’s American. It’s the same thing with the takeout box. The Chinese takeout box is completely invented in the West. And the fortune cookie was invented by a Japanese person, but it was popularized in America.”
[...]
“The people who fight the hardest for certain emoji are usually trying to fight for representation for themselves in some way,” Lee told me. “Most linguists say emoji are not currently a language—they’re paralinguistic, the equivalent of hand gestures or voice tone. But for people who use them, it’s almost like fighting for a word that [shows] you exist. When you come up with a word to describe your population, it’s a very powerful thing.”
In 1978 Japan's Ministry of Economy, Trade and Industry established the encoding that would later be known as JIS X 0208, which still serves as an important reference for all Japanese encodings. However, after the JIS standard was released people noticed something strange - several of the added characters had no obvious sources, and nobody could tell what they meant or how they should be pronounced. Nobody was sure where they came from. These are what came to be known as the ghost characters (幽霊文字).
An interesting article about Japanese and Unicode. Excerpt:
In 1978 Japan's Ministry of Economy, Trade and Industry established the encoding that would later be known as JIS X 0208, which still serves as an important reference for all Japanese encodings. However, after the JIS standard was released people noticed something strange - several of the added characters had no obvious sources, and nobody could tell what they meant or how they should be pronounced. Nobody was sure where they came from. These are what came to be known as the ghost characters (幽霊文字). [...]
By interviewing the catalogers involved in the creation of the standard, the investigators established that some characters were inadvertently invented as mistakes in the cataloging process. For example, 妛 was an error introduced while trying to record "山 over 女". "山 over 女" occurs in the name of a particular place and was thus suitable for inclusion in the JIS standard, but because they couldn't print it as one character yet, 山 and 女 were printed separately, cut out, and pasted onto a sheet of paper, and then copied. When reading the copy, the line where the two little pieces of paper met looked like a stroke and was added to the character by mistake. The original character (𡚴) was not added to JIS or Unicode until much later and doesn't display on most sites for me.
Mark Davis, the president of the Unicode Consortium and chair of the Emoji Subcommittee, has found the perfect April 1st tweet. (And yes, Egyptian hieroglyphs are truly in Unicode.)
The standards authority for character encoding is the Unicode Consortium. Character encoding is necessary because computers only deal with numbers. Letters and other characters are stored by assigning a unique number for each of them. This includes emojis.
Unicode standards updated (too?) often
Unicode keeps removing or replacing emojis! The most current version of the standard is Unicode 16.0.0 (September 2024). To get a sense of how often the universe of available emojis can change, the Unicode standard was on version 9 as of April 2016. Seven major standards updates in eight years is a lot.
Unicode could keep emojis unchanged even though they updated other parts of the standard. There is a lot more that the Unicode Consortium does, besides maintaining emojis. A major source of annoyance to Unicode consortium members is that their work is often assumed to only be about emojis.
Loss of my favorite emojis
Many are gone now that I really liked! I miss the nuclear energy cooling tower and the princess crown and lots of others.
In this post, I'm trying to preserve some of my still extant favs. Even after Unicode deprecates an emoji, an existing use of it can be copy and pasted, and will continue to display correctly.
Many emoji don't render well on tumblr
These came out mostly okay although small. And some did lose their color.
🎼🏭🏗👑☢☣📠📟💳🏺⚗🚜
Can't say the same about these two!
🛞🕍
I'm not sure why some resolve as rectangles here but not on other websites or apps, given that I am using the same browser and operating system.
🏜🕴️♀️🌮🍮🌪💨🐧🐷🐏🐑🐖🐝🦨🦡🐗🦓🐞🐙
💔💕💞💓💗💖💘💝❤️🔥❤️🩹
👀👁🦷🤮🤢🥵😢😵💫😬
🫂
🗣🤤😔🥰
Also not sure what happened here!
Why did they grow so big?!
🫡
🥱🥺🙄
🫤
....then return to the normal size?
🤫🤭😤🌠📡💌📐📏✡🔯🕎⚛‼⁉❗❕❓🌀♾🎵🎶💬💭
These poor guys, the four suites of a deck of cards lost all their color and shrunk down to a even tinier size.
♠♣♥♦
EDIT
Looks like the size resolved itself to some extent. Maybe it was due to the tumblr editor? idk