SixFoisNeuf

Totally irregular blog on computers and security


Text encoding playground

This small application allows you to see how text encoding and decoding look like, and how it can lead to Mojibake when the two parties don’t agree on an encoding.

Technical details on the app

This web app uses the Javascript TextEncoder and TextDecoder APIs to encode text using UTF-8, and decode bytes using a variety of encodings. Not being able to encode text using any other encoding is a limitation of the API itself.

The Javascript function IsTextUnicode reimplements the Win32 API function IsTextUnicode, most interestingly with its well-known bush bug. The function was reimplemented through reverse-engineering of the original function from binary sources. No Windows source code was used.