UTF-16 Format for JavaScript

Converting Hexadecimal to UTF-16 Format

Let's imagine you're trying to insert the 🖤 symbol.

The code place for this symbol is 128420 (decimal) or 1F5A4 (hexadecimal). This means that the Hex Code for this symbol is 🖤 (This would give you the symbol in HTML, but you need something that will work in JavaScript.)

The key point is that the code place for 🖤 is higher than 0x10000. This means you cannot use the normal JavaScript format u\XXXX (i.e., with four hexadecimal characters) because 1F5A4 has five characters.

To use this character in JavaScript, you must convert it to a format that looks like this u\XXXXu\XXXX. Let's call it the UTF-16 format.

Characters in the Basic Multilingual Plane (BMP)

The most common symbols we use are usually numbered between 0x0000 and 0xFFFF, i.e., they're below 0x10000. (NB: The "0x" prefix tells JavaScript it's a hexadecimal number.)

The characters in this range are described as being in the "Basic Multilingual Plane" (BMP). Characters with a code place above 0x10000 are best described as "non-BMP characters." And, we want one of those!

How To Covert to the UTF-16 Format

The first thing to know is that the first four-character hexadecimal number is called the "high surrogate." The second one of the pairing is called the "low surrogate." Together, they are called a "surrogate pair," and they represent one character.

Here's the formula for converting a code place above 0x10000 for JavaScript:
Conversion Method
let codePlace = 0x1F49C;
let highSurrogate = Math.floor((codePlace - 0x10000) / 0x400) + 0xD800;
let lowSurrogate = (codePlace - 0x10000) % 0x400 + 0xDC00;
Obviously, JavaScript can handle hexadecimal numbers when presented in the format 0x****, but the variables highSurrogate and lowSurrogate are now decimals.
Get Hexadecimal String
let stringOne = highSurrogate.toString(16);
let stringTwo = lowSurrogate.toString(16);
To get the format perfect, do something like this:
Creating the Surrogate Pair
let surrogatePair = "\\u"+ stringOne + "\\u" + stringTwo;
So, to conclude, the non-BMP character 01F5A4 is represented by the surrogate pair "\uD83D\uDDA4". This can be used in JavaScript. For example:
JavaScript TextOutput
let str = "\uD83D\uDDA4"
document.write("My symbol: " + str)
My symbol: 🖤

UTF-16 Calculator

Use this calculator to get the UTF-16 format for your character.

For Completeness

And, just for completeness, if you need to convert from a hexadecimal string to a number:
Get Numbers from the Hexadecimal String (if needed)
myFirstNumber = parseInt(stringOne, 16);
mySecondNumber = parseInt(stringTwo, 16);

Help Us Improve Cyber Definitions

  • Do you disagree with something on this page?
  • Did you spot a typo?
  • Do you know a slang term that we've missed?
Please tell us using this form.