This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis
Unicode is such an interesting topic and it feels like I can discover new things every day. Today was one of these days. I was reading a blog post and came across the for me new u
flag. At the end I found myself reading Axel's chapter in "Exploring ES6" on that topic which as usual got everything covered.
So what's this u
flag?
In JavaScript we've got the "problem" that strings are represented in UTF-16 which means that not every character can be represented with a single code unit. This leads to weird length
properties of certain strings and it becomes tricky when you deal with surrogate pairs. This brings up the question if .
should match a code point that needs two code units?
This is exactly where the u
comes into play.
Let's have a look at an example:
const emoji = '\u{1F60A}'; // "smiling face with smiling eyes"
emoji.length // 2 -> it's a surrogate pair
/^.$/.test(emoji) // false
/^.$/u.test(emoji) // true
This mode also enables that you can use code point escape sequences in regular expression which can come in really handy because then you don't have to deal with the surrogate pairs.
const emoji = '\u{1F42A}'; // "camel"
/\u{1F42A}/.test(emoji); // false
/\uD83D\uDC2A/.test(camel); // true
/\u{1F42A}/u.test(emoji); // true
The u
mode can definitely can help to deal with Unicode in Regular Expressions and I can highly recommend to read Axel's chapter on this topic and of cource Mathias Bynens wrote also an article about that. Have fun!
Reply to Stefan
This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis

Stefan Judis | Sciencx (2017-07-22T22:00:00+00:00) There is a Unicode mode in JavaScript regular expressions (#tilPost). Retrieved from https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.