There is a Unicode mode in JavaScript regular expressions (#tilPost)

Unicode is such an interesting topic and it feels like I can discover new things every day. Today was one of these days. I was reading a blog post and came across the for me new u flag. At the end I found myself reading Axel’s chapt…


This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis

Unicode is such an interesting topic and it feels like I can discover new things every day. Today was one of these days. I was reading a blog post and came across the for me new u flag. At the end I found myself reading Axel's chapter in "Exploring ES6" on that topic which as usual got everything covered.

So what's this u flag?

In JavaScript we've got the "problem" that strings are represented in UTF-16 which means that not every character can be represented with a single code unit. This leads to weird length properties of certain strings and it becomes tricky when you deal with surrogate pairs. This brings up the question if . should match a code point that needs two code units?

This is exactly where the u comes into play.

Let's have a look at an example:

const emoji = '\u{1F60A}'; // "smiling face with smiling eyes"
emoji.length               // 2 -> it's a surrogate pair
/^.$/.test(emoji)          // false
/^.$/u.test(emoji)         // true

This mode also enables that you can use code point escape sequences in regular expression which can come in really handy because then you don't have to deal with the surrogate pairs.

const emoji = '\u{1F42A}';  // "camel"
/\u{1F42A}/.test(emoji);    // false
/\uD83D\uDC2A/.test(camel); // true
/\u{1F42A}/u.test(emoji);   // true

The u mode can definitely can help to deal with Unicode in Regular Expressions and I can highly recommend to read Axel's chapter on this topic and of cource Mathias Bynens wrote also an article about that. Have fun!


Reply to Stefan


This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis


Print Share Comment Cite Upload Translate
APA
Stefan Judis | Sciencx (2022-11-29T13:54:58+00:00) » There is a Unicode mode in JavaScript regular expressions (#tilPost). Retrieved from https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/.
MLA
" » There is a Unicode mode in JavaScript regular expressions (#tilPost)." Stefan Judis | Sciencx - Saturday July 22, 2017, https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/
HARVARD
Stefan Judis | Sciencx Saturday July 22, 2017 » There is a Unicode mode in JavaScript regular expressions (#tilPost)., viewed 2022-11-29T13:54:58+00:00,<https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/>
VANCOUVER
Stefan Judis | Sciencx - » There is a Unicode mode in JavaScript regular expressions (#tilPost). [Internet]. [Accessed 2022-11-29T13:54:58+00:00]. Available from: https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/
CHICAGO
" » There is a Unicode mode in JavaScript regular expressions (#tilPost)." Stefan Judis | Sciencx - Accessed 2022-11-29T13:54:58+00:00. https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/
IEEE
" » There is a Unicode mode in JavaScript regular expressions (#tilPost)." Stefan Judis | Sciencx [Online]. Available: https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/. [Accessed: 2022-11-29T13:54:58+00:00]
rf:citation
» There is a Unicode mode in JavaScript regular expressions (#tilPost) | Stefan Judis | Sciencx | https://www.scien.cx/2017/07/22/there-is-a-unicode-mode-in-javascript-regular-expressions-tilpost/ | 2022-11-29T13:54:58+00:00
https://github.com/addpipe/simple-recorderjs-demo