This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis
Regular expressions are a challenge by themselves. For me it always takes a few minutes until I understand what a particular regular expression does but there is no question about their usefulness.
Today, I just had my Sunday morning coffee and worked myself through the slide deck "What's new in ES2018" by Benedikt Meurer and Mathias Bynens.
There is so much useful information in these slides, and besides new language features like async iterations, object spread properties and named capture groups in regular expressions (?) it also covers lookaheads (and the upcoming lookbehinds) in regular expressions.
Now and then lookaheads in JavaScript regular expressions cross my way, and I have to admit that I never had to use them but now the counter part lookbehinds are going to be in the language, too, so I decided to read some documentation and finally learn what these lookaheads are.
Lookaheads in JavaScript
Section titled Lookaheads in JavaScriptWith lookaheads, you can define patterns that only match when they're followed or not followed by another pattern.
The MDN article about regular expressions describes two different types of lookaheads in regular expressions.
Positive and negative lookaheads:
x(?=y)
– positive lookahead (matches 'x' when it's followed by 'y')x(?!y)
– negative lookahead (matches 'x' when it's not followed by 'y')
Captured groups in JavaScript – the similar looking companions
Section titled Captured groups in JavaScript – the similar looking companionsOh well... x(?=y)
– that's a tricky syntax if you ask me. The thing that confused me initially is that I usually use ()
for captured groups in JavaScript expressions.
Let's look at an example for a captured group:
const regex = /\w+\s(\w+)\s\w+/;
regex.exec('eins zwei drei');
// ['eins zwei drei', 'zwei']
// /\
// ||
// captured group
// defined with
// (\w+)
What you see above is a regular expression that captures a word (zwei
in this case) that is surrounded by one space and another word.
Lookaheads are not like captured groups
Section titled Lookaheads are not like captured groupsSo let's look at a typical example that you'll find when you read about lookaheads in JavaScript regular expressions.
const regex = /Max(?= Mustermann)/
regex.exec('Max Mustermann')
// ['Max']
regex.exec('Max Müller')
// null
This example matches Max
whenever it is followed by a space and Mustermann
otherwise it's not matching and returns null
. The interesting part for me is that it only matches Max
and not the pattern that is defined in the lookahead. Which seems to be a weird after working with regular expressions for a while but when you think of it, that's the point of lookaheads.
The "Max Mustermann" example is not useful in my opinion so let's dive into positive and negative lookaheads with a real-world use case.
Positive lookahead
Let's assume you have a long string of Markdown that includes a list of people and their food preferences. How would you figure out which people are vegan when everything's just a long string?
const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
// |----| |-----------|
// / \
// more than one \
// word character positive lookahead
// but as few as => followed by "(vegan)"
// possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Billa
// Fred
Let's have a quick look at the regular expression and try to phrase it in words.
const regex = /-\s(\w+?)\s(?=\(vegan\))/g;
Alright... let's do this!
Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space and the pattern "(vegan)"
Negative/negating lookaheads
On the other hand, how would you figure out who is not vegan?
const people = `
- Bob (vegetarian)
- Billa (vegan)
- Francis
- Elli (vegetarian)
- Fred (vegan)
`;
const regex = /-\s(\w+)\s(?!\(vegan\))/g
// |---| |-----------|
// / \
// more than one \
// word character negative lookahead
// but as few as => not followed by "(vegan)"
// possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Bob
// Francis
// Elli
Let's have a quick look at the regular expression and try to phrase it in words, too.
const regex = /-\s(\w+)\s(?!\(vegan\))/g
Match any dash followed by one space character followed by more one or more but as few as possible word characters (A-Za-z0-9_) followed by a space character (which includes line breaks) not followed by the pattern "(vegan)"
lookaheads will have company from lookbehinds soon
Section titled lookaheads will have company from lookbehinds soonLookbehinds will work the same way but for patterns before the matching pattern (lookaheads consider the patters after the matching part) and are already supported in Chrome today. They will also be available as positive lookbehind x(?<=y)
and the negative lookbehind x(?<!y)
.
When we flip the strings in the example around it still works the same way using lookbehinds then. :)
const people = `
- (vegetarian) Bob
- (vegan) Billa
- Francis
- (vegetarian) Elli
- (vegan) Fred
`;
const regex = /(?<=\(vegan\))\s(\w+)/g
// |------------| |---|
// / \__
// positive lookbehind \
// => following "(vegan)" more than one
// word character
// but as few as possible
let result = regex.exec(people);
while(result) {
console.log(result[1]);
result = regex.exec(people);
}
// Result:
// Billa
// Fred
Side note: I usually recommend RegExr for the fiddling with regular expressions but lookbehinds are not supported yet.
If you're interested in more cutting edge features have a look at Mathias' and Benedikt's slides on new features coming to JavaScript there is way more exciting stuff to come.
Another side note: If you're developing the browser make sure to check the support of lookbehinds first. At the time of writing they're not supported in Firefox.
To remember the syntax for lookahead and lookbehinds I created a quick cheat sheet about it.
Reply to Stefan
This content originally appeared on Stefan Judis Web Development and was authored by Stefan Judis
Stefan Judis | Sciencx (2018-03-17T23:00:00+00:00) lookaheads (and lookbehinds) in JavaScript regular expressions (#tilPost). Retrieved from https://www.scien.cx/2018/03/17/lookaheads-and-lookbehinds-in-javascript-regular-expressions-tilpost/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.