EN VI

Why is my regex to match words in PHP not working as expected? Ranged quantifiers on negated lookaheads are disqualifying unintended input strings?

2024-03-12 07:00:05
Why is my regex to match words in PHP not working as expected? Ranged quantifiers on negated lookaheads are disqualifying unintended input strings

I am using PHP 7.4 as well as PHP 8.2 and I have a regex that I use in PHP to match words (names). To be completely honest: I barely recognize this regex monster I created. Thus this question asking for assistance in figuring it out. It is basically this:

$is_word = preg_match('/^(?![aeiou]{3,})(?:\D(?![^aeiou]{4,}[aeiou]*)(?![aeiou]{4,})){3,}$/i', $name);

I’ve been using it for about 6+ years to match names in a script I have created: It will basically return a boolean of TRUE or FALSE of it matches a word pattern.

But today it messed up on matching two names:

  • Li
  • Drantch

To test this out, you can run use the following batch of test names; using pseudo names for example sake:

  • Nartinez
  • Drantch
  • Dratch
  • Xtmnprwq
  • Yelendez
  • Boldberg
  • Yelenovich
  • Allash
  • Mohamed
  • Li

I attempted to adjust the regex to set the second {x,x} to {5,}

$is_word = preg_match('/^(?![aeiou]{3,})(?:\D(?![^aeiou]{5,}[aeiou]*)(?![aeiou]{4,})){3,}$/i', $name);

It helps in cases like that to match names like like “Drantch” but then it still completely misses two letter names like “Li.”

How can this regex be tweaked to properly match all names? If not all names, how can it be adjusted to properly match “Drantch” and other obvious names other that “Li.”

Note that, “Xtmnprwq” is a fake test name so I can test negatives as well as positives.

Solution:

Your regexp has the following constraints on words:

  • ^(?![aeiou]{3,}) - Can't begin with 3 or more consecutive vowels
  • (?![^aeiou]{4,} - Can't have 4 or more consecutive consonants in the middle
  • (?![aeiou]{4,}) - Can't have 4 or more consecutive vowels in the middle
  • {3,} - Must be at least 3 characters long

Li violates the 3 characters requirement.

Drantch violates the 4 consecutive consonants restriction.

Tweak or remove these bits of the regexp to changes the restrictions to allow these names.

Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login