EN VI

Python: bug in re (regex)? Or please help me understand what I'm missing?

2024-03-16 11:00:04
Python: bug in re (regex)? Or please help me understand what I'm missing

Python 3.9.18. If the basic stuff below isn't a bug in re then how come I'm getting different results with what's supposed to be equivalent code (NOTE: I am not looking for alternative ways to achieve the expected result, I already have plenty such alternatives):

import re
s = '{"merge":"true","from_cache":"true","html":"true","links":"false"}'
re.sub(r'"(true|false)"', r'\1', s, re.I)
'{"merge":true,"from_cache":true,"html":"true","links":"false"}'

^^^ note how only the 1st and 2nd "true" were replaced, but the 3rd and 4rd are still showing quotes " around them.

Whereas the following, which is supposed to be equivalent ((?i) instead of re.I), works as expected:

import re
s = '{"merge":"true","from_cache":"true","html":"true","links":"false"}'
re.sub(r'(?i)"(true|false)"', r'\1', s)
'{"merge":true,"from_cache":true,"html":true,"links":false}'

^^^ all instances of "true" and "false" were replaced.

Solution:

The function re.sub() has the following signature:

Signature: re.sub(pattern, repl, string, count=0, flags=0)

If you give re.I as the fourth argument, it will interpret it as the count argument.

When converted to an integer, re.I is equal to 2.

>>> print(int(re.I))
2

So supplying this flag in this way causes it to make only 2 replacements.

Instead, I suggest using a keyword arg.

re.sub(r'"(true|false)"', r'\1', s, flags=re.I)
Answer

Login


Forgot Your Password?

Create Account


Lost your password? Please enter your email address. You will receive a link to create a new password.

Reset Password

Back to login