re findall() with overlapping == true
Here’s a question that asks for 0+ between 1
https://leetcode.com/problems/binary-gap/
If you use re.findall()
, you’ll get an error
binary = 11111111010011110100
result = re.findall(r"1(0+)1", binary)
the result is ['0', '0']
If you use your own function, you’ll get it right
def findall_overlapped(r, s):
res = [] # Resulting list
reg = r'^{}$'.format(r) # Regex must match full string
for q in range(len(s)): # Iterate over all chars in a string
for w in range(q,len(s)): # Iterate over the rest of the chars to the right
cur = s[q:w+1] # Currently tested slice
if re.match(reg, cur): # If there is a full slice match
res.append(cur) # Append it to the resulting list
return res
rex = r'1(0+)1'
print(findall_overlapped(rex, binary))
the result is ['101', '1001', '101']
If you use c++ with symbols of ?=
, you can also get it right
string input_seq = "11111111010011110100";
std::regex re("1(?=(0+)1)"); // <-- working
//std::regex re("1(0+)1"); // <-- not work
std::sregex_iterator next(input_seq.begin(), input_seq.end(), re);
std::sregex_iterator end;
while (next != end)
{
std::smatch match = *next;
std::cout << match.str(1) << "\t"
<< "\t" << match.position() << "\t"
<< "\n"; // <-- SEE HERE
next++;
}
the result is ['0', '00', '00']
The explanation
According to the python document, is:(?=...)
Matches if ...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion
.
For example, Isaac (?=Asimov)
will match 'Isaac '
only if it’s followed by 'Asimov'
.