Regex for multiples of an integer
This article provides a method to construct a regular expression that matches multiples of 3. Here we generalize it and give a method to construct a regular expression that matches any integer \(n\) in radix \(r\) such that \(n≡k\,mod\,m\).
The outline of this method is the same as the original one. First we construct a DFA, then we convert this DFA into a regular expression.
-
Construct a DFA with states \(0,1, \cdots ,m−1\).
-
Calculate transitions out of state \(s\) as follows:
-
Originally we are in state \(s\)
-
We read a digit \(d\)
-
Now we become \((s∗r+d)\)
-
Therefore, there should be a transition from \(s\) to \((s∗r+d)\) with symbol \(d\)
-
-
After all such transitions have been completed, we convert this DFA to a regular expression using state elimination.
The sample code is as follows:
#!/usr/bin/python2
'''Print a regular expression which matches numbers in radix `radix` such that, when modulo
`modulo`, give remainder `remainder`.'''
import re, sys
################################################################
# Global configuration.
################################################################
# (...)_{radix} % {modulo} = {remainder}
radix = 10
modulo = 3
remainder = 0
if len(sys.argv) >= 2:
radix = int(sys.argv[1])
if len(sys.argv) >= 3:
modulo = int(sys.argv[2])
if len(sys.argv) >= 4:
remainder = int(sys.argv[3])
################################################################
# Don't modify code below!
################################################################
# Constants.
digit = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F']
value = range(radix)
# Regex of transitions.
transition = [[None for i in range(modulo)] for i in range(modulo)]
# State elimination function.
def eliminate(i):
for p in range(modulo):
if p == i or transition[p][i] is None:
continue
for n in range(modulo):
if n == i or transition[i][n] is None:
continue
transition[p][n] = '({e_pi}{e_ii}{e_in}{e_orig})'.format(
e_pi = transition[p][i],
e_ii = '' if transition[i][i] is None else transition[i][i] + '*',
e_in = transition[i][n],
e_orig = '' if transition[p][n] is None else '|' + transition[p][n])
# Compute transitions.
for s in range(modulo):
for e in range(radix):
d = (s * radix + e) % modulo
if transition[s][d] is None:
transition[s][d] = digit[e]
else:
transition[s][d] += digit[e]
for s in range(modulo):
for d in range(modulo):
transition[s][d] = '[' + transition[s][d] + ']'
# Eliminate states.
start = 0
accept = remainder
result = ''
if start == remainder:
for i in range(1, modulo):
eliminate(i)
r = transition[start][start]
result = '' if r is None else '({R}*)'.format(R = r)
else:
for i in range(1, modulo):
if i == accept:
continue
eliminate(i)
r = transition[start][start]
s = transition[start][accept]
t = transition[accept][start]
u = transition[accept][accept]
if s is None:
result = ''
elif t is None:
result = '({R}{S}{U})'.format(
R = '' if r is None else r + '*',
S = s,
U = '' if u is None else u + '*')
else:
result = '(({R}{S}{U}{T})*{S}{U})'.format(
R = '' if r is None else r + '|',
S = s,
T = t,
U = '' if u is None else u + '*')
print result