Playing with Python Magic Methods to make a nicer Regex API
A co-worker of mine mentioned that he missed Ruby's syntactic sugar for regular expressions. I haven't used Ruby's regular expressions, but I'm familiar enough with Python's to know that the API is a bit wanting in syntactic sweetness.
First, retrieving capture groups from a regular expression requires two
steps. In the first step you need to call either
and assign the result to a variable. Then, you need to check whether the
result is not
None (indicating no match was found). Finally, if a match
exists, you can safely extract the captured groups. Here is an example:
>>> import re >>> match_obj = re.match('([0-9]+)', '123foo') >>> match_obj # What is `match_obj`? <_sre.SRE_Match object at 0x7fd1bb000828> >>> match_obj.groups() ('123',) >>> match_obj = re.match('([0-9]+)', 'abc') >>> match_obj None
It would be nicer, in my opinion, to have something like:
>>> re.get_matches('([0-9]+)', '123foo') ('123',) >>> re.get_matches('([0-9]+)', 'abc') None
The other thing I frequently run into is mixing up the parameters for
which performs find-and-replace. The required parameters, in order, are
search_string. For whatever reason, it seems more intuitive to me to
search_string come before replacement.
Unfortunately, mangling these parameters can lead to "correct-looking" results.
Here is an example. The goal here will be to replace the word
foo with the
>>> re.sub('foo', 'replace foo with bar', 'bar') 'bar' >>> re.sub('foo', 'bar', 'replace foo with bar') 'replace bar with bar'
In the first example, we might presume that the input string was just
For fun, I put together a little helper class that adds some syntactic sweetness to python's regular expression library. I don't really suggest that anyone should use this, but it was fun to make and maybe it will give you some ideas on how you might improve the syntax of other libraries.
Before I show you the implementation, here are some examples of the API I devised.
Searching for matches is a single-step operation:
>>> def has_lower(s): ... return bool(R/'[a-z]+'/s) >>> has_lower('This contains lower-case') True >>> has_lower('NO LOWER-CASE HERE!') False
Retrieving capture-groups is also easy:
>>> list(R/'([0-9]+)'/'extract 12 the 456 numbers') ['12', '456']
Finally, you can use the division operator one more time to perform replacements:
>>> R/'(foo|bar)'/'replace foo and bar'/'Huey!' 'replace Huey! and Huey!'
What do you think? More fun?
The implementation is pretty straightforward and relies on Python's magic methods to provide the API. If there's a neat trick, it is the use of metaclasses to implement what is essentially a classmethod operator overload.
import re class _R(type): def __div__(self, regex): return R(regex) class R(object): __metaclass__ = _R def __init__(self, regex): self._regex = re.compile(regex) def __div__(self, s): return RegexOperation(self._regex, s) class RegexOperation(object): def __init__(self, regex, search): self._regex = regex self._search = search def search(self): match = self._regex.search(self._search) if match is not None: return match.groups() def __len__(self): return self._regex.search(self._search) is not None def __div__(self, replacement): return self._regex.sub(replacement, self._search) def __iter__(self): return iter(self._regex.findall(self._search))
Stepping through the operations one-by-one, hopefully it will clarify what is going on behind-the-scenes.
R / <something> will invoke the
__div__ method on the
which is basically a factory method for creating
>>> R/'foo' <rx.R at 0x7f77c00831d0>
__div__ on the newly-created
R object, we get a
R.__div__ is another factory method.
>>> r_obj = R/'foo' >>> r_obj / 'bar' <rx.RegexOperation at 0x7f77c00837d0>
The final object,
RegexOperation, implements several magic methods which allow
us to retrieve matches, perform substitions, and test for the existence of a match.
Thanks for reading
Thanks for taking the time to read this post, I hope you found this interesting! Feel free to leave a comment below.
Commenting has been closed, but please feel free to contact me