Pages

Roman Numbers

Write a function that checks if a given string represents a Roman number.

We can keep simple the solution to this problem using Regular Expressions. Using the python re library we just have to wrap a call to its search function, passing to it the input string and an appropriate pattern. If it does not find it, it returns None.
re.search(pattern, data)
The problem is identify the right pattern. An hint, some unit test would help. Another hint, you could have a look at chapter 5 section 3 of Dive into Python 3 that uses this example as introduction to pattern matching in Python.

Assuming that the candidate Roman number should start at the beginning of the passed string, and end with it, here is a solution:
pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'
First character, a caret ('^'), acts as an anchor. The matching starts at the beginning of the passed string.
Then we have a M, saying that we expect this character in the passed string, just at its beginning (as stated by the caret before). How many of them? It is specified by the following curly brackets. Zero or more, up to three.
The following stuff is in round brackets. We have three of them. All of them are optional. And then we have a dollar sign, saying that the input string should end there. Nothing not specified in the pattern should follow, otherwise there would be no recognition.

The three optional blocks are very similar. Let's see the first one.
(CM|CD|D?C{0,3})
It says that we are expecting one of three possible alternatives. The first two are just two-character string, 'CM' or 'CD'. The third one is more interesting.

It says we are expecting a 'D'. But, wait, it is followed by a question mark, meaning that it is optional. We can have 0 or 1 of them. We could get the same result writing
D{0,1}  # same as D?
Usually the more compact 'D?' is preferred.

And finally we have zero or more, up to three, 'C'.

I pushed the python unit test I have written when playing with this regular expression on GitHub.

No comments:

Post a Comment