Regular expressions are extremely powerful. This is something I read at least once or twice every day while reading articles and blogs on the Web.
While browsing today, I found this page which thoroughly describes the use of the regular expression /^1?$|^(11+?)\1+$/ in Perl to check if a number is prime or not!!!
To be frank, I was skeptical. The regular expression looks like magic! And I wanted to understand it better. I rewrote it in Ruby using irb and I tested it:
Great! It also works in Ruby! This means that there is no (Perl) magic going on. The regular expression really works. But how? Let’s try to follow the logic behind it.
irb(main):001:0> def is_prime(n)
irb(main):002:1> ("1" * n) !~ /^1?$|^(11+?)\1+$/
To know this, the function first generates “1111111″ (from “1″ * 7) and tries to see if that string does not match /^1?$|^(11+?)\1+$/. If there is no match, then the number is prime.
Notice that the regular expression has two parts (separated with the vertical bar |).
The first part is /^1?$/ is trivial and matches with beginning of line (^), an optional 1 (1?)
and end of line ($) which implies that it matches either the empty string or “1″. This simply indicates that calling that function with n==0 or n==1 will correctly return false (as the “1″ * n will match with the first part of the regular expression)
The second part is where the magic occurs…
/^(11+?)\1+$/ matches with beginning of line (^) then by (11+?) then by \1+ and finally by end of line ($). I guess that you know that \1 is a variable which is bound to whatever was matched previously (in our case by (11+?)).
Let’s proceed slowly…
(11+?) does two things
- It matches with “1″ followed by one or more ones minimally. This means that it matches with “11″ initially (notice that if there was no ? (i.e. (11+) was used instead, the whole string would have matched)
- The string obtained (“11″ initially) is bound to the variable \1.
If you are following, you’ll realise that this eliminates all even numbers except 2 (for example, 8 is “11111111″ and therefore (11+?) will match with “11″ and \1+ will match with “111111″).
As for odd numbers (in our case 7), the (11+?) matches with “11″ initially but \1+$ cannot be true (notice the $) as there are 5 remaining ones. The regular expression engine will backtrack and will make (11+?) match “111″ and here also \1+$ won’t be true because there will be 4 remaining ones (and \1+$ will only match with a number of ones which is a multiple of 3 followed by end of line) etc. hence “1111111″ will not match the regular expression which implies that 7 will be considered as being prime
When I showed this to Christina this morning (true), she told me that this only checked for a number being odd or not. This is also what I felt at the beginning. But it really works. For instance, let’s try to apply it to 9 (which is obviously not even), “1″ * 9 should match the regular expression…
“1″ * 9 = “111111111″. (11+?) matches “11″ initially. \1+$ cannot match because there are 7 remaining ones. Backtracking occurs. (11+?) now matches “111″. And here \1+$ matches the remaining 6 remaining ones! Hence, 9 is not prime.
Easy… and beautiful at the same time