Friday, July 23, 2010

A regular expression to check for prime numbers

Regular expressions are extremely powerful. This is something I read at least once or twice every day while reading articles and blogs on the Web.
While browsing today, I found this page which thoroughly describes the use of the regular expression /^1?$|^(11+?)\1+$/ in Perl to check if a number is prime or not!!!
To be frank, I was skeptical. The regular expression looks like magic! And I wanted to understand it better. I rewrote it in Ruby using irb and I tested it:
Avinash@noulakaz:~$ irb
irb(main):001:0> def is_prime(n)
irb(main):002:1> ("1" * n) !~ /^1?$|^(11+?)\1+$/
irb(main):003:1> end
=> nil
irb(main):004:0> is_prime(10)
=> false
irb(main):005:0> is_prime(11)
=> true
irb(main):006:0> is_prime(12)
=> false
irb(main):007:0> is_prime(13)
=> true
irb(main):008:0> is_prime(99)
=> false
irb(main):009:0> is_prime(100)
=> false
irb(main):010:0> is_prime(101)
=> true
Great! It also works in Ruby! This means that there is no (Perl) magic going on. The regular expression really works. But how? Let’s try to follow the logic behind it.
Is 7 prime?
To know this, the function first generates “1111111″ (from “1″ * 7) and tries to see if that string does not match /^1?$|^(11+?)\1+$/. If there is no match, then the number is prime.
Notice that the regular expression has two parts (separated with the vertical bar |).
The first part is /^1?$/ is trivial and matches with beginning of line (^), an optional 1 (1?)
and end of line ($) which implies that it matches either the empty string or “1″. This simply indicates that calling that function with n==0 or n==1 will correctly return false (as the “1″ * n will match with the first part of the regular expression)
The second part is where the magic occurs…
/^(11+?)\1+$/ matches with beginning of line (^) then by (11+?) then by \1+ and finally by end of line ($). I guess that you know that \1 is a variable which is bound to whatever was matched previously (in our case by (11+?)).
Let’s proceed slowly…
(11+?) does two things
  1. It matches with “1″ followed by one or more ones minimally. This means that it matches with “11″ initially (notice that if there was no ? (i.e. (11+) was used instead, the whole string would have matched)
  2. The string obtained (“11″ initially) is bound to the variable \1.
\1+ then matches with whatever has been matched above (“11″ initially) repeated one or more times. If this match succeeds then the number is not prime.
If you are following, you’ll realise that this eliminates all even numbers except 2 (for example, 8 is “11111111″ and therefore (11+?) will match with “11″ and \1+ will match with “111111″).
As for odd numbers (in our case 7), the (11+?) matches with “11″ initially but \1+$ cannot be true (notice the $) as there are 5 remaining ones. The regular expression engine will backtrack and will make (11+?) match “111″ and here also \1+$ won’t be true because there will be 4 remaining ones (and \1+$ will only match with a number of ones which is a multiple of 3 followed by end of line) etc. hence “1111111″ will not match the regular expression which implies that 7 will be considered as being prime :-)
When I showed this to Christina this morning (true), she told me that this only checked for a number being odd or not. This is also what I felt at the beginning. But it really works. For instance, let’s try to apply it to 9 (which is obviously not even), “1″ * 9 should match the regular expression…
“1″ * 9 = “111111111″. (11+?) matches “11″ initially. \1+$ cannot match because there are 7 remaining ones. Backtracking occurs. (11+?) now matches “111″. And here \1+$ matches the remaining 6 remaining ones! Hence, 9 is not prime.
Easy… and beautiful at the same time ;-)

No comments: