That’s not to detract from the excellent post, just that this isn’t a mathematical trick that exploits some structure of primes but rather an incredibly clever way to write a computer program.
Also, strictly speaking, it's not a "regular expression" but a "regex", as backreferences make the language more powerful than regular.
https://en.wikipedia.org/wiki/Regular_expression#Patterns_fo...
And regex is short for regular expression, so you've essentially said nothing.
AUTHOR
Larry Wall <[email protected]>
Maintainer: Larry Wall <[email protected]>
Date: 4 Jun 2002
Last Modified: 18 May 2006
Number: 5
Version: 7
This is the Apocalypse on Pattern Matching, generally having to do with what we call "regular expressions", which are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).
Extending finite automata to efficiently match Perl-compatible regular expressions - http://dx.doi.org/10.1145/1544012.1544037
And somehow, some people like me are in computer science mode when reading such sentences, such reminders wakes us up: "Oh, ok, not actually regular, not such a big deal"
It would be literally incredible, because the pumping lemma shows that it's false.
Ah yes, I even used to teach this actually. Sorry for the understatement xD.
No worries! Mainly I was just pleased with myself for remembering the pumping lemma, and kind of wished that I knew how to contact my professor (from back in the mid-1990s, when an intro CS course was taught out of Sipser and involved almost no actual programming) and tell him that he did such a good job that it stuck with me some 30 years later!
The answer is, you don't, this regex doesn't describe a regular language
This specification describes an interoperable regular expression (abbreviated as "regexp") flavor, I-Regexp.
I-Regexp does not provide advanced regular expression features such as capture groups, lookahead, or backreferences. It supports only a Boolean matching capability, i.e., testing whether a given regular expression matches a given piece of text.
It wasn't until '97 when PCRE was released to mimic Perl's handling of regex and some time after that that GNU grep added -P as an option (BSD doesn't appear to support PCRE).While PCRE is a defacto standard (I been heard uttering "ugh, that only handles posix regex"), for most of the history of regex they were only as powerful as a NDFA.
Backref support was added to grep between 6th edition and 7th edition unix
6th edition grep manual: http://man.cat-v.org/unix-6th/1/grep
7th edition grep manual: http://man.cat-v.org/unix_7th/1/grep
Both of those refer to ed(1) for the syntax of regular expressions
6th edition ed manual: http://man.cat-v.org/unix-6th/1/ed
7th edition ed manual: http://man.cat-v.org/unix_7th/1/ed
POSIX EREs do not support backrefs. This goes back to the 1970s because egrep used a different regex matching algorithm to grep — egrep compiled the regex to a DFA which could not match backrefs, unlike grep’s nondeterministic algorithm — and egrep also had different syntax.
I think that this is definitely not true. There are lots of things where an "aha!" moment makes things appear conceptually much simpler after you've internalized the framework, but there are plenty of things in what I consider my area of expertise that are still hard even though I know very well how.
It was an important point in the design of Perl 6, now named Raku. Perl already had first class support for a powerful regex variant, in Raku, they went a step further to consider it for what they really are.
> In Raku, regexes are written in a domain-specific language, i.e. a sublanguage or slang. This page describes this language, and explains how regexes can be used to search for text patterns in strings in a process called pattern matching.
> Fundamentally, Raku regexes are very much like subroutines...
And if they're part of a grammar, they're essentially methods on a class. And a class that can be sub-classed. Or have roles mixed in, consisting of regexes.
No, I think the story is that it's an incredible thing to implement a prime test in a regex. It was a pretty neat thing 20+ years ago when I first saw it and I reckon it's still pretty neat.
The "JAPH" thing was a pretty cool thing too.
perl -e '$a = q 94a75737420616e6f74686572205065726c204861636b65720a9 and ${qq$\x5F$} = q 97265646f9 and s g..g; qq e\x63\x68\x72\x20\x30\x78$&eggee; {eval if $a =~ s e..eqq qprint chr 0x$& and \x71\x20\x71\x71qeexcess}'
It is kind of surprising to hear that regex can do that, but once you see the regex its kind of obvious. Its just checking if the number is 1, or if the number can be represented by 2 or more 1's repeated at least 2 times. Which is literally the definition of a prime (is the number divisible by a number >= 2)
/^(..+?)(?=\1+$)/
Edit: of course, silly me, +? is non-greedy.The regex is not totally trivial, but it's not super sophisticated either: conceptually 'just' a Sieve of Eratosthenes.
(?=(x(x*)).*(?=\1*$)\2+$)(?=(x\1)+(x?(x*)))(?=\4(x(x*?))\1+$)(?=.*(?=(?=\4*$)\4\5+$)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\9))(?=.*(?=(?=\4*$)(?=\6*$)(?=\4\7+$)\6\5+$|$\4)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\13))(?=.*(?=\12\12\9$)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\17))(?*.*?(?=((?=\3*(x?(x*)))\21(x(x*?))\1+$)))(?=.*(?=\23*$)(\23\24+$))(?=.*(?=(?=\21*$)\21\22+$)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\27))(?=.*(?=(?=\21*$)(?=\23*$)(?=\21\24+$)\23\22+$|$\21)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\31))(?=.*(?=\30\30\27$)(x*?)(?=\3*$)(x?(x*?))(\1+$|$\35))(?=.*(?=\26\26)(?=\3*(x*))(\1(x)|))(?=.*(?=\34\34\40)(?=\3*(x*))(\1(x)|))(?=(?=(.*)\13\13\17(?=\6*$)\6\7+$)\44(x+|(?=.*(?!\16)\41|(?!.*(?!\38)\8).*(?=\16$)\41$))(\25\31\31\35){2}\43$)\20|xx?\B|
Source: https://codegolf.stackexchange.com/a/198428> In CS theory, regular languages are a strict subset of context-free languages, but regular expression implementations in mainstream programming languages are more powerful. As noulakaz.net/weblog/2007/03/18/… describes, so-called "regular expressions" can check for prime numbers in unary, which is certainly something that a regular expression from CS theory can't accomplish. – Adam Mihalcin Commented Mar 19, 2012 at 23:50
I saw that by Abigail on comp.lang.perl.misc many moons ago. Here is an article about it: http://test.neilk.net/blog/2000/06/01/abigails-regex-to-test...
As far as I know, she was the genesis of this whole thing.