in
Richard Eigenmann, 10 Nov 2015
Revised 21 Nov 2015, 27 Mar, 31 Mar, 13 Apr 2016
Regular Expressions don't work in all compilers!
They worked fine in Microsoft Visual Studio but failed in gcc.
In gcc Regular Expressions are "partially implemented". Fixed in gcc 4.9.0. See Bug 53631
See also Stack Overflow: Options for using C++11 <regex> with a circa 2013 compiler
13 digit ISBN number. Wikipedia
978-0-321-99278-9
Pattern: digits- digits- digits- digits- digit
\d+-\d+-\d+-\d+-\d
Example: +41 44 2429788
Pattern: + 2 digits 1 space 2 digits 1 space 7 digits
\+\d{2}\s\d{2}\s\d{7}
Example: 192.168.0.1
1-3 digits . 1-3 digits . 1-3 digits . 1-3 digits
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
\d | digit |
\D | everything not a digit |
\s | whitespace |
\S | any char but a whitespace |
\w | word character: A-Z, 0-9, including the _ (underscore) |
\W | everything but a \w |
[abc] | a single char a, b, or c |
[^abc] | neither a, b, or c |
[a-z] | a single lowercase char a to z |
[^a-z] | no single lowercase char |
. | any single char |
\. | a period (note the escaping!) |
\\ | a backslash |
\+ | a plus |
- | a dash (no escaping!) |
* | Zero or more of the preceding element |
+ | One or more of the preceding element |
? | Zero or one of the preceding element |
{n} | Exactly n of the preceding element |
{n,} | n or more of the preceding element |
{n,m} | Between m and n of the preceding element |
^ | Beginning of line |
$ | End of line |
\b | Word boundary ( \bdone\b doesn't match abandoned) |
#include <boost/regex.hpp>
using namespace boost;
using namespace std;
int main() {
regex ISBNPattern{ R"(^\d+-\d+-\d+-\d+-\d$)" };
string isbn1 = "978-0-321-99278-9";
string isbn2 = "978-0-321-99278";
cout << isbn1 << " regex_match "
<< regex_match( isbn1, ISBNPattern ) << '\n';
cout << isbn2 << " regex_match "
<< regex_match( isbn2, ISBNPattern ) << '\n';
return 0;
}
978-0-321-99278-9 regex_match 1 978-0-321-99278 regex_match 0
Using boost regex to work under gcc < 4.9.0 else use #include <regex>
regex ISBNPattern{ R"(^\d+-\d+-\d+-\d+-\d$)" };
The R"(...)" means Raw string literal.
Bjarne writes: To get a double quote into a string literal we have to precede it with a backslash. This can quickly become unmanageable. In fact, in real use this “special character problem” gets so annoying that C++ and other languages have introduced the notion of raw string literals to be able to cope with realistic regular expression patterns. In a raw string literal a backslash is simply a backslash character (rather than an escape character) and a double quote is simply a double quote character (rather than an end of string).
#include <boost/regex.hpp>
using namespace boost;
using namespace std;
int main() {
regex IPv4Pattern { R"(^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$)" };
string ip1 = "192.168.0.1";
string ip2 = "7000.168.0.1";
string ip3 = "192.168.0";
cout << ip1 << " regex_match "
<< regex_match( ip1, IPv4Pattern ) << '\n';
cout << ip2 << " regex_match "
<< regex_match( ip2, IPv4Pattern ) << '\n';
cout << ip3 << " regex_match "
<< regex_match( ip3, IPv4Pattern ) << '\n';
}
192.168.0.1 regex_match 1 7000.168.0.1 regex_match 0 192.168.0 regex_match 0
What if we want the 4 numbers?
#include <boost/regex.hpp>
using namespace boost;
using namespace std;
int main() {
regex IPv4ExtractPattern
{ R"(^(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})$)" };
smatch matches;
string ip1 = "192.168.0.1";
if ( regex_search( ip1, matches, IPv4ExtractPattern ) ) {
cout << matches.size() << " matches\n";
for (int i = 0; i < matches.size(); ++i)
cout << "matches[" << i << "] = " << matches[i]<<'\n';
}
return 0;
}
5 matches matches[0] = 192.168.0.1 matches[1] = 192 matches[2] = 168 matches[3] = 0 matches[4] = 1
Given: | <TD>Cute Kittens</TD><TD>Funny Cats</TD> |
Pattern: | <TD>(.*)<\/TD> |
Match: | <TD>Cute Kittens</TD><TD>Funny Cats</TD> |
Pattern: | <TD>(.*?)<\/TD> |
Match: | <TD>Cute Kittens</TD><TD>Funny Cats</TD> |
A ? after a Repetition requests non-greedy repetition
Source: xkcd.com