The Original Web-Wise-Wizard
Web Authors, Web Developers and Webmasters Internet Toolbox
Best viewed at 1024 x 768 using a colour depth greater than 256

Perl Regular Expressions

This page has been tested and conforms to WCAG 2.0 Accessibility Guidelines

Programming Power ...

Regular expressions appear to be rapidly gaining in popularity as web authors discover the sheer programming power that regular expressions can provide. Historically, regular expressions have been associated with the UNIX platform and fully functional programming languages like Perl (Practical Extraction and Report Language) but web developers are discovering that they are also supported in other web languages like PHP and JavaScript. The syntax surrounding their usage may be peculiar to a particular language but the regular expression syntax itself is usually identical across all supporting languages. This makes the Perl regular expression examples on this page relevant to many web programming languages and because of this we have provided some links to other related pages that you might consider useful to read in conjunction with this page.


Page Organization ...

This is quite a long page that is split into different sections which are listed here. Perl Regular Expressions were previously included on the Web-Wise-Wizard Perl Scripting Examples page but because of the wider interest in regular expressions we decided to separate the regular expressions from Perl scripting examples and then develop both pages separately.

  1. Perl Regular Expressions - String Manipulation
  2. Perl Regular Expressions - Understanding Page Parsing
  3. Perl Regular Expressions - Page Parsing Examples
  4. Perl Regular Expressions - Regular Expression Operators
  5. Perl Regular Expressions - Link Directly To This Page
  6. Perl Regular Expressions - Featured Links

String Manipulation ...

This first section provides examples of how you can operate on and manipulate strings using Perl Regexp. In the context of these regular expression examples the term 'white space' includes any non printing character, that is any character you cannot see. Examples include blank spaces, carriage returns, line feeds, tab characters, etc.

Strip White Space From Start

$string =~ s/^\s+//;

Strip White Space From End

$string =~ s/\s+$//;

Strip White Space Both Ends

$string =~ s/^\s*(.*?)\s*$/$1/;

Convert To Lower Case

$string = 'This IS a Wizard TEST string';
$string =~ tr/A-Z/a-z/;
RESULT >>> this is a wizard test string

Convert To Upper Case

$string = 'This IS a Wizard TEST string';
$string =~ tr/a-z/A-Z/;

Capitalize Words

$string = 'This IS a Wizard TEST string';
$string =~ s/(\w+)/\u\L$1/g;
RESULT >>> This Is A Wizard Test String

Strip Trailing Foreward Slash

$string = '';
$string =~ s/\/$//;

Back Slashes To Foreward Slashes

$string = 'c:\htdocs\mysite\public_html\';
$string =~ s/\\/\//g;
RESULT >>> c:/htdocs/mysite/public_html/

Strip Protocol From URL

$string = '';
$string =~ s/^http\:\/\/|^https\:\/\/|^ftp\:\/\///;

Delete String From Foreward Slash On

$string = '';
$string =~ s/\/(.*)//;

URI Encode A Search String

# SEARCH STRING >>> +"perl regular expression" +examples
$string = "+\"perl regular expression\" +examples";
$string =~ s/[\x00-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\xFF]/&hexCodeFromChar($&)/eg;
# RESULT >>> %2B%22perl regular expression%22 %2Bexamples
$string =~ s/ /+/g;
# RESULT >>> %2B%22perl+regular+expression%22+%2Bexamples
sub hexCodeFromChar {
  my ($char) = @_;
  my $asc = ord($char);
  return(sprintf("%c%02X", 37, $asc));

Decode A URI Encoded String

$string = "%2B%22perl+regular+expression%22+%2Bexamples";
$string =~ tr/+/ /;
# RESULT >>> %2B%22perl regular expression%22 %2Bexamples
$string =~ s/%(..)/pack("C",hex($1))/eg;
# RESULT >>> +"perl regular expression" +examples


Understanding Page Parsing ...

The first problem you have to address if you want to parse pages effectively is how to load pages in a format that can be useful for page parsing. A web page is a text file made up of lots of lines of text Markup and each line is terminated by a 'line end'. The problem with this is that line ends can vary, depending on which type of system the page was created on. For example, Windows/DOS based systems use a carriage return/line feed combination, Apple Macs use a single carriage return and UNIX/Linux based systems use a single line feed.

The second problem that you have to address is that HTML Tags are not necessarily contained on one line and you only have to look at the source Markup of web pages created by many of the HTML 'generator' programs to note the practice of breaking HTML Tags and then continuing the Tag on a new line or on several new lines.

If you want to parse a web page the only effective way to deal with all these problems is to eliminate line ends altogether and load the page as one long string without any line ends. There are various ways to accomplish this but these are beyond the scope of this regular expressions tutorial.

Page Parsing Examples ...

The following examples all presume that the page being parsed is stored as one long string and that all line ends have been eliminated from the string.

HTML Title Tag

<title>Parsing The HTML Title Tag</title>
my $pageTitle = '';
if ($pageContents =~ m/<title>(.*)<\/title>/i ) {
  $pageTitle = $1;

Meta Description Tag - Simple

<meta name="description" content="Example showing how to parse a META Description Tag">
my $metaDescription = '';
if ($pageContents =~ m/<meta name=\"description\" content=\"(.*)\">/i ) {
  $metaDescription = $1;

Meta Description Tag - Complex

<meta name="description" content="Example one showing how to parse META Description Tags">
<meta name='description' content='Example two showing how to parse META Description Tags'>
<meta name="description" content="Example three showing how to parse META Description Tags" />
<meta name='description' content='Example four showing how to parse META Description Tags' />
my $metaDescription = '';
if ($pageContents =~ m/<meta name=[\"|\']description[\"|\'] content=[\"|\'](.*)[\"|\'][>| \/>]/i ) {
  $metaDescription = $1;

Meta Keywords Tag - Simple

<meta name="keywords" content="regular, expression, parse, html, xhtml">
my $metaKeywords = '';
if ($pageContents =~ m/<meta name=\"keywords\" content=\"(.*)\">/i ) {
  $metaKeywords = $1;

Meta Keywords Tag - Complex

<meta name="keywords" content="perl, php, java, script, javascript, regexp">
<meta name='keywords' content='page, parse, parsing, operate, strings'>
<meta name="keywords" content="anchors, comments, images, include, learn" />
<meta name='keywords' content='strip, unix, easy, harness, techniques' />
my $metaKeywords = '';
if ($pageContents =~ m/<meta name=[\"|\']keywords[\"|\'] content=[\"|\'](.*)[\"|\'][>| \/>]/i ) {
  $metaKeywords = $1;

Strip HTML Tags

a simple minded approach ...

$htmlPage =~ s/<.*?>//g;

a more successfull approach ...

$htmlPage =~ s/<(?:[^>'"]*|(['"]).*?\1)*>//gs;


Regular Expression Operators ...

There are three types of operator in Perl regular expressions, the MATCHING, SUBSTITUTION and TRANSLATION operators. By default, all three regular expression operators work with the Perl system variable $_ but many newcomers to regular expressions find this confusing so all examples on this page apply regular expressions to descriptive scalar variables. In Perl, you apply regular expressions to scalar variables using the Perl 'binding operators' =~ (matches pattern) or !~ (not matches pattern).

example using binding operator =~

my $message = 'This is an URGENT message';
if ($message =~ m/urgent/i) {

example using binding operator !~

my $message = 'This is a NORMAL message';
if ($message !~ m/urgent/i) {

Matching Operator m// or //

The MATCHING operator is the most widely used of the Perl regular expression operators. The following examples show it being used as a boolean true/false test of a condition and to extract multiple instances of data from a web page. In Perl, you can omit the 'm' at the start of the expression although this is not a practice I would recommend if you are new to regular expressions.

a simple boolean test ...

my $string = 'This is an URGENT message';
if ($string =~ m/urgent/i) {

extract text from all ALT Attributes on page and place in an array ...

my @altAttributeArray = ();
my $count = 0;
while ($pageContents =~ m/alt=\"(.*)\"/ig) {
  $altAttributeArray[$count++] = $1;

Substitution Operator s///

The SUBSTITUTION operator can work on words, numbers and phrases.

Translation Operator tr///

The TRANSLATION operator works on characters only and can make a character by character translations.

count the number of astericks in a string ...

my $count = ($string =~ tr/*/*/);

convert lower case characters to upper case ...

$string =~ tr/a-z/A-Z/;


Link Directly To This Page ...

help support free information on the Internet ...

Many users prefer to link directly to individual content pages on Web-Wise-Wizard. If you would like to do this then we have provided the following HTML/CSS link script which you can copy and paste directly into your HTML editor. Alternatively, you might like to use our New Dynamic Link Generator to create a link that more fully meets your own particular requirements.

the link displayed ...

Web-Wise-Wizard - Perl Regular Expressions Perl regular expression examples. Learn regular expressions the easy way. Examples include methods to manipulate strings and for HTML page parsing. Use these examples to harness the power of Regex.

select/copy the link Markup ...

Featured Tutorial
Want More Traffic? Increase your Link Popularity and your PageRank by learning how to use Web Directories
Link To Us Scripts
New Dynamic Link Generator
If you find this page interesting or useful then others are likely to view it in exactly the same way. Providing a link to the page will be considered by the search engines as casting a vote for the page. In turn, this will help to improve the search engine ranking of the page resulting in more people being able to see the page. Your link really does count so please don't delay.
Post your link NOW!
Regular Expression Featured Links
Copyright © 1998,2014, Gilbert Hadley, Liverpool, England