Regular Expression PHP Tutorial

0
712

It takes the form:

preg_match(‘/regularexpression/’, $textstring)

Note the forward slash at the start and end of the regular expression. This is a way of indicating there is a regular expression between the forward slashes. Other PHP commands used with regexs are preg_split, preg_replace and preg_match_all. You can find out more of these from the official PHP website.

Searching for an exact text phrase

If you want to check if an exact text string is within another text string, there are no special regex characters required – you just use the exact text phrase for the regular expression. For example:

if (preg_match(‘/tutorial/’, ‘tips and tutorials are here for webmasters’))

    echo “word ‘tutorial’ found!”;

Note – it’s case sensitive. It’s actually more efficient to use the PHP function substrin these cases but I’m just kicking things off with an easy example.

Start and end of text

If you’re searching for text at the start or end of a text file, use the symbols ^ and $.

“^The”: matches any string that starts with “The”;

“of despair$”: matches a string that ends in the substring “of despair”;

Multiple characters

The symbols ‘*’, ‘+’, and ‘?’ denote the number of times a character or a sequence of characters may occur. What they mean respectively is: “zero or more”, “one or more”, and “zero or one”. For example:

“tu*”: matches a string that has the letter t followed by zero or more u’s (“t”, “tu”, “tuuuuu”, “tutorial”, etc).

“tu+”: similar but at least one u (“tu”, “tuuu”, “tut”, etc).

“tu?”: there may or may not be a u.

“t?b+$”: a possible t followed by one or more u’s ending the string.

Or if you want to be more specific on the number of multiple characters, you can specify a range within braces {}.

“o{3}h”: matches a string that has exactly three o’s followed by h (“oooh”).

“o{3,}h”: there are at least three o’s (“oooh”, “ooooh”, “ooooooooooh”, etc).

“o{3,5}h”: from three to five o’s (“oooh”, “ooooh”, or “oooooh”).

You always specify the first number of a range but you can’t specify just the last number (eg – {3,5} or {3,} but not {,5}).

If you want to quantify a sequence of characters rather than just a single character, put them inside parentheses:

“t(ut)*”: matches a string that has an t followed by zero or more copies of the sequence “ut” (eg – “t”, “tut”, “tutututut”, etc).

“t(ut){1,3}”: between one to three copies of “ut” (“tut”, “tutut”, “tututut”).

OR operator

The ‘|’ symbol works as an OR operator: “tips|tutorials”: matches a string that has either “tips” or “tutorials” in it.

“(b|cd)ef”: a string that has either “bef” or “cdef”.

“(a|b)*c”: a string that has a sequence of alternating a’s and b’s ending in a c.

Wild character

A period (‘.’) is a wild character – it can stand for any single character:

“t.*p”: matches a string that has a t followed by any number of characters followed by a p (“tip”, “tp”, “tdfsadfsadsfp”, etc).

“^.{5}$”: a string with exactly 5 characters (“bingo”, “blind”, “rainy”, “asdfe”, etc).

Bracket expressions

Bracket expressions lets you match a whole range of characters to a single position of a string:

“[tu]”: matches a string that has either a ‘t’ or a ‘u’ (that’s the same as “t|u”);

“[a-d]”: a string that has lowercase letters ‘a’ through ‘d’ (that’s equal to “a|b|c|d” and even “[abcd]”);

“^[a-zA-Z]”: a string that starts with a letter;

“[0-9]%”: a string that has a single digit before a percent sign;

“,[a-zA-Z0-9]$”: a string that ends in a comma followed by an alphanumeric character.

Note that inside brackets, all the regex special characters are just ordinary characters – they don’t do any of their usual regular expression functions.

Excluding characters

You can also exclude characters by using a ‘^’ as the first symbol in a bracket expression:

“%[^a-zA-Z]%” matches a string with a character that is not a letter between two percent signs).

Note – the difference between this application and using ^ at the start of a regular expression which specifies the first character of a string.

Escaping regular expression characters

What do you do if you want to check for one of the regular expression special characters “^.[$()|*+?{\” in your text string? You have to escape these characters with a backslash (‘\’).

Retrieving text using preg_match

If you want to extract a phrase out of a text string, you use the PHP function preg_match in the following format:

preg_match(‘/regular expression/’, $textstring, $matchesarray)

It returns a value of 1 if there is a match to your regular expression, a value of 0 if no match. For example,

echo preg_match (‘/test/’, “a test of preg_match”);

outputs 1 whereas

echo preg_match (‘/tutorial/’, “a test of preg_match”);

outputs 0.

Preg_match is really useful for extracting phrases out of a text string. To do this, you specify an array as the third argument (eg – $matchesarray is what I use in the example). You also need to use parenthesizes in your regular expression to specify the sections you want to retrieve. If there’s a successful match, $matchesarray is filled with the results of the search. $matchesarray[0] contain the entire text string. $matchesarray[1] contains the text that matched the first captured parenthesized subpattern, and so on.

For example, the following regex divides a url into two sections. The first section is “http://” (note the escaping back slash), the second section is whatever comes after:

preg_match (‘/(http://)(.*)/’, “https://www.webmastersun.com/”, $matchesarray)

This fills $matchesarray with the following values:

$matchesarray[0] = “https://www.webmastersun.com/”

$matchesarray[1] = “http://”

$matchesarray[2] = “www.webmastersun.com/”

LEAVE A REPLY

Please enter your comment!
Please enter your name here