Regular Expressions Part 3 – Replacing Patterns

Replacing Patterns

In the examples in Part 2, we have searched for patterns in a string, leaving the search string untouched. The preg_replace() function looks for substrings that match a pattern and then replaces them with new text. preg_replace() takes three basic parameters and an additional one. These parameters are, in order, a regular expression, the text with which to replace a found pattern, the string to modify, and the last optional argument which specifies how many matches will be replaced.
preg_replace( pattern, replacement, subject [, limit ])

The function returns the changed string if a match was found or an unchanged copy of the original string otherwise. In the following example we search for the copyright phrase and replace the year with the current.
<?php
echo preg_replace(“/([Cc]opyright) 200(3|4|5|6)/”, “$1 2007”, “Copyright 2005”);
?>

In the above example we use back references in the replacement string. Back references make it possible for you to use part of a matched pattern in the replacement string. To use this feature, you should use parentheses to wrap any elements of your regular expression that you might want to use. You can refer to the text matched by subpattern with a dollar sign ($) and the number of the subpattern. For instance, if you are using subpatterns, $0 is set to the whole match, then $1, $2, and so on are set to the individual matches for each subpattern.

In the following example we will change the date format from “yyyy-mm-dd” to “mm/dd/yyy”:
<?php
echo preg_replace(“/(\d+)-(\d+)-(\d+)/”, “$2/$3/$1”, “2007-01-25”);
?>

We also can pass an array of strings as subject to make the substitution on all of them. To perform multiple substitutions on the same string or array of strings with one call to preg_replace(), we should pass arrays of patterns and replacements. Have a look at the example:
<?php
$search = array ( “/(\w{6}\s\(w{2})\s(\w+)/e”,
“/(\d{4})-(\d{2})-(\d{2})\s(\d{2}:\d{2}:\d{2})/”);

$replace = array (‘”$1 “.strtoupper(“$2”)’,
“$3/$2/$1 $4”);

$string = “Posted by John | 2007-02-15 02:43:41”;

echo preg_replace($search, $replace, $string);?>

In the above example we use the other interesting functionality – you can say to PHP that the match text should be executed as PHP code once the replacement has taken place. Since we have appended an “e” to the end of the regular expression, PHP will execute the replacement it makes. That is, it will take strtoupper(name) and replace it with the result of the strtoupper() function, which is NAME.

Regular Expressions Part 2 – Matching Patterns

Matching Patterns

The preg_match() function performs Perl-style pattern matching on a string. preg_match() takes two basic and three optional parameters. These parameters are, in order, a regular expression string, a source string, an array variable which stores matches, a flag argument and an offset parameter that can be used to specify the alternate place from which to start the search:
preg_match ( pattern, subject [, matches [, flags [, offset]]])

The preg_match() function returns 1 if a match is found and 0 otherwise. Let’s search the string “Hello World!” for the letters “ll”:

<?php
‘——————————————————
if (preg_match(“/ell/”, “Hello World!”, $matches)) {
echo “Match was found <br />”;
echo $matches[0];
}
‘——————————————————
?>

The letters “ll” exist in “Hello”, so preg_match() returns 1 and the first element of the $matches variable is filled with the string that matched the pattern. The regular expression in the next example is looking for the letters “ell”, but looking for them with following characters:

<?php
‘——————————————————
if (preg_match(“/ll.*/”, “The History of Halloween”, $matches)) {
echo “Match was found <br />”;
echo $matches[0];
}
‘——————————————————
?>

Now let’s consider more complicated example. The most popular use of regular expressions is validation. The example below checks if the password is “strong”, i.e. the password must be at least 8 characters and must contain at least one lower case letter, one upper case letter and one digit:

<?php
‘——————————————————
$password = “Fyfjk34sdfjfsjq7”;

if (preg_match(“/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/”, $password)) {
echo “Your passwords is strong.”;
} else {
echo “Your password is weak.”;
}
‘——————————————————
?>

The ^ and $ are looking for something at the start and the end of the string. The “.*” combination is used at both the start and the end. As mentioned above, the .(dot) metacharacter means any alphanumeric character, and * metacharacter means “zero or more”. Between are groupings in parentheses. The “?=” combination means “the next text must be like this”. This construct doesn’t capture the text. In this example, instead of specifying the order that things should appear, it’s saying that it must appear but we’re not worried about the order.

The first grouping  is (?=.*{8,}). This checks if there are at least 8 characters in the string. The next grouping (?=.*[0-9]) means “any alphanumeric character can happen zero or more times, then any digit can happen”. So this checks if there is at least one number in the string. But since the string isn’t captured, that one digit can appear anywhere in the string. The next groupings (?=.*[a-z]) and (?=.*[A-Z]) are looking for the lower case and upper case letter accordingly anywhere in the string.

Finally, we will consider regular expression that validates an email address:

<?php
‘——————————————————
$email = firstname.lastname@aaa.bbb.com;
$regexp = “/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[@][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/”;

if (preg_match($regexp, $email)) {
echo “Email address is valid.”;
} else {
echo “Email address is <u>not</u> valid.”;
}
‘——————————————————
?>

This regular expression checks for the number at the beginning and also checks for multiple periods in the user name and domain name in the email address.

For the speed reasons, the preg_match() function matches only the first pattern it finds in a string. This means it is very quick to check whether a pattern exists in a string. An alternative function, preg_match_all(), matches a pattern against a string as many times as the pattern allows, and returns the number of times it matched.

A guide to sell digital downloads online

Digital goods or electronic goods or digital products are anything you can sell that is in a digital format. The types of products include ebooks, software, website templates, music, videos, licence codes, ringtones, apps are just a few. The upfront costs of creating digital goods is tiny so it can be a great way for an entrepreneur or small business to add a side stream income. Here are just a few of the perks to selling digital downloads:

• No inventory – you have no stock levels to maintain so you have a smaller initial outlay.

• Costs the same to sell one or thousands

• No shipping

• Little to no startup cost

• Consumers instantly receive the product

• Transaction is quickly completed so you have the money in your account straight away

Some online services specialize in selling digital goods, including invoicing, payment, and delivering the digital copy.

Picking a service can be challenging, here’s some guidelines to help you get started:

1) Fees

Transaction and hosting fees will vary, some require a monthly subscription and others, just a percentage or both. Just know which ones they require and what you can afford.  Selling through PayPal is often a convenient way and they have a transaction fee.

2) Integration with your website

Make sure the service can integrate with your blog or website, it’s best to use a service that has a cart that can blend in with your website. Sometimes a provider that has code you can cut and paste code into your web page!

3) Features

Here’s a few of the features that might be helpful:
Automatic product download
Have the option of no shipping
The ability to simultaneous sell tangible goods and e-goods

4) Security

It’s critical that the purchase made by the client is secure. You should expect your customer, after purchasing your product, to receive a secure link to immediately download the purchased item. The link should expire within a certain amount of time. Some systems offer “digital stamping” with a unique ID so you can track who is sharing your digital download. Some services offer rules that allow you to choose # of clicks or # of days the link is valid.

5)  Easy to Use

Make sure the transaction is smooth and easy for your customer, you don’t want them to jump through a bunch of hoops and make it easy to keep coming back and making purchases.

Here’s a great ecommerce guide from my site at http://withinweb.com/global/hints_tips_and_tricks/digital_files.pdf on further concepts for digital downloads.

Regular Expressions Part 1 – Introduction

In programming you quite often want to test a string for the occurrence of a character or group of characters.  This is particularly true for such things as data validation on form input boxes.  Regular expressions provide a pattern matching facility.

This tutorial gives a brief overview of basic regular expression syntax and then considers the functions that PHP provides for
working with regular expressions.

The Basics
Matching Patterns
Replacing Patterns
Array Processing

PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular Expressions (PCRE). The PCRE functions are more powerful than the POSIX ones, and faster too, so we will concentrate on them.

The Basics

In a regular expression, most characters match only themselves. For instance, if you search for the regular expression “ball” in the string “John plays football,” you get a match because “ball” occurs in that string. Some characters have special meanings in regular expressions. For instance, a dollar sign ($) is used to match strings that end with the given pattern. Similarly, a caret (^)
character at the beginning of a regular expression indicates that it must match the beginning of the string. The characters that
match themselves are called literals. The characters that have special meanings are called metacharacters.

The dot (.) metacharacter matches any single character except newline (\). So, the pattern h.t matches hat, hothit, hut, h7t, etc. The vertical pipe (|) metacharacter is used for alternatives in a regular expression. It behaves much like a logical OR operator and you should use it if you want to construct a pattern that matches more than one set of characters. For instance, the pattern Utah| Idaho|Nevada matches strings that contain “Utah” or “Idaho” or “Nevada”. Parentheses give us a way to group sequences.

For example :

(Nant|b)ucket matches “Nantucket” or “bucket”. Using parentheses to group together characters for alternation is called grouping.

If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.

To specify a set of acceptable characters in your pattern, you can either build a character class yourself or use a predefined one.

A character class lets you represent a bunch of characters as a single item in a regular expression. You can build your own character class by enclosing the acceptable characters in square brackets. A character class matches any one of the characters in
the class. For example a character class [abc] matches a, b or c. To define a range of characters, just put the first and last
characters in, separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9]. You can also create a negated character class, which matches any character that is not in the class. To create a negated character class, begin the character class with ^: [^0-9].

The metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means “Match one or more of the
preceding expression”, * means “Match zero or more of the preceding expression”, and ? means “Match zero or one of the preceding expression”. Curly braces {} can be used differently. With a single integer, {n} means “match exactly n occurrences of the preceding expression”, with one integer and a comma, {n,} means “match n or more occurrences of the preceding expression”, and with two comma-separated integers {n,m} means “match the previous character if it occurs at least n times, but no more than m times”.

Now, have a look at the examples:

Regular Expression     Will match…
foo     The string “foo”
^foo     “foo” at the start of a string
foo$     “foo” at the end of a string
^foo$     “foo” when it is alone on a string
[abc]     a, b, or c
[a-z]     Any lowercase letter
[^A-Z]     Any character that is not a uppercase letter
(gif|jpg)     Matches either “gif” or “jpeg”
[a-z]+     One or more lowercase letters
[0-9\.\-]     ?ny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$     Any word of at least one letter, number or _
([wx])([yz])     wy, wz, xy, or xz
[^A-Za-z0-9]     Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4})     Matches three letters or four numbers

Perl-Compatible Regular Expressions emulate the Perl syntax for patterns, which means that each pattern must be enclosed in a pair of delimiters. Usually, the slash (/) character is used. For instance, /pattern/.

The PCRE functions can be divided in several classes: matching, replacing, splitting and filtering.

Basic syntax of PDO Objects for PHP 5.1 and above

Here are some of the basic syntax for using PDO Objects.

The advantage of PDO objects is that you pass your variables into the SQL function using prepared statements.

Prepared statements are what are termed paramatised queries when working with program languages like Microsoft dot.net and
provide a way to prevents sql injection into databases.

SQL FETCH
<?php
$dbh = new PDO(‘mysql:host=localhost;dbname=test’, $user, $pass);
$stmt = $dbh->prepare(“SELECT * FROM REGISTRY where name = ?”);
if ($stmt->execute(array($_GET[‘name’]))) {
while ($row = $stmt->fetch()) {
print_r($row);
}
}
?>

SQL MODIFY
<?php
$dbh = new PDO(‘mysql:host=localhost;dbname=test’, $user, $pass);
$stmt = $dbh->prepare(“INSERT INTO REGISTRY (name, value) VALUES (?, ?)”);
$stmt->bindParam(1, $name);
$stmt->bindParam(2, $value);

// insert one row
$name = ‘one’;
$value = 1;
$stmt->execute();

// insert another row with different values
$name = ‘two’;
$value = 2;
$stmt->execute();
?>

It is advisable to use try / catch statements around PDO or and print out friendly error messages or otherwise it is
possible that an error will display internal details that you don’t want users to see.

<?php
try {
$dbh = new PDO(‘mysql:host=localhost;dbname=test’, $user, $pass);
foreach($dbh->query(‘SELECT * from table’) as $row) {
print_r($row);
}
$dbh = null;
} catch (PDOException $e) {
print “Error!: ” . $e->getMessage() . “<br/>”;
die();
}
?>

The robots.txt file

When it comes to SEO, most people understand that a Web site must have content, “search engine friendly” site architecture/HTML, and meta data such as title tags, graphic alt tag tags and so on.

However, some web sites totally disregarded the robots.txt file. When optimizing a Web site: don’t disregard the power of this little text file.

What is a Robots.txt File?

Simply put, if you go to www.domain.com/robots.txt, you should see a list of directories of the Web site that the site owner is asking the search engines to “skip” (or “disallow”). However, if you’re not careful when editing a robots.txt file, you could be putting information in your robots.txt file that could really hurt your business.

There’s tons of information about the robots.txt file available at the Web Robots Pages, including the proper usage of the disallow feature, and blocking “bad bots” from indexing your Web site.

The general rule of thumb is to make sure a robots.txt file exists at the root of your domain (e.g., www.domain.com/robots.txt). To exclude all robots from indexing part of your Web site, your robots.txt file would look something like this:

User-agent:
* Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

The above syntax would tell all robots not to index the /cgi-bin/, the /tmp/, and the /junk/ directories on your Web site.

There are situations where you might use the Robots.txt file to cause issues with your site optimisation.  For instance if you include a * Disallow: “/” in your Robots.txt file it will be telling the search engines not to crawl any part of the web site giving you no web presence – not what you want.

Another point to watch out for is if you modify your Robots.txt file to dissallow old legacy pages and directories – you should really do a 301 permanent redirect to pass the value from the old Web pages to the new web pages.

Robots.txt Dos and Don’ts

There are many good reasons to stop the search engines from indexing certain directories on a Web site and allowing others for SEO purposes.

Here’s what you should do with robots.txt:

* Take a look at all of the directories in your Web site. Most likely, there are directories that you’d want to disallow the search engines from indexing, including directories like /cgi-bin/,  /wp-amin/,  /cart/,  /scripts/,  and others that might include sensitive data.
* Stop the search engines from indexing certain directories of your site that might include duplicate content. For example, some Web sites have “print versions” of Web pages and articles that allow visitors to print them easily. You should only allow the search engines to index one version of your content.
* Make sure that nothing stops the search engines from indexing the main content of your Web site.
* Look for certain files on your site that you might want to disallow the search engines from indexing, such as certain scripts, or files that might contain e-mail addresses, phone numbers, or other sensitive data.

Here’s what you should not do with robots.txt:

* Don’t use comments in your robots.txt file.
* Don’t list all your files in the robots.txt file. Listing the files allows people to find files that you don’t want them to find.
* There’s no “/allow” command in the robots.txt file, so there’s no need to add it to the robots.txt file.

By taking a good look at your Web site’s robots.txt file and making sure that the syntax is set up correctly, you’ll avoid search engine ranking problems.  By disallowing the search engines to index duplicate content on your Web site, you can potentially overcome duplicate content issues that might hurt your search engine rankings.

Test a robots.txt file

Google provides a facility as part of there Webmaster Tools system to enable you to test a robots.txt file.

Test a site’s robots.txt file:

On the Webmaster Tools Home page, click the site you want.
Under Health, click Blocked URLs.
If it’s not already selected, click the Test robots.txt tab.
Copy the content of your robots.txt file, and paste it into the first box.
In the URLs box, list the site to test against.
In the User-agents list, select the user-agents you want.

Any changes you make in this tool will not be saved. To save any changes, you’ll need to copy the contents and paste them into your robots.txt file.

Posted in SOE

Single and double quotes in PHP

There is a difference in the way that PHP handles single and double quote marks when using the echo statement.

For example :

$var = ‘test’;

The statements echo(‘$var’); and echo(“$var”); will generate different results.

echo “\$var is equal to $var”;

will display $var is equal to test

While :

echo ‘\$var is equal to $var’;

will display

\$var is equal to $var.

In the case of the sinlle quotes, the variable name is displayed as is.

filter_var and validate an email address in PHP 5.2.0 onwards

PHP 5.2.0 onwards has the filter_var function which can be used to validate many different inputs.

To validate an email address :

<?php
//Validate an email address in PHP 5.2.0 onwards

$email_address = “me@example.com”;
if (filter_var($email_address, FILTER_VALIDATE_EMAIL)) {
// The email address is valid
} else {
// The email address is not valid
}
?>