Clean user data using PHP and Regular Expressions

Multithreaded JavaScript has been published with O'Reilly!
DEPRECATED: This post may no longer be relevant or contain industry best-practices.

These functions use Regular Expression's to check the data. What is a RegEx? It is basically a pattern matching language. You can compare a string to a regex to see if it is valid or not, or you can strip invalid characters.

Here is a RegEx function to make sure a string only contains letters, numbers, and the underscore character:

<?php
function nukeAlphaNum($value) {
    return ereg_replace("[^a-zA-Z0-9_]", "", $value);
}

The function takes one argument, a string, and returns another string that only contains lowercase letters (a-z) uppercase letters (A-Z) numbers (0-9) and underscore (_). ereg_replace takes three arguments, the RegEx, the character to replace it with (in this case nothingness) and the string that it is sifting through (the one we send to the function).

Here are some others that we use:

<?php
function nukeAlpha($value) {
    return ereg_replace("[^a-zA-Z]", "", $value);
}

function nukeHex($value) {
    return ereg_replace("[^0-9a-fA-F]", "", $value);
}

function nukeNum($value) {
    return ereg_replace("[^0-9]", "", $value);
}

These are all pretty self explanatory. There is one drawback; RegEx isn't the most processor friendly function. This normally isn't a problem if you execute a single regex per page render, however if you start using several regex's, you may want to consider consolidating.

There is one more type of RegEx function that we use. These use the eregi function. Here is an example of what I use to validate an email address:

<?php
function nukeValidEmail($value) {
    if (eregi("^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,3})$", $value))
        return true;
    else
        return false;
}

Instead of removing bad characters, this function returns a true if the email is valid and a false if it is not valid.

This last function is what we use to validate a website address. It allows blank strings, http://, and a full website. The first two options are allowed because not everyone that creates an account on a particular site has a website:

<?php
function nukeValidWebsite($value) {
    if (eregi("^(http|ftp|https)://[-A-Za-z0-9._/]+", $value))
        return true;
    else if (empty($value) || $value == "http://")
        return true;
    else
        return false;
}
Tags: #php #security
Thomas Hunter II Avatar

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.