Clean user data using PHP and Regular Expressions

DEPRECATED: This post has been marked as deprecated and may no longer contain industry best-practices.

These functions use Regular Expression's to check the data. What is a RegEx? It is basically a pattern matching language. You can compare a string to a regex to see if it is valid or not, or you can strip invalid characters.

Here is a RegEx function to make sure a string only contains letters, numbers, and the underscore character:

<?php
function nukeAlphaNum($value) {
    return ereg_replace("[^a-zA-Z0-9_]", "", $value);
}

The function takes one argument, a string, and returns another string that only contains lowercase letters (a-z) uppercase letters (A-Z) numbers (0-9) and underscore (_). ereg_replace takes three arguments, the RegEx, the character to replace it with (in this case nothingness) and the string that it is sifting through (the one we send to the function).

Here are some others that we use:

<?php
function nukeAlpha($value) {
    return ereg_replace("[^a-zA-Z]", "", $value);
}

function nukeHex($value) {
    return ereg_replace("[^0-9a-fA-F]", "", $value);
}

function nukeNum($value) {
    return ereg_replace("[^0-9]", "", $value);
}

These are all pretty self explanatory. There is one drawback; RegEx isn't the most processor friendly function. This normally isn't a problem if you execute a single regex per page render, however if you start using several regex's, you may want to consider consolidating.

There is one more type of RegEx function that we use. These use the eregi function. Here is an example of what I use to validate an email address:

<?php
function nukeValidEmail($value) {
    if (eregi("^[_a-z0-9-]+(.[_a-z0-9-]+)*@[a-z0-9-]+(.[a-z0-9-]+)*(.[a-z]{2,3})$", $value))
        return true;
    else
        return false;
}

Instead of removing bad characters, this function returns a true if the email is valid and a false if it is not valid.

This last function is what we use to validate a website address. It allows blank strings, http://, and a full website. The first two options are allowed because not everyone that creates an account on a particular site has a website:

<?php
function nukeValidWebsite($value) {
    if (eregi("^(http|ftp|https)://[-A-Za-z0-9._/]+", $value))
        return true;
    else if (empty($value) || $value == "http://")
        return true;
    else
        return false;
}
Tags: #php #security
Thomas Hunter II Avatar

Thomas is the author of Advanced Microservices and is a prolific public speaker with a passion for reducing complex problems into simple language and diagrams. His career includes working at Fortune 50's in the Midwest, co-founding a successful startup, and everything in between.