PHP Navigation System using Single Entry Point

5 min read Jul 3, 2006

DEPRECATED: This post may no longer be relevant or contain industry best-practices.

The traditional PHP application uses a really simple method for rendering pages. Lets say that you visit the contact page of a website. In your address bar, you go to sitename.com/contact.php. What this PHP script ends up doing is first renders the header (aka includes it), performs the actions in the body of the site, then renders the footer. So, everytime you create a new page, you have to do something like this:

<?php
include("header.php");
do_stuff();
include("footer.php");
?>

Look familiar? Chances are, this was how one of your first websites looked, or quite possibly ones you are still developing. Now, why is this bad? Lets say that you want to change the name of your header file. You would need to modify every single page of your website, not to mention the redundant code in every file is bad anyway. Also, say that you want to run an extra function in all of your pages which you can't stick in the footer file. This gives us the same problem of having to modify every page.

The solution to this problem is to use a PHP system with a single entry point. How this works is that any page a person visits is routed to a main PHP file (e.g. index.php) which loads the header, then includes our script, and loads a footer.

Here is the classical approach adopted by a lot of people doing this:

<?php
switch($id) {
  default: include('blah.php');
  break; case "1": include('blah1.php');
  break; case "2": include('blah2.php');
  break; case "3": include('blah3.php');
  break; case "4": include('blah4.php');
} ?>

You would put your header before the code and your footer after. If you want to link to a specific page you would use the link <a href="blah.php?id=1">Blah1</a>.

This is simple, it works, and cannot be hacked. But, can you see the problem? What if you have 1000 pages? That is a lot of code to write. If you ever want to add another page you have to modify your code again.

Here is out solution. First the data cleaning function which makes sure no characters are looked at other than letters:

function nukeAlpha($value) {
    return ereg_replace("[^a-zA-Z]", "", $value);
}

All data that you work with coming from a user should be cleaned and this regular expression function works better than anything else we've come across. If you pass this function banana, it will return banana. If you pass it &)(\*)(&#)a\*p3p@l3e it will return apple.

Say that all the files you want to include are in a directory called includes. We use this line of code to store that path:

$inc_dir = './includes/';

Now we get the info about the page that the user wants to look at (if not specified the page defaults to news):

$s = isset($_GET['s']) ? nukeAlphaNum($_GET['s']) : "news";

The data is cleaned by our previous function and the page the user is looking for is set to $s. Now, we attach the directory that we include from with the page name followed by .php:

$inclusion = $inc_dir . $s . '.php';

However, what if the file doesn't exist (like the user types in some erroneous value)? Well, we would get a PHP level error. What I do is first see if the file exists and then try including it:

if (file_exists($inclusion)) {
   include_once($inclusion);
} else {
   include_once("$inc_dir/error.php");
}

Linking, using this system, works like this: <a href=";main.php?s=example";>Example</a>. That link will try to pull example.php from the includes directory. If the file does not exist, error.php is included instead (which would be your custom 404 page).

Quick tip: If you use this system and your base page is index.php, you can use the link structure of <a href="?s=blah">Blah</a>. Notice how the base php file name is unnecessary.

Why this is better than other methods:

Custom 404 pages
Just upload the file into your include directory and make a link
You only have to write the header/footer data in one location
Overall less code writing and redundancy

If you want to have other files in the include directory that will end in .php and you don't want a user to be able to link to them (such as a db_connection.php file) put a .ssi.php extension on it instead. This way, if the user even knows the filenames on your server, the . will be ignored when passing it through the url and only files that you want can be included.

Now, lets take this a step further and make the links a little bit prettier (in general ?'s should be avoided whenever possible).

My old website had content loaded from PHP and MySQL. In order to load different section names, the section to be loaded was passed by a GET variable. The structure of the url was this: index.php?section=section_name, which was further truncated to ?s=section_name.

The ?'s are un-appealing and not friendly with many forums, BB's, and search engines. So, for version 13 we used ModRewrite. What this does is take one url and change in into another url w/o the user knowing. The system that we am using takes xxx.htm and converts it to index.php?s=xxx. The .htm files do not physically exist on the server.

This is the simple version of my .htaccess file:

RewriteEngine On
RewriteRule ^([a-zA-Z0-9_]+).htm$ index.php?s=$1 [QSA]

The first line is needed to enable the rewrite engine and should be loaded before all other rewrite rules. The second line is a little more tricky. The first part is RewriteRule, which is needed.

The second part is a regular expression what describes the url that the user is trying to get to. ^ represents the start of the url and $ represents the end. The stuff in the parentheses is a block of text that could represent a variable abount of characters. a-z represents all lowercase characters, A-Z and 0-9 are obvious. The _ character matches underscores. All that in the square brackets means one character and the plus sign means that there can be one or more of that character type (meaning any amount of letters, numbers, and underscores). Everything in the parenthesis is remembered and set as a variable $1. If you had another set of parenthesis that would be $2. Overall this loads variable characters ending in .htm.

The third chunk is what the url get's redirected to, in our case index.php?s. The $1 I mentioned before is now put to use and is tacked on the url, passed as the GET variable s.

The fourth chunk is optional. The QSA is a ModRewrite rule that stands for Query String Attach. What this means is that any other GET variables used on the .htm file will also be passed to index.php?s.

Now, you can always take these rules a step farther, and model your code after some of the popular PHP frameworks:

RewriteEngine On
RewriteRule ^([a-zA-Z0-9_]+)$ index.php?x_controller=$1 [QSA]
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)$ index.php?x_controller=$1&x_method=$2 [QSA]
RewriteRule ^([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)/([a-zA-Z0-9_]+)$ index.php?x_controller=$1&x_method=$2&x_id=$3 [QSA]

Tags: #php

Thomas has contributed to dozens of enterprise Node.js services and has worked for a company dedicated to securing Node.js. He has spoken at several conferences on Node.js and JavaScript and is an O'Reilly published author.