Recursively download website files using WGET

Posted by & filed under Linux, OS X.

I use the following command to recursively download a bunch of files from a website to my local machine. It is great for working with open directories of files, e.g. those made available from the Apache web server. The following can be added to your .bash_profile or .bashrc script, depending on which your OS/distro recommends:… Read more »

PHP Web Scraping Book

Posted by & filed under PHP.

I just referred someone to this book on building PHP web scrapers after they emailed me asking about my personal PHP web scraper project. Unfortunately I don’t have time for a lot of freelance work these days, but I’m always willing to suggest a good book or point someone to a pro.

Open Sourcing EVE Crawler

Posted by & filed under Open Source.

PHP Eve Crawler on GitHub This is a project I started working on and abandoned in 2009. It is a spider which was specifically built for crawling websites which contained EVE Kill Mail’s. In the game of EVE, every time you kill a player, an in-game ‘mail’ is sent to you containing information. Players would… Read more »

Open Sourcing my PHP Web Scraper

Posted by & filed under Open Source, PHP.

Open Sourcing my PHP Web Scraper PHP Web Scraping Engine I started this project in January 2011. It was going to be an easy to use web scraper that anyone could configure. It has an attractive GUI interface using jQuery UI elements. The selectors can be entered using three different methods; the first is the tried… Read more »

Requests for PHP

Posted by & filed under PHP.

Requests for PHP Here’s a pretty cool PHP library for making HTTP requests. It handles all of the nasty cURL stuff behind the scenes and just leaves you with a clean “API” for making requests. $headers = array(‘Accept’ => ‘application/json’); $options = array(‘auth’ => array(‘user’, ‘pass’)); $request = Requests::get(‘https://api.github.com/gists’, $headers, $options); var_dump($request->status_code); // int(200) var_dump($request->headers[‘content-type’]);… Read more »

Installing PECL_HTTP on Debian

Posted by & filed under Linux, PHP, Web Server.

I was recently tasked with getting the pecl_http package installed on a server. I already hade PECL all setup (which can be its own nightmare), and I had cURL installed. But, there is a mystery package which needed to be installed first. tlhunter@amalthea:~ $ sudo pecl install pecl_http downloading pecl_http-1.7.4.tgz … Starting to download pecl_http-1.7.4.tgz… Read more »

PHP cURL Replacement

Posted by & filed under PHP.

If you are a PHP developer who writes a lot of software which needs to be executed in many different shared hosts, it can often be frustrating when certain hosts don’t offer all of the functionality your applications require, specificially the cURL libraries. I’ve seen these missing on several hosts, either for security reasons or… Read more »

Web Spidering

Posted by & filed under PHP, Security.


Spidering, in its simplest form is the act of transferring data from one database to another. Spidering requires the use of Regular Expressions, the cURL library (if POST data or cookies are used), and the cron libraries (if we need to download information with a schedule).