
If you're interested in website scraping, here's an excerpt from a
post I made this weekend

:
Website Scraping PlatformsWeb Harvest is an
Open Source,
Java based platform geared towards website data extraction. As they put it,
Web Harvest "offers a way to collect desired Web pages and extract useful data from them. In order to do that, it leverages well established techniques and technologies for text/xml manipulation such as XSLT,
XQuery and
Regular Expressions.
Web Harvest mainly focuses on HTML/XML based
web sites which still make vast majority of
the Web content. "
Web Harvest looks to be extremely powerful and flexible, and it's free, which is always nice. If you're able to write code in
Java, you may want to look at it pretty closely.
The
Twit88 blog has two excellent tutorials on using Java/
Web Harvest to extract data from websites.
Web Scraping using Web Harvest, and
Java - Writing a Web Page Scraper or Web Data Extraction Tool.
Thanks to
MIT's
SIMILIE Project, you can use two of their programs -
Piggy Bank, and
Solvent - to turn your copy of Mozilla FireFox into a data scraping platform. Both plugins are free under the BSD License, and come with
sample scrapers to help you get started.

Read the rest of the
post at the
BookMarkMoney.com Blog