Friday, February 16, 2007

How to use python and mechanize to grab a web page

In a previous post, I looked at how to grab a web page using urllib2. Mechanize looks to be the best python package for automatic web browsing or spidering the web. The author also wrote ClientForm and ClientCookie which is included in Mechanize. I also looked at Mechanoid, which is a fork of Mechanize, but it looks like Mechanize is the way to go. It is still under active development. The author has given examples of how to access a password protected page using Mechanize. This first example will just show how to grab a page without password protection.

How to:
1. Install Easy Install
2. open "cmd.exe"
3. "cd c:\python24\Scripts
4. run "easy_install mechanize"
5. run the following python code:
from mechanize import Browser

br = Browser()
br.open("http://www.yahoo.com")
print br.response().read() 
That's it!

No comments: