Unix Review > Archives > 2007 > June 2007

UnixReview.com
June 2007

Regular Expressions: Python's Mechanization

by Cameron Laird and Kathryn Soraiz

Much of what we do for our day-time employer, Phaseit, is automation: Automation of operations on network devices, automation of test sequences, customized emailings, automation of work-flow in large governmental agencies, and so on. We recently "scraped the Web" using Python in a way that's likely to interest readers.

Complications

Anything repetitive you do with a computer should trigger the reaction, "How can I mechanize this, so I don't have to type/click/select/... the same thing over and over?" Retrieval of Web pages is a challenge that's yielded a wealth of automations: bookmarks, command-line tools like lynx and wget, proprietary scripting applications, capture-and-playback tricks, browers plugins, and much more.

Part of the reason for this diversity is that the Web is successful enough to have become complicated. The end-user experience to which pages are targeted is affected by cookies, JavaScript interpretation, Referer checks, several forms of authentication, robots.txt directives, proxying, redirects, browser history, and more. How do tools balance the simplicity that makes common cases easy, with the flexibility to control all the details of cookies and proxies and all the rest? One successful organizational principle is object orientation (OO).<>

Sys Admin Spotlight

CMP DevNet Spotlight

Ada and the Language Renaissance
A renaissance in computer language design has allowed "little" languages like Ruby and Lisp to accumulate large, active communities of developers that continue to discover new uses for these technologies. Ada is another language that has benefitted from grassroots-level development. The result: Ada 2005.

In the News

CD-ROM

Sys Admin and The Perl Journal CD-ROM version 11.0

Version 11.0 delivers every issue of Sys Admin from 1992 through 2005 and every issue of The Perl Journal from 1996-2002 in one convenient CD-ROM!

Order now!




MarketPlace

Workflow Enabled Help Desk & IT Service Management
Automate service desk activities and integrate processes across IT. Learn more here.

Flowcharts from C/C++ code -- Free trial download
Understand C/C++ code in less time. A new team member ? Inherited legacy code ? Get up to speed faster with Crystal Flow for C/C++. Code-formatting improves readability. Flowcharts are integrated with code browser. Export flowcharts to Visio.

Discover WinDev 11 RAD
and develop 10 times faster ! ALM, IDE, .Net, PDF, 5GL, Database, 64-bit, etc. Free Express version

Online Crash Analysis
Automatically capture customer crash data, no debugger required. Support for .NET, C++, OS X, Java.

Wanna see your ad here?