© copyright 01.Jan.2002 by Paul Bradley filed under Perl
When it comes to working with XML in Perl, there are hundreds of CPAN modules to choose from, each supporting different aspects of Web Services. However, the Perl core installation includes support for XML in the module XML::Parser. This short tutorial demonstrates using Perl and XML::Parser to grab news headlines from The Registers RDF feed.
The first three lines of code start all my CGI Perl scripts. The first being the path to Perl including switching on warnings -w. The second line disables buffering on STDOUT which is a good thing for CGI scripts, the third line turns on the strict pragma to restrict unsafe constructs. The next three lines use the additional modules required for this script, then we print the content type statement to the browser.
#!/usr/bin/perl -w
$|++;
use strict;
use XML::Parser;
use LWP::UserAgent;
use HTTP::Status;
print "Content-type: text/html
";
The script sets some initial values for the variables which will be used.
my $tag = "";
my $html = "";
my $link = "";
my $title = "";
my $xmlcontent = "";
my $description = "";
Next, we define a UserAgent to grab the contents of the RDF feed, we do this by using the LWP::UserAgent module, setting a valid agent string (Mozilla/4.0) and a time out of 20 seconds.
my $ua = new LWP::UserAgent;
$ua->agent(Mozilla/4.0);
$ua->timeout(20);
my $req = new HTTP::Request(
GET => http://www.theregister.co.uk/tonys/slashdot.rdf);
my $rdffile = $ua->request($req);
We need to validate that the request was successful (i.e. we got a server 200 response form the site), and if the request was unsuccessful for any reason then display an appropriate message.
If the request was successful, then we store the actual page content into $xmlcontent. We then create an instance of the parser object into $parser and set-up some handlers to point to our user defined subroutines. The XML::Parser is and event-based parser, meaning certain conditions trigger handling functions. For example below I have defined a handler for the Start tag to run a subroutine startElement, every time a start tag is parsed within the file the subroutine startElement will be executed.
if ($rdffile->is_success) {
$xmlcontent = $rdffile->content;
$xmlcontent =~ s/&/{amp}/g;
my $parser=new XML::Parser(Style=>Stream);
$parser->setHandlers (
Start => &startElement,
End => &endElement,
Char => &characterData,
Default => &default);
$parser->parsestring($xmlcontent);
} else {
print "News site is not responding.";
}
All that's left is to define what happens in the case of each type of event.
When the parser encounters a start tag e.g. <title>, it will execute the startElement subrountine because it was defined in the setHandlers method above. I am simply setting the $tag variable to the current element so I can keep track on which tag the parser is currently working on.
sub startElement {
my($parseinst, $element, %attrs) = @_;
$tag = "";
SWITCH: {
if ($element eq "title") {
$tag = "title";
last SWITCH;
}
if ($element eq "link") {
$tag = "link";
last SWITCH;
}
if ($element eq "description") {
$tag = "description";
last SWITCH;
}
}
}
The endElement subroutine is executed every time an end tag is encountered in the XML file e.g. </title>. If the element is equal to an end item tag (</item>) then I print a link to the news article displaying its title and description.
sub endElement {
my($parseinst, $element, %attrs) = @_;
$tag ="";
SWITCH: {
if ($element eq "item") {
$html = "<p><a href="$link">$title
</a><br>$description</p>";
print $html;
$html = "";
last SWITCH;
}
}
}
The characterData subroutine simply assigns the actual data to the correct variable ready for processing in the endElement subroutine.
sub characterData {
my($parseinst, $data) = @_;
$data =~ s/{amp}/&/;
if ($tag eq "title") {
$title = $data;
}
if ($tag eq "link") {
$link = $data;
}
if ($tag eq "description") {
$description = $data;
}
}
About the Author
Paul Bradley is a VB.NET software developer living and working in Cumbria. He provides PHP & MySQL bespoke development services via his software development company, Carlisle Software Limited.
He has over 20 years programming experience.