Friday, May 21, 2010

How to parse XML file using Perl?

explain me step by step. I am new to Perl





Ex%26gt; using XML::Parser or XML::DOM

How to parse XML file using Perl?
You're looking in the right place if you found that module on CPAN, you do know about CPAN right? Parsing XML is a tricky thing, but there are basically two approaches which I can tell you about I hope that you'll get a better idea of what it is exactly that you want to do after I'm done.





Approach number one is called SAX parsing. This involves setting up a series of functions (or subs in perl) that "handle" certain "events." Let's say that you have an xml document like this:





%26lt;people%26gt;


%26lt;person%26gt;%26lt;name%26gt;Joe%26lt;/name%26gt;%26lt;age%26gt;25%26lt;/age%26gt;%26lt;...


%26lt;person%26gt;%26lt;name%26gt;Sally%26lt;/name%26gt;%26lt;age%26gt;20%26lt;/age...


%26lt;/people%26gt;





If you are parsing it using SAX you'd set up some code like this:





start_element(tagName, attributes) {


  if( tagName == "name" ) {


    inNameTag = true;


  }


}





characters(char) {


  if( inNameTag == true ) {


    charArray = charArray + char;


  }


}





So we are collecting names. But notice how this is done. The parser calls this sub "start_element" each time we hit the start of a tag, we just have to check which sort of tag it was, and then we set a variable to start collecting characters from in between the start and end tag. It is stream-oriented.





DOM is the other major way of going about parsing XML. It views a document as a model having a certain structure, a Document Object Model (ahem). Here is what some of this style of code looks like:





$doc_root = $dom_doc-%26gt;getDocumentElement();


  my $people = $doc_root-%26gt;getFirstChild();


  foreach my $person ( $people-%26gt;getChildNodes() ) {


    if ( $person-%26gt;getName() eq "name") {


    push(@peopleNames, $person-%26gt;getData());


  }


}





This method searches based on the idea that nodes have children that can be returned as lists. This is very close to the idea of trees and graphs in computer science.





I hope this very brief intro gets you on your way. You can look at the modules in perl's CPAN directory for some hints. You may also want to look at XML::Simple, which is the first item I've linked to below.


No comments:

Post a Comment