打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
PHP 5 DOM and XMLReader: Reading XML with Nam...
userphoto

2012.02.04

关注

PHP 5 DOM and XMLReader: Reading XML with Namespace (Part 1)

  • user warning: Table './itsalif/captcha_sessions' is marked as crashed and last (automatic?) repair failedquery: INSERT into captcha_sessions (uid, sid, ip_address, timestamp, form_id, solution, status, attempts) VALUES (0, 'tb7n2cb58s9vuiak3uqq3qevn2', '202.43.144.67', 1328264744, 'comment_form', 'undefined', 0, 0) in /home/vhosts/itsalif.info/public/sites/default/modules/captcha/captcha.inc on line 92.
  • user warning: Table './itsalif/captcha_sessions' is marked as crashed and last (automatic?) repair failedquery: SELECT status FROM captcha_sessions WHERE csid = 337263 in /home/vhosts/itsalif.info/public/sites/default/modules/captcha/captcha.inc on line 112.
  • user warning: Table './itsalif/captcha_sessions' is marked as crashed and last (automatic?) repair failedquery: SELECT status FROM captcha_sessions WHERE csid = 337263 in /home/vhosts/itsalif.info/public/sites/default/modules/captcha/captcha.inc on line 112.
  • user warning: Table './itsalif/captcha_sessions' is marked as crashed and last (automatic?) repair failedquery: UPDATE captcha_sessions SET timestamp=1328264744, solution='1' WHERE csid=337263 in /home/vhosts/itsalif.info/public/sites/default/modules/captcha/captcha.inc on line 104.

By alif - Posted on 07 April 2009

PHP-5's DOM and XMLReader provides the ability to read XML files easily. The good thing about PHP-5's DOM (mainly DomDocument, DomNodeList, DomNode) is that it implements the standard DOM features as specified by W3C. W3C's reference on DOM can be viewed here. So, if someone has used DOM before (say on JavaScript), then it would be easy for him/her to grasp PHP-5's DOM.

The following are the functions of PHP5's DOM I commonly use:
  1. getElementsByTagName
  2. getAttribute
  3. childNodes
  4. nodeName
  5. nodeValue
  6. getElementsByTagNameNS
Here's a Simple XML File called test.xml:
  1. <?xml version="1.0" encoding="ISO-8859-1"?>
  2. <library>
  3. <book isbn="781">
  4. <name>SCJP 1.5</name>
  5. <info><![CDATA[Sun Certified Java Programmer book]]></info>
  6. </book>
  7. <book isbn="194">
  8. <name>jQuery is Awesome!</name>
  9. <info><![CDATA[jQuery Reference Book]]></info>
  10. </book>
  11. </library>

Below I will explain how to read the XML. At first load the file on DomDocument

  1. $dom = new DomDocument();
  2. $dom->load('test.xml');

So, $dom now has the XML file loaded, now using getElementsByTagName I will get the list of elements/nodes called 'book'

  1. $bookElemList = $dom->getElementsByTagName('book');

bookElemList is an object of DomNodeList and it contains List of DomNode of 'book' tags/elements. It has a instance variable 'length' which returns the number of DomNodes (items) in it, and it has a method called item (index), which returns the item based on the index passed on it. Below, I parse through bookElemList and store contents of 'book' in an assoc array. To get access to an Attribute, I use getAttribute method as shown below

  1. $bookList = array();
  2. // run a for loop to iterate through all bookElemList index.
  3. for($i=0;$i<$bookElemList->length;$i++) {
  4. $bookList[$i] = array (
  5. // get Attribute of book Element as store it in book_isbn
  6. 'book_isbn' => $bookElemList->item($i)->getAttribute('isbn'),
  7. // get 'name' element inside bookElemList at $i index.
  8. 'name' => $bookElemList->item($i)->getElementsByTagName('name')->item(0)->nodeValue,
  9. 'info' => $bookElemList->item($i)->getElementsByTagName('info')->item(0)->nodeValue
  10. );
  11.  
  12. }

Instead of getting name and info separately I could have easily used childNodes method to access the elements like below: (Note that below I had to use nodeType to check if the node is Element or not, this is required because Blank spaces on XML is considered as a text node by DOM. If you want to avoid checking nodeType, then remove whitespaces from XML before reading it). Values of NodeType can be viewed at W3C's page

  1. $bookList = array();
  2. for($i=0;$i<$bookElemList->length;$i++) {
  3. $bookList[$i]['book_isbn'] = $bookElemList->item($i)->getAttribute('isbn');
  4.  
  5. foreach($bookElemList->item($i)->childNodes as $eachChild) {
  6. if( $eachChild->nodeType == 1 ) // ensure nodeType is Element
  7. $bookList[$i][$eachChild->nodeName] = $eachChild->nodeValue;
  8. }
  9. }

But, I prefer to manually get the contents, because in most cases, I only need the values/texts of few elements on the XML, so if instead I use childNodes, it means I would be consuming memory for large XML files which has many elements/tags.

Here's a print_r of how $bookList looks like:
  1. (
  2. [0] => Array
  3. (
  4. [book_isbn] => 781
  5. [name] => SCJP 1.5
  6. [info] => Sun Certified Java Programmer book
  7. )
  8.  
  9. [1] => Array
  10. (
  11. [book_isbn] => 194
  12. [name] => jQuery is Awesome!
  13. [info] => jQuery Reference Book
  14. )
  15.  
  16. )

The above was a very simple XML. Now, lets parse an XML a bit complex and which has namespaces.An XML Namespace is used to avoid conflicts on XML Elements/Tags by using a prefix. Brief info on XML Namespaces can be viewed here.

I chose to read reading an XML featured on JWPlayer's setup wizard. It can be viewed here JWPlayer's Rss XML

Here's the XML:

  1. <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
  2. <channel>
  3. <title>Example media RSS playlist for the JW Player</title>
  4. <link>http://www.longtailvideo.com</link>
  5.  
  6. <item>
  7. <title>Big Buck Bunny - FLV Video</title>
  8. <link>http://www.bigbuckbunny.org/</link>
  9.  
  10. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  11. <media:credit role="author">the Peach Open Movie Project</media:credit>
  12. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.flv" type="video/x-flv" duration="33" />
  13. </item>
  14.  
  15. <item>
  16. <title>Big Buck Bunny - MP3 Audio with thumb</title>
  17. <link>http://www.bigbuckbunny.org/</link>
  18.  
  19. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  20. <media:credit role="author">the Peach Open Movie Project</media:credit>
  21. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.mp3" type="audio/mpeg" duration="33" />
  22. <media:thumbnail url="http://www.longtailvideo.com/jw/upload/bunny.jpg" />
  23. </item>
  24.  
  25. <item>
  26. <title>Big Buck Bunny - PNG Image with start</title>
  27.  
  28. <link>http://www.bigbuckbunny.org/</link>
  29. <description>Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.</description>
  30. <media:group>
  31. <media:credit role="author">the Peach Open Movie Project</media:credit>
  32. <media:content url="http://www.longtailvideo.com/jw/upload/bunny.png" type="image/png" duration="20" start="10" />
  33. </media:group>
  34. </item>
  35.  
  36. </channel>
  37. </rss>

Here's the first tag from the File which declares the XML Namespace

  1. <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">

The Namespace is defined on the first line, i.e. xmlns:media (so 'media' is the localname of that Element on this XML File, while its namespace is 'http://search.yahoo.com/mrss/')

To read a node with a namespace, the following method can be used:

  1. $dom->getElementByTagNameNS('namespaceURI', 'local_Name_of_Node');

The code below explains how to read the above XML

  1. // load the file on the DOM
  2. $dom = new DomDocument();
  3. $dom->load('http://www.longtailvideo.com/jw/upload/mrss.xml');
  4.  
  5. $itemList = array();
  6.  
  7. // get the list of Items.
  8. $itemElemList = $dom->getElementsByTagName('item');
  9. for($i=0;$i<$itemElemList->length;$i++) {
  10. $itemList[$i] = array (
  11. 'title' => $itemElemList->item($i)->getElementsByTagName('title')->item(0)->nodeValue,
  12. 'link' => $itemElemList->item($i)->getElementsByTagName('link')->item(0)->nodeValue,
  13. 'description' => $itemElemList->item($i)->getElementsByTagName('description')->item(0)->nodeValue,
  14. 'credit' => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'credit')->item(0)->nodeValue,
  15. 'content_url' => $itemElemList->item($i)->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url'),
  16. );
  17.  
  18. }

Here's a print_r of how itemList looks like:

  1. (
  2. [0] => Array
  3. (
  4. [title] => Big Buck Bunny - FLV Video
  5. [link] => http://www.bigbuckbunny.org/
  6. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  7. [credit] => the Peach Open Movie Project
  8. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.flv
  9. )
  10.  
  11. [1] => Array
  12. (
  13. [title] => Big Buck Bunny - MP3 Audio with thumb
  14. [link] => http://www.bigbuckbunny.org/
  15. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  16. [credit] => the Peach Open Movie Project
  17. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.mp3
  18. )
  19.  
  20. [2] => Array
  21. (
  22. [title] => Big Buck Bunny - PNG Image with start
  23. [link] => http://www.bigbuckbunny.org/
  24. [description] => Big Buck Bunny is a short animated film by the Blender Institute, part of the Blender Foundation. Like the foundation's previous film Elephants Dream, the film is made using free and open source software.
  25. [credit] => the Peach Open Movie Project
  26. [content_url] => http://www.longtailvideo.com/jw/upload/bunny.png
  27. )
  28.  
  29. )

So far I explained reading XML by loading on DomDocument. An important thing to realize is that when an XML is loaded on DomDocument, the entire XML is converted into a DomDocument, thus giving the ability to parse through each Nodes on the XML.

But, if the XML is very large, then loading them via DomDocument is unwise, because it means using a lot of memory (loading entire file on Memory), so, PHP-5 provides a Class: XMLReader. In part 2 of this article, I explain how to use XMLReader.

AttachmentSize
test.xml284 bytes
test.php.txt867 bytes
jwplayer_rss.php.txt906 bytes
Tags
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
XML 文档和数据
dom4j读取网络路径xml
ASP.NET 2.0中XSLT的使用
XML认证教程,第 8 部分: SAX Parser
JSP与XML的结合
Android读写XML(中)——SAX
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服