How-To: Fix SimpleXML CDATA problem in php

If you’ve used the SimpleXML functions in PHP, you may have noticed some strange things happening with CDATA values in your XML file/string. All I needed to do was extract the value of my CDATA fields, however these were always coming back blank in the structure that simplexml_load_file returns.

Finally, after hours of trawling google, I’ve come up with the following solution:

$xml = simplexml_load_file($this->filename,
'SimpleXMLElement', LIBXML_NOCDATA);

Use this line of code when you are loading the XML file into the SimpleXML Object. Β The key is the LIBXML_NOCDATA option as the third parameter. This returns the XML object with all the CDATA data converted into strings. You can read about this in the php manual.

This solved all the problems I was having getting CDATA values out of SimpleXML in php. Hope it helps someone.

40 Replies to “How-To: Fix SimpleXML CDATA problem in php”

  1. How about assigning an inner text to a node where the new text contains CDATA? It is transforming the entities:

    $xml->Image[$index]->Comment = “![CDATA[” . html_entity_decode($_POST[‘imgcaption’]) . “]]”;

    This example code is to add an HTML caption to an image definition in an XML document. The entities are still htmlencoded in the output XML.

    Thanks!

  2. I’m having the same problem but do not use simplexml. I have PHP4 installed on godaddy and need to solve the problem.
    My description string is a CDATA and changes each time I refresh the brower.

    class xItem {
    var $xAddress;
    var $xLink;
    var $xDescription;
    var $xPrice;
    var $xCity;
    var $xPicLinks;
    }

    // Array to convert XML entities back to plain text.
    $XmlEntities = array(
    ‘&’ => ‘&’,
    ‘<‘ => ‘ ‘>’,
    ‘'’ => ‘\”,
    ‘"’ => ‘”‘,
    );

    // general vars
    $sTitle = “”;
    $sLink = “”;
    $sDescription = “”;
    $arItems = array();
    $itemCount = 0;

    // ********* Start User-Defined Vars ************
    // rss url goes here
    $uFile = “http://mountsnowpalmiter.idxco.com/idx/feeds/3036/advancedFeed.xml”;
    // descriptions (true or false) goes here
    $bDesc = true;
    // ********* End User-Defined Vars **************

    function startElement($parser, $name, $attrs) {
    global $curTag;

    $curTag .= “^$name”;
    }

    function endElement($parser, $name) {
    global $curTag;

    $caret_pos = strrpos($curTag,’^’);

    $curTag = substr($curTag,0,$caret_pos);

    }

    function characterData($parser, $data) { global $curTag; // get the Channel information first
    global $sTitle, $sLink, $sDescription;
    $titleKey = “^FEATURED^BROKER_INFO^TITLE”;
    $linkKey = “^FEATURED^BROKER_INFO^LINK”;
    $descKey = “^FEATURED^BROKER_INFO^DESCRIPTION”;
    if ($curTag == $titleKey) {
    $sTitle = $data;
    }
    elseif ($curTag == $linkKey) {
    $sLink = $data;
    }
    elseif ($curTag == $descKey) {
    $sDescription = $data;
    }

    // now get the items
    global $arItems, $itemCount;
    $itemLinkKey = “^FEATURED^LISTING^LINK”;
    $itemDescKey = “^FEATURED^LISTING^DESCRIPTION”;
    $itemPriceKey = “^FEATURED^LISTING^PRICE”;
    $itemAddressKey = “^FEATURED^LISTING^STREET-ADDRESS”;
    $itemCityKey = “^FEATURED^LISTING^CITY-NAME”;
    $itemPicLinksKey = “^FEATURED^LISTING^PICTURES”;

    if ($curTag == $itemLinkKey) {
    // make new xItem
    $arItems[$itemCount] = new xItem();

    // set new item objects properties
    $arItems[$itemCount]->xLink = $data;
    }
    elseif ($curTag == $itemDescKey) {
    $arItems[$itemCount]->xDescription = $data;
    }
    elseif ($curTag == $itemPriceKey) {
    $arItems[$itemCount]->xPrice = $data;
    }
    elseif ($curTag == $itemAddressKey) {
    $arItems[$itemCount]->xAddress = $data;
    }
    elseif ($curTag == $itemCityKey) {
    $arItems[$itemCount]->xCity = $data;
    }
    elseif ($curTag == $itemPicLinksKey) {
    $arItems[$itemCount]->xPicLinks = $data;
    // increment item counter
    $itemCount++;
    }
    }

    // main loop
    $xml_parser = xml_parser_create();
    xml_set_element_handler($xml_parser, “startElement”, “endElement”);
    xml_set_character_data_handler($xml_parser, “characterData”);
    if (!($fp = fopen($uFile,”r”))) {
    die (“could not open featured for input”);
    }
    while ($data = fread($fp, 4096)) {
    if (!xml_parse($xml_parser, $data, feof($fp))) {
    die(sprintf(“RSS error: %s at line %d”, xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser)));
    }
    }
    xml_parser_free($xml_parser);

    // write out the items
    ?>

  3. Thanks for saving me god knows how many hours of trawling. I was expecting it to be very painful.

    Cheers!!!!!

  4. This might come late (years after post date), but; THANK YOU! Saved my day. πŸ™‚

  5. I realize this is a relatively old post (2008), but it just goes to show to power of google.

    I was handling it like this:
    $title = (string)$xml->title[0];

    With this tweak, I can now do:
    $title = (string)$xml->title;

    Although the processing speed doesn’t really change, the code is cleaner. I was actually under the impression that my DTD was mistaken (as it didn’t show as multiple titles in the xml). All along it wasn’t a possibility of multiple titles after all, it was just how the CDATA was being processed.

    Thanks for the tip!

  6. Actually, not sure if the problem has been fixed, but I’ve discovered that there is a far better way.. Cast to a string.. ie, (string)$xml->element or (string)$xml->{‘element’}

  7. Thank you so much! This solved indeed this unexpected behaviour of PHP and saved my time!

    Thank you

  8. Top tip. I found this quickly to a php json encode decode simplexml cdata search and saved me what could have been a long time getting the casting to assoc array to have all the data.

Leave a Reply