Friday, October 03, 2008

Get Started With SimpleXML in PHP

If you're starting out in web development, you're probably going to have to transfer data between applications. These days, most do this using XML. XML is a protocol used to share data, especially via public APIs.

Hacking XML code can get pretty messy and unmanageable if you don't know what you're doing. Mercifully, PHP 5 has a library called SimpleXML that lives up to its name. The hard work of turning XML into a usable format for programming is done for you. All that's left is working with the object SimpleXML creates.

In this tutorial, I'll show you how to make sense of the SimpleXMLElement object and the many ways it can be used. Come to the head of the class, because it's time for SimpleXML School.

What you'll need
  • PHP 5, a programming language available on many web hosts.
  • SimpleXML, a library usually installed along with PHP 5.
  • Some knowledge of PHP and XML

Example XML File

You can download the example XML we'll use in this section, or copy the text below into your own file named school.xml. Make sure you store this file in the same directory as your PHP file.
<?xml version="1.0"?>
<school>
<grades>
<grade>
<level>K</level>
<student_count>
<boys>49</boys>
<girls>41</girls>
</student_count>
</grade>
<grade>
<level>1</level>
<student_count>
<boys>29</boys>
<girls>32</girls>
</student_count>
</grade>
<grade>
<level>2</level>
<student_count>
<boys>26</boys>
<girls>31</girls>
</student_count>
</grade>
</grades>
<principal>
<name>Dr. Hamilton</name>
<experience timein="years">
<time>14</time>
</experience>
</principal>
<classrooms />
</school>

Read in XML

The most common way to use XML in your PHP programs is "reading in" someone else's XML. Most APIs output their data as XML. To be able to use these APIs, you need to be able to read in, parse, and make sense of the XML it sends.

Read XML From a File

To read the XML from school.xml, use the simplexml_load_file function:
$xmlobj = simplexml_load_file("school.xml");
print header("Content-type: text/plain");
print_r($xmlobj);

I have also printed out the way the XML is interpretted in PHP: as a SimpleXMLElement object.

SimpleXMLElement Object
(
[grades] => SimpleXMLElement Object
(
[grade] => Array
(
[0] => SimpleXMLElement Object
(
[level] => K
[student_count] => SimpleXMLElement Object
(
[boys] => 49
[girls] => 41
)
)

[1] => SimpleXMLElement Object
(
[level] => 1
[student_count] => SimpleXMLElement Object
(
[boys] => 29
[girls] => 32
)
)

[2] => SimpleXMLElement Object
(
[level] => 2
[student_count] => SimpleXMLElement Object
(
[boys] => 26
[girls] => 31
)
)
)
)

[principal] => SimpleXMLElement Object
(
[name] => Dr. Hamilton
[experience] => SimpleXMLElement Object
(
[@attributes] => Array
(
[timein] => years
)

[time] => 14
)
)
)

Read XML From a String

To create a SimpleXMLElement object from a string of text, use the simplexml_load_string function:
$xmltext = join(file("school.xml"), "");
$xmlobj = simplexml_load_string($xmltext);
print header("Content-type: text/plain");
print_r($xmlobj);

Here I loaded the school.xml text into a string variable ($xmltext), before passing it to the simplexml_load_string function. If this looks like an extra step, it is in this case. But if you received a string from an API, you'd want to be able to use this function.

Now that we have a SimpleXMLElement object, let's talk about how to use it.

Understand the SimpleXMLElement Object

The SimpleXMLElement object is how SimpleXML converts textual XML into a format that PHP can understand. The object is essentially a collection of tag names with the values inside the tag. As is common in XML, sometimes a tag contains other tags. In that case, the value of a tag is actually another SimpleXMLElement object.

Take a look at the strange indented text that my code above has produced. It begins with a SimpleXMLElement object. The next thing is the <grades> tag, which is itself another SimpleXMLElement object. It contains a <grade> tag, which has an array of—you guessed it—SimpleXMLElement objects.

There's not much use to printing out the entire object, other than to understand it. So, now let's try accessing some individual pieces of data.

Want to print out the name of the principal? Try this:
print $xmlobj->principal->name;

Which produces the name of the principal from the XML file:
Dr. Hamilton

How about the number of boys in Kindergarten? That gets a little more complicated:
print $xmlobj->grades->grade[0]->student_count->boys;

As expected, it outputs the number of boys from the "zeroth" grade element:
49

Luckily, SimpleXML has another even easier way to access pieces of the results. It's called XPath, and you can find out more in the next section.

Use XPath for Even Simpler XML Access

XPath is a web standard, just like XML. XPath is a way to query XML for particular elements and there's a function within SimpleXML that makes XPath a cinch.

All you need to do is call the xpath function on the SimpleXMLElement object with the special "path" you want to find. For example, to get the principal tag and its children, just use this code:
print_r($xmlobj->xpath("/school/principal"));

Notice this example describes the entire path to the <principal>tag, including the root (<school>). Each level is preceded by a slash, including the first one.

Output:
Array
(
[0] => SimpleXMLElement Object
(
[name] => Dr. Hamilton
[experience] => SimpleXMLElement Object
(
[@attributes] => Array
(
[timein] => years
)

[time] => 14
)
)
)

You can also search for all instances of a specific tag, regardless of its place in the hierarchy. The same example above is now shorter:
print_r($xmlobj->xpath("//principal"));

Notice the double-slash at the beginning of the XPath call this time? It tells XPath to look for any <principal> tag. The output is the same as the previous example.

The same double-slash syntax can be used to find multiple results. Here, we'll retrieve an array of all the grade levels in the school:

Output:
Array
(
[0] => SimpleXMLElement Object
(
[0] => K
)

[1] => SimpleXMLElement Object
(
[0] => 1
)

[2] => SimpleXMLElement Object
(
[0] => 2
)
)

You can even call up certain tags based on their content. Here we'll grab any tag that contains a <level> tag with a K (for Kindergarten). Notice the asterisk, which tells XPath to match any tag:
print_r($xmlobj->xpath("//*[level='K']"));

Output:
Array
(
[0] => SimpleXMLElement Object
(
[level] => K
[student_count] => SimpleXMLElement Object
(
[boys] => 49
[girls] => 41
)
)
)

In addition to checking for a specific value, XPath can also look for numbers that are greater than or less than. Here, we get all <student_count> tags with more than 31 girls:
print_r($xmlobj->xpath("//student_count[girls>31]"));

Output:
Array
(
[0] => SimpleXMLElement Object
(
[boys] => 49
[girls] => 41
)

[1] => SimpleXMLElement Object
(
[boys] => 29
[girls] => 32
)
)

Here, it also grabs the <boys> tags, both of which are less than 31. That is because we're only checking against the <girls> tag, but then grabbing its parent, which is also the parent of the boys.

We can include the boys in the count to check the overall count by using XPath's addition operator:
print_r($xmlobj->xpath("//student_count[boys+girls>60]"));

Output:
Array
(
[0] => SimpleXMLElement Object
(
[boys] => 49
[girls] => 41
)

[1] => SimpleXMLElement Object
(
[boys] => 29
[girls] => 32
)
)

Again, both boys and girls are returned, but here they only return if the total of the two counts is above 60.

These many examples are just a small sampling of what XPath can do. Hopefully it helps you find out how to read in and query XML for the data you need. In the next section, we'll look at writing your own XML using SimpleXML.

Write out XML

As you've seen in the previous section, the SimpleXMLElement object is central to using SimpleXML. This object holds the structure of the XML in a way that makes it easy for PHP to access it. However, sometimes we want to output it back as raw XML. For example, if you're creating an API, you would probably be sending XML out as output.

Output SimpleXMLElement Object as XML

If you already have a SimpleXMLElement object, writing out the XML is as easy as calling the correct function: asXML.

Assuming you have a SimpleXMLElement object called $xmlobj, here's the code you need to print out as XML:
print $xmlobj->asXML();

The print command does the printing, because asXML just returns the content. If you don't want to immediately output the XML, you can store the content into a variable:
$xmltext = $xmlobj->asXML();

Now that you know how to print out the XML, let's create a SimpleXMLElement object from scratch.

Create SimpleXMLElement from Scratch

When you want to create XML, using the SimpleXMLElement object can help avoid errors in syntax, so you know your XML is as good as your object.

Before we create a SimpleXMLElement from scratch programatically, let's look at the XML we want to achieve:
<?xml version="1.0"?>
<classroom>
<teacher>Mr. Deckelmann</teacher>
<students>
<student>Sammy</student>
<student gender="F">Daisy</student>
</students>
</classroom>

Start with XML and Root Tag

To create an empty SimpleXMLElement object, you can pass an XML structure essentially void of content. Here we'll give it just an <xml> tag, then create a root tag. In this case, I've chosen <classroom> as my root element, under which I'll add all other XML tags.

Here's the code to give us an empty SimpleXMLElement:
$xmltext = "\n<classroom></classroom>";
$xmlobj = simplexml_load_string($xmltext);

The first line holds the blank XML. The second line uses that XML to create a SimpleXMLElement object.

Add Elements to SimpleXMLElement

The SimpleXMLElement object doesn't do us much good unless we start adding elements to it. Glancing up at the XML we're hoping to achieve, I see we need a <teacher> tag next. To add this, we need to call the addChild function on our SimpleXMLElement object. This function takes two arguments: the name of the tag and the value. Here's the code:
$xmlobj->addChild("teacher", "Mr. Deckelmann");

print header("Content-type: text/plain") . $xmlobj->asXML();

The $xmlobj variable from the previous section holds the empty SimpleXMLElement. To call a function on an object, we use the -> operator, followed by the name of the function.

The asXML line might look familiar from the above. It prints out the XML code from the SimpleXMLElement object. In future sections, you'll need to put new code above that line to avoid printing before all your XML is in place.

Save your PHP file and load it up in your browser. You should now see our XML. Blown away? Probably not, but let's add some more elements and see if we can change that.

Add Sub-Elements to SimpleXMLElement

Now that we've added a normal element, let's get more advanced and create an element that contains other elements. To do this, we need to create a new element, then get access to the new object.

Luckily, the addChild function we used above returns the new object that it creates. Here is the code to create a sub-element:
$studentsobj = $xmlobj->addChild("students");
$studentsobj->addChild("student", "Sammy");

(Remember to place the code above the asXML line).

The first line creates an empty <students> tag. In addition to creating the tag, it also sends the output of the addChild function to the $studentsobj variable. The output is a new SimpleXMLElement object holding just the new tag.

Calling the addChild function on the new object, as we did in the second line, creates a <student> tag. This second line is similar to the <teacher> tag we added, but here is will go inside the <students> tag instead of under the root tag.

Reload the file and see for yourself. We're now most of the way toward re-creating some XML by using SimpleXML. Just one tiny step remaining.

Add Attributes to SimpleXMLElements

Attributes go inside XML tags. They often hold meta-data, which supports the main data, but is not as important. Sometimes the value is extremely important, such as the anchor tag in HTML, which stores the URL inside the href attribute.

Here are are going to add a new student named Daisy and set a gender attribute of "F" for her. Here's the code:
$daisyobj = $studentsobj->addChild("student", "Daisy");
$daisyobj->addAttribute("gender", "F");

(Remember to place that code above the asXML line).

The first line may look familiar. As with adding the first child (Sammy) above, we need to call the addChild function on the $studentsobj variable. The difference is that we set the output of that call to yet another SimpleXMLElement object. This one holds only the data for the most recent tag, Daisy's <student> tag.

We use the newest object to add an attribute for Daisy. The name of the attribute is "gender" and the value is "F." So, when you reload the PHP file again, you should now see the complete XML that we were trying to emulate.

I hope you're at least a little bit blown away now, because you just created an entire XML file programmatically.

Where to Use SimpleXML

Now that you know how to read and write XML with SimpleXML, you're probably looking for ways to use it. As I've mentioned above, APIs often output XML, so that's a good place to start. Below we have some tutorials that use SimpleXML to parse real live XML from APIs.


Credits: Webmonkey

No comments: