About the tutorial
|
|
This short tutorial presents with simple examples of use the main commands of the XML Suite, performed on a real file. We shall use the periodic chart of the chemical elements, a list of all the known atoms with their main physical properties. The various XML files are viewable in your browser.
You can simply read the tutorial below, and at any moment you can copy the scripts to your machine and test them for good. To import the scripts, click the link below.
|
-- step 1: opening the XML document
|
Back to top
|
Previous
|
Next
|
For the examples in this page and the following we shall use the periodic chart of the chemical elements. To view the XML file (in a new browser window), click here: allelements.txt.
At this step all we can do is open the document and get a reference to the root node. Note that the XMLOpen instruction may require a few seconds since it has to read a remote file of 112 KB, and then build the internal tables which will ensure fast responsivity to your requests.
set the_URL to "http://www.satimage.fr/software/samples/allelements.xml"
set the_doc to XMLOpen the_URL
-- «datan XMLR0000000100000000»
set the_root to XMLRoot the_doc
-- «datan XMLR0000000104739D80»
The rest of the tutorial uses the variables defined above.
If you have viewed the XML file, maybe you noticed that it does not declare a DTD: indeed, declaring a DTD is optional. You can have confirmation by running:
XMLValidate the_doc
which will display an error alert.
|
-- step 2: browsing the XML tree
|
Back to top
|
Previous
|
Next
|
Let us rapidly explore a few elements in the tree.
XMLCount the_root
-- 112
-- the table contains 112 atoms
set the_child to XMLChild the_root index 1
-- «datan XMLR0000000104739DC0»
-- the_child is an opaque reference
-- to know what kind of thing it is we use XMLNodeInfo
XMLNodeInfo the_child
-- {kind:"ELEMENT_NODE", name:"ATOM"}
-- the atom itself has children:
set the_child_2 to XMLChild the_child index 1
XMLNodeInfo the_child_2
-- {kind:"ELEMENT_NODE", name:"NAME"}
-- the 1st atom's 1st child is the atom's name
XMLGetText the_child_2
-- "Actinium"
set the_child_2 to XMLNextSibling the_child_2
XMLNodeInfo the_child_2
-- {kind:"ELEMENT_NODE", name:"ATOMIC_WEIGHT"}
-- the 1st atom's 2nd child is the atomic weight
XMLGetText the_child_2
-- "227"
-- Actinium is a heavy atom!
|
-- step 3: searching Hydrogen in a tree
|
Back to top
|
Previous
|
Next
|
Here we suppose we would like to retrieve the atomic weight of Hydrogen.
Since there is no DTD for our XML file, we have to view the file in order to know how the atomic weight is stored: we can either view it directly (click here) since it is just a text file, or we can use XMLDisplayXML:
XMLDisplayXML the_child
-- Result:
<ATOM>
<NAME>Actinium</NAME>
<ATOMIC_WEIGHT>227</ATOMIC_WEIGHT>
<ATOMIC_NUMBER>89</ATOMIC_NUMBER>
etc.
</ATOM>
As we see ATOM does not include attributes: you could expect that each atom have a NAME attribute, but such is not the case (the ATOM element has a NAME element, instead). Thus we have to use a XPATH to retrieve the Hydrogen. (Otherwise, using XMLFind would have been simpler.) The XPATH below requests an element of the ATOM kind, which has a NAME element whose value is "Hydrogen": ATOM[NAME:="Hydrogen"]. Then we shall use XLMFind to read the atomic weight.
set {the_atom} to XMLXpath the_root with "ATOM[NAME=\"Hydrogen\"]"
set the_weight to XMLFind the_atom name "ATOMIC_WEIGHT"
XMLGetText the_weight
-- "1.00794"
We could have a slightly more complicated XPATH request directly the ATOMIC_WEIGHT of an ATOM (ATOM/ATOMIC_WEIGHT) by specifying a condition on its parent (../): ATOM/ATOMIC_WEIGHT[../NAME="Hydrogen"].
set the_weight to XMLXpath the_root with "ATOM/ATOMIC_WEIGHT[../NAME=\"Hydrogen\"]"
XMLGetText the_weight
-- "1.00794"
The Actinium did not have any attribute, but the Hydrogen has one:
XMLNodeInfo the_atom
-- {kind:"ELEMENT_NODE", name:"ATOM", attribute:{"STATE", "GAS"}}
We can use XMLFind to get the list of all the gases in the table (exactly, all the atoms with a "STATE" attribute set to "GAS").
set the_gases to XMLFind the_root key "STATE" value "GAS" with all occurrences
-- {«datan XMLR0000000102BDA210», «datan XMLR00000001058360D0», «datan XMLR0000000105837530», «datan XMLR0000000105880080»}
repeat with the_gas in the_gases
msg(XMLGetText (XMLChild the_gas index 1))
end repeat
-- Result:
Argon
Hydrogen
Helium
Xenon
|
-- step 4: displaying nodes
|
Back to top
|
Previous
|
Next
|
We already used XMLNodeInfo, XMLDisplayXML and XMLGetText in the previous pages. Let us insist on the difference between XMLNodeInfo and XMLDisplayXML: XMLNodeInfo displays information on the node, as a record where you can find the name of the node (its tag), its kind and its attributes if it has any. XMLDisplayXML displays as text the whole node.
XMLNodeInfo the_atom
-- {kind:"ELEMENT_NODE", name:"ATOM", attribute:{"STATE", "GAS"}}
XMLDisplayXML the_atom
-- Result:
<ATOM STATE=\"GAS\">
<NAME>Hydrogen</NAME>
<ATOMIC_WEIGHT>1.00794</ATOMIC_WEIGHT>
<ATOMIC_NUMBER>1</ATOMIC_NUMBER>
etc.
</ATOM>
XMLGetText is intended for an elementary node:
XMLGetText the_weight
-- "1.00794"
You can use it as well on a complex node, but maybe you get an unreadable output:
XMLGetText the_atom
-- Result:
Hydrogen1.007941120.2813.81H
0.0899
1s1 0.322.12.08
etc.
XMLGetText (as well as XMLNodeInfo and XMLDisplayXML) supports lists. Suppose we want the list of the atomic weights of all the atoms, we would first get the list of all the atomic weight nodes:
set the_nodes to XMLXpath the_root with "ATOM/ATOMIC_WEIGHT"
-- {«datan XMLR000000010471D940», «datan XMLR0000000104746810», «datan XMLR0000000104726350», etc.}
then we would get the values of the nodes as text with XMLGetText:
set the_weights to XMLGetText (the_nodes)
-- {"227", "26.98154", "243", "121.757", etc.}
|
-- step 5: creating a new element, deleting another
|
Back to top
|
Previous
|
Next
|
Now we are ready to create a new element! Suppose you discover an unknown atom, and you name it Smilium. The most urgent thing would be to find a symbol for it. Let us build the list of the symbols used so far.
set the_symbols to XMLGetText (XMLXpath the_root with "ATOM/SYMBOL")
-- {"Ac", "Al", "Am",[...],"Sc", "Se", "Sg", "Si", "Sm", "Sr", "Ta",[...]}
"Sm" is already used, let us choose Sl.
Since the list is (approximately) in alphabetic order, it will be a good idea to insert Smilium after "Si" (the Silicon).
set {the_si} to XMLXpath the_root with "ATOM[SYMBOL=\"Si\"]"
XMLGetText (XMLChild the_si index 1)
-- "Silicon"
Now we know where to insert the new element, we need find out what to insert!
The minimum we want is certainly the name and the symbol. Let us build the data "manually":
set the_data to "<ATOM>
<NAME>Smilium</NAME>
<SYMBOL>Sl</SYMBOL>"
set the_smilium to XMLNewSibling the_data after the_si
Ouch! The script triggers an error! Indeed, our new element is not well formed, the ATOM tag is not closed.
set the_data to "<ATOM>
<NAME>Smilium</NAME>
<SYMBOL>Sl</SYMBOL>
</ATOM>"
set the_smilium to XMLNewSibling the_data after the_si
-- «datan XMLR00000001032708E0»
Now let us fill more information in.
A new element should be bigger than all the previous ones. Let us build a small routine to compute the maximum value (so far) of an arbitrary numerical element. Begin given the name of an element, XMLGetText returns a list of the values as text. To get the maximum of those values we have to make the list into a list of numbers. There is no standard coercion from a list of strings to a list of numbers, however the scientific facets of Smile will help here. Applying addlist 0 to the list of strings will make it into a list of numbers. Then we can use statlist to get the stats of the list.
on GetMaxValue(the_root, the_element)
set the_xpath to "ATOM/" & the_element
set the_nodes to XMLXpath the_root with the_xpath
set the_values to XMLGetText the_nodes
set the_numbers to addlist 0 with the_values
return maximum of (statlist the_numbers)
end GetMaxValue
GetMaxValue(the_root, "ATOMIC_NUMBER")
-- 112.0
GetMaxValue(the_root, "ATOMIC_WEIGHT")
-- 277.0
GetMaxValue(the_root, "ATOMIC_RADIUS")
-- 2.700000047684
Now we can enter more information about Smilium without fear:
XMLNewChild "<ATOMIC_NUMBER>144</ATOMIC_NUMBER>" at the_smilium
XMLNewChild "<ATOMIC_WEIGHT>321</ATOMIC_WEIGHT>" at the_smilium
XMLNewChild "<ATOMIC_RADIUS>3.14</ATOMIC_RADIUS>" at the_smilium
... and finally display our element:
XMLDisplayXML the_smilium
-- Result:
<ATOM>
<NAME>Smilium</NAME>
<SYMBOL>Sl</SYMBOL>
<ATOMIC_NUMBER>144</ATOMIC_NUMBER>
<ATOMIC_WEIGHT>321</ATOMIC_WEIGHT>
<ATOMIC_RADIUS>3.14</ATOMIC_RADIUS>
</ATOM>"
As an exercise, let us remove the entry for Silicon.
XMLRemove the_si
set the_si to 0
It is a good practice to reset the value of a variable containing a XLRef after you have deleted the entity it refers to, in order to be sure that your program will not use an invalid reference. Using such an invalid reference may crash XMLLib and Smile.
We can now save the new table as a local file (on the desktop).
set the_path to ((path to desktop) as text) & "future_elements.xml"
XMLSave the_doc in file the_path
set the_file to the_path as alias
|
-- step 6: handling local and remote XML's
|
Back to top
|
Previous
|
Next
|
In the previous steps of the tutorial, we opened the URL known as the_URL and Smile returned a XMLRef that we stored as the_doc. Then we added an element to the_root (the root node of the_doc) and we saved the_doc as a local file, the_path. However, the XML document stored at the_path is not open, only the_URL is open. You can have confirmation, run:
XMLDocument the_file
The script triggers an error: there is no valid XMLRef referring to that document, while the following returns the reference stored in the_doc:
XMLDocument the_URL
-- «datan XMLR0000000100000000»
It may be convenient when you change a database to work on a local copy. To do so, let us close the connection to the_URL, and open the local copy instead.
XMLClose the_doc
set the_doc to XMLOpen the_file
-- «datan XMLR0000000200000000»
We can confirm:
XMLURL the_doc
-- alias "Macintosh HD:Users:<login>:Desktop:future_elements.xml"
Now we can work locally. To save some changes, write:
XMLSave the_doc
|
-- step 7: displaying the periodic table in a web page
|
Back to top
|
Previous
|
Next
|
Here we shall use style sheets (the XSLT standard) to make our XML document into an html document suitable for publishing on the web.
For a simple example, we shall build a list, displaying each element in a summarized form:
- Ac
- Actinium, Z=89
- Al
- Aluminum, Z=13
- Am
- Americium, Z=95
- etc.
The corresponding xsl file is fairly simple. To view the xsl file (in a new browser window), click here: elements_xsl.html.
Let us generate the html file on the desktop and open it in Safari.
The style sheet file is itself an XML file, let us open it.
set the_saso_URL to "http://www.satimage.fr/software/samples/elements.xsl"
set the_saso to XMLOpen the_saso_URL
Now let us apply the style sheet.
set the_html to XMLTransform the_doc with the_saso
the_html is a XMLRef, we must use XMLSave to make it into a file. Note that we could alternately use XMLDisplayXML to generate the text representation of the_html, print it into a Unicode window and save it to disk by the usual means.
set the_html_path to ((path to desktop) as text) & "elements.html"
XMLSave the_html in file the_html_path
Let us now check how the result will display.
set the_html_file to the_html_path as alias
tell application "Safari" to open the_html_file
To view the resulting html's source (in a new browser window) click here: elements_html.html.
To view the resulting html as a web page (in a new browser window) click here: elements.html.
|
-- step 8: releasing memory
|
Back to top
|
Previous
|
Next
|
XML trees may use a lot of memory. It is better to close those you do not use.
XMLClose the_doc
XMLClose the_saso
XMLClose the_html
|