Overview
Extensible Numerical File Format, abbreviated XNF, describes a public extensible format for storing numerical data.
An XNF file consists in one or several raw data files, with possibly various formats, and some metadata to describe the contents of the data files, stored in a specific XML file.
On a Mac, an XNF file is a bundle, a special kind of folder which looks like a file (and that Finder names a "package"). Double-clicking an XNF file opens a user interface in Smile to browse, view, extract, and plot the data that it contains. On other platforms, an XNF file shows like a regular directory containing the data files and the XML metadata file.
SmileLab 3.2 provides commands to write and read those structures to and from XNF files.
Goals
There are already data file formats to store an arbitrary amount of data sets, while storing also the metadata required to retrieve the data, axis information etc.
Unfortunately, the metadata are usually stored in such a format as to require a specific library to read them. An average scientist not in the field, not familiar with that particular format and not equipped with a specific software, will be clueless when having to extract the metadata - thus the data, too. For instance, it would not make sense to try to work with HDF files or with FITS files without the adapted library.
Whence the option of storing metadata as XML, in a neatly separated fashion. There already exist data file formats with an XML table of contents. Yet, our working group could not identify such a format which would be sufficiently general and simple to be used by scientists. The existing formats imply a complex XML - often specific to a given professional field or application - which finally requires specific libraries as well.
XNF was designed with the following requirements in mind.
-
XNF should accept the files formats that scientists routinely generate in programs (FORTRAN or C binary files),
-
generating the related metadata should be fairly easy, even without a specific software, and even without an XML library,
-
the metadata should be reasonably legible for a user not equipped with a specific software,
Anatomy of an XNF file
The bundle (or, on other platforms, the directory) contains two items at the root level: a folder named Contents, which stores the data files, and the index.xml XML file, describing the contents of the files. The metadata in the XML file can be parsed and used by a specific software to provide any wanted interface to the numerical data.
The syntax for the index.xml file is defined in the DTD for the XNF format. The DTD is quite basic, here is a summary of its specification.
The root of a XNF document is a tableofcontents. The root can contain any number of dataset.
The dataset element
A dataset element has an xml:id unique attribute to access it, and a dimension atttribute specifying the number of dimensions of the dataset.
<dataset xml:id='pressure' dimension="3">...</dataset>
A dataset element must contain as many axis elements as its dimensions count, then it can contains any number of data elements.
The axis elements define the data length for each dimension, and the data give a reference to the data.
The data element
Generally, the data element is a reference to some data contained in a binary file. The binary file is specified by the href attribute.
- The href attribute must contains an absolute url, or a relative url that is relative to the dataset base (xml:base). By default, this base is "Contents/".
- The offset attribute give the position in bytes of the beginning of the data in the file.
- The type attribute specify the format of the data to extract. That attribute must be one of the followinf values: real32 | real64| uint8 | uint16 | uint32 | sint8 | sint16 | sint32 | complex64 | complex32.
- The byte_order attribute (big or little) must specify if the data is written in little endian (Intel) or in big endian (PPC).
<data href="file1" offset="64" type="real32" byte_order="little"/>
Alternatively, the data element may explicitly contain the data as its textual content: numbers written as ASCII, separated by the space character, for instance <data>1.04 0.93103 etc. </data>.
The axis element
The axis element must contain a size attribute defining the number of values in the specific dimensions of this axis.
Axis scale can optionaly be defined from diffrent manners:
-
by defining a linear scale by using the start (first value) and step (increment) attributes,
<axis size='100' start='0' step='0.1'/>
-
by inserting a data element in the axis element,
-
by refering to a dataset of 1 dimension in a idref attribute.
Using an XNF in SmileLab
SmileLab 3.2 opens XNF files. When you double-click an XNF file with SmileLab installed, or when you drop the XNF file on Smile's icon, Smile opens a user interface. In the interface you can browse the data contained in the XNF file and get information about each atom, and you can plot the data. (The interface adapts to the type of the data that the user selects.)
SmileLab provides commands to handle XNF files. Those are high-level commands, in that you do not have to care about the exact format of the file containing the data: to work with an array you have stored previously, you just refer to it with the identifier you have defined when storing the data. The commands to read and write XNF files are in the Dictionary of XMLLib.osax.
To automate processing data, you may have to do a specific search in the metadata. For this, you need XML software. SmileLab provides XML commands, in an AppleScript implementation of libxml2, the XML C parser and toolkit developed for the Gnome project. Most often, you have to use XPath, which is implemented as the XMLXPath command.
Handling XNF files with C/C++
The design of the XNF format is simple. It is easy to fill a new entry manually, and later to simply view the index file in order to get the metadata required to extract the data: this is a first basic step of use.
For a programmatic use, you can implement on any platform an interface to XNF like SmileLab does on a Mac, using an XML library and calling basic binary data i/o commands. A library in C, based on libxml2, implements the main functions required to handle XNF files. The library is available in the download section.
Tips for filling the index.xml file
-
If you want to be able to check the conformance to the XNF DTD, you have to declare the DTD in the header of the file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE tableofcontents SYSTEM "http://www.satimage.fr/software/dtds/XNFv2.dtd">
-
When you edit index.xml in Smile, pressing ^⌘R checks the XML syntax, and pressing ⌥⌘R checks the conformance to the DTD. Then, use the information in the XNF 2.0 DTD to fill index.xml.
-
Smile's folder contains some samples of .xnf files: use the examples provided to get started. Click here to view another example of an index.xml.
|