axl main logo
 All Data Structures Functions Variables Typedefs Enumerations Enumerator Modules Pages
XML development with Axl Library

Index

On this manual you'll find the following section:

Section 1: Basic elements to understand XML and Axl

Section 2: Manipulating and producing XML documents

Section 3: Doing validation on your documents

Section 4: Advanced topics

Apendix

Introduction

XML 1.0 definition allows to build documents that could be used to represents textual information, remote procedure invocations or dynamic user interfaces. Its definition is based on very simple principles, that allows to developers to compose them to create bigger abstractions that are roughly on every place in modern computer software design.

It is a "quite" human readable format, so you will find that is not the best format if you are looking for space efficiency. What XML 1.0 provides you on the other hand is the ability to quickly prototype and produce working formats that encapsulate your data, and, as your system evolves, XML 1.0 will do it with you.

Among other things, XML 1.0 provides you ways to validate your documents to ensure your code will read XML documents in the format expected, reducing the time and development cost due to additional checkings required.

Before continuing, we will explain some concepts that are required to understand XML 1.0 and why the Axl API was built this way.

Some concepts before starting to use Axl Library

Here is a simple example of a XML 1.0 document:

<?xml version="1.0">
<!-- This is a comment -->
<complex>
<data>
<simple>10</simple>
<empty attr1="value1" />
</data>
</complex>

Previous XML document represents an structure with a top level node, called complex, that has one single child called data which in turn have two childs. The first one is the child called simple that have content and other one, called empty, which is a node usually called an empty xml node.

The XML representation for previous document is the following:

image01.png
Document representation

Several issues must be considered while interpreting previous diagram and how Axl library parse and expose those elements through the API to the client application:

MIXED and CHILDREN API: How to use them

XML 1.0 is used for a variety of purposes, some of them requires the CHILDREN API and the rest the MIXED API. To require, we mean that it fits better, so you will get better results, your application will react in a proper manner and you'll have to do less work.

The reason for this API is simple. XML 1.0 definition allows to mix content with more nodes, comments and many more elements to be placed as childs for a particular node.

This definition, found at the standard, have moved many XML implementations to support only an API that support all these features, that is, an interface that is complicated and overloaded, that gives you a power that you don't require, making your development more inefficient.

As a result, when a developer only requires a usual form of xml, called CHILDREN, that means nodes have only another childs nodes or content but not both at the same time. This kind of xml is really useful, easy to parse, easy to make a DTD definition, more compact and extensible.

Lets see an example for both formats to clarify:

<?xml version='1.0' ?>
<document>
<!-- Children XML format example: as you can see -->
<!-- nodes only contains either nodes or node content -->
<!-- but nothing mixed at the same level -->
<node1>
This is node1 content
</node1>
<node2>
<node3>
This is node3 content
</node3>
<node4 />
</node2>
</document>

While an MIXED xml document could be:

<?xml version='1.0' ?>
<document>
<!-- Children XML format example: as you can see -->
<!-- nodes only contains either nodes or node content -->
<!-- but nothing mixed at the same level -->
<node1>
This is node1 content
</node1>
Content mixed with xml nodes at the same level.
<node2>
More content....
<node3>
This is node3 content
</node3>
<node4 />
</node2>
</document>

Both approaches, which are valid using the XML 1.0 standard, are appropriate for particular situations:

Having introduced the context of the problem, Axl Library takes no position, providing an API that fits while developing xml content that follows a CHILDREN description and an API for the MIXED description.

In this context, which API you use, will only affect to the way you traverse the document. The CHILDREN API is mainly provided by the Axl Node interface and the MIXED API is mainly provided by the Axl Item interface.

You don't need to do any especial operation to activate both APIs, both are provided at the same time. Lets see an example:

Supposing the previous mixed example, the following code will get access to the <node2> reference:

// supposing "doc" reference contains the document loaded
axlNode * node;
// get the document root, that is <document>
node = axl_doc_get_root (doc);
// get the first child for the document root (<node1>)
// get the next child (brother of <node1>, that is <node2>)
node = axl_node_get_next (node);

However, with the MIXED API you can get every detail, every item found for a particular node. This is how:

// supposing "doc" reference contains the document loaded
axlNode * node;
axlItem * item;
// get the document root, that is <document>
node = axl_doc_get_root (doc);
// get the first item child for the document root that is the comment:
// "Children XML format example: as you can see".
// now skip the following two comments
item = axl_item_get_next (item);
item = axl_item_get_next (item);
// now the next item is holding the <node1>
item = axl_item_get_next (item);
node = axl_item_get_data (item);
// now get the content between the <node1> and <node2>
item = axl_item_get_next (item);
// and finally, get the next child (brother of <node1>, that is
// <node2>)
item = axl_item_get_next (item);
node = axl_item_get_data (item);

Obviously, the mixed example contains more code and it is more fragile to xml document changes. The problem is that the MIXED API is more general than the CHILDREN, making XML libraries to only provide that API.

As a consequence:

Parsing XML documents

We have seen how an XML document is. Now we are going to see how to parse those document into data structures that are usable to inspect the content. All parsing functions are available at the Axl Doc interface.

Let's start with a very simple example:

#include <axl.h>
#include <stdio.h>
int main (int argc, char ** argv)
{
axlError ** error;
// top level definitions
axlDoc * doc = NULL;
// initialize axl library
if (! axl_init ()) {
printf ("Unable to initialize Axl library\n");
return -1;
}
// get current doc reference
doc = axl_doc_parse_from_file ("large.xml", error);
if (doc == NULL) {
axl_error_free (error);
return axl_false;
}
// DO SOME WORK WITH THE DOCUMENT HERE
// release the document
axl_doc_free (doc);
// cleanup axl library
axl_end ();
return axl_true;
}

Traveling an XML document

Once the document is loaded you can use several function to traverse the document.

First you must use axl_doc_get_root to get the document root (axlNode) which contains all the information. Then, according to the interface you are using, you must call to either axl_node_get_first_child or axl_item_get_first_child.

Once you have access to the first element, you can use the following set of function to get more references to other nodes or items:

There are alternative APIs that will allow you to iterate the document, providing a callback: axl_doc_iterate.

Another approach is to use axl_doc_get and axl_doc_get_content_at to get fast access to a particular node using a really limited XPath syntax.

Modifying a loaded XML document

One feature that comes with Axl Library is ability to modify the content, replacing it with other content and transferring node node to another place.

Check the following function while operating with axlNode elements:

Check the following functions while operating with axlItem elements:

Producing xml documents from memory

Axl Library comes with several functions to perform xml memory dump operations, allowing to translate a xml representation (axlDoc or axlNode) into a string:

In the case you want to produce xml content taking as reference a particular node use:

Validating XML documents

Once you are familiar with the Axl API, or any other XML toolkit, it turns that it is not a good practice to write lot of source code to check node names expected or how they are nested. This makes your program really weak to changes and makes your to write more code that is not actual work but a simple environment check.

You may also need to check that some XML document received follows a defined XML structure, but it is too complex to be done.

For this purpose, XML 1.0 defines DTD or (Document Type Definition) which allows to specify the document grammar, how are nested nodes, which attributes could contain, or if the are allocated to be empty nodes or nodes that must have another child nodes.

Let start with the DTD syntax used to configure restrictions about node structure:

<!-- sequence specification -->
<!ELEMENT testA (test1, test2, test3)>
<!-- choice specification -->
<!ELEMENT testB (test1 | test2 | test3)>

DTD <!ELEMENT is modeled on top of two concepts which are later expanded with repetition patterns. We will explain then later. For now, this two top level concepts are: sequence and choice.

Sequence specification (elements separated by , (comma), the one used to apply restriction to the node testA, are used to denote that testA have as childs test1, followed by test2 and ended by test3. The order specified must be followed and all instances must appear. This could be tweaked using repetition pattern.

In the other hand, choice specification (elements separated by | (pipe), are used to specify that the content of a node is built using nodes of the choice list. So, in this case, testB node could have either one instance of test1 or test2 or test3.

Now you know these to basic elements to model how childs are organized for a node, what it is need is to keep on adding more <!ELEMENT directives until all nodes are specified. You will end your DTD document with final nodes that are either empty ones or have PCDATA. At this moment MIXED nodes are not supported.

Suppose that all nodes that are inside testA and testB are final ones. Then this could be its DTD specification:

<!-- test1 is a node that only have content -->
<!ELEMENT test1 (#PCDATA)>
<!-- test2 is a node that is always empty -->
<!ELEMENT test1 EMPTY>
<!-- test3 is a node that could have either test1 or test2 -->
<!ELEMENT test3 (test1 | test2)>

Sequences and choices could be composed to create richer DTD expressions that combines sequences of choices and so on.

At this point all required elements to model choices, sequences and final nodes are explained, but, we have to talk about repetition pattern. They are symbols that are appended to elements inside choices (or sequences) including those list specifications.

Patterns available are: +, ? and *. By default, if no pattern is applied to the element, it means that the match should be produced one and only one time.

The + pattern is used to model that element should be matched one, and at least one, or more.

The * pattern is used to model elements that should be matched zero or any times.

The ? pattern is used to model elements that should be matched zero or one times.

For the exampled initially explained, let's suppose we want that the content inside testA have sequences repeated at leat one time, being that sequence: test1, test2 and test3. We only need to add a + repetition pattern as follows:

<!-- sequence specification -->
<!ELEMENT testA (test1, test2, test3)+>

So, we are saying to our validation engine that the sequence inside testA could be found one or many times, but the entire sequence match be found every time.

Here is an simple example that loads an XML document, then loads an DTD file, and then validates the XML document:

bool test_12 (axlError ** error)
{
axlDoc * doc = NULL;
axlDtd * dtd = NULL;
// parse gmovil file (an af-arch xml chunk)
doc = axl_doc_parse_from_file ("channel.xml", error);
if (doc == NULL)
return axl_false;
// parse af-arch DTD
dtd = axl_dtd_parse_from_file ("channel.dtd", error);
if (dtd == NULL)
return axl_false;
// perform DTD validation
if (! axl_dtd_validate (doc, dtd, error)) {
return axl_false;
}
// free doc reference
axl_doc_free (doc);
// free dtd reference
axl_doc_free (dtd);
return axl_true;
}

Until now, we have seen how to check xml structure. But this do not cover xml node attributes. This is checked by using <!ATTLIST> declaration.

In the case we have a node testA with two attribuets attr1 and attr2 the first one optional and the second one mandatory, we can declare something like:

<!-- attribute validation for node testA -->
<!ATTLIST testA
attr1 CDATA #IMPLIED
attr2 CDATA #REQUIRED>

Enabling your software with XML Namespaces

XML 1.0 initial design didn't take care about situations where several software vendors could introduce content inside the same XML documents. This has several benefits, but one problem to solve: how to avoid xml node names (tags) to clash from each other.

Think about using <table> as a tag for your document. Many XML applications uses <table> as a valid tag for its XML language set. However, each of them has a different meaning and must be handled by the proper XML software.

While developing applications with XML, and supposing such XML documents will be used by more applications than yours, you are likely to be interested in use XML Namespaces. In other words, many of the new XML standards that are appearing uses XML Namespaces to allow defining its xml node names, while allowing users/developers to use their own set of xml tags, under their own XML Namespaces, in order they can use them in the same document.

XML Namespaces support inside Axl Library is handled through a separated library, which requires the base library to function. Here are some instructions to get Axl Library Namespace installed.

This library provides functions that replaces some of the functions used by XML applications that don't require XML Namespaces. In particular, some of them are:

See also API documentation for all functions that are provided to enable your application with XML Namespaces:

Making your software to support other encodings than UTF-8

Default axl library implementation (libaxl) assumes it will receive and produce UTF-8 content.

Because the subset of characters that are used to properly parse XML content are located in the ASCII range, still valid UTF-8, but at same time valid in other encodings such ISO 646, some part of ISO 8859, Shift-JIS, EUC, or any other 7-bit, 8-bit, or mixed-width encoding which ensures that the characters of ASCII have their normal positions, width, and values (See section F. Autodetecting of Character Encodings at http://www.w3.org/TR/REC-xml/), causes the library to properly parse the content, even if it is not UTF-8.

In many cases this is not important for you since your application do not care about content codification (such configuration files) or they are in UTF-8.

However, this could present problems if you are handling different documents with several encoding types. The idea is to have an unified way to handle such different encoded documents, with a single, run-time encoding: UTF-8.

libaxl-babel provides support to read content in supported codifications and translate it into UTF-8 at run-time (checking result to be valid UTF-8):

axl_babel_reading.png
Reading documents and handle them as they were in UTF-8

The library works as an extension that configures a set of handlers making the library to open XML documents and translating them into UTF-8 if required.

To activate the library, you must use axl_babel_init at the begining of your application or library. Here is an example:

// optional axlError declaration
axlError * error;
// init axl babel
if (! axl_babel_init (&error)) {
printf ("Failed to start axl babel: %s...\n",
axl_error_get (error));
axl_error_free (error);
return axl_false;
}

Once done, every call to the base API (such axl_doc_parse, axl_doc_parse_from_file) will open the document as usual. It is not required to perform any additional special operation.

It is not required to call to axl_babel_finish on application exit. However, in the case you want to deactivate libaxl-babel but still keep on using axl base library, you can use axl_babel_finish.

See axl_babel_init for currently supported formats.

How to reduce the library footprint

Axl Library is implemented in a modular way to ensure you are only linked against those software elements that you really require. Additionally, the library allows the following to reduce the library footprint to the minimum:

Previous information applies to the Axl base Library (libaxl.so/.dll), however the same happens for the rest of software components bundle with Axl.

Futher reading where to go for more information

You can also check API documentation for a complete detailed explanation about the library.

Please, if you find that something isn't properly documented or you think that something could be improved, contact us in the mailing list. We are building Axl Library with the aim to produce a high quality, commercial grade, open source XML development kit, so, any help received will be welcome.

Remember you can always contact us at the mailing list for any question not properly answered by this documentation. See Axl Library website documentation to get more information about mailing list.