Oracle® XML Developer's Kit Programmer's Guide 10g Release 2 (10.2) Part Number B14252-01 |
|
|
View PDF |
This chapter contains these topics:
Note: Use the new unified C++ API inxml.hpp for new XDK applications. The old C++ API in oraxml.hpp is deprecated and supported only for backward compatibility. |
Oracle XML parser for C++ determines whether an XML document is well-formed and optionally validates it against a DTD or XML schema. The parser constructs an object tree which can be accessed through one of the following two XML APIs:
DOM: Tree-based APIs. A tree-based API compiles an XML document into an internal tree structure, then allows an application to navigate that tree using the Document Object Model (DOM), a standard tree-based API for XML and HTML documents.
SAX: Event-based APIs. An event-based API, on the other hand, reports parsing events (such as the start and end of elements) directly to the application through a user defined SAX even handler, and does not usually build an internal tree. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.
Tree-based APIs are useful for a wide range of applications, but they often put a great strain on system resources, especially if the document is large (under very controlled circumstances, it is possible to construct the tree in a lazy fashion to avoid some of this problem). Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree.
This is the namespace for DOM-related types and interfaces.
DOM interfaces are represented as generic references to different implementations of the DOM specification. They are parameterized by Node
that supports various specializations and instantiations. Of them, the most important is xmlnode
which corresponds to the current C implementation
These generic references do not have a NULL
-like value. Any implementation must never create a reference with no state (like NULL
). If there is a need to signal that something has no state, an exception should be thrown.
Many methods might throw the SYNTAX_ERR
exception, if the DOM tree is incorrectly formed, or throw UNDEFINED_ERR
, in the case of wrong parameters or unexpected NULL
pointers. If these are the only errors that a particular method might throw, it is not reflected in the method signature.
Actual DOM trees do not depend on the context, TCtx
. However, manipulations on DOM trees in the current, xmlctx
-based implementation require access to the current context, TCtx
. This is accomplished by passing the context pointer to the constructor of DOMImplRef
. In multithreaded environment DOMImplRef
is always created in the thread context and, so, has the pointer to the right context.
DOMImplRef
provides a way to create DOM trees. DomImplRef
is a reference to the actual DOMImplementation
object that is created when a regular, non-copy constructor of DomImplRef
is invoked. This works well in a multithreaded environment where DOM trees need to be shared, and each thread has a separate TCtx
associated with it. This works equally well in a single threaded environment.DOMString is only one of the encodings supported by Oracle implementations. The support of other encodings is an Oracle extension. The oratext*
data type is used for all encodings.Interfaces represent DOM level 2 Core interfaces according to http://www.w3.org/TR/DOM-Level-2-Core/core.html
. These C++ interfaces support the DOM specification as closely as possible. However, Oracle cannot guarantee that the specification is fully supported by our implementation because the W3C specification does not cover C++ binding.
DATATYPE DomNodeType - Defines types of DOM nodes.
DATATYPE DomExceptionCode - Defines exception codes returned by the DOM API.
DOMException
Interface - See exception DOMException
in the W3C DOM documentation. DOM operations only raise exceptions in "exceptional" circumstances: when an operation is impossible to perform (either for logical reasons, because data is lost, or because the implementation has become unstable). The functionality of XMLException can be used for a wider range of exceptions.
NodeRef
Interface - See interface Node
in the W3C documentation.
DocumentRef
Interface - See interface Document
in the W3C documentation.
DocumentFragmentRef
Interface - See interface DocumentFragment
in the W3C documentation.
ElementRef
Interface - See interface Element
in the W3C documentation.
AttrRef
Interface - See interface Attr
in the W3C documentation.
CharacterDataRef
Interface - See interface CharacterData
in the W3C documentation.
TextRef
Interface - See Text
nodes in the W3C documentation.
CDATASectionRef
Interface - See CDATASection
nodes in the W3C documentation.
CommentRef
Interface - See Comment
nodes in the W3C documentation.
ProcessingInstructionRef
Interface - See PI
nodes in the W3C documentation.
EntityRef
Interface - See Entity
nodes in the W3C documentation.
EntityReferenceRef
Interface - See EntityReference
nodes in the W3C documentation.
NotationRef
Interface - See Notation
nodes in the W3C documentation.
DocumentTypeRef
Interface - See DTD
nodes in the W3C documentation.
DOMImplRef
Interface - See interface DOMImplementation
in the W3C DOM documentation. DOMImplementation
is fundamental for manipulating DOM trees. Every DOM tree is attached to a particular DOM implementation object. Several DOM trees can be attached to the same DOM implementation object. Each DOM tree can be deleted and deallocated by deleting the document object. All DOM trees attached to a particular DOM implementation object are deleted when this object is deleted. DOMImplementation
object is not visible to the user directly. It is visible through class DOMImplRef
. This is needed because of requirements in the case of multithreaded environments
NodeListRef
Interface - Abstract implementation of node list. See interface NodeList in the W3C documentation.
NamedNodeMapRef
Interface - Abstract implementation of a node map. See interface NamedNodeMap in the W3C documentation.
DATATYPE AcceptNodeCode
defines values returned by node filters provided by the user and passed to iterators and tree walkers.
DATATYPE WhatToShowCode
specifies codes to filter certain types of nodes.
DATATYPE RangeExceptionCode
specifies Exception kinds that can be thrown by the Range
interface.
DATATYPE CompareHowCode
specifies kinds of comparisons that can be done on two ranges.
NodeFilter
Interface - DOM 2 Node Filter.
NodeIterator
Interface - DOM 2 Node Iterator.
TreeWalker
Interface - DOM 2 TreeWalker.
DocumentTraversal
Interface - DOM 2 interface.
RangeException
Interface - Exceptions for DOM 2 Range operations.
Range
Interface - DOM 2 Range.
DocumentRange
Interface - DOM 2 interface.
DOMParser
Interface - DOM parser root class.
GParser
Interface - Root class for XML parsers.
ParserException
Interface - Exception class for parser and validator.
SAXHandler
Interface - Root class for current SAX handler implementations.
SAXHandlerRoot
Interface - Root class for all SAX handlers.
SAXParser
Interface - Root class for all SAX parsers.
SchemaValidator
Interface - XML schema-aware validator.
GParser
Interface - Root class for all XML parser interfaces and implementations. It is not an abstract class, that is, it is not an interface. It is a real class that allows users to set and check parser parameters.
DOMParser
Interface - DOM parser root abstract class or interface. In addition to parsing and checking that a document is well formed, DOMParser provides means to validate the document against DTD or XML schema.
If threads are forked off somewhere in the midst of the init-parse-term sequence of calls, you will get unpredictable behavior and results.
A call to Tools::Factory
to create a parser initializes the parsing process.
The XML input can be any of the InputSource
kinds (see IO
namespace).
DOMParser
invocation results in the DOM tree.
SAXParser
invocation results in SAX events.
A call to parser
destructor terminates the process.
The following is the XML parser for C++ default behavior:
Character set encoding is UTF-8. If all your documents are ASCII, you are encouraged to set the encoding to US-ASCII for better performance.
Messages are printed to stderr
unless msghdlr
is specified.
XML parser for C++ determines whether an XML document is well-formed and optionally validates it against a DTD. The parser constructs an object tree that can be accessed through a DOM interface or operates serially through a SAX interface.
A parse tree which can be accessed by DOM APIs is built unless saxcb
is set to use the SAX callback APIs. Note that any of the SAX callback functions can be set to NULL
if not needed.
The default behavior for the parser is to check that the input is well-formed but not to check whether it is valid. The flag XML_FLAG_VALIDATE
can be set to validate the input. The default behavior for whitespace processing is to be fully conformant to the XML 1.0 spec, that is, all whitespace is reported back to the application but it is indicated which whitespace is ignorable. However, some applications may prefer to set the XML_FLAG_DISCARD_WHITESPACE
which will discard all whitespace between an end-element tag and the following start-element tag.
Note: It is recommended that you set the default encoding explicitly if using only single byte character sets (such as US-ASCII or any of the ISO-8859 character sets) for performance up to 25% faster than with multibyte character sets, such as UTF-8. |
In both of these cases, an event-based API provides a simpler, lower-level access to an XML document: you can parse documents much larger than your available system memory, and you can construct your own data structures using your callback event handlers.
xdk/demo/cpp/parser/
directory contains several XML applications to illustrate how to use the XML parser for C++ with the DOM and SAX interfaces.
Change directories to the sample directory ($ORACLE_HOME/xdk/demo/cpp
on Solaris, for example) and read the README
file. This will explain how to build the sample programs.
Table 21-1 lists the sample files in the directory. Each file *Main.cpp
has a corresponding *Gen.cpp
and *Gen.hpp
.
Table 21-1 XML Parser for C++ Sample Files
Sample File Name | Description |
---|---|
DOMSampleMain.cpp |
Sample usage of C++ interfaces of XML parser and DOM. |
FullDOMSampleMain.cpp |
Manually build DOM and then exercise. |
SAXSampleMain.cpp |
Source for SAXSample program. |