A Brief Introduction
to XML DOM in MSXML
copyright Bun Yue 3/30/2000
Introduction
-
MSXML, the Microsoft XML parser included in Internet Explorer 5.0 ,includes
full support for the programming interfaces described in the W3C Document
Object Model Core (Level 1) recommendation.
-
MSXML also adds Microsoft specific functionality.
-
MSXML parse an XML document to a tree. DOM supports classes (data
types) and methods to access the manipulate the tree.
-
References:
To invoke MSXML and parse a file:
<%
Set objXML = Server.CreateObject("Microsoft.XMLDOM")
objXML.async = False
' optional
objXML.validateOnParse = False
' optional
objXML.Load(Server.MapPath("scriptingNews.xml"))
%>
Notes:
-
Server.CreateObject("Microsoft.XMLDOM") returns
an XMLDOMDocument object, representing
the entire XML document.
-
The async property may be set to false so that future
loading will not be asynchronous. The next statement will wait until
the loading is completed.
-
The validateOnParse property may be set to false
to disable DTD validation. This will speed up the parsing.
-
MapPath performs mapping of file name of virtual
directory to physical directory file name.
There are four major classes:
XMLDOMDocument Object
-
XMLDOMDocument represents the top level of the XML
source.
-
Contains properties and methods to navigate, query,
and modify the content and structure of an XML document.
-
Since an XMLDOMDocument Object is also an XMLDOMNode,
all methods of XMLDOMNode are available.
-
Important properties and methods:
-
documentElement: contains the root element of the
document.
-
load: Loads an XML document from the specified location,
a file name or an url.
-
loadXML: Loads an XML document using the supplied
string.
-
validateOnParse: Indicates whether the parser should
validate this document.
XMLDOMNode Object
-
The XMLDOMNode object extends the core XML DOM node
interface to provide support for data types, namespaces, DTDs, and schemas.
-
Important properties and methods:
-
attributes: the list of attributes for this node,
an XMLDOMNamedNodeMap object.
-
childNodes: a node list containing the children,
an XMLDOMNodeList object.
-
dataType: specifies the data type for this node:
NODE_ATTRIBUTE, NODE_ELEMENT, NODE_TEXT, etc.
-
firstChild: contains the first child of this node.
-
getElementsByTagName: returns a collection of elements
that have the specified name, an XMLDOMNodeList object.
-
hasChildNodes: Returns true if this node has children.
-
lastChild: Returns the last child node.
-
nextSibling: Contains the next sibling of this node
in the parent's child list.
-
nodeName: Contains the qualified name of the element,
attribute, or entity reference, or a fixed string for other node types.
-
nodeType: Specifies the XML DOM node type, which
determines valid values and whether the node can have child nodes.
-
nodeValue: Contains the text associated with the
node.
-
parentNode: Contains the parent node (for nodes that
can have parents).
-
selectNodes: Applies the specified pattern-matching
operation to this node's context and returns the list of matching nodes.
-
text: Contains the text content of the node and its
subtrees.
-
There are many node types. The most important
are:
-
NODE_ELEMENT: may contain attributes and children.
-
NODE_ATTRIBUTE: represent an attribute of an element
node.
Example:
For the parsed XML document with top elements
<news>:
<news>
<article
id="_6337363">
<headline_text>CDnow: Harbinger of E-Commerce Doom?</headline_text>
<source>Industry Standard</source>
<media_type>text</media_type>
<cluster>E-commerce news</cluster>
<tagline> </tagline>
<document_url>http://www.thestandard.com/news/grok/</document_url>
<harvest_time>Mar 30 2000 10:23PM</harvest_time>
<access_registration> </access_registration>
<access_status> </access_status>
</article>
<article
id="_6336605">
<headline_text>Tommy Hilfiger And Bluefly Settle Suit</headline_text>
<source>Gomez</source>
<media_type>text</media_type>
<cluster>E-commerce news</cluster>
<tagline> </tagline>
<document_url>http://www.gomez.com/features/gomezwire.cfm?topcat_id=0&section=ALL</document_url>
<harvest_time>Mar 30 2000 9:58PM</harvest_time>
<access_registration> </access_registration>
<access_status> </access_status>
</article>
</news>
To list all headline texts:
Method #1:
Set objLst = objXML.getElementsByTagName("article")
For i = 0 to (objLst.length - 1)
strHeadline = objLst.item(i).getElementsByTagName("headline_text").item(0).childNodes(0).nodeValue
Response.write "News item
#" & (i+1) & ": " & strHeadline & "<BR>" & vbCRFL
Next
Method #2 (not so good):
Set objLst = objXML.getElementsByTagName("article")
For i = 0 to (objLst.length - 1)
strHeadline = objLst.item(i).childNodes(0).text
Response.write "News item
#" & (i+1) & ": " & strHeadline & "<BR>" & vbCRFL
Next
Method #3:
Set objLst = objXML.getElementsByTagName("headline_text")
i = 0
For each objNode in objLst
strHeadline = objnode.text
i = i + 1
Response.write "News #"
& i & ": " & strHeadline & "<BR>" & vbCRFL
Next
XMLDOMNodeList Object
-
The XMLDOMNodeList object supports iteration through
the live collection, in addition to indexed access.
-
ChildNodes and getElementsByTagName return an XMLDOMNodeList
object.
-
A live collection means any change to the collection
is immediate.
-
All collection methods can be used.
-
Properties and methods:
-
item: allows random access to individual nodes within
the collection.
-
length: indicates the number of items in the collection.
-
nextNode: used as an iterator to return the next
node in the collection.
-
reset: Resets the iterator.
XMLDOMNamedNodeMap
-
An XMLDOMNamedNodeMap object is returned by the attributes
property.
-
Unlike an XMLDOMNodeList object, an XMLDOMNamedNodeMap
can also be accessed by name.
-
This is because attributes are uniquely identified
by their names.
-
In addition to the properties and methods for an
XMLDOMNodeList object, an XMLDOMNamedNodeMap object also has the following
properties and methods, among others:
-
getNamedItem: Retrieves the attribute with the specified
name.
-
removeNamedItem: Removes an attribute from the collection.
Example:
To print all id attributes of attribute elements.
Set objLst = objXML.getElementsByTagName("article")
For i = 0 to (objLst.length - 1)
strId = objLst.item(i).attributes.getNamedItem("id").nodeValue
Response.write "News #"
& i & " id: " & strId & "<BR>" & vbCRLF
next