TITLE: | Committee Draft for DSDL part 4: Selection of Validation Candidates |
SOURCE: | SC34/WG1 |
PROJECT: | |
PROJECT EDITOR: | MURATA Makoto (FAMILY Given), Japan |
STATUS: | Committee Draft |
ACTION: | |
DATE: | 2002-12-11 |
DISTRIBUTION: | SC34 and Liaisons |
REFER TO: | |
REPLY TO: |
Different parts of an XML document may require different schema languages. Typical examples are narrative documents containing metadata. Narrative documents may be written in DocBook, TEI, or XHTML. General-purpose schema languages such as RELAX NG and DTD are appropriate for describing schemas for such narrative documents. On the other hand, embedded metadata may be topic maps or RDF metadata. Special-purpose schema languages such as Topic Map Constraint Language or RDF Schema are appropriate for describing schemas for such metadatda.
DSDL brings together multiple schema languages into a single framework that allows them to work together. In particular, this part of DSDL allows selection of validation candidates. That is, DSDL alows specific parts of an XML document to be extracted and then validated. Different schema languages and validators may be applied to different candidates.
This part of the International Standard introduces an XML-based language for controlling selection of validation candidates. This language is called DSDL VCSL (DSDL Validation Candidate Selection Language).
RELAX NG (Part 2), Schematron (Part 3), Path-based integrity constraint language (Part 6), and even non-DSDL schema languages (e.g., RDF Schema [1] and Topic Map Constraing Language [2]) may be used to validate extracted validation candidates. However, it is outside the scope of this part to specify which schema and schema language is used for extracted validation candidates.
Descriptions in DSDL VCSL may be independent XML documents or they may be embedded in other XML documents. Specifically, when a DSDL framework is represtented by an XML document, it may reference to or contain descriptions in DSDL VCSL.
The following documents contain provisions which, through reference in this text, constitute provisions of this part of the International Standard.
IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, Internet Standards Track Specification, August 1998, http://www.ietf.org/rfc/rfc2396.txt
W3C XML, Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 6 October 2000, http://www.w3.org/TR/2000/REC-xml-20001006
W3C XML-Infoset, XML Information Set, W3C Recommendation, 24 October 2001, http://www.w3.org/TR/2001/REC-xml-infoset-20011024/
W3C XML-Names, Namespaces in XML, W3C Recommendation, 14 January 1999, http://www.w3.org/TR/1999/REC-xml-names-19990114/
port
a port is an NCName
A VSCL description specifies a list of port-namespace pairs.
If an element e belongs to one of the specified namespaces
and the parent element of e (say e' ) belongs to a
different namespace, then e is detached from e'.
Instead of e, a dummy node is introduced as a child element of
e'. Such dummy elements belong to the namespace
http://www.xml.gr.jp/xmlns/dummy
. The attribute
namespaceName
of a dummy node for e indicates the namespace
of e.
Validation candidates are created by repeatedly detaching elements and introducing dummy elements. Each of these candidates is associated with a port. More than one candidate may be associated with a single port.
Note: Port names may be used by DSDL framework descriptions for specifying schemas and schema languages for validating extracted candidates.
A VSCL description specifies a list of tuples, each of which consist of a port, a namspace URI, a local name, and an optional attribute value.
An attribute is said to match such a tuple, if its namespace URI, local name, and attribute value are specified by this tuple.
If an attribute of an element e matches one of the tuples,
then e is detached from the parent element of e (say
e' ). Instead of e, a dummy node is introduced as a
child element of e'. Such dummy elements belong to the
namespace http://www.xml.gr.jp/xmlns/dummy
. A dummy node
for e inherits the matching attribute of e.
It is an error if more than one attribute of an element matches specified tuples.
To be supplied.
To be supplied.
To be supplied.
This annex provides a RELAX NG schema formally representing the syntax of DSDL VCSL. The URI for this schema is : ?????
|
Consider a VCSL description as below.
|
This specifies two port-namespace pairs, namely
portDoc
, "http://www.example.com/doc"
)
and portTable
, "http://www.example.com/table"
).Consider an XML document as below. This document has two namespaces,
"http://www.example.com/doc"
and "http://www.example.com/table"
.
|
Figure 1: An XML document containing two namespaces
This XML document is decomposed into seven validation candidates.
First, there is one top-level candidate. It is associated with the
port portDoc
.
|
Figure 2: A top-level validation candidate
Second, there are two second-level candidates. They are associated with the
port portTable
.
|
|
Figure 3: Two second-level validation candidates
Third, there are four third-level candidates. They are associated
with the port portDoc
.
|
|
|
|
Figure 4: Four third-level validation candidates
Consider a VCSL description as below.
|
This specifies two port-namespace pairs, namely
portXhtml
, "http://www.w3.org/1999/xhtml"
)
and portRdf
, "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
).An XHTML document containing an embedded RDF is shown below.
|
This XML document is decommposed into two validation candidates.
The first and second validation candidates are associated with
portXhtml
and portRdf
, respectively.
The former can be validated against a RELAX NG schema, while
the latter can be validated against an RDF schema [1].
|
|
To be supplied.
To be supplied.
[1] RDF Vocabulary Description Language 1.0: RDF Schema, http://www.w3.org/TR/rdf-schema
[2] Topic Map Constraint Language, ...
Issue 1: Should we drop attribute-based selection (4.2)? Use cases are required.
Issue 2: Is the design of attribute-based selection powerful enough? Use cases are required.
Issue 3: Should we unify namespace-based selection (4.1) and attribute-based selection (4.2)?
Issue 4: Should we allow attributes as validation candidates? In other words, should we detach attributes from elements? This might be useful for allowing common attributes.
Issue 5: Should we introduce an additional constraint that applies to the validation candidate representing the document root? In the example shown in Appendix B, we cannot eliminate XML documents consisiting of paragraphs only. If we introduce a constraint on the root candidate, such documents can be eliminated.
Issue 6:
Should we introduce an additional attribute of dummy
to represent the tag name of the original element?
Issue 7: Should we
use foreign
elements rather than dummy
elements?
Issue 8: Should we
duplicate subtrees rather than extracting them? In other words,
should we keep original elements rather than replacing them dummy
elements?
Issue 9: Should extracted fragments inherit XML version numbers?
Issue 10: Should extracted fragments inherit namespace declarations?
Issue 11: Should extracted fragments inherit xml:lang, xml:base, and other such attributes?
Issue 12: Should we allow foreign-namespace elements and attributes (annotations) as does RELAX NG?
Issue 13: Should we allow the VSCL root element to have a version attribute indicating the version of DSDL VSCL?
Issue 14: Which namespace URI is appropriate for DSDL VCSL?
Issue 15:
Which namespace URI is appropriate for dummy nodes? At
present, http://www.xml.gr.jp/xmlns/dummy
is used.