ISO/IEC JTC 1/SC34 N363 (Draft)

ISO/IEC JTC 1/SC34/WG1

Information Technology --
Document Description and Processing Languages
-- Information Description

TITLE: Committee Draft for DSDL part 4: Selection of Validation Candidates
SOURCE: SC34/WG1
PROJECT:
PROJECT EDITOR: MURATA Makoto (FAMILY Given), Japan
STATUS: Committee Draft
ACTION:
DATE: 2002-12-11
DISTRIBUTION: SC34 and Liaisons
REFER TO:
REPLY TO:


Committee Draft         ISO/IEC 19757-4

Document Schema Definition Languages (DSDL) ——— Part 4: Selection of Validation Candidates

Introduction

Different parts of an XML document may require different schema languages. Typical examples are narrative documents containing metadata. Narrative documents may be written in DocBook, TEI, or XHTML. General-purpose schema languages such as RELAX NG and DTD are appropriate for describing schemas for such narrative documents. On the other hand, embedded metadata may be topic maps or RDF metadata. Special-purpose schema languages such as Topic Map Constraint Language or RDF Schema are appropriate for describing schemas for such metadatda.

DSDL brings together multiple schema languages into a single framework that allows them to work together. In particular, this part of DSDL allows selection of validation candidates. That is, DSDL alows specific parts of an XML document to be extracted and then validated. Different schema languages and validators may be applied to different candidates.


1. Scope

This part of the International Standard introduces an XML-based language for controlling selection of validation candidates. This language is called DSDL VCSL (DSDL Validation Candidate Selection Language).

RELAX NG (Part 2), Schematron (Part 3), Path-based integrity constraint language (Part 6), and even non-DSDL schema languages (e.g., RDF Schema [1] and Topic Map Constraing Language [2]) may be used to validate extracted validation candidates. However, it is outside the scope of this part to specify which schema and schema language is used for extracted validation candidates.

Descriptions in DSDL VCSL may be independent XML documents or they may be embedded in other XML documents. Specifically, when a DSDL framework is represtented by an XML document, it may reference to or contain descriptions in DSDL VCSL.

2. References

The following documents contain provisions which, through reference in this text, constitute provisions of this part of the International Standard.

IETF RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, Internet Standards Track Specification, August 1998, http://www.ietf.org/rfc/rfc2396.txt

W3C XML, Extensible Markup Language (XML) 1.0 (Second Edition), W3C Recommendation, 6 October 2000, http://www.w3.org/TR/2000/REC-xml-20001006

W3C XML-Infoset, XML Information Set, W3C Recommendation, 24 October 2001, http://www.w3.org/TR/2001/REC-xml-infoset-20011024/

W3C XML-Names, Namespaces in XML, W3C Recommendation, 14 January 1999, http://www.w3.org/TR/1999/REC-xml-names-19990114/

3. Terms and definitions

port
a port is an NCName

4. Basic Concepts

4.1 Namspace-based selection

A VSCL description specifies a list of port-namespace pairs.

If an element e belongs to one of the specified namespaces and the parent element of e (say e' ) belongs to a different namespace, then e is detached from e'. Instead of e, a dummy node is introduced as a child element of e'. Such dummy elements belong to the namespace http://www.xml.gr.jp/xmlns/dummy. The attribute namespaceName of a dummy node for e indicates the namespace of e.

Validation candidates are created by repeatedly detaching elements and introducing dummy elements. Each of these candidates is associated with a port. More than one candidate may be associated with a single port.

Note: Port names may be used by DSDL framework descriptions for specifying schemas and schema languages for validating extracted candidates.

4.2 Attribute-based selection

A VSCL description specifies a list of tuples, each of which consist of a port, a namspace URI, a local name, and an optional attribute value.

An attribute is said to match such a tuple, if its namespace URI, local name, and attribute value are specified by this tuple.

If an attribute of an element e matches one of the tuples, then e is detached from the parent element of e (say e' ). Instead of e, a dummy node is introduced as a child element of e'. Such dummy elements belong to the namespace http://www.xml.gr.jp/xmlns/dummy. A dummy node for e inherits the matching attribute of e.

It is an error if more than one attribute of an element matches specified tuples.

5. Syntax

To be supplied.

6. Reference Model

To be supplied.

7. Conformance

To be supplied.

Annex A (normative)
RELAX NG schema for DSDL VCSL

This annex provides a RELAX NG schema formally representing the syntax of DSDL VCSL. The URI for this schema is : ?????

<?xml version="1.0"?>
<element
    name="selection"
    ns="?????"
    xmlns="http://relaxng.org/ns/structure/1.0"
    datatypeLibrary=
      "http://www.w3.org/2001/XMLSchema-datatypes">
  <choice>
    <oneOrMore>
      <element name="pair">
        <attribute name="port">
          <data type="NCName"/>
        </attribute>
        <attribute name="namespace">
          <data type="anyURI"/>
        </attribute>
      </element>
    </oneOrMore>
    <oneOrMore>
      <element name="pair">
        <attribute name="port">
          <data type="NCName"/>
        </attribute>
        <attribute name="namespace">
          <data type="anyURI"/>
        </attribute>
        <attribute name="localName">
          <data type="NCName"/>
        </attribute>
        <attribute name="value"/>
      </element>
    </oneOrMore>
  </choice>
</element>

Annex B (informative)

B.1 (namespace-based selection)

Example 1

Consider a VCSL description as below.

<?xml version="1.0"?>
<selection xmlns="?????">
  <pair
    port="portDoc"
    namespace="http://www.example.com/doc"/>
  <pair
    port="portTable"
    namespace="http://www.example.com/table"/>
</selection>

This specifies two port-namespace pairs, namely

Consider an XML document as below. This document has two namespaces, "http://www.example.com/doc" and "http://www.example.com/table".

<doc:doc
  xmlns:doc="http://www.example.com/doc"
  xmlns:table="http://www.example.com/table">
  <doc:para>this is a para</doc:para>
  <table:table number="1">
    <table:row>
      <table:cell>
        <doc:para>1st para</doc:para>
        <doc:para>2nd para</doc:para>
      </table:cell>
    </table:row>
  </table:table>
  <table:table number="2">
    <table:row>
      <table:cell>
        <doc:para>3rd para</doc:para>
        <doc:para>4th para</doc:para>
      </table:cell>
    </table:row>
  </table:table>
</doc:doc>
two-namespace doc

Figure 1: An XML document containing two namespaces

This XML document is decomposed into seven validation candidates. First, there is one top-level candidate. It is associated with the port portDoc.

<doc:doc xmlns:doc="http://www.example.com/doc">
  <doc:para>this is a para</para>
  <dummy namespaceName="http://www.example.com/table"
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
  <dummy namespaceName="http://www.example.com/table"
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
</doc:doc>
A top-level validation candidate

Figure 2: A top-level validation candidate

Second, there are two second-level candidates. They are associated with the port portTable.

<table:table
  xmlns:table="http://www.example.com/table"
  number="1">
  <table:row>
    <table:cell>
      <dummy
         namespaceName="http://www.example.com/doc" 
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
      <dummy
         namespaceName="http://www.example.com/doc" 
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    </table:cell>
  </table:row>
</table:table>
<table:table
  xmlns:table="http://www.example.com/table"
  number="2">
  <table:row>
    <table:cell>
      <dummy
         namespaceName="http://www.example.com/doc"
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
      <dummy
         namespaceName="http://www.example.com/doc" 
         xmlns="http://www.xml.gr.jp/xmlns/dummy"/>
    </table:cell>
  </table:row>
</table:table>
Two second-level validation candidates

Figure 3: Two second-level validation candidates

Third, there are four third-level candidates. They are associated with the port portDoc.

<doc:para
  xmlns:doc="http://www.example.com/doc">1st para</doc:para>
<doc:para
  xmlns:doc="http://www.example.com/doc">2nd para</doc:para>
<doc:para
  xmlns:doc="http://www.example.com/doc">3rd para</doc:para>
<doc:para
  xmlns:doc="http://www.example.com/doc">4th para</doc:para>
Four third-level validation candidates

Figure 4: Four third-level validation candidates

Example 2 (RDF embedded in XHTML)

Consider a VCSL description as below.

<?xml version="1.0"?>
<selection xmlns="?????">
  <pair
     port="portXhtml"
     namespace="http://www.w3.org/1999/xhtml"/>
  <pair
     port="portRdf"
     namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
</selection>

This specifies two port-namespace pairs, namely

An XHTML document containing an embedded RDF is shown below.

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://doc"
      dc:creator="Joe Smith"
      dc:title="My document"
      dc:description="Joe's ramblings about his summer vacation."
      dc:date="1999-09-10" />
</rdf:RDF>
</head>
<body>
</body>
</html>

This XML document is decommposed into two validation candidates. The first and second validation candidates are associated with portXhtml and portRdf, respectively. The former can be validated against a RELAX NG schema, while the latter can be validated against an RDF schema [1].

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<dummy
   namespace="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns="http://www.xml.gr.jp/xmlns/dummy">
</head>
<body>
</body>
</html>
<rdf:RDF
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:dc="http://purl.org/dc/elements/1.1/">
  <rdf:Description rdf:about="http://doc"
      dc:creator="Joe Smith"
      dc:title="My document"
      dc:description="Joe's ramblings about his summer vacation."
      dc:date="1999-09-10" />
</rdf:RDF>

Example 3 (XML Topic Map embedded in XHTML)

To be supplied.

B.1 (attribute-based selection)

Example 4

To be supplied.

Bibliography

[1] RDF Vocabulary Description Language 1.0: RDF Schema, http://www.w3.org/TR/rdf-schema

[2] Topic Map Constraint Language, ...

Issues

Issue 1: Should we drop attribute-based selection (4.2)? Use cases are required.

Issue 2: Is the design of attribute-based selection powerful enough? Use cases are required.

Issue 3: Should we unify namespace-based selection (4.1) and attribute-based selection (4.2)?

Issue 4: Should we allow attributes as validation candidates? In other words, should we detach attributes from elements? This might be useful for allowing common attributes.

Issue 5: Should we introduce an additional constraint that applies to the validation candidate representing the document root? In the example shown in Appendix B, we cannot eliminate XML documents consisiting of paragraphs only. If we introduce a constraint on the root candidate, such documents can be eliminated.

Issue 6: Should we introduce an additional attribute of dummy to represent the tag name of the original element?

Issue 7: Should we use foreign elements rather than dummy elements?

Issue 8: Should we duplicate subtrees rather than extracting them? In other words, should we keep original elements rather than replacing them dummy elements?

Issue 9: Should extracted fragments inherit XML version numbers?

Issue 10: Should extracted fragments inherit namespace declarations?

Issue 11: Should extracted fragments inherit xml:lang, xml:base, and other such attributes?

Issue 12: Should we allow foreign-namespace elements and attributes (annotations) as does RELAX NG?

Issue 13: Should we allow the VSCL root element to have a version attribute indicating the version of DSDL VSCL?

Issue 14: Which namespace URI is appropriate for DSDL VCSL?

Issue 15: Which namespace URI is appropriate for dummy nodes? At present, http://www.xml.gr.jp/xmlns/dummy is used.