Find JSRs
Submit this Search

Ad Banner

JSRs: Java Specification Requests
JSR 5: XML Parsing Specification

Original Java Specification Request (JSR)

Identification | Request | Contributions

Section 1: Identification

Submitted by:

David Brownell and Nancy K. Lee,
Java Software, Sun Microsystems, Inc.

voice: +1 408-343-1439

Section 2: Request

A. Background

The intended specification will address the need for a complete set of implementation-independent portable APIs supporting XML 1.0. The XML specification is available on-line at:

XML is a platform-independent data representation, which may be viewed as a simplified web-aware version of SGML. It is serving as a foundation for a new generation of web technologies. Today, it is used in web application servers as part of dynamic content generation systems and as part of messaging systems (e.g. for business-to-business web commerce and workflow) uniting system components written in many programming languages.

Existing specifications for JavaTM APIs for XML do not address the full set of requirements for complete applications (see below for more information). In brief, the accepted portable XML APIs (SAX and DOM) have portability limitations in basic functionality, such as validation, constructing DOM trees from input documents, writing out well formed XML, and working with XML namespaces.

B. Scope and Content

We propose to develop a set of modular library APIs, a 100% Pure JavaTM Reference Implementation (RI), and a Compatibility Test Suite (CTS), addressing at least the issues noted below.

This targets the desktop and enterprise versions of the Java Platform, based on the JDK 1.1 API set. The APIs can also support the Personal Java platform with little or no extra effort.

It is recognized that there is much work going on in the area of XML at this time. This proposal will provide a set of core features that will form the building blocks for fully-functional XML-based applications.

We propose that this core should include:

(1) Event-based parsing of XML.

We expect that this API would be based on the "SAX 1.0" API, which is widely accepted for this purpose. Extensions may be necessary, exposing information which is not exposed by those APIs but is required elsewhere in the core feature set.

Event based parsing is an absolute requirement for a number of applications that cannot accept the overhead of producing an in-memory representation of XML data.

The name of this package is "org.xml.sax", and there is also "org.xml.sax.helpers". Extensions would probably not be in those packages.

(2) An optional basic in-memory representation of XML data.

We expect that this API would be based on the W3C "DOM Level 1 Core" API. We know that extensions to this package are needed, because there is no way to acquire an implementation or populate one according to a given XML source document. Similarly, there is no defined way to extend a "core" implementation to support additional feature sets such as the "HTML DOM", or ID-based node access (as required for XSL and XLink support).

The name of this package is "org.w3c.dom", but extensions would not be in that package unless they were adopted by the W3C.

(3) Basic APIs to print the XML data.

Applications of XML involve either reading XML that has been provided by some other source, or writing XML text so that it may be processed by some other application. Some API is needed to support writing XML text, at least "well formed" if not also "valid".

(4) Support for XML Namespaces

XML namespaces are a convention for associating a URI with elements and attributes, creating an extensible naming framework in place of the original flat namespace. APIs are needed to support accessing such XML nodes according to namespaces, and otherwise take advantage of this structuring tool.

We anticipate that extensions not already defined by an external organization would be in a

package (or sub-package).

C. Implications

This technology has no direct security implications. However, it may be used in security-sensitive contexts, such as web commerce messages.

XML technology is targeted at internationalized systems. It has been defined in terms of the Unicode character set, supports a wide (and extensible) variety of character encodings, and has direct support for representing text in multiple languages within the same XML document. For example, a common use of XML in web-based systems requires support for multiple languages concurrently in the same Java Virtual Machine* (as in the case where one client uses English while another uses Japanese.)

In terms of localization, the reference implementation of this level of XML technology will require diagnostic messages and documentation to be translatable into local languages.

The risk of not providing a specification as outlined in this JSR is that fragmentation will exist in core XML APIs. Also, there will be wide variations in compatibility between different implementations. There is no particular difficulty in providing an RI or CTS, although XML conformance work (as noted below) will be required.

D. Existing Specifications

The W3C DOM working group has produced a "Level 1 Core" specification, and is developing an enhanced "Level 2 Core". This is specified in OMG-IDL, and has custom language bindings to Java and to JavaScript.

David Megginson, coordinating input from members of the "XML-DEV" mailing list, has produced the SAX 1.0 specification. The reference is specified in Java, but the intent is that this API not be specific to Java; for example, Python bindings exist.

In terms of XML conformance, there is an Oasis/NIST working group which is working to produce a set of accepted XML conformance tests and associated infrastructure.

In all of these cases, it is the desire of this process not to preempt the work being done there, but rather to collaborate as appropriate to achieve the intended results. In particular, this process will provide a focus on Java Platform integration issues, which are not the primary goal of any of those existing efforts.

Section 3: Contributions

Sun Microsystems has a highly conformant implementation of the basic standards identified above. This is accessible from at this time. This is one of several 100% Pure Java implementations of those standards, and is well advanced in conformance testing and performance tuning.

In conjunction with the above, Sun Microsystems has developed a set of SAX and XML conformance tests. These tests build on top of well accepted tests that are freely available from James Clark, called XMLTEST (, adding validation and more complete coverage of all the testable statements in the XML 1.0 specification.

*As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.