StatusFinal

JCP version2.1

JSPA version1.0

Proposal

Updates to the Original Java Specification Request (JSR)

2011.11.21:
The Maintenance Lead changed from Conversation Computing to EverSpeech:

Specification Lead: Charles Hemphill
EverSpeech

E-Mail Address: charles@everspeech.com

Telephone Number: -

Fax Number: -

Updated Summary: This JSR extends the work of the 1.0 Java^TM Speech API which allows developers to incorporate speech technology into user interfaces for their Java programming language applets and applications. This API specifies a cross-platform interface to support speech recognizers and synthesizers.

Original Java Specification Request (JSR)

Identification | Request | Contributions | Additional Information
Original Summary: This JSR extends the work of the 1.0 Java^TM Speech API which allows developers to incorporate speech technology into user interfaces for their Java programming language applets and applications. This API specifies a cross-platform interface to support command and control recognizers, dictation systems, and speech synthesizers.

Section 1. Identification

Submitting Member: Conversay

Name of Contact Person: Charles Hemphill

E-Mail Address: Hemphill@conversay.com

Telephone Number: +1 425 830 3611

Fax Number: +1 775 898 7116

Specification Lead: Charles Hemphill & Steve Rondel

E-Mail Address: Hemphill@conversay.com & srondel@conversay.com

Telephone Number: +1 425 830 3611 & +1 425 636 0606

Fax Number: +1 775 898 7116 & +1 425 636 0600

Initial Expert Group Membership:

Sun Speech Group

Conversay

Section 2: Request

2.1 Please describe the proposed Specification:

The Java^TM Speech API allows developers to incorporate speech technology into user interfaces for their Java programming language applets and applications. This API specifies a cross-platform interface to support command and control recognizers and speech synthesizers, with considerations for future incorporation of dictation and other features.

Version 2.0 will extend Sun's s pre-JCP work on JSAPI 1.0. The new API will stress compatibility with the existing API and the emerging W3C Speech Interface Framework whenever possible.

2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

We will target the embedded (J2ME) platform first and consider additional functionality on J2SE and J2EE platforms. Target platforms should have access to sound resources and adequate computing resources.

2.3 What need of the Java community will be addressed by the proposed specification?

Applications, especially on Java embedded platforms (such as communications devices, set-tops, telematics, etc.), will require speech as part of their preferred profile. A modern and scalable speech interface will allow these applications to perform various speech related functions from multiple vendors, including speech recognition and text-to-speech, while maintaining portability.

2.4 Why isn't this need met by existing specifications?

While JSAPI 1.0 is a great start, there are still some issues to be resolved. Candidate targets for JSAPI 2.0 include:

A service provider interface (SPI) will act as middleware between the JSAPI Layer and the vendor-provided speech engine. This may be based on existing standards such as the SAPI 5.0 service provider API.
The JSGF and JSML specs must track the W3C Voice Browser group's grammar format and synthesis markup languages.
Built in support for redirection of audio.
Various threading issues (e.g., synchronization via event queues) and reentrancy issues require tighter specification.
A modular architecture that supports a very small minimum configuration and future growth for the resources of a variety of platforms (e.g., desktop, server).
Various other specification details require clarification.
Future consideration for compatibility with telephony APIs (e.g., JTAPI or JAIN).

2.5 Please give a short description of the underlying technology or technologies:

JSAPI does not provide any speech functionality itself, but through a set of APIs and event interfaces, access to speech functionality provided by supporting speech vendors is accessible to the application.

A related service provider interface (JSAPI SPI), a speech engine abstraction layer the sits between JSAPI and the vendor

2.6 Is there a proposed package name for the API Specification? (i.e., `javapi.something`, `org.something`, etc.)

javax.speech.*, and possibly javax.speech.embedded, and later javax.speech.desktop, and javax.speech.server. An alternative is a component-based package approach. Component examples include dynamic grammars, spelling to pronunciation modules, etc.

2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

The device must have local or remote access to sound resources, adequate computing resources, and the need for speech services.

A vendor must also support a JSAPI-SPI-compliant speech engine for the platform.

2.8 Are there any security issues that cannot be addressed by the current security model?

Recording and transmission of audio could become a privacy issue in some applications.

2.9 Are there any internationalization or localization issues?

The API must support multiple languages. Applications must specify localized speech-related resources.

JSAPI will support the selection of speech recognizers based upon the java.util.Locale's they support. The grammar specification will extend JSAPI 1.0's JSGF format, which supports Unicode descriptions of grammars for multiple locales. JSAPI will also support the selection of speech synthesizers based upon the java.util.Locale's they support. The speech synthesis markup specification will extend JSAPI 1.0's JSML format, which supports Unicode descriptions, as an XML application, of synthesis markup documents for multiple locales.

Both the grammar and speech synthesis specifications will track the specifications being done by the W3C voice browser working group (http://www.w3c.org/voice), which are based upon JSGF and JSML. The ability to provide multilingual grammar formats (i.e., a single grammar file that contains multiple languages) and multilingual synthesis markup formats (i.e., a single synthesis document that contains multiple languages) is TBD.

2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

The status of the current Javax.speech.* can be discussed as part of the process. This effort will strive for backwards compatibility with the existing specification. Additional subpackages will support new functionality (e.g., javax.speech.embedded).

2.11 Please describe the anticipated schedule for the development of this specification.

Approximate one-year delivery of final specification and reference implementation of core JSAPI 2.0. (Several drafts provided earlier for feedback.) SPI-layer for various platform types may be independently scheduled.

Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

JJSAPI 1.0 Specification: http://java.sun.com/pro ducts/java-media/speech/

3.2 Explanation of how these items might be used as a starting point for the work.

Functionality of JSAPI 1.0 may be wholly or partially maintained, depending on ease of porting and the ability to maintain semantics with the new SPI layer, maintenance of platform generality, etc. Whenever possible, JSAPI 1.0 or a documented 1.0-subset will be supported for backward compatibility in 2.0.

Section 4: Additional Information (Optional)