Description
Please direct comments on this JSR to the Spec Lead(s).
Stage timeline
| Stage | Access | Start | Finish |
|---|---|---|---|
| Withdrawn | 06 Sep, 2005 | ||
| Expert Group Formation | 10 Oct, 2000 | ||
| JSR Review Ballot | View results | 26 Sep, 2000 | 09 Oct, 2000 |
Team
Specification Leads
- Jose E. MoreiraIBM
Expert Group
- Dongarra, Jack
- IBM
- Sun Microsystems, Inc.
Proposal
This JSR has been Withdrawn.
Reason: JSR-83 has shown slow progress for several years and has not generated significant industry interest or participation. With no outlook for a timely completion of this JSR, the Spec Lead decided to withdraw it.
Identification |
Request |
Contributions |
Additional Information
Title: Java Multiarray package Summary: This JSR proposes a package implementing true rectangular
multidimensional arrays for Java. Section 1. Identification
Submitting Member: International Business Machines Corporation Name of Contact Person: Jose E. Moreira E-Mail Address: jmoreira@us.ibm.com Telephone Number: 1-914-945-3987 Fax Number: 1-914-945-4425 Specification Lead: Jose E. Moreira, IBM Corporation E-Mail Address: jmoreira@us.ibm.com Telephone Number: 1-914-945-3987 Fax Number: 1-914-945-4425 Initial Expert Group Membership:
Additional endorsers of this JSR include individuals from:
- University of North Carolina
- National Institute of Standards and Technology
- NAG
- The Mathworks, Inc.
- University of Tennessee
- UCAR
Section 2: Request
2.1 Please describe the proposed Specification:
TheMultiarray package provides support for
true multidimensional arrays (henceforth called multiarrays) in Java.
Multiarrays are n-dimensional rectangular collections
of elements.
A multiarray is characterized by its rank (number of dimensions
or axes), its elemental data type (all elements of a multiarray
are of the same type), and its shape (the extents along
its axes).
Elements of a multiarray are identified by their indices along each axis.
Let a d-dimensional array A of elemental type
T have extent nj along its j-th axis,
j = 0,...,d-1.
Then, a valid index ij along the j-th axis must
be greater than or equal to zero and less than nj.
An attempt to reference an element
A[i0,i1,...,id-1]
with any invalid index ij causes an
ArrayIndexOutOfBoundsException to be thrown.
Elements of a multiarray are logically ordered with respect to each other according to the following definition. An element A[i0,i1,...,id-1] of a d-dimensional multiarray A follows an element A[j0,j1,...,jd-1] of the same multiarray if and only if there exists a k greater than or equal to zero and less than d such that il=jl for all l<k and ik>jk. In usual nomenclature, this corresponds to row-major (C-style) ordering of the elements. We call this the intrinsic ordering of the multiarray elements. (In a column-major, i.e., Fortran-style, ordering of the elements, an element A[i0,i1,...,id-1] of a d-dimensional multiarray A follows an element A[j0,j1,...,jd-1] of the same multiarray if and only if there exists a k greater than or equal to zero and less than d such that il=jl for all l>k and ik>jk. This concept will be useful when we discuss native interfaces and mapping to conventional Java arrays below.)
We propose the development of standard Java classes which
implement multiarrays.
The rank and type of a multiarray are defined by its class.
That is, for each rank and type there is a different concrete class.
Supported types must include all of Java primitive types
(boolean, byte, char, short,
int, long, float,
and double), one or more complex types (at least a
Complex class that implements double precision complex numbers),
and Object.
Supported ranks must include 0- (scalar), 1-, 2-, 3-, 4-, 5-, 6-
and 7-dimensional arrays. (Rank 7 is the current standard limit for Fortran.)
The class for a d-dimensional array of type would
be typeMultiarraydD.
As an example, the class for a two-dimensional multiarray of doubles would
be named doubleMultiarray2D.
Multiarray classes are final classes.
All concrete classes of the same type (e.g., doubleMultiarray1D,
doubleMultiarray2D, ..., doubleMultiarray7D) are
derived from an abstract class with the same type (e.g., doubleMultiarray).
All type-specific abstract classes (e.g., floatMultiarray,
ComplexMultiarray, charMultiarray) are derived from the
abstract class Multiarray. This class hierarchy supports
type- and rank-generic programming (through the abstract classes), as well as
type- and rank-specific programming (as would normally be done in C and Fortran).
We expect that programming with the concrete final classes will be
important for performance. Programs that use the type- and rank-specific
concrete final classes can be optimized using mature compiler techniques
developed for Fortran, C, and C++. The optimization of type- and rank-generic
programs is still a research topic.
The Multiarray package must fully support the concept of
regular multiarray sections.
A regular multiarray section corresponds to a subarray of another
multiarray, which we call the master multiarray.
Each element of a multiarray section corresponds to exactly one
element of its master multiarray.
The correspondence is one-to-one in the section to master direction.
Referencing one element of a multiarray section (for reading or writing)
has the effect of referencing the corresponding element of
the master multiarray.
Regular multiarray sections have the same type as, and rank less than
or equal to, their master multiarrays.
Regular array sections behave exactly like regular arrays for all operations,
including sectioning.
(In fact, there are no separate classes for multiarray sections.)
A dB-dimensional multiarray section B of a dA-dimensional master multiarray A is defined by specifying, for each axis k of A, either (i) a specific element index along that axis or (ii) a range of element indices. A range of indices consists of an arithmetic progression of indices, defined by its first element, the incremental step, and the number of elements. The single index or the range of indices serve as a selector of the elements of A that are present in section B. The number of axes of A for which a single index is specified is equal to the rank difference between A and B, dA - dB. The shape of B is defined by the length (number of elements) in the range selectors.
In addition to regular multiarray sections, the Multiarray
package must provide some support for irregular multiarray sections.
In specifying an irregular multiarray section, the selector for
an axis of a master multiarray A can be an arbitrary list of
indices, as opposed to just an arithmetic progression.
Irregular multiarray sections are never instantiated as objects,
and there is no explicit representation for them.
The concept is only present during certain operations on
multiarrays.
The multiarray classes provide methods that implement Fortran-like functionality for arrays. In particular, the following operations must be provided:
- Get and set the values of a multiarray element or regular multiarray section.
- Get and set the values of the elements in an irregular multiarray section.
- Operations to query the rank and shape of a multiarray.
- Operations to reshape and transpose a multiarray.
- Elemental conversion functions (e.g., the
equivalent of Fortran
REALandAIMAG, that convert complex arrays intodoublearrays). - Elemental transcendental functions (e.g.,
sinandcosapplied to each element of a multiarray). - Elemental relational functions (<, >, <=, >=, ==, !=), applied to corresponding elements of conforming (i.e., same shape) multiarrays.
- Fortran-like array reduction functions (sum, minval, etc.).
- Fortran-like array construction functions (merge, pack, spread, unpack).
- Fortran-like array manipulation functions (shift, spread).
- Fortran-like array location functions (maxloc, minloc).
- Operations that correspond to Fortran-like array expressions (addition, multiplication, etc.)
doubleArray2D and double[][].
The conversion operations create new copies of the data to prevent
exposing the internal structure of the multiarray classes.
The multiarray package also needs to provide ravel and
unravel methods that copy data between multiarray objects
and Java one-dimensional arrays of the same type. For the
purpose of copying data to and from one-dimensional arrays,
the logical ordering of elements of a multiarray (defined above)
is used. It may make sense to provide additional variants of
ravel/unravel, in which the ordering of the elements is
explicitly specified (e.g., supporting column major
operation). Finally, it remains to be investigated how
ravel/unravel operations could be optimized to avoid copying
data.
Quite often, Java code using the Array package will have to interface
with native code. Although native code can use the JNI procedure
for accessing elements of a multiarray through the set
and get accessor methods, we envision that most invocations
will be to methods that expect a Java single-dimensional array as
a parameter. For example, if a Java programmer wants to invoke
a native method native static void foo(double[]),
passing as parameter a doubleMultiarray2D,
it is first necessary to extract the elements into a Java
single-dimensional array (using an unravel operation), then
invoke method foo, and then propagate the effect
back to the original doubleMultiarray2D
(using a ravel operation):
doubleMultiarray2D A = new doubleMuliarray2D(m,n);
...
double[] B = A.unravel();
foo(B);
A.ravel(B);
Access to individual elements of multiarrays is performed
only through get and set accessor methods.
As an example, consider the following simple implementation of a matrix
multiply code, where each matrix is represented by a
doubleArray2D class:
void matmul(doubleArray2D a, doubleArray2D b, doubleArray2D c) {
/*
* For simplicity, let us consider only the simple case
* in which the matrices are conforming, that is,
* a is m x n, b is n x p, and c is m x p.
*/
int m = a.size(0);
int n = a.size(1);
int p = b.size(1);
for (int i=0; i<m; i++) {
for (int j=0; j<p; j++) {
c.set(i,j,0);
for (int k=0; k<n; k++)
c.set(i,j,c.get(i,j)+a.get(i,k)*b.get(k,j));
}
}
}
The multiarray classes can be implemented with no changes to Java, the JVM, or to the JNI.
However, it is essential that the get and set accessor
methods be implemented as efficiently as
array indexing operations are in Fortran or in C.
Multidimensional arrays are extremely common in numerical
computing, and hence we expect that efficient multidimensional arrays classes
will be heavily used.
2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)
The Multiarray package is targeted at both the desktop and server platforms.
2.3 What need of the Java community will be addressed by the proposed specification?
Multidimensional arrays are the most common data
structures in scientific and engineering computing.
The Java Multiarray package provides the ability to represent multidimensional arrays in
Java programs.
It supports a set of array operations that lead to concise representation and
efficient optimization of scientific and engineering codes.
2.4 Why isn't this need met by existing specifications?
Native Java arrays are strictly one-dimensional. Multidimensional arrays
are simulated as "arrays of arrays". That means, for example,
that each element of a double[][] array is a
double[] array.
Arrays of arrays are very general and therefore more difficult to optimize.
For instance, for a double[][] a, rows
a[i] and a[j] may have different lengths.
Bounds checking optimization is an example of an optimization that can be better
performed when arrays are known to be rectangular (as in true multidimensional
array).
Furthermore, the "array of arrays" approach opens up more possibility for aliasing.
For instance, for a double[][] a and
double[][] b, it is possible that a[i] and
b[j] refer to the same row.
In fact, it is possible that a[i] and a[j] (i != j)
refer to the same row.
Many advanced compiler optimizations rely on accurate aliasing disambiguation.
The java.vecmath package includes two classes that are relevant to this
discussion: GVector and GMatrix. They implement one- and two-dimensional arrays of
doubles. The purpose of these classes is similar in spirit to the Multiarray package
but they offer much more restricted functionality.
GVector and GMatrix only support one- and two-dimensional arrays of doubles and
they only offer very limited aggregate operations.
2.5 Please give a short description of the underlying technology or technologies:
Implementing multidimensional arrays as classes has been done before in the context of A++/P++ and POOMA. The mechanisms for mapping a multidimensional cartesian space into a single-dimensional address space are well understood. The main challenge is performance.
A good implementation of the Multiarray package will deliver high performance in
the execution of aggregate methods. That is, methods that operate on many
elements, such as adding two array sections or multiplying two matrices.
The techniques for performing efficient aggregate operations on matrices,
in particular for linear algebra operations, are well described in the
literature.
To deliver good performance on user codes that perform elementary operations on multiarray elements, two compiler technologies are of utmost importance:
- Bounds checking optimization through versioning: This technique creates safe regions of code where all array accesses are guaranteed to be valid. Aggressive optimizations, that reorder code, can be applied in these regions.
- Semantic expansion: With the
Multiarraypackage, each elemental data access (setorget) operation is a method call. The method verifies that all the indices are valid (along each of the multiarray axes) before proceeding to the actual data access. Exposing the semantics of these methods to the compiler is a necessary step for optimization. This is accomplished through the technique of semantic expansion.
2.6 Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, etc.)
javax.math.multiarray
2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?
No.
A strawman implementation of the Multiarray package was
implemented entirely in Java.
2.8 Are there any security issues that cannot be addressed by the current security model?
No.
2.9 Are there any internationalization or localization issues?
No.
2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?
No other existing specifications are affected by this specification.
The functionality of classes GVector and GMatrix of
javax.vecmath will be largely
superseded by the functionality of the equivalent classes
(doubleArray1D and doubleArray2D, respectively) in the multiarray package.
However, the interfaces are not compatible and for the benefit of programs
using the Java 3D API both GVector and GMatrix should continue to be maintained.
The recent proposal to extend generic types to Java could provide a more general solution for building multidimensional arrays of user-defined types.
2.11 Please describe the anticipated schedule for the development of this specification.
- 12/2000: First specification of
Multiarraypackage ready. - 03/2001: Implementation of a reference implementation of the
Multiarraypackage ready. Tests begin. - 04/2001 - 06/2001: Open review of the
Multiarraypackage. - 06/2001: Report on tests with the reference implementation of the
Multiarraypackage. Redesign of the package. - 09/2001: Final specification of the
Multiarraypackage ready. - 12/2001: Reference implementation and test suite for final specification ready.
Section 3: Contributions
3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.
A strawman for the Array package is available as package
com.ibm.math.array. It
can be downloaded from
www.alphaWorks.ibm.com/tech/ninja.
A prototype research compiler that performs the optimizations of semantic expansion and bounds checking optimization has been implemented at IBM Research.
Two technical reports describing the design of the strawman Multiarray package are
available:
- IBM Research Division RC21369. A Standard Java Array Package for Technical Computing. Jose Moreira, Sam Midkiff, Manish Gupta.
- IBM Research Division RC21482. High Performance Numerical Computing in Java: Language and Compiler Issues. Pedro Artigas, Manish Gupta, Sam Midkiff, Jose Moreira.
3.2 Explanation of how these items might be used as a starting point for the work.
The strawman implementation for the Multiarray package includes most of the
functionality desired in the final product.
The main features missing are reduction operations (sum and product of all elements
of an array) and elementary math functions (sin, cos, exp, etc applied to all
elements of an array.) Availability of the strawman implementation will allow
us to proceed with the accelerated schedule described in Section 2.11.