Find JSRs
Submit this Search


Ad Banner
 
 
 
 

JSRs: Java Specification Requests
JSR 83: Multiarray package

This JSR has been Withdrawn
Reason: JSR-83 has shown slow progress for several years and has not generated significant industry interest or participation. With no outlook for a timely completion of this JSR, the Spec Lead decided to withdraw it.

Original Java Specification Request (JSR)

Identification | Request | Contributions | Additional Information

Title:

Java Multiarray package

Summary:

This JSR proposes a package implementing true rectangular multidimensional arrays for Java.

Section 1. Identification

Submitting Member: International Business Machines Corporation

Name of Contact Person: Jose E. Moreira

E-Mail Address: jmoreira@us.ibm.com

Telephone Number: 1-914-945-3987

Fax Number: 1-914-945-4425


Specification Lead: Jose E. Moreira, IBM Corporation

E-Mail Address: jmoreira@us.ibm.com

Telephone Number: 1-914-945-3987

Fax Number: 1-914-945-4425


Initial Expert Group Membership:

  • International Business Machines Corporation
  • Sun Microsystems, Inc

Additional endorsers of this JSR include individuals from:

  • University of North Carolina
  • National Institute of Standards and Technology
  • NAG
  • The Mathworks, Inc.
  • University of Tennessee
  • UCAR



Section 2: Request

2.1 Please describe the proposed Specification:

The Multiarray package provides support for true multidimensional arrays (henceforth called multiarrays) in Java. Multiarrays are n-dimensional rectangular collections of elements. A multiarray is characterized by its rank (number of dimensions or axes), its elemental data type (all elements of a multiarray are of the same type), and its shape (the extents along its axes).

Elements of a multiarray are identified by their indices along each axis. Let a d-dimensional array A of elemental type T have extent nj along its j-th axis, j = 0,...,d-1. Then, a valid index ij along the j-th axis must be greater than or equal to zero and less than nj. An attempt to reference an element A[i0,i1,...,id-1] with any invalid index ij causes an ArrayIndexOutOfBoundsException to be thrown.

Elements of a multiarray are logically ordered with respect to each other according to the following definition. An element A[i0,i1,...,id-1] of a d-dimensional multiarray A follows an element A[j0,j1,...,jd-1] of the same multiarray if and only if there exists a k greater than or equal to zero and less than d such that il=jl for all l<k and ik>jk. In usual nomenclature, this corresponds to row-major (C-style) ordering of the elements. We call this the intrinsic ordering of the multiarray elements. (In a column-major, i.e., Fortran-style, ordering of the elements, an element A[i0,i1,...,id-1] of a d-dimensional multiarray A follows an element A[j0,j1,...,jd-1] of the same multiarray if and only if there exists a k greater than or equal to zero and less than d such that il=jl for all l>k and ik>jk. This concept will be useful when we discuss native interfaces and mapping to conventional Java arrays below.)

We propose the development of standard Java classes which implement multiarrays. The rank and type of a multiarray are defined by its class. That is, for each rank and type there is a different concrete class. Supported types must include all of Java primitive types (boolean, byte, char, short, int, long, float, and double), one or more complex types (at least a Complex class that implements double precision complex numbers), and Object. Supported ranks must include 0- (scalar), 1-, 2-, 3-, 4-, 5-, 6- and 7-dimensional arrays. (Rank 7 is the current standard limit for Fortran.) The class for a d-dimensional array of type would be typeMultiarraydD. As an example, the class for a two-dimensional multiarray of doubles would be named doubleMultiarray2D. Multiarray classes are final classes.

All concrete classes of the same type (e.g., doubleMultiarray1D, doubleMultiarray2D, ..., doubleMultiarray7D) are derived from an abstract class with the same type (e.g., doubleMultiarray). All type-specific abstract classes (e.g., floatMultiarray, ComplexMultiarray, charMultiarray) are derived from the abstract class Multiarray. This class hierarchy supports type- and rank-generic programming (through the abstract classes), as well as type- and rank-specific programming (as would normally be done in C and Fortran). We expect that programming with the concrete final classes will be important for performance. Programs that use the type- and rank-specific concrete final classes can be optimized using mature compiler techniques developed for Fortran, C, and C++. The optimization of type- and rank-generic programs is still a research topic.

The Multiarray package must fully support the concept of regular multiarray sections. A regular multiarray section corresponds to a subarray of another multiarray, which we call the master multiarray. Each element of a multiarray section corresponds to exactly one element of its master multiarray. The correspondence is one-to-one in the section to master direction. Referencing one element of a multiarray section (for reading or writing) has the effect of referencing the corresponding element of the master multiarray. Regular multiarray sections have the same type as, and rank less than or equal to, their master multiarrays. Regular array sections behave exactly like regular arrays for all operations, including sectioning. (In fact, there are no separate classes for multiarray sections.)

A dB-dimensional multiarray section B of a dA-dimensional master multiarray A is defined by specifying, for each axis k of A, either (i) a specific element index along that axis or (ii) a range of element indices. A range of indices consists of an arithmetic progression of indices, defined by its first element, the incremental step, and the number of elements. The single index or the range of indices serve as a selector of the elements of A that are present in section B. The number of axes of A for which a single index is specified is equal to the rank difference between A and B, dA - dB. The shape of B is defined by the length (number of elements) in the range selectors.

In addition to regular multiarray sections, the Multiarray package must provide some support for irregular multiarray sections. In specifying an irregular multiarray section, the selector for an axis of a master multiarray A can be an arbitrary list of indices, as opposed to just an arithmetic progression. Irregular multiarray sections are never instantiated as objects, and there is no explicit representation for them. The concept is only present during certain operations on multiarrays.

The multiarray classes provide methods that implement Fortran-like functionality for arrays. In particular, the following operations must be provided:

  • Get and set the values of a multiarray element or regular multiarray section.
  • Get and set the values of the elements in an irregular multiarray section.
  • Operations to query the rank and shape of a multiarray.
  • Operations to reshape and transpose a multiarray.
  • Elemental conversion functions (e.g., the equivalent of Fortran REAL and AIMAG, that convert complex arrays into double arrays).
  • Elemental transcendental functions (e.g., sin and cos applied to each element of a multiarray).
  • Elemental relational functions (<, >, <=, >=, ==, !=), applied to corresponding elements of conforming (i.e., same shape) multiarrays.
  • Fortran-like array reduction functions (sum, minval, etc.).
  • Fortran-like array construction functions (merge, pack, spread, unpack).
  • Fortran-like array manipulation functions (shift, spread).
  • Fortran-like array location functions (maxloc, minloc).
  • Operations that correspond to Fortran-like array expressions (addition, multiplication, etc.)
It must be possible to convert (copy) Java arrays into multiarray objects of the same rank and vice-versa. As an example, it must be possible to convert back and forth between doubleArray2D and double[][]. The conversion operations create new copies of the data to prevent exposing the internal structure of the multiarray classes. The multiarray package also needs to provide ravel and unravel methods that copy data between multiarray objects and Java one-dimensional arrays of the same type. For the purpose of copying data to and from one-dimensional arrays, the logical ordering of elements of a multiarray (defined above) is used. It may make sense to provide additional variants of ravel/unravel, in which the ordering of the elements is explicitly specified (e.g., supporting column major operation). Finally, it remains to be investigated how ravel/unravel operations could be optimized to avoid copying data.

Quite often, Java code using the Array package will have to interface with native code. Although native code can use the JNI procedure for accessing elements of a multiarray through the set and get accessor methods, we envision that most invocations will be to methods that expect a Java single-dimensional array as a parameter. For example, if a Java programmer wants to invoke a native method native static void foo(double[]), passing as parameter a doubleMultiarray2D, it is first necessary to extract the elements into a Java single-dimensional array (using an unravel operation), then invoke method foo, and then propagate the effect back to the original doubleMultiarray2D (using a ravel operation):

        doubleMultiarray2D A = new doubleMuliarray2D(m,n);
        ...
        double[] B = A.unravel();
        foo(B);
        A.ravel(B);
        

Access to individual elements of multiarrays is performed only through get and set accessor methods. As an example, consider the following simple implementation of a matrix multiply code, where each matrix is represented by a doubleArray2D class:

void matmul(doubleArray2D a, doubleArray2D b, doubleArray2D c) {

    /*
     * For simplicity, let us consider only the simple case
     * in which the matrices are conforming, that is,
     * a is m x n, b is n x p, and c is m x p.
     */
    int m = a.size(0);
    int n = a.size(1);
    int p = b.size(1);

    for (int i=0; i<m; i++) {
        for (int j=0; j<p; j++) {
            c.set(i,j,0);
            for (int k=0; k<n; k++) 
                c.set(i,j,c.get(i,j)+a.get(i,k)*b.get(k,j));
        }
    }
}
        

The multiarray classes can be implemented with no changes to Java, the JVM, or to the JNI. However, it is essential that the get and set accessor methods be implemented as efficiently as array indexing operations are in Fortran or in C. Multidimensional arrays are extremely common in numerical computing, and hence we expect that efficient multidimensional arrays classes will be heavily used.

2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

The Multiarray package is targeted at both the desktop and server platforms.

2.3 What need of the Java community will be addressed by the proposed specification?

Multidimensional arrays are the most common data structures in scientific and engineering computing. The Java Multiarray package provides the ability to represent multidimensional arrays in Java programs. It supports a set of array operations that lead to concise representation and efficient optimization of scientific and engineering codes.

2.4 Why isn't this need met by existing specifications?

Native Java arrays are strictly one-dimensional. Multidimensional arrays are simulated as "arrays of arrays". That means, for example, that each element of a double[][] array is a double[] array. Arrays of arrays are very general and therefore more difficult to optimize. For instance, for a double[][] a, rows a[i] and a[j] may have different lengths. Bounds checking optimization is an example of an optimization that can be better performed when arrays are known to be rectangular (as in true multidimensional array). Furthermore, the "array of arrays" approach opens up more possibility for aliasing. For instance, for a double[][] a and double[][] b, it is possible that a[i] and b[j] refer to the same row. In fact, it is possible that a[i] and a[j] (i != j) refer to the same row. Many advanced compiler optimizations rely on accurate aliasing disambiguation.

The java.vecmath package includes two classes that are relevant to this discussion: GVector and GMatrix. They implement one- and two-dimensional arrays of doubles. The purpose of these classes is similar in spirit to the Multiarray package but they offer much more restricted functionality. GVector and GMatrix only support one- and two-dimensional arrays of doubles and they only offer very limited aggregate operations.

2.5 Please give a short description of the underlying technology or technologies:

Implementing multidimensional arrays as classes has been done before in the context of A++/P++ and POOMA. The mechanisms for mapping a multidimensional cartesian space into a single-dimensional address space are well understood. The main challenge is performance.

A good implementation of the Multiarray package will deliver high performance in the execution of aggregate methods. That is, methods that operate on many elements, such as adding two array sections or multiplying two matrices. The techniques for performing efficient aggregate operations on matrices, in particular for linear algebra operations, are well described in the literature.

To deliver good performance on user codes that perform elementary operations on multiarray elements, two compiler technologies are of utmost importance:

  1. Bounds checking optimization through versioning: This technique creates safe regions of code where all array accesses are guaranteed to be valid. Aggressive optimizations, that reorder code, can be applied in these regions.
  2. Semantic expansion: With the Multiarray package, each elemental data access (set or get) operation is a method call. The method verifies that all the indices are valid (along each of the multiarray axes) before proceeding to the actual data access. Exposing the semantics of these methods to the compiler is a necessary step for optimization. This is accomplished through the technique of semantic expansion.

2.6 Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, etc.)

javax.math.multiarray

2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

No. A strawman implementation of the Multiarray package was implemented entirely in Java.

2.8 Are there any security issues that cannot be addressed by the current security model?

No.

2.9 Are there any internationalization or localization issues?

No.

2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

No other existing specifications are affected by this specification. The functionality of classes GVector and GMatrix of javax.vecmath will be largely superseded by the functionality of the equivalent classes (doubleArray1D and doubleArray2D, respectively) in the multiarray package. However, the interfaces are not compatible and for the benefit of programs using the Java 3D API both GVector and GMatrix should continue to be maintained.

The recent proposal to extend generic types to Java could provide a more general solution for building multidimensional arrays of user-defined types.

2.11 Please describe the anticipated schedule for the development of this specification.

  • 12/2000: First specification of Multiarray package ready.
  • 03/2001: Implementation of a reference implementation of the Multiarray package ready. Tests begin.
  • 04/2001 - 06/2001: Open review of the Multiarray package.
  • 06/2001: Report on tests with the reference implementation of the Multiarray package. Redesign of the package.
  • 09/2001: Final specification of the Multiarray package ready.
  • 12/2001: Reference implementation and test suite for final specification ready.





Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

A strawman for the Array package is available as package com.ibm.math.array. It can be downloaded from www.alphaWorks.ibm.com/tech/ninja.

A prototype research compiler that performs the optimizations of semantic expansion and bounds checking optimization has been implemented at IBM Research.

Two technical reports describing the design of the strawman Multiarray package are available:

  1. IBM Research Division RC21369. A Standard Java Array Package for Technical Computing. Jose Moreira, Sam Midkiff, Manish Gupta.
  2. IBM Research Division RC21482. High Performance Numerical Computing in Java: Language and Compiler Issues. Pedro Artigas, Manish Gupta, Sam Midkiff, Jose Moreira.
Both these reports are available at www.research.ibm.com/ninja.

3.2 Explanation of how these items might be used as a starting point for the work.

The strawman implementation for the Multiarray package includes most of the functionality desired in the final product. The main features missing are reduction operations (sum and product of all elements of an array) and elementary math functions (sin, cos, exp, etc applied to all elements of an array.) Availability of the strawman implementation will allow us to proceed with the accelerated schedule described in Section 2.11.