TASKING provides an implementation of the LAPACK interface (including BLAS) [1] as a library for AURIX and AURIX Next Generation the form of a highly optimized, highly tested binary library for usage with the TASKING TriCore Compiler.
If you develop and maintain portable, yet high performance applications for the TriCore and other architectures supporting LAPACK, you should have a closer look.
Using C/C++, MATLAB/Simulink and other high level programming interfaces that reference the LAPACK interface make it easy to develop and maintain high performance applications. LAPACK is proven in use for many years in many computing projects which place a strong emphasis on exactness, reliability and performance.
Portability of applications with minimal impact on performance can be guaranteed by utilizing the LAPACK interface to perform the most time critical computations and leveraging the performance of highly optimized LAPACK implementations available on various platforms. An overview of the functionality of LAPACK is given in [3].
Appendix A
Below you find a list of BLAS/LAPACK functions targeted to be optimized for the first release of this library. These are the functions that have been shown to be most performance relevant for common user applications.
The optimizations are to be performed by dedicated experts on numerics, TriCore architecture and functional safety. Through testing and formal methods are applied to ensure the safety and correctness of the library. Full test reports for specific TASKING compiler versions and settings will be available as an option to the product.
Please be aware that this is a product preview, the exact set of optimized functions, the exact speedups and planned certification is subject to change at any time until TASKING has announced a final library specification that details the guaranteed specification of the product.
Note that most BLAS/LAPACK functions which are not listed here are intended to be supported by the library as well but these functions will receive less attention in terms of optimizations in the first release.
BLAS
SGEMM
Performs one of the matrix-matrix operations
C := alpha*op(A)*op(B) + beta*C,
where op( X ) is one of
op(X) = X or op(X) = X**T,
alpha and beta are scalars, and A, B and C are matrices, with op ( A ) an m by k matrix, op ( B ) a k by n matrix and C an m by n matrix.
SDOT
Forms the dot product of two vectors.
SGEMV
Performs one of the matrix-vector operations
y := alpha*A*x + beta*y, or
y := alpha*A**T*x + beta*y,
where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.
SSYMM
Performs one of the matrix-matrix operations
C := alpha*A*B + beta*C,
or
C := alpha*B*A + beta*C,
where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.
Additional functions may be optimized at customer’s request, additional features can be accommodated at customer’s request,
please contact TASKING product engineering at alexander.herz@altium.com
LAPACK
SGEJSV
Computes the singular value decomposition (SVD) of a real M-by-N matrix [A], where M >= N. The SVD of [A] is written as
[A] = [U] * [SIGMA] * [V]^t,
where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are the singular values of [A]. The columns of [U] and [V] are the left and the right singular vectors of [A], respectively. The matrices [U] and [V] are computed and stored in the arrays U and V, respectively. The diagonal of [SIGMA] is computed and stored in the array SVA.
SPSTF2
Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:
P**T * A * P = U**T * U, if UPLO = 'U',
P**T * A * P = L * L**T, if UPLO = 'L',
where U is an upper triangular matrix and L is lower triangular, and P is stored as vector PIV.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls level 2 BLAS.
SGEQRT3
Recursively computes a QR factorization of a real M-by-N matrix A, using the compact WY representation of Q.
Based on the algorithm of Elmroth and Gustavson, IBM J. Res. Develop. Vol 44 No. 4 July 2000.
SGETRF2
Computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form:
A = P * L * U,
where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the recursive version of the algorithm. It dividesthe matrix into four submatrices:
[ A11 | A12 ] where A11 is n1 by n1 and A22 is n2 by n2
A = [ -----|----- ] with n1 = min(m,n)/2
[ A21 | A22 ] n2= n-n1
[ A11 ]
The subroutine calls itself to factor [ --- ],
[ A12 ]
[ A12 ]
do the swaps on [ --- ], solve A12, update A22,
[ A22 ]
then calls itself to factor A22 and do the swaps on A21.
SPOTRF2
Computes the Cholesky factorization of a real symmetric positive definite matrix A using the recursive algorithm.
The factorization has the form:
A = U**T * U, if UPLO = 'U', or
A = L * L**T, if UPLO = 'L',
where U is an upper triangular matrix and L is lower triangular.
This is the recursive version of the algorithm. It divides the matrix into four submatrices:
[ A11 | A12 ] where A11 is n1 by n1 and A22 is n2 by n2
A = [ -----|----- ] with n1 = n/2
[ A21 | A22 ] n2 = n-n1
The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor A22.
SPSTRF
Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:
P**T * A * P = U**T * U, if UPLO = 'U',
P**T * A * P = L * L**T, if UPLO = 'L',
where U is an upper triangular matrix and L is lower triangular, and P is stored as vector PIV.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls level 3 BLAS.
Additional functions may be optimized at customer’s request, please contact TASKING at sales@tasking.com
Bibliography
[1] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers.
InProceedings of the 1990 ACM/IEEE Conference on Supercomputing 2–11 (IEEE Computer Society Press, 1990).
http://www.netlib.org/lapack/ , http://www.netlib.org/blas/
[2] https://ti.arc.nasa.gov/tech/asr/intelligent-robotics/nasa-vision-workbe...
[3] http://www.hpcavf.uclan.ac.uk/softwaredoc/sgi_scsl_html/sgi_html/ch03.html
[4] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers.
In Intelligent Vehicles Symposium, 2008 IEEE 7–12 (IEEE, 2008).
http://arxiv.org/pdf/1411.7113.pdf
[5] http://de.mathworks.com/company/newsletters/articles/matlab-incorporates...