TASKING provides an implementation of the LAPACK interface (including BLAS) [1] as a library for AURIX and AURIX Next Generation the form of a highly optimized, highly tested binary library for usage with the TASKING TriCore Compiler.

If you develop and maintain portable, yet high performance applications for the TriCore and other architectures supporting LAPACK, you should have a closer look.

Using C/C++, MATLAB/Simulink and other high level programming interfaces that reference the LAPACK interface make it easy to develop and maintain high performance applications. LAPACK is proven in use for many years in many computing projects which place a strong emphasis on exactness, reliability and performance.

Portability of applications with minimal impact on performance can be guaranteed by utilizing the LAPACK interface to perform the most time critical computations and leveraging the performance of highly optimized LAPACK implementations available on various platforms. An overview of the functionality of LAPACK is given in [3].

Applications

Applications which typically rely on and benefit from LAPACK include

ADAS Applications, e.g. some deterministic implementations lane detection [4]:

  • Solving systems of linear equations efficiently (Cholesky Decomposition, …)
  • Coordinate transformations (Camera space to Cartesian space)
  • Splines (display recognized lanes)
  • Radar Applications

MATLAB/Simulink based applications [4] with code that calls into LAPACK/BLAS for:

  • Matrix Multiplication
  • Vector Matrix Products

Advantages of the TASKING LAPACK Performance Libraries

  • Directly supported by C/C++, MATLAB/Simulink, BASELABS and other model based packages
  • Optimized for TriCore
  • Bit exact TriCore result simulation on PC available from TASKING
  • Easily port your existing BLAS/LAPACK based applications from ARM/Intel to TriCore
  • Functionality provided by LAPACK usually determines speed of overall application, speedup in LAPACK usually translates into direct speedup for the application
  • Very small porting costs, highly cost effective development (no manual optimization, which require hardware and numerics experts)
  • Support for FFT in software and hardware
  • Highly tested and proven in use

Appendix A

Below you find a list of BLAS/LAPACK functions targeted to be optimized for the first release of this library. These are the functions that have been shown to be most performance relevant for common user applications.

The optimizations are to be performed by dedicated experts on numerics, TriCore architecture and functional safety. Through testing and formal methods are applied to ensure the safety and correctness of the library. Full test reports for specific TASKING compiler versions and settings will be available as an option to the product.


Please be aware that this is a product preview, the exact set of optimized functions, the exact speedups and planned certification is subject to change at any time until TASKING has announced a final library specification that details the guaranteed specification of the product.

Note that most BLAS/LAPACK functions which are not listed here are intended to be supported by the library as well but these functions will receive less attention in terms of optimizations in the first release. 

BLAS

SGEMM

Performs one of the matrix-matrix operations

C := alpha*op(A)*op(B) + beta*C,

where op( X ) is one of

op(X) = X or op(X) = X**T,

alpha and beta are scalars, and A, B and C are matrices, with op ( A ) an m by k matrix, op ( B ) a k by n matrix and C an m by n matrix.

SDOT

Forms the dot product of two vectors.


SGEMV

Performs one of the matrix-vector operations

y := alpha*A*x + beta*y, or
y := alpha*A**T*x + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

SSYMM

Performs one of the matrix-matrix operations

C := alpha*A*B + beta*C,

or

C := alpha*B*A + beta*C,

where alpha and beta are scalars,  A is a symmetric matrix and  B and C are  m by n matrices.

Additional functions may be optimized at customer’s request, additional features can be accommodated at customer’s request,
please contact TASKING product engineering at alexander.herz@altium.com

LAPACK

SGEJSV

Computes the singular value decomposition (SVD) of a real M-by-N matrix [A], where M >= N. The SVD of [A] is written as

[A] = [U] * [SIGMA] * [V]^t,

where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are the singular values of [A]. The columns of [U] and [V] are the left and the right singular vectors of [A], respectively. The matrices [U] and [V] are computed and stored in the arrays U and V, respectively. The diagonal of [SIGMA] is computed and stored in the array SVA.


SPSTF2

Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:

P**T * A * P = U**T * U,  if UPLO = 'U',
P**T * A * P = L * L**T,  if UPLO = 'L',

where U is an upper triangular matrix and L is lower triangular, and P is stored as vector PIV.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls level 2 BLAS.

SGEQRT3

Recursively computes a QR factorization of a real M-by-N matrix A, using the compact WY representation of Q.

Based on the algorithm of Elmroth and Gustavson, IBM J. Res. Develop. Vol 44 No. 4 July 2000.


SGETRF2 

Computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.  
The factorization has the form:

A = P * L * U,

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the recursive version of the algorithm. It dividesthe matrix into four submatrices:

[  A11 | A12  ] where A11 is n1 by n1 and A22 is n2 by n2
A = [
-----|-----with n1 = min(m,n)/2
[  A21 | A22  ] n2= n-n1             
[
A11 ]
The subroutine calls itself to factor [ --- ],
[ A12 ]
[
A12 ]
do the swaps on [ --- ], solve A12, update A22,
[ A22 ]

then calls itself to factor A22 and do the swaps on A21.

SPOTRF2

Computes the Cholesky factorization of a real symmetric positive definite matrix A using the recursive algorithm.
The factorization has the form:

A = U**T * U,  if UPLO = 'U', or
A = L * L**T,  if UPLO = 'L',

where U is an upper triangular matrix and L is lower triangular.
This is the recursive version of the algorithm. It divides the matrix into four submatrices:

[ A11 | A12 ] where A11 is n1 by n1 and A22 is n2 by n2
A
= [ -----|----- ] with n1 = n/2
[
A21 | A22 ]           n2 = n-n1

The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor A22.


SPSTRF

Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:

P**T * A * P = U**T * U,  if UPLO = 'U',
P**T * A * P = L  * L**T,  if UPLO = 'L',

where U is an upper triangular matrix and L is lower triangular, and P is stored as vector PIV.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls level 3 BLAS.

Additional functions may be optimized at customer’s request, please contact TASKING at sales@tasking.com

Bibliography

[1] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers.
InProceedings of the 1990 ACM/IEEE Conference on Supercomputing 2–11 (IEEE Computer Society Press, 1990).

http://www.netlib.org/lapack/ ,   http://www.netlib.org/blas/

 
[2] https://ti.arc.nasa.gov/tech/asr/intelligent-robotics/nasa-vision-workbe...

 
[3] http://www.hpcavf.uclan.ac.uk/softwaredoc/sgi_scsl_html/sgi_html/ch03.html

 
[4] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers. 
In Intelligent Vehicles Symposium, 2008 IEEE 7–12 (IEEE, 2008).

http://arxiv.org/pdf/1411.7113.pdf

 
[5] http://de.mathworks.com/company/newsletters/articles/matlab-incorporates...

 
[6]  http://www.netlib.org/clapack/