TASKING LAPACK Performance Libraries

Overview

TASKING^® provides an implementation of the LAPACK interface (including BLAS) [1] as a library for AURIX and AURIX 2nd Generation in the form of a highly optimized, highly tested binary library for usage with the TASKING TriCore Compiler (included in the TriCore VX Toolset) and other architectures.

If you develop and maintain portable, yet high performance applications for the TriCore and other architectures supporting LAPACK, you should have a closer look.

Using C/C++, MATLAB/Simulink and other high level programming languages that reference the LAPACK libraries makes it easy to develop and maintain high performance applications. LAPACK is proven in use for many years in many computing projects which place a strong emphasis on exactness, reliability and performance.

Portability of applications with minimal impact on performance can be guaranteed by utilizing the LAPACK interface to perform the most time critical computations and leveraging the performance of highly optimized LAPACK implementations available on various platforms. An overview of the functionality of LAPACK is given in [3]. A summary of LAPACK and BLAS functions appears below.

Applications

Applications that typically rely on, and benefit from LAPACK include

ADAS Applications, e.g. some deterministic implementations of lane detection [4]

Solving systems of linear equations efficiently (Cholesky Decomposition, …)
Coordinate transformations (Camera space to Cartesian space)
Splines (display recognized lanes)
Radar Applications

MATLAB/Simulink based applications [4] with code that calls into LAPACK/BLAS for

Matrix Multiplication
Vector Matrix Products

Benefits

Directly supported by C/C++, MATLAB/Simulink, BASELABS and other model based packages.
Optimized for TriCore.
Bit-exact TriCore result simulation on PCs available from TASKING.

Easily port your existing BLAS/LAPACK based applications from ARM/Intel to TriCore.
Functionality provided by LAPACK usually determines speed of overall application, speedup in LAPACK usually translates into direct speedup for the application.

Single precision complex BLAS & LAPACK functions included for even faster performance
Highly tested and proven in use.
Very small porting costs, highly cost effective development (no manual optimization, which require hardware and numerics experts).

BLAS
LAPACK

SGEMM

Performs one of the matrix-matrix operations in pseudo code:

C := alpha*op(A)*op(B) + beta*C,

where op( X ) is one of

op(X) = X or op(X) = X**T,

alpha and beta are scalars, and A, B and C are matrices, with op ( A ) an m by k matrix, op ( B ) a k by n matrix and C an m by n matrix.

SDOT

Forms the dot product of two vectors.

SGEMV

Performs one of the matrix-vector operations

y := alpha*A*x + beta*y, or
y := alpha*A**T*x + beta*y,

where alpha and beta are scalars, x and y are vectors and A is an m by n matrix.

SSYMM

Performs one of the matrix-matrix operations

C := alpha*A*B + beta*C,

C := alpha*B*A + beta*C,

where alpha and beta are scalars, A is a symmetric matrix and B and C are m by n matrices.

SGEJSV

Computes the singular value decomposition (SVD) of a real M-by-N matrix [A], where M >= N. The SVD of [A] is written in pseudo code as:

[A] = [U] * [SIGMA] * [V]^t,

where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are the singular values of [A]. The columns of [U] and [V] are the left and the right singular vectors of [A], respectively. The matrices [U] and [V] are computed and stored in the arrays U and V, respectively. The diagonal of [SIGMA] is computed and stored in the array SVA.

SPSTF2

Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:

P**T * A * P = U**T * U, if UPLO = 'U',
P**T * A * P = L * L**T, if UPLO = 'L',

where U is an upper triangular matrix and L is lower triangular, and P is stored as vector PIV.
This algorithm does not attempt to check that A is positive semidefinite. This version of the algorithm calls level 2 BLAS.

SGEQRT3

Recursively computes a QR factorization of a real M-by-N matrix A, using the compact WY representation of Q.

Based on the algorithm of Elmroth and Gustavson, IBM J. Res. Develop. Vol 44 No. 4 July 2000.

SGETRF2

Computes an LU factorization of a general M-by-N matrix A using partial pivoting with row interchanges.
The factorization has the form:

A = P * L * U,

where P is a permutation matrix, L is lower triangular with unit diagonal elements (lower trapezoidal if m > n), and U is upper triangular (upper trapezoidal if m < n).
This is the recursive version of the algorithm. It dividesthe matrix into four submatrices:

[ A11 | A12 ]
A = [ -----|----- ]
[ A21 | A22 ]

where A11 is n1 by n1 and A22 is n2 by n2,
with n1 = min(m,n)/2 and n2= n-n1.

	[ A11 ]
The subroutine calls itself to factor	[ ----- ]
	[ A12 ]

	[ A12 ]
do the swaps on	[ ----- ]	solve A12, update A22
	[ A22 ]

then calls itself to factor A22 and do the swaps on A21.

SPOTRF2

Computes the Cholesky factorization of a real symmetric positive definite matrix A using the recursive algorithm.
The factorization has the form:

A = U**T * U, if UPLO = 'U', or
A = L * L**T, if UPLO = 'L',

where U is an upper triangular matrix and L is lower triangular.
This is the recursive version of the algorithm. It divides the matrix into four submatrices:

[ A11 | A12 ]
A = [ -----|----- ]
[ A21 | A22 ]

where A11 is n1 by n1 and A22 is n2 by n2, with n1 = n/2 and n2 = n-n1.

The subroutine calls itself to factor A11. Update and scale A21 or A12, update A22 then call itself to factor A22.

SPSTRF

Computes the Cholesky factorization with complete pivoting of a real symmetric positive semidefinite matrix A.
The factorization has the form:

P**T * A * P = U**T * U, if UPLO = 'U',
P**T * A * P = L * L**T, if UPLO = 'L',

Additional functions may be optimized at customer’s request.

To get a free trial, please fill in our our evaluation form.

References

[1] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers.
InProceedings of the 1990 ACM/IEEE Conference on Supercomputing 2–11 (IEEE Computer Society Press, 1990).

http://www.netlib.org/lapack/ , http://www.netlib.org/blas/

[2] https://ti.arc.nasa.gov/tech/asr/intelligent-robotics/nasa-vision-workbe...

[3] http://www.hpcavf.uclan.ac.uk/softwaredoc/sgi_scsl_html/sgi_html/ch03.html

[4] Anderson, E. et al. LAPACK: A Portable Linear Algebra Library for High-performance Computers.
In Intelligent Vehicles Symposium, 2008 IEEE 7–12 (IEEE, 2008).

http://arxiv.org/pdf/1411.7113.pdf

[5] http://de.mathworks.com/company/newsletters/articles/matlab-incorporates...

[6] http://www.netlib.org/clapack/

Licensing and services

More