This chapter contains the following sections:
Startup Code
Register Usage
Calling Conventions
Section Usage
Compiler Hardware Environment
Operating Mode Register
Status Register
Other Registers
Stack
Stack Extension
Heap
Floating Point
Software Floating Point Implementation
Characteristics of Floating Point Types
Floating Point Constants
Usual Arithmetic Conversions
Single Precision Floating Point Format
Single Precision Floating Point Number Range
Comparison to IEEE-754 Standard for Binary Floating Point Arithmetic
Single Precision Floating Point Memory Usage
Software Floating Point Interfacing
The Basic Floating Point Operations
The Floating Point Accumulators
Storage 2-Complement Format Values
Internal Register Usage
Floating Point Code Generation
When linking your C modules with the C library, you can automatically link the object module containing the C startup code. The name of this module depends on the selected execution environment (target selection in EDE, or command line option -T of the control program). A large set of predefined target modules is supplied already. See below for information how to create support files for a user-designed target board. All of these modules define a list of values used to initialize the hardware correctly, and include the file cstart.inc which contains the actual code.
Because these modules specifiy the run-time environment of your DSP56xxx C application, you might want to edit them to match your needs. Therefore, these modules and the file cstart.inc are delivered in assembly source in the src subdirectory of the lib directory. Typically, you will copy the startup code file start.asm (and rename it to, for example, mystart.asm) and copy the file cstart.inc to your own directory and edit it. The file cstart.inc contains macro preprocessor symbols to tune the code. Include this file in your own startup code. You can check the predefined target startup code files target_name.asm to see what information you can put in your own startup code file.
To use the changed file, you must add it to the file list (EDE) or makefile (command line). You must also select "User supplied, no library startup code" (EDE) or -T on the command line to avoid linking with a library startup file. If you would link your own startup file with a library startup file, you would get conflicts on all symbols in the startup files.
The invocation (using the control program) is:
cc56 -c mystart.asm -DNODSTACK
for the static model (DSP5600x only), or
cc563 -c mystart.asm
for all other models.
In the C startup code an absolute code section is defined for setting up the power on vector and the DSP56xxx C environment. The power-on vector contains a jump to the F_START label. This global label must not be removed, since it is referred to by the C compiler. It is also used as the default start address of the application (see the start keyword in the locator description language DELFEE). The code space for all non-used interrupt vectors may be occupied by small user code sections. When this is not desirable, you should include default versions of the interrupt vectors by including the startup code in your project file list and removing the comments in the cstart.inc file. The default interrupt vectors will jump to the abort() function when they are inadvertently called.
The stack is defined in the locator description file (.dsc in directory etc) with the keyword stack, which results in a section called stack. Except in the static model of the DSP5600x, a user stack is always required.
See the section
Stack
for detailed information on the stack.
The stack pointer can be switched between r6 and r7. To initialize the correct address register with the base of the stack the linker retrieves a module from the library that defines the symbol R_STKINIT that contains the opcode for the register move. This avoids the need for two versions of every startup definition file.
The heap is defined in the description file with the keyword heap, which results in as section called heap.
See the section
Heap
for detailed information on heap management.
An important task of the startup code is to configure the interface of the DSP to external memory. The following registers are used for this:
5600x: BCR 563xx: SR OMR BCR AAR0 AAR1 AAR2 AAR3 DCR 566xx: SR OMR BCR
A download of a program that uses external memory can only take place after the external memory interface has been initialized correctly. CrossView Pro uses the values of symbols with coded names R_xxxxVALUE (e.g., R_AAR0VALUE) for the above registers to do this. The startup code assembly file defines and exports these symbols; the code in the cstart.inc file sets them during program start, so the code can also run without CrossView Pro (e.g., from an EPROM).
The startup code also takes care of initialized C variables, residing in the different RAM areas. Each memory type has a unique name for both the ROM and the RAM section. The startup code copies the initial values of initialized C variables from ROM to RAM, using these special sections and some run-time library functions. When initialization of C variables is not needed, you can translate the startup file with -DNOCOPY. See also the table keyword in the locator description language DELFEE.
When everything described above has been executed, your C application is called, using the global label Fmain, which has been generated by c563 for the C function main().
When the C application 'returns', which is not likely to happen in an embedded environment, the program ends with a DEBUG instruction, using the assembly label F_exit. When using a debugger, it can be useful to set a breakpoint on this label, indicating that the program has reached the end, or that the library function exit() has been called.
The following macros can be used to control the functionality of cstart.inc:
Macro | Description |
NODSTACK |
Do NOT initialize the dynamic stack pointer (only in static model for the DSP5600x). |
NOESTACK | Do NOT initialize stack extension hardware (DSP563xx only). |
NOCACHE | Do NOT enable cache (DSP563xx only). |
NOCOPY | Do NOT produce code to clear BSS sections and initialize DATA sections. |
NOARGV | Do NOT produce code to provide a dummy argument vector to function main(). |
PLLVALUE=val | Set clock phase locked loop register to val. If not defined, the PLL is not initialized. |
BCRVALUE=val |
Set external memory waitstates to val. If not defined, zero wait states is selected for all memory (not on DSP563xx). |
LOW_DYNAMIC | Select 16-bit arithmetic mode (DSP563xx only). |
NARROW_BUS | Select 16-bit address bus mode (DSP563xx only). |
USP=R6 | Select register R6 for user stack pointer. |
Table 7-1: Macros used in cstart.inc
The cstart.inc file declares some labels external, that will be resolved by the linker. The label 'start' will link the object from the file start.asm that contains a jump to the label F_START and will generate the start vector in the interrupt vector table. Likewise, some labels named irqxx, with xx in the range 1 to 63 are declared external or can be declared external and will create default interrupt vectors in the interrupt vector table. These default vectors will always jump to abort(), because they indicate that either a system error has occurred (e.g. hardware stack overflow), or that an unexpected interrupt has occurred. By default most interrupt vectors are left uninstalled in order to have as much internal program memory available as possible. If an interrupt is created in the C program this will overrule the default vector by defining the label irqxx; if you create interrupt handlers in assembly you can suppress the default vector in the same way. The symbols null_x and null_y are declared external in order to link a one-word absolute section at X:$0 and another one at Y:$0. This is done to allow safe checking of NULL pointers: by occupying these addresses we know for sure that if a pointer returns NULL it indicates an error condition. The _at() modifier can still be used to force variables at address 0.
The compiler will use all available registers that fit for storing variables and intermediates. Registers used are A, B, X0, Y0, X1, Y1, R0-R7, N0-N7. The modifier registers M0-M7 are only used in conjunction with the associated R0-R7 register to act as a circular pointer. When not in use as a circular pointer they must remain in the reset state (-1, linear addressing). Of course registers PC, SR and CCR are used implicitly, as is the hardware stack.
The compiler will try to use the available registers as efficiently as possible. The compiler uses a flexible register allocation scheme, which implies that any change to the C code may result in a different register usage.
For passing parameters to functions the compiler uses the following scheme:
- Arithmetic-type arguments: Float arguments are passed via AB, X and Y; long arguments are passed via A, B, X and Y; integers are passed via A, B, X0, Y0, X1 and Y1. The order of the arguments from left to right is A, B, X0, Y0, X1, Y1. Arguments requiring the whole X or Y register (e.g., a long) are allocated in the A, B, X and Y register before other arguments which only need X0, X1, Y0 or Y1.
- Structure-type arguments: Single-word structures are passed and returned in the same registers as integers, double-word structures in the same registers as floats. Longer structures are passed on the stack. The processor status flags are undefined upon return.
- Pointer-type arguments: Pointers are passed in registers R0, R4, R1, R5, R2, (R6) , R3, (R7) (and the corresponding modifier registers in case of circular pointers). Except for the static model, R6 or R7 is not used because it is reserved as user stack pointer.
- Variable argument lists are passed via the stack. All other arguments, except the argument that immediately preceeds the variable argument list, are stored using default register parameter passing.
extern int f(int,...); extern int g(int,int,...); extern int h(int,int,int,...);
int a0; int a1; int a2;
void foo(void) { f(a0, "a0 on stack as is this string"); g(a0, a1, "a0 via accumulator - remainder on stack"); h(a0, a1, a2, "a0 via A, a1 via B - remainder on stack"); }
- When there are too many arguments to be passed in the registers the remaining arguments are passed on the stack. If the _asmfunc qualifier is used, the compiler will issue an error message.
For C function return types, the following registers are used:
Return type | Register | Description |
char | A | accumulator |
short/int | A | accumulator |
long | A | accumulator |
_fract | A | accumulator |
long _fract | A | accumulator |
float | AB | floating point stack (see section 7.8 for floating point information) |
pointer | R0 | scratch address registers |
spointer | R0 | address registers |
circular pointer | R0/M0 | address registers |
Table 7-2: Function return types
The address register R7 (or R6) is used as user stack pointer (not for the DSP5600x in the static memory model). R6 is used if the corresponding compatibility switch (-Cr) is selected.
During the execution of the called function ('callee') all registers can be used. The caller saves all registers that must be preserved over the function call ('save-by-caller' calling convention) with the exception of the modifier registers. Upon return, the callee must reset all modifier registers to linear addressing (except M0 if it is used to return a circular pointer). Of course the stacks must remain balanced between call and return.
In the compatible calling convention, entered with either the keyword _compatible or with the global option -Cc, the parameter passing changes to the convention used by the Motorola C compiler. The first two parameters are passed in registers A and B and the return value is always in register A. Upon return of a value the processor status flags are set by the callee according to the return value. The registers Y0, Y1, R2, N2, R3, N3, R7 and N7 are preserved by the callee. To be completely compatible with the Motorola compiler the stack pointer must be chosen to be R6 (-Cr) and the right default memory space must be selected.
The compiler uses a convention to pass parameters in registers and on the stack, to save certain registers over a function call and to return values in certain registers or on the stack. In the _compatible calling convention several registers must be preserved in a function. The set of registers is different for the members of the DSP56xxx family, as shown in the table below.
Family Member | Parameter Registers | Registers to be preserved by callee | Return Register |
DSP5600x |
None (all on stack) When returning a struct/union, A points to the address to store it. |
B1, B0, X1, X0, Y1, Y0, R0..R5, R7 (R7 is replaced by R6 if R7 is used for user stack pointer) | A, except when returning a struct/union; in that case A points to the address where it is stored |
DSP563xx DSP566xx |
A,B (rest on stack) When returning a struct/union, R7 points to the address to store it. |
Y1, Y0, R2/N2, R3/N3, R7/N7 (R7/N7 are replaced by R6/N6 if R7 is used for user stack pointer) | A, except when returning a struct/union; in that case R7 points to the address where it is stored |
Table 7-3: Compatible calling convention
In the save-by-callee calling convention, selected with the _callee_save qualifier, all registers are preserved by the callee. The compiler tries to limit the number of registers used in the function when this has been selected. The parameter passing convention is the same as the default. Although this can have advantages in some cases, the default calling convention in combination with the flexible register allocation guarantees the best performance.
An example of the _callee_save qualifier is listed below.
#ifndef _CALLEE_SAVE #define _who_saves #else #define _who_saves _callee_save #endif int i; _who_saves void __inc(void) { i++; }
Default it will compile with the standard save-by-caller calling convention and the assembly will look as follows:
F__inc: move x:Fi,r3 move (r3)+ move r3,x:Fi rts
But when you compile with the define _CALLEE_SAVE, the callee itself becomes responsible for preserving R3 and the assembly will change to:
F__inc: move (r7)+ move r3,x:(r7) move x:Fi,r3 move (r3)+ move r3,x:Fi move x:(r7)-,r3 rts
c563 uses a number of sections. For a section the compiler generates an ORG directive in the assembler output file. The following list gives an overview of sections that may be generated by c563:
Section Name | Possible Attributes | Comment |
.p[n][i|e]text | near, internal, external | code |
.fptext | code from the floating point library | |
.l[n][i|e]const | near, internal, external | constant initialized L data, not copied from copy table |
.p[n][i|e]const | near, internal, external | constant initialized P data, not copied from copy table |
.x[n][i|e]const | near, internal, external | constant initialized X data, not copied from copy table |
.y[n][i|e]const | near, internal, external | constant initialized Y data, not copied from copy table |
.lovl | overlayed area in L memory for non-reentrant (static) functions | |
.povl | overlayed area in P memory for non-reentrant (static) functions | |
.xovl | overlayed area in X memory for non-reentrant (static) functions | |
.yovl | overlayed area in Y memory for non-reentrant (static) functions | |
.l[n][i|e]data | near, internal, external | initialized L data, copied from copy table in P |
.p[n][i|e]data | near, internal, external | initialized P data, copied from copy table in P |
.x[n][i|e]data | near, internal, external | initialized X data, copied from copy table in P |
.y[n][i|e]data | near, internal, external | initialized Y data, copied from copy table in P |
.l[n][i|e]bss | near, internal, external | cleared L data |
.p[n][i|e]bss | near, internal, external | cleared P data |
.x[n][i|e]bss | near, internal, external | cleared X data |
.y[n][i|e]bss | near, internal, external | cleared Y data |
Table 7-4: Section names
The [n][i|e] part of the section names has the following meaning:
In the compiled code it is assumed that all processor mode registers like OMR, SR and SP are in the state they have after reset, except when they are set up differently in the startup code. Although other settings of these registers may work just as well, care is needed. In any case, different settings must be compatible with the hardware connected and the memory layout selected. For some changes the startup code and/or the locator description files must be updated as well (e.g. if you want to place the stack extension in Y memory).
OMR bit 0-3: chip operating mode.
Normally the compiled code runs in expanded mode (default), but different settings may work. The interrupt vectors and startup vector are placed from address 0 by the compiler, so some modes will need special precautions.
OMR bit 16-20: stack extension settings (DSP563xx only).
Do not change these bits after the startup code.
The compiler assumes all status register bits are in the reset state, unless they have been set up otherwise in the startup code (lib/src/cstart.inc). This does not mean that the compiled code does not work with other settings, but care is needed. In summary (some of these bits are not present on all DSP56xxx family members):
SR bit 0-7: condition code register.
These bits are changed by just about any instruction. The L-bit (limiting occurred) is also used by the floating point library to indicate an overflow/underflow condition.
SR bit 8/9: interrupt mask.
Can be set to any value, but if the system timer is used (for delay() for
instance) its function may be affected (see lib/src/clock.c). No other library functions use interrupts.
SR bit 10/11: scaling mode.
Never used in compiled code. These bits can be changed locally in assembly code, but they must remain zero during execution of compiled code. These bits are cleared during interrupts and restored afterwards.
SR bit 12: Reserved.
SR bit 13: Sixteen bit compatibility mode (DSP563xx only).
Set up in 16-bit compilation model. This bit can be changed locally in assembly code, but must be restored to its startup value during execution of compiled code.
SR bit 14: Double precision multiply mode.
Never used in compiled code. This bit can be changed locally in assembly code, but must remain zero during execution of compiled code.
SR bit 15: DO-loop flag.
Never used in compiled code. Not very useful, do not touch in compiled code.
SR bit 16: DO-forever flag.
Never used in compiled code. Not very useful, do not touch in compiled code.
SR bit 17: Sixteen bit arithmetic mode.
Used in 16-bit compiled code. This bit can be changed locally in assembly code, but must be restored to its startup value during execution of compiled code. This bit is cleared during interrupts and restored afterwards; in the 16-bit models the compiler will force it on again in interrupt routines.
SR bit 18: Reserved.
SR bit 19: Cache Enable.
Switched on in startup code normally. Can be changed in the application, but the cache intrinsic functions cannot be called when the cache is switched off. Care must be taken to avoid using the cache area for code when switching the cache mode at run-time.
SR bit 20: Arithmetic saturation mode.
Never used in compiled code. This bit can be changed locally in assembly code, but must remain zero during execution of compiled code. The overflow behavior of integers is affected by this bit, so compiled code may behave strangely when this bit is set. Long division might not work anymore either. As register-register and register-memory moves of (long) _fracts result in saturation anyway, it is not very useful in compiled code. This bit is not changed during interrupt routines; if it is set anywhere in the code, it must be turned off in all C compiled interrupt routines using inline assembly.
SR bit 21: Rounding mode.
Never used in compiled code. This bit can be changed in compiled code to get the effect described in the processor manual. This bit is cleared during interrupts and restored afterwards.
SR bit 22-23: Core priority.
Never used in compiled code. These bits can be changed in compiled code to get the effect described in the processor manual.
The execution of compiled interrupt routines must be
avoided as well where the summary mentions that bits must remain unchanged in compiled code. So, the interrupts may have to be switched off locally
as well.
LA, LC, SP: do not change their value to avoid disrupting the program flow.
SSL, SSH: the hardware stack is used to store return addresses and hardware loop information. If required, return addresses are popped from the hardware stack to avoid overflows (DSP5600x). The hardware stack can be used as long as it is not exhausted and stack balance is preserved. In most cases using the user stack is a faster and easier method.
EP, SZ, SC (DSP563xx only):
the first two registers are set up in the startup code if the hardware stack extension is enabled and must not be changed during program execution. The stack count register could be used by a task switching kernel, but is of little use otherwise and should be left untouched.
The DSP563xx/DSP566xx processors have a system stack (hardware stack) with 16 locations, 15 for the DSP5600x , divided in a system stack high word (SSH) and system stack low word (SSL). This system stack is used by the DSP for function calls, long interrupts and program looping (DO and REP loops). For a C program the system stack size is not sufficient. A deeply nested C program that also uses DO and REP loops and where long interrupts may occur, would soon generate a system stack overflow, because the system stack is also used by the DSP for return addresses of function calls. Therefore, all memory models except the static model (c56 only) use a user stack to store C variables, common subexpressions, temporary results, and to pass parameters when all registers are occupied. The mixed model for functions that are not explicitly declared _reentrant, and the static model use overlayable sections for these purposes.
When hardware stack extension is not available (DSP5600x) or is not enabled, the hardware stack size is very limited. Therefore the changed hardware stack contents is transferred to the user stack during the function execution. The compiler generates code that pops the return address from the system stack and pushes it on the user stack (reentrant function) or saves it in a static area on function entry (static function). Before returning from the function it reverses this operation. A leaf function does not move the return address from the system stack.
The following diagrams show the structure of the stack. The first diagram reflects the system stack. The second diagram shows the user stack when using reentrant functions.
Figure 7-1: Stack diagrams
The user stack is defined in the locator description file (.dsc in directory etc) with the keyword stack, which results in a section called stack. The description file tells the locator where to allocate the user stack.
The user stack size can be controlled with the keyword length=size in the description file. If you do not specify the stack size, the locator will allocate the rest of the available RAM for the stack, as done in the startup code. You can use the locator defined labels F_lc_bs and F_lc_es in your application to retrieve the begin and end address of the stack. Please note that the locator will only allocate a stack section if the application refers to one of the locator defined symbols F_lc_bs or F_lc_es. (This is usually done in the startup code.) Remember that there must be enough space allocated for the stack, which grows upwards.
For non-reentrant functions, (non-register) automatics and (non-register) parameters are allocated in a static area and therefore do not use any stack space.
For reentrant functions, a user stack is used in memory. Automatics and parameters are all accessed using a user stack pointer register, allocated as a 16-bit pointer (USP). The stack pointer USP points to the last occupied location on the stack. If the compatibility option -Cs is used, the stack pointer USP points to the first free location on the stack. The stack frame also contains a so-called user frame pointer (UFP). The saved registers are also accessed using a user stack pointer. The user stack pointer (USP) is maintained in register R7 or R6 if the compatibility option is used. The stack must be placed in default memory, or at least contain default memory, so L memory can be used for it when default memory is X or Y.
The UFP is always relative to the USP. To save registers the UFP is not maintained in a register, but is calculated from the USP with an offset.
Stack extension is a mechanism that allows larger stack sizes than supported by the internal hardware mechanism. When stack extension is enabled and the internal hardware stack has reached its maximum capacity, the Least Recently Used (LRU) internal hardware stack location is copied to data memory to create a new stack entry on the internal stack.
The toolchain works from the assumption that stack extension is required and initializes EP and OMR in the startup code. Locator label F_lc_ub_se is loaded into EP and afterwards stack extension is enabled from the OMR.
You can set the size of the stack extension as follows:
Select the Project | Project Options... menu item. Expand the Linker/Locator entry and select Control File. Type the size of the stack extension in the Stack extension size field.
Or with the locator command line option:
-emSESIZE=size
The locator uses this macro to preprocess the following line in the description file (which results in the locator label F_lc_ub_se):
reserved label=se length=SESIZE;
If you do not want to use stack extension, you must add the macro NOESTACK to the assembler preprocessor options and make sure the startup code is added to your project. In this case you need to be sure that there will not be more than 16 function calls or do-loops; the number of available internal hardware stack entries.
To avoid the calculation for this, you can also choose not to use the stack extension:
Select the Project | Project Options... menu item. Expand the C Compiler entry and select Code Generation. Disable the Use hardware stack extension check box.
Or with the compiler command line option:
-Mn
This 'ignoring' of the stack extension does not set a control register
bit. Only the function return address is saved/restored to/from the user
stack. (see section 7.6,
Stack
).
The effect is that the internal hardware stack is bypassed for function
calls and as such cannot overflow. Do-loops cannot be bypassed but you can limit them:
Select the Project | Project Options... menu item. Expand the C Compiler entry and select Code Generation. Disable the Use hardware stack extension check box and type a number in the Max. hardware stack use outside interrupt functions field.
Or with the compiler option:
-Lnumber
Ignoring the hardware stack extension decrease execution speed because user calls must be saved and restored on the user stack as well. Furthermore, nested hardware do-loops become restricted. Since function calls are being preserved, the user stack size will naturally increase, as will code size. All this pleads for using the hardware stack extension.
The heap is only needed when dynamic memory management library functions are used: malloc(), calloc(), free() and realloc(). The heap is a reserved area in default memory; it cannot be placed in a different memory type because the library functions handling it rely on the memory type. If you use one of the memory allocation functions listed above, the locator automatically allocates a heap, as specified in the locator description file with the keyword heap.
A special section called heap is used for the allocation of the heap area. You can place the heap section anywhere in default memory, using the locator description file. You can specify the size of the heap using the keyword length=size in the locator description file. If you do not specify the heap size and yet refer to it (e.g. call malloc()), the locator will allocate the rest of the available X memory for the heap. The locator defined labels F_lc_bh and F_lc_eh (begin and end of heap) are used by the library function sbrk(), which is called by malloc() when memory is needed from the heap.
This section describes the definition and implementation of the TASKING Software Floating Point Library for the Motorola DSP56xxx Family of Digital Signal Processors.
There are three floating types defined by the ANSI C standard document, designated as float, double and long double. The characteristics for the double and long double types are equal to the float type, as described in the standard definition include file <float.h>.
Floating point constants conform to the ANSI C standard, except that an unsuffixed floating point constant has type float. If suffixed by the letter f or F, it has type float. If suffixed by the letter l or L, it also has type float because the characteristics for the double and long double types are chosen to be equal to the float type. Floating point constants in the range <-1,1> are interpreted as a fractional type (switchable with the compiler -AF option). Semantics, the type of a float constant is the first in which its value can be represented, first _fract then float. This allows fixed point arithmetic with fractional constants without suffixes. See also section 3.3.1 The Fractional Type.
Promotions conform to the ANSI C pattern of usual arithmetic conversions. This pattern is extended for _fract and long _fract.
Remember that floating point constants in <-1,1> are interpreted as a fractional type. See previous section Floating Point Constants.
Floating point number - (m,e) including mantissa sign
Bit number | 23 0 | 23 0 |
Binary encoding | s.mmm.mmmm.mmmm.mmmm.mmmm.mmmm | 0000.0000.0000.0000.eeee.eeee |
Bit Weight (2n) | 0 -1 -23 | 7 0 |
s = sign bit, m = mantissa bit, e = exponent bit |
Table 7-5: 2-Complement Format for 24-bit data models
Bit number | 15 0 | 15 0 |
Binary encoding | s.mmm.mmmm.mmmm.mmmm | 0000.0000.eeee.eeee |
Bit Weight (2n) | 0 -1 -15 | 7 0 |
s = sign bit, m = mantissa bit, e = exponent bit |
Table 7-6: 2-Complement Format for 16-bit data models
m = 24-bit mantissa or 16-bit mantissa (16 bit models) (two's complement, normalized fraction). 23-bit or 15-bit (16-bit models) mantissa precision plus 1-bit mantissa sign gives precision of approximately 7 or 4 (16-bit models) decimal digits. A minimum of 6 decimal digits is prescribed by the ANSI C standard for single precision floating point. The 24-bit mantissa (24-bit models) or 16-bit mantissa (16-bit models) was chosen to maximize precision with efficient use of the MPY and MAC instructions. A hidden leading 1 is not implemented in this format.
e = 8-bit exponent (unsigned integer, biased by fbias = +127) stored as a 24-bit or 16-bit (16-bit models) unsigned integer with 16 or 8 (16-bit models) leading zeros.
Largest positive mantissa 24-bit data models 16-bit data models | $7FFFFF $7FFF | +0.99999988079071044921875 +0.99996948242875 |
Smallest positive mantissa 24-bit data models 16-bit data models | $400000 $4000 | +0.5 +0.5 |
Floating point zero mantissa 24-bit data models 16-bit data models | $000000 $0000 | 0.0 0.0 |
Smallest negative mantissa 24-bit data models 16-bit data models | $BFFFFF $BFFF | -0.50000011920928955078125 -0.500030517578125 |
Largest negative mantissa 24-bit data models 16-bit data models | $800000 $8000 | -1.0 -1.0 |
Reserved mantissas 24-bit data models |
$000001 through $3FFFFF $C00000 through $FFFFFF | |
16-bit data models | $0001 through $3FFF $C000 through $FFFF |
Table 7-7: Supported mantissas
All reserved mantissas are illegal since they represent denormalized mantissas. Denormalized numbers are not supported.
Assumed fixed point exponent 24-bit data models 16-bit data models | $00007F $007F | 2+0 = +1.0 |
Smallest exponent 24-bit data models 16-bit data models | $000000 $0000 | 2-127 |
Largest exponent 24-bit data models 16-bit data models | $0000FF $00FF | 2+128 |
Reserved exponents 24-bit data models 16-bit data models | $000100 through $FFFFFF $0100 through $FFFF |
Table 7-8: Supported exponents
If bit weight 28 is set, exponent overflow has occurred.
If bit weight 29 is set, exponent underflow has occurred.
No distinct exponents are reserved for plus infinity,
minus infinity, Not-a-Number (IEEE NaN), minus zero or denormalized numbers.
Floating point number | Mantissa | Exponent | Decimal Value |
Largest positive 24-bit data model 16-bit data model | $7FFFFF $7FFF | $0000FF $00FF | +3.402823E+38 +3.403E+38 |
Smallest positive 24-bit data model 16-bit data model | $400000 $4000 | $000000 $0000 | +2.938736E-39 +2.939E-39 |
Floating point zero 24-bit data model 16-bit data model | $000000 $0000 | $000000 $0000 | +0.0 +0.0 |
Smallest negative 24-bit data model 16-bit data model | $BFFFFF $BFFF | $000000 $0000 | -2.938736E-39 -2.939E-39 |
Largest negative 24-bit data model 16-bit data model | $800000 $8000 | $0000FF $00FF | -3.402823E+38 -3.403E+38 |
Table 7-9: Floating point number range
Note that the two's complement mantissa does not have equal positive and negative ranges. Only sign-magnitude formats possess this property. These ranges should be checked after most arithmetic operations.
Since the IEEE Floating Point Arithmetic Standard is well publicized, it is useful to compare these two floating point formats. This floating point format is compared to the single precision IEEE format and it differs from the IEEE standard primarily in its handling of floating point exceptions. Other differences are noted in the table below. Conversion between the IEEE standard format and this format is straight-forward.
Characteristic | 2-Complement Format | IEEE Format |
Mantissa Precision 24-bit models 16-bit models | 23 bits 15 bits | 24 bits |
Hidden Leading One | No | Yes |
Mantissa Format
24-bit models 16-bit models |
24-bit Two's Complement Fraction 16-bit Two's Complement Fraction | 23 bit Unsigned Magnitude Fraction |
Exponent Width | 8 bits | 8 bits (single) |
Maximum Exponent | +128 | +127 (single) |
Minimum Exponent | -127 | -127 (single) |
Exponent Bias | +127 | +127 (single) |
Format Width 24-bit models 16-bit models | 48 bits 32 bits | 32 bits (single) |
Rounding | Round to Nearest | Round to Nearest (default) Round to +/-Infinite Round to Zero |
Infinity Arithmetic | Saturation Limiting | Affine Operations |
Denormalized Numbers | No (Forced to Zero) | Yes (With Minimum Exponent) |
Exceptions | Divide by Zero Overflow Underflow | Divide by Zero Overflow Underflow Invalid Operations Inexact Arithmetic |
Table 7-10: IEEE 754 Comparison
As shown in the table, the 2-complement floating point mantissa precision is one bit (24-bit models) or nine bits (16-bit models) less than the IEEE single precision format. This is a result of using two's complement arithmetic.
If exponent overflow occurs, the result is limited to the maximum representable floating point number of the correct sign. If exponent underflow occurs, the result is limited to the minimum representable floating point number, which is zero. Although this format does not provide the arithmetic safety offered by the IEEE standard, it avoids extensive error checking and exceptions in favor of real-time execution speed and efficient implementation.
All exception conditions are handled "in-line" according to predefined rules. This accepts the fact that real-time systems have no choice but to provide an output with some amount of error if an exception occurs. It is not possible to stop execution until the application program determines a solution to the problem and fixes it.
One major difference is the use of affine arithmetic in the IEEE standard versus the use of saturation arithmetic in this 2-complement floating point format. Affine arithmetic gives separate identity to plus infinity, minus infinity, plus zero and minus zero. In operations involving these values, finite quantities remain finite and infinite quantities remain infinite. In contrast, this format gives special identity only to unsigned zero.
This format performs saturation arithmetic such that any result out of the representable floating point range is replaced with the nearest floating point representation. In the analog world, overflow is analogous to an analog opamp output clamping at the power supply rails.
The IEEE floating point standard provides extensive error handling required by affine arithmetic, denormalized numbers, signaling Not a Number (NaNs) and quiet NaNs. It postpones introducing computation errors by using internal signaling and user traps to process each exception condition. Computational errors will be introduced by the application program if the calculation is completed instead of aborting the program. This format introduces computation errors when an exception occurs in order to maintain real-time execution. An error flag (L bit in CCR) is set to inform the application program that an exception has occurred. This bit will remain set until reset by the application program.
The floating point mantissa and exponent may be stored in any locations in any memory space. The input and output register values are organized so that the long (L:) addressing mode may be used to load/store both the mantissa and exponent with one instruction. If the long addressing mode is used, the mantissa is in X memory and the exponent is in Y memory at the same address.
This section describes how a floating point operation has to be performed using the DSP56xxx floating point library functions. This contains the basic floating point operations, floating point accumulator format and floating point interface functions.
This section does not describe the algorithms used or the implementaion considerations, nor does it give a thorough explanation of the floating point routines themselves.
The basic operations of the floating point library are specified below and consist of arithmetic operations and conversion operations. These operations are implemented to support the 2-Complement Format.
For each floating point operation, function calls are specified in single precision. It is also specified whether a floating point function needs one, two or three floating point operands as input, an integer operand as input, a fractional operand as input and if it returns a floating, integer or fractional value.
Floating point operations are performed on so-called floating point accumulators. These accumulators are located in predefined registers and contain the floating point value(s) passed to the floating point operation. Section 7.8.2.2 The Floating Point Accumulators describes the format of these accumulators. The first floating point operand has to be loaded in accumulator fac and (if necessary) the second and third operand in accumulator ftm1 or ftm2. A floating point result always resides in the floating point accumulator fac. An integer, long or fractional operand is passed via accumulator register A.
The following tables list all supported functions and their function names.
The actual functions are prefixed by the letters Rfp to meet the compiler run-time library function calling convention.
Operation | Function | Input operand(s) | Result |
Add | addf2 | fac, ftm1 | fac |
Subtract | subf2 | fac, ftm1 | fac |
Multiply | mulf2 | fac, ftm1 | fac |
Divide | divf2 | fac, ftm1 | fac |
Multiply-Accumulate + | macpf2 | fac, ftm1, ftm2 | fac |
Multiply-Accumulate - | macnf2 | fac, ftm1, ftm2 | fac |
Compare | cmpf2 | fac, ftm1 | CCR |
Negate | negf2 | fac | fac |
Table 7-11: Floating point arithmetic operations
Operation | Function | Input operand(s) | Result |
Signed Integer to Float | cif12 | A | fac |
Signed Long to Float | cif22 | A | fac |
Unsigned Integer to Float | cuf12 | A | fac |
Unsigned Long to Float | cuf22 | A | fac |
Fract to Float | crf12 | A | fac |
Long Fract to Float | crf22 | A | fac |
Float to Signed Integer | cfi21 | fac | A |
Float to Signed Long | cfi22 | fac | A |
Float to Unsigned Integer | cfu21 | fac | A |
Float to Unsigned Long | cfu22 | fac | A |
Float to Fract | cfr21 | fac | A |
Float to Long Fract | cfr22 | fac | A |
Table 7-12: Floating point conversion operations
The software floating point libraries for the DSP56xxx are based on the 2-Complement Format, which is fully optimized for fast and efficient floating point operations. The floating point values are stored in so-called floating point accumulators.
Three accumulators are necessary to perform all the floating point operations, they are called fac, ftm1 and ftm2. Accumulator fac is used for passing operand 1, result values, intermediate results and for internal calculations. Accumulators ftm1 and ftm2 are used for passing operands and internal calculations. The accumulators are located in registers.
Mantissa | Exponent | |
FAC | A2 - sign extension of A1 (unused) A1 - mantissa A0 - zero | B2 - sign extension of B1 (unused) B1 - exponent B0 - zero |
FTM1 | X1 | X0 |
FTM2 | Y1 | Y0 |
Table 7-13: Accumulator formats
The following table shows the memory storage implementation used by the software floating point libraries for single precision floating point values.
Address | +0000 | +0001 |
Binary encoding | 0000.0000.0000.0000.eeee.eeee | s.mmm.mmmm.mmmm.mmmm.mmmm.mmmm |
s = sign bit, m = mantissa bit, e = exponent bit |
Table 7-14: Memory Layout for 24-bit data models
Address | +0000 | +0001 |
Binary encoding | 0000.0000.eeee.eeee | s.mmm.mmmm.mmmm.mmmm |
s = sign bit, m = mantissa bit, e = exponent bit |
Table 7-15: Memory Layout for 16-bit data models
The software floating point arithmetic and conversion functions use a set of registers. Some are used for parameter passing and others are free for internal use. If you use some of these registers in your own assembly function you have to save them before a floating point function can be performed.
The registers that are modified by each function are described in the following tables.
Function | Modified register(s) |
addf2 | A, B, X, R0, N0 |
subf2 | A, B, X, Y, R0, N0 |
mulf2 | A, B, X0, R0, N0 (DSP563xx) A, B, X0, R0 (other) |
divf2 | A, B, X, R0, N0 (DSP563xx) A, B, X, R0 (other) |
macpf2 | A, B, X, Y, R0, N0 (DSP563xx) A, B, X, Y, R0 (other) |
macnf2 | A, B, X, Y, R0, N0 (DSP563xx) A, B, X, Y, R0 (other) |
cmpf2 | none |
negf2 | A, B, R0, N0 (DSP563xx) A, B, R0 (other) |
Table 7-16: Floating point arithmetic operations register usage
Function | Modified register(s) |
cif12 | A, B, R0, N0 (DSP563xx) A, B, R0 (other) |
cif22 | A, B, R0, N0 (DSP563xx) A, B, R0 (other) |
cuf12 | A, B, X, R0, N0 (DSP563xx) A, B, X, R0 (other) |
cuf22 | A, B, X, R0, N0 (DSP563xx) A, B, X, R0 (other) |
crf12 | A, B, R0, N0 (DSP563xx) A, B, R0 (other) |
crf22 | A, B, R0, N0 (DSP563xx) A, B, R0 (other) |
cfi21 | A, B, Y1 (DSP563xx) A, B, Y, R0, N0 (other) |
cfi22 | A, B, Y1 (DSP563xx) A, B, Y, R0, N0 (other) |
cfu21 | A, B, Y, R0, N0 |
cfu22 | A, B, Y1 |
cfr21 | A, B, Y, R0, N0 |
cfr22 | A, B, Y, R0, N0 |
Table 7-17: Floating point conversion operations register usage
This section describes, using some examples, the basics of floating point code generation. It is impossible to describe here all possible code generation combinations with all the floating point operations, because the number of possible floating point expression is almost infinite. So, if you want to write your own floating point expression in assembly it is profitable to write it first in C and then use the code generated by the C compiler in your own assembly function.
The following example will illustrate a floating point expression using two floating point values returning a floating point value.
c = a + b; move x:Fa,b ; pass floating point a in fac move x:Fa+1,a move x:Fb,x0 ; pass floating point b in ftm1 move x:Fb+1,x1 jsr Rfpaddf2 ; perform add move a,x:Fc+1 ; store result from fac in c move b,x:Fc
Integer and long operands and integer and long results are passed via accumulator register A. The following example illustrates a conversion from long to float.
float a; long b; a = b; move x:Fb+1,a ; pass long b in A move x:Fb,a0 jsr Rfpcif22 ; convert from long to float move a,x:Fa+1 ; store result from fac in a move b,x:Fa
For more comprehensive floating point expressions it is not needed to store the floating result of a previous floating point operation and load it again for the next floating point operation. The result of the previous floating point operations remains in the accumulator fac and will be used in the next floating point operation. This is called intermediate result optimization. Only the second operand must be loaded in accumulator ftm1. The following example illustrates this.
d = a + b - c; move x:Fa,b ; pass floating point a in fac move x:Fa+1,a move x:Fb,x0 ; pass floating point b in ftm1 move x:Fb+1,x1 jsr Rfpaddf2 ; perform add ; The intermediate floating point ; result stays in fac ! move x:Fc,x0 ; pass floating point c in ftm 1 move x:Fc+1,x1 jsr Rfpsubf2 ; perform subtract move a,x:Fd+1 ; store result from fac in d move b,x:Fd
The floating point mechanism is based on the fact that when the floating point accumulator is loaded with a floating point operand and a next operand must be loaded in it, then the current contents of fac is saved on the user stack.
Next example is for the DSP5600x and illustrates the use of the user stack in a floating point expression, which needs to subtract two intermediate results. The first intermediate floating point result is stored on the user stack and the second can be hold in the accumulator fac. To perform the subtraction the intermediated result is popped from the user stack and loaded in accumulator ftm1. This example shows generation of reentrant code.
e = (a + b) - (c * d) move (r7)+ ; reserve user stack space move (r7)+ move x:Fa,b ; pass floating point a in fac move x:Fa+1,a move x:Fb,x0 ; pass floating point b in ftm1 move x:Fb+1,x1 jsr Rfpaddf2 ; perform add move #-2,n7 ; user stack offset move (r7)+ ; first stack element move a,x:(r7+n7) ; push result mantissa from ; fac on stack move (r7)- ; second stack element move b,x:(r7+n7) ; push result exponent from ; fac on stack move x:Fc,b ; pass floating point c in fac move x:Fc+1,a move x:Fd,x0 ; pass floating point d in ftm1 move x:Fd+1,x1 jsr Rfpmulf2 ; perform multiply move b,x0 ; store result from fac in ftm1 move a,x1 move (r7)+ ; first stack element move x:(r7+n7),a ; pop mantissa result from ; stack to fac move (r7)- ; second stack element move x:(r7+n7),b ; pop exponent result from ; stack to fac jsr Rfpsubf2 ; perform subtract move a,x:Fe+1 ; store result from fac in e move b,x:Fe move (r7)+n7 ; free reserved user stack space