This chapter contains the following sections:
Input Specification
Assembler Significant Characters
Sections
Section Memory Types
Section Attributes
Absolute Sections
Relocatable Section Names
Grouped Sections
Registers
Special Function Registers
An assembly program consists of zero or more statements, one statement per line. A statement may optionally be followed by a comment, which is introduced by a semicolon character (;) and terminated by the end of the input line. Any source statement can be extended to one or more lines by including the line continuation character (\) as the last character on the line to be continued. The length of a source statement (first line and any continuation lines) is only limited by the amount of available memory. Upper and lower case letters are considered equivalent for assembler mnemonics and directives, but are considered distinct for labels, symbols, directive arguments, and literal strings.
A statement can be defined as:
[label:] [instruction | directive | macro_call] [;comment]
where,
label is an identifier or number. A label does not have to start on the first position of a line, but a label must always be followed by a colon.
LAB1: ;This is a label 1: jmp 1p ;This is an endless loop ;using numeric labels
instruction is any valid XA assembly language instruction consisting of a mnemonic and one, two, three or no operands. Operands are described in the chapter Operands and Expressions. The instructions are described separately in the chapter Instruction Set.
RETI ; No operand CPL R0 ; One operand AND R3H, [R0] ; Two operands CJNE [R1],#01,LAB1 ; Three operands
directive any one of the assembler directives; described separately in the chapter Assembler Directives.
macro_call a call to a previously defined macro. See the chapter Macro Operations.
A statement may be empty.
There are several one character sequences that are significant to the assembler. Some have multiple meanings depending on the context in which they are used. Special characters associated with expression evaluation are described in Chapter 4 , Operands and Expressions. Other assembler-significant characters are:
Individual descriptions of each of the assembler special characters follow. They include usage guidelines, functional descriptions, and examples.
Any number or characters preceded by a semicolon (;), but not part of a literal string, is considered a comment. Comments are not significant to the assembler, but they can be used to document the source program. Comments will be reproduced in the assembler output listing. Comments are preserved in macro definitions.
Comments can occupy an entire line, or can be placed after the last assembler-significant field in a source statement. The comment is literally reproduced in the listing file.
; This comment begins in column 1 of the source file
Loop: CALL COMPUTE ; This is a trailing comment ; These two comments are preceded ; by a tab in the source file
The backslash character (\), if used as the last character on a line, indicates to the assembler that the source statement is continued on the following line. The continuation line will be concatenated to the previous line of the source statement, and the result will be processed by the assembler as if it were a single line source statement. The maximum source statement length (the first line and any continuation lines) is 512 characters.
; THIS COMMENT \ EXTENDS OVER \ THREE LINES
The backslash (\) is also used to cause the concatenation of a macro dummy argument with other adjacent alphanumeric characters. For the macro processor to recognize dummy arguments, they must normally be separated from other alphanumeric characters by a non-symbol character. However, sometimes it is desirable to concatenate the argument characters with other characters. If an argument is to be concatenated in front of or behind some other symbol characters, then it must be followed by or preceded by the backslash, respectively.
See also section 5.5.1 .
Suppose the source input file contained the following macro definition:
SWAP_MEM MACRO REG1,REG2 ;swap memory contents XCH \REG1, \REG2 ENDM
The concatenation operator (\) indicates to the macro processor that the substitution characters for the dummy arguments are to be concatenated in both cases with the character I. If this macro were called with the following statement,
SWAP_MEM R0,R1
the resulting expansion would be:
XCH R0, R1
The ?symbol sequence, when used in macro definitions, will be replaced by an ASCII string representing the value of symbol. This operator may be used in association with the backslash (\) operator. The value of symbol must be an integer.
See also section 5.5.2 .
Consider the following macro definition:
SWAP_MEM MACRO REG1,REG2 ;swap memory contents XCH R\?REG1, R\?REG2 ENDM
If the source file contained the following SET statements and macro call,
AREG SET 1 BREG SET 2 SWAP_MEM AREG,BREG
the resulting expansion as it would appear on the source listing would be:
XCH R1, R2
The %symbol sequence, when used in macro definitions, will be replaced by an ASCII string representing the hexadecimal value of symbol. This operator may be used in associations with the backslash (\) operator. The value of symbol must be an integer.
See also section 5.5.3 .
Consider the following macro definition:
GEN_LAB MACRO LAB,VAL,STMT LAB\%VAL: STMT ENDM
If this macro were called as follows,
NUM SET 10 GEN_LAB HEX,NUM,'NOP'
The resulting expansion as it would appear in the listing file would be:
HEXA: NOP
The circumflex (^), when used as a unary operator in a macro expansion, will cause name mangling of any associated local label. Normally, the macro preprocessor will leave any local label inside a macro expansion to a normal label in the current module. By using the Local Label character (^), the label is made a unique label. This is done by removing the leading underscore and appending a unique string "__M_Lxxxxxx" where "xxxxxx" is a unique sequence number. The ^-operator has no effect outside of a macro expansion. The ^-operator is useful for passing label names as macro arguments to be used as local label names in the macro. Note that the circumflex is also used as the binary exclusive or operator.
See also section 5.5.5 .
Consider the following macro definition:
LOAD MACRO ADDR ADDR: MOV R0, ADDR ^ADDR: MOV R0, ^ADDR ENDM
If this macro were called as follows,
LOAD _LOCAL
the resulting expansion as it would appear in the listing file would be:
_LOCAL: MOV R0, _LOCAL _LOCAL__M_L000001: MOV R0, _LOCAL__M_L000001
The double quote ("), when used in macro definitions, is transformed by the macro processor into the string delimiter, the single quote ('). The macro processor examines the characters between the double quotes for any macro arguments. This mechanism allows the use of macro arguments as literal strings.
See also section 5.5.4 .
Using the following macro definition,
CSTR MACRO STRING ASCII "STRING" ENDM
and a macro call,
CSTR ABCD
the resulting macro expansion would be:
ASCII 'ABCD'
A sequence of characters which matches a symbol created with a DEFINE directive will not be expanded if the character sequence is contained within a quoted string. Assembler strings generally are enclosed in single quotes ('). If the string is enclosed in double quotes (") then DEFINE symbols will be expanded within the string. In all other respects usage of double quotes is equivalent to that of single quotes.
Consider the source fragment below:
DEFINE LONG 'short' STR_MAC MACRO STRING MSG 'This is a LONG STRING' MSG "This is a LONG STRING" ENDM
If this macro were invoked as follows,
STR_MAC sentence
then the resulting expansion would be:
MSG 'This is a LONG STRING' MSG 'This is a short sentence'
All assembler built-in functions start with the @ symbol. See section 4.5 for a full discussion of these functions.
SVAL EQU @ABS(VAL) ; Obtain absolute value
When used as an operand in an expression, the asterisk represents the current integer value of the run-time location counter.
CSEG AT 100H XBASE EQU $+20H ; XBASE = 120H
Square brackets are used to indicate to the assembler to use a location addressing mode.
MOV R0, [_Value]
The pound sign (#) is used to indicate to the assembler to use the immediate addressing mode.
CNST EQU 5H MOV R0, #CNST ;Load R0 with the value 5H
Sections group logical pieces of code or data. Each section has a memory type and optionally some properties. There are two types of sections: relocatable sections and absolute sections. The next paragraphs explain the use of sections in more detail.
The section memory type specifies the address space where the section will reside. For relocatable sections the memory type is specfied with a type specifier following the SEGMENT directive. For absolute sections the memory type is implied by the directive that initiates the absolute section. Valid section memory types are:
Type specifier relocatable section |
Directive absolute section |
Description |
BIT | BSEG | bit data space |
CODE | CSEG | code space |
DATA | DSEG | direct addressable data space |
IDATA | ISEG | indirect addressable data space |
XDATA | XSEG | external data space |
BITADDR | DBSEG | bitaddressable DATA (same as DATA BITADDRESSABLE) |
HCODE | HCSEG | huge code space |
EDATA | ESEG | SmartXA EEPROM Data Memory for SmartXA only |
HDATA | HSEG | huge indirect addressable data space |
XSHORT | XSSEG | first page of external (movx) data memory (same as XDATA SHORT) |
Table 3-1: Section memory types
The first group of memory types listed above are the TASKING 8051 compatible memory spaces, the second group lists the extended XA memory spaces.
The optional section attributes define the properties of the section. Depending on the memory type of the section an attribute is or is not allowed. Possible attributes are:
Section attribute | Description | Allowed on |
BITADDRESSABLE |
Specifies a section to be relocated within the bit space on a byte boundary. The section size is limited to 32 bytes. DATA BITADDRESSABLE is equivalent to BITADDR. | DATA |
SHORT | With this attribute, the locator allocates the section in the first 64K of external data memory. | XDATA |
PAGE | Ignored, TASKING C51 compatibility |
CODE HCODE XDATA HDATA |
INPAGE | Ignored, TASKING C51 compatibility |
CODE XDATA |
INBLOCK | Ignored, TASKING C51 compatibility | CODE |
INSEGMENT | Specifies a section which must be contained in a 64K-byte page (a segment). |
HCODE HDATA |
UNIT | A default attribute: the section will not be aligned. | all |
NOCLEAR | Specifies that the section is not to be cleared at program startup. This is a default attribute. | all |
CLEAR | Specifies that the section is to be cleared at program startup. |
DATA IDATA XDATA XSHORT HDATA BIT BITADDR |
INIT | Specifies that the section contains initialized data. The initial data is copied from ROM to RAM at program startup. |
DATA IDATA XDATA XSHORT HDATA BIT BITADDR |
OVERLAY | Specifies that the section may be overlaid by another overlayable section. An overlayable section implicitly gets the NOCLEAR attribute. No overlaying will be done if the OVERLAY attribute is omitted. |
DATA EDATA IDATA XDATA XSHORT HDATA BIT BITADDR |
ROMDATA | Specifies that the section contains initialized data. |
CODE HCODE DATA IDATA HDATA |
JOIN | Group sections. | all |
Table 3-2: Section attributes
For absolute sections only the attributes NOCLEAR, CLEAR, INIT and ROMDATA are allowed.
The section attributes can be divided in the following two groups:
From each group you can specify one attribute at the most. The attributes UNIT and NOCLEAR are the default attributes. An OVERLAY attribute cannot be combined with a PAGE, INPAGE, INBLOCK or INSEGMENT attribute. A section with an OVERLAY attribute implicitly also has a NOCLEAR attribute. A section with a ROMDATA attribute implicitly also has an INIT attribute.
An absolute section directive switches to an absolute section. Of the section attributes mentioned in the previous paragraph only NOCLEAR, CLEAR, INIT, INTSEGMENT and ROMDATA are allowed on an absolute section. An absolute section can be declared with or without a name.
For all absolute sections also the AT attribute is allowed. The expression following 'AT' defines the start address of the absolute section. If no attributes are specified then the absolute section will continue the last absolute section with the same memory type. If the AT attribute is not specified and the other attributes do not match the attributes of the last absolute section with the same memory type, then a new section is created starting at the first free address following that section. When the absolute section without AT attribute is the first absolute section with that memory type then the section will start at the first valid address for that memory type, that is zero for all sections.
The assembler generates object files in relocatable IEEE-695 object format. The assembler groups units of code and data in the object file using sections. All relocatable information is related to the start address of a relocatable section. The locator assigns absolute addresses to sections. A section is the smallest unit of code or data that can be moved to a specific address in memory after assembling a source file.
A relocatable section must be declared before it can be used. The SEGMENT pseudo declares a section with its attributes. A section name can be any identifier. The '@' character is not allowed in regular section names. The assembler and linker use this character to create overlayable or joined sections. This is explained below.
You can group sections together with the JOIN attribute. For example, when more sections have to be located within the same data page, you can use this attribute.
A section becomes overlayable by specifying the OVERLAY attribute. Because it is useless to initialize overlaid sections at program startup time (code using overlaid data cannot assume that the data is in the defined state upon first use), the NOCLEAR attribute is defined implicitly when OVERLAY is specified. Overlayable section names are composed as follows:
DA@nfunc SEGMENT DATA OVERLAY ^ ^ | |_ function name pool name
The linker overlays sections with the same pool name. To decide whether sections can be overlaid, the linker builds a call graph. Data in sections belonging to functions that call each other cannot be overlaid. The compiler generates pseudo instructions ( CALLS ) with information for the linker to build this call graph. The CALLS pseudo has the following syntax:
If the function main() has overlayable data allocations in the zero page and calls nfunc(), the following sections and call information will be generated:
DA@nfunc SEGMENT DATA OVERLAY DA@main SEGMENT DATA OVERLAY CALLS 'main', 'nfunc'
If a section declaration contains the OVERLAY attribute and the section name does not contain exactly one '@' character, the assembler will report an error.
When you have to group sections together, you can use the JOIN section attribute. For example, when two data sections have to be located within the same range, you can write this as follows:
DATA1@DS SEGMENT DATA JOIN
and for the second section:
DATA2@DS SEGMENT DATA JOIN
Note that sections are grouped by the extension used in the section name. So, the definition is:
sect@group SEGMENT DATA JOIN ^ ^ | | section name joined group name
Combining the JOIN and OVERLAY attributes gives the following result:
DA@DS@func SEGMENT DATA JOIN OVERLAY
The XA architecture permits bit, byte, word and -in a few cases- double word access to the registers.
A register name convention is introduced to enable the assembler to use generic instructions. The assembler deduces a hardware instruction from the generic mnemonic by interpreting the size of the operands. So if register addressing is used the register name should indicate the register's size.
A word register is composed of two byte registers. The low order byte of a word register is identified with an 'L' postfix, e.g., R0L. The high order byte of a word register is identified with an 'H' postfix, e.g., R0H. Registers R0..R7 are byte addressable.
Word registers are named R0..R15. The assembler, just like the baseline XA core, only supports registers R0..R7, registers R8..R15 are not implemented.
Double word registers are composted of two adjacent registers. Valid combinations are R1:R0, R3:R2, R5:R4 or R7:R6. A double word register is referenced by adding the postfix 'D' to the low order register, i.e., R0D.
Examples:
DIVU R0L,R1H DIVU.B R0L,R1H DIVU R0,R1L DIVU.W R0,R1L DIVU R0D,R2 DIVU.D R0,R2
There are four different instances of registers R0 through R3. Of these four banks only one bank can be active at any given time, referenced as R0 through R3. The contents of the other banks are inaccessible. PSW bits RS1 and RS0 select the active register bank:
RS1 | RS0 | register bank |
0 | 0 | bank 0 |
0 | 1 | bank 1 |
1 | 0 | bank 2 |
1 | 1 | bank 3 |
Table 3-3: Register bank selection
The XA SFRs are not mapped in the address space as with data memory. SFRs have control functions associated with them. For example, an SFR could control and/or provide status information of an on-chip peripheral. All SFRs reside in a 64 byte region starting at location 400H. The SFR space is both byteaddressable and bitaddressable.