The YGREC8's Assembly Language Manual
Created lun. 16 oct. 2023 05:50:46 CEST by whygee@f-cpu.org
Version sam. 16 déc. 2023 03:43:00 CET
PRELIMINARY / WORK IN PROGRESS
It's not yet a definitive reference, check again soon.
Check the latest news and updates at ygrec8.com
and download the latest version from src.ygrec8.com
or if you prefer git : gitlab.com/fhwd/ygrec8
©2023 Yann Guidon
all documentation published under CC BY-NC-SA 4.0 license.
Introduction
This manual is limited to details and specifications for assemblers and disassemblers:
programs that transform text into binary instructions or vice versa.
- Look at the YGREC8's Main manual for the
general overview of the project, the ISA and the programming model.
- Look at the YGREC8's Programming
manual for exemple code sequences, program samples and coding idioms.
General syntax, conventions and other rules
This specification tries to keep the syntax of YGREC8 programs source code as simple and easy
to understand and parse as possible. The assembler can be easily coded with sophisticated software tools (such as
JavaScript or
bison
and flex) but low-level languages (even assembly languages) should
work as well. Eventually (and even though this is very far fetched) the YGREC8
should be able to assemble its own programs, meaning that frivolous features are avoided
(due to obvious space and resource constraints).
- The assembler may have one, two or more passes depending on the implementation.
- Invocation of the assembly program is not specified. It can be a command-line
interface, a graphical interface or a module in another program.
- The recommended file name extension for assembly language input is .y8
and the syntax is described by this document.
- No file inclusion is provided. Just concatenate all the files before assembly.
- No aliasing or macro is provided. If needed, preprocess using an existing program such as
m4 or cpp.
- The file name extension for text-encoded binary code is .hyx (see
https://gitlab.com/fhwd/HYX/).
Raw binary code (.byn) is also possible but less flexible and lacks metadata
such as addresses or purpose (data or instructions? overlay number?).
- Instructions are output Lower Byte first (Little Endian). All addresses must be in
increasing order.
- By default, code is assembled starting at address 0. Don't forget to .ORG
if you need otherwise.
- All uninitialised and unused instruction locations must be set to all-ones
0xFFFF (like a blank Flash or EEPROM cell). Thus the program correctly traps
with the INV opcode when an
unexpected condition occurs, which is clearly distinct from NOP.
- Each instruction has its own line. Even the
PF prefix and the
labels (which are equivalent to .EQU).
- Lines can not be merged with a trailing "\" (unlike C and derived)
- Comments start at the semicolon ";" and extend to the end of the
current line.
- During parsing, the line is transformed to upcase to renormalise everything (even
symbols). Commas "," are considered as spaces (in case you have this
old habit). The following lines give the same result:
OR D1, D2 IFC
or d1 d2,ifc
Or D1 d2 IfC
oR d1, D2, iFc
Numbers are recognised by their prefix: a minus sign or a decimal digit.
They are by default written in decimal but can have an eventual suffix
to specify the base :
- d (decimal)
- h (hexadecimal)
- o (octal optional)
- b (binary)
The equivalent regular expression would be something like:
(([-][0-9a-fA-F])|([0-9]))[0-9a-fA-F]*[bBdDhHoO]?
Each number must fit in the desired field. All numbers are considered signed
but can also be written using unsigned numbers for convenience (as long as the
binary reprensentation fits in the field).
- Imm9 range: -256 to 511
- Imm8 range: -128 to 255
- Imm4 range: -8 to 7 (8 to 15 are not recommended because this field
is sign-extended, so it may output a warning)
User's symbols and intermediary results in expressions are stored as signed numbers
coded in 32 bits (int32_t in C). The assembler must check the range for every
instruction field. All arithmetic overflows result in throwing errors,
or at least a warning if the corresponding option is enabled.
When expecting a number, the assembler can accept a direct number (as in the
previous point), a character ('a'), a symbol (previously defined and
valued), or an expression (always between parenthesis) that accepts all of the
4 previous data types.
Some basic arithmetic may be provided to compute immediate values (+, -, *,
>>, >>>, <<, smin, smax, etc.) inside a binary "expression" (see above). There is
no precedence because every operation uses its own pair of parenthesis.
UTF-8/Unicode is not supported, only plain ASCII. Symbol names can only contain letters
(A-Z), digits (0-9) and underscores (_) but may not start with a
digit (otherwise this is considered as a number).
The default maximum length of a symbol (or label) is 24 ASCII characters. Depending
on the implementation, symbol names that exceed this length may either be supported, or
throw an error, or issue a warning or use only the 24 leading characters. Check and
configure your source code and tools carefully, in case you need longer names.
The symbols and reserved words can not be redefined (by .EQU,
.OVL, label or others). This will throw an error.
The order of the operands immediately follows the structure of the instruction, so
you usually have the following order : OPCODE SRI SND (SRI being an
immediate number or register name). However the condition is always at the end.
A typical instruction would be:
ADD -6 PC IFNZ ; subtract 6 from PC if Z flag is 0
The above instruction loops back 6 instructions until the result of the preceding
computation is zero.
Pseudo-instructions
Before passing a line to the actual instruction assembler, the line can be
pre-parsed and examined, looking for pseudo-instructions: these are commands that do
not turn into actual instructions but do help manage the overall program's structures
and definitions. The first word is parsed and tested:
• If the word starts with a dot "." then a pseudo-instruction is expected and
looked-up. It could be one of those:
- .END (essential) Ends parsing of the file. The source
file can contain any garbage (or the dump of the symbol table) below this line.
- .ORG (essential) sets the address where the next instruction
is stored. The value may be a number, symbol or expression, but can not be post-defined.
Depending on the assembler, it can also be an error to have a new address
smaller than the current ldcxaddress ($). For example the line
.ORG 42
means that the next instruction will be stored at address 42.
.EQU (essential) creates a new symbol with a known value.
.EQU plop 42
defines the symbol "PLOP" and assigns the value 42.
The symbol "PLOP" can be used later in the source file.
The accepted range is a signed 32-bit integer (uint32_t).
This value muse be fully defined.
.DW (essential) assembles a sequence of 16-bit immediate value.
.DW 42
will output the value 42 to the instruction memory space at the current address
($), as if it was an instruction. Depending on the assembler,
the first value may be "fixed" by a second pass if one symbol is undefined.
.DW accepts more than one argument:
.DW 1,1,2,3,5,8,13,21,33
Depending on the assembler, the first value may be "fixed" by a second pass
if one symbol is undefined. It would be too complex to fix the following ones.
.DW forward,1,2,3
.EQU forward 21
.BL (premiminary, optional) is like .DW but "byte low" to be
accessed by LDCL. Accepts a sequence of bytes and/or ASCII characters as strings (such as
"abc"). The upper byte is not affected.
.DB 11, "hello world"
.BH (premiminary, optional) is like .BH but "byte high" to be
accessed by LDCH. Does not affect the lower byte so you can use .BL and
.BH sequentially
on the same addresses.
.DH 26, "Bonjour le monde du haut !\n"
.OVL (premiminary, optional) declares a new overlay, this
.ORGs back to 0, expects a number (0 to 255) and can declare an optional
symbol name (a sort of enhanced .EQU with more side effects).
.OVL 23 MyCustomLibrary
.{ and .} are reserved for eventual/optional
nested contexts (local symbol tables) later. Throw an error if not supported.
• Otherwise, if the word ends in ':' (and is not an number, a symbol or
a keyword) then it is a label and its definition is added to the table of symbols.
There is no separator before ':' and the label must be alone on the line.
The line
plop:
is a shorthand to the line
.EQU plop $
The following line must fail:
plop : ; MUST_FAIL: don't allow spaces before ":"
• Lastly, if none of the above applies, the line is passed to the actual assembler.
Basic Syntax
- End of line is given by CR (ASCII 13d) or LF (ASCII 10d)
- Separators are
- space ' ' (ASCII 32d)
- comma ',' (ASCII 44d)
- horizontal tab (ASCII 9d)
- non-breakable space (ASCII 160d), in HTML).
- The dollar sign '$' (ASCII 36d) represents/returns
the value of the current address where the result is stored (often equivalent to
the PC's value).
- Character bytes are written between apostrophes. This means only raw ASCII,
no UNICODE code point.
- "Expressions" allow binary computations. When implemented, each operation
is always between parenthesis, to avoid precedence. The assembler could support
the following arithmetic and boolean operations:
- + (addition)
- - (subtraction)
- * (multiplication)
- / (division)
- % (modulo/remainder)
- ^ (boolean XOR)
- | (boolean OR)
- & (boolean AND)
- >> (logic shift righ)
- >>> (sign-preserving shift right)
- << (shift left)
- <@ (rotate left)
- @< (rotate right)
- ?> (return the lowest unsigned value)
- <? (return the highest unsigned value)
- $> (return the lowest signed value)
- <$ (return the highest signed value)
Let's also consider comparisons:
- < ()
- > ()
- = ()
- != ()
- <= ()
- >= ()
- ()
- ()
All those symbols are reserved.
Keywords
In addition to the previous keywords, the assembler pre-defines the following keywords
as elements of an instruction. The list may expand in the future but these are essential
definitions:
Opcodes
The opcodes take 2 to 4 characters.
Aliases
Some instructions are encoded as a special case of other instructions (see
aliases).
Registers
The register names take only 2 characters only.
- "D1" (000)
- "A1" (001)
- "D2" (010)
- "A2" (011)
- "R1" (100)
- "R2" (101)
- "R3" (110)
- "PC" (111)
Conditions
The conditions fit in 3 or 4 characters only, often shortened.
- 0000 NEVR Never (instruction is not executed or committed, like a NOP)
- 0001 IFN0 B0=0
- 0010 IFNC Carry=0
- 0011 IFN1 B1=0
- 0100 IFNS Sign=0 (positive signed number, aliased as IFP?)
- 0101 IFN2 B2=0
- 0110 IFNZ Zero=0 (last result had at least on set bit)
- 0111 IFN3 B3=0
- 1000 ALWS Always (the default condition so writing it is not required)
- 1001 IF0 B0=1
- 1010 IFC Carry=1
- 1011 IF1 B1=1
- 1100 IFS Sign=1 (negative signed number)
- 1101 IF2 B2=1
- 1110 IFZ Zero=1 (last result had all bits cleared)
- 1111 IF3 B3=1
Forms
The instruction's textual format follows the binary format (except the condition
which is a suffix). "Forms" are valid sequences of no, one or two arguments:
- No operand (often instruction aliases):
Examples:
NOP ; do nothing
HLT ; properly terminate a program
INV ; trigger a trap/reboot/panic
One immediate number (0 to 255: Imm8):
Example:
OVL 42 ; Load and execute overlay #42 (special case for CALL 42 PC)
One immediate number (0 to 511: Imm9) and a (source
or destination) register:
Examples:
IN 67, D3 ; get value from IOspace at address 67 and write register D3
OUT 45 A1 ; Put the value of A1 into IOspace at address 45
One short immediate (-8 to 7) or a source register,
and a source/destination register:
- "LDCH"
- "LDCL"
- "RC"
- "RO"
- "SA"
- "SH"
Examples:
RC 1 D1 ; Rotate register D1 through carry by 1 position (left)
RO -3 D2 ; Rotate register D2 by 3 positions (right)
SA R1 A1 ; Arithmetic Shift of A1 by R1 positions
SH R2 A2 ; Logic shift of A2 by R2 positions
One source/destination register and one optional condition:
PF R3 ; put the next result in R3
; default condition is ALWS so Carry-In is set to 1
PF R1 IFN2 ; put the next result in R1
; and copy the negated condition bit #2 in the Carry-in flag
One immediate byte (Imm8) and a source/destination register
(same list as below)
One short immediate (Imm4) or a source register,
then a source/destination register and one condition:
- "ADD"
- "AND"
- "ANDN"
- "CALL"
- "CMPS"
- "CMPU"
- "OR"
- "SET"
- "SUB"
- "XOR"
If no condition is specified, it is considered as ALWS (1000)
Examples:
AND 123 R1 ; bit-mask R1 with byte 123
ADD -76 R2 ; subtract byte 76 from register R2
SUB 95 R3 ; Subtract R3 from byte 95 and put the result in R3
ADD 1 A1 IFC ; increment A1 by short 1 if the carry bit is set
ADD D1 A1 IFS ; Add D1 to A1 (result in A1) if the Sign/Negative flag is set.
There is no clear requirement or syntax for interpreting a number as short or long
(Imm4 or Imm8). The assembler first tries to fit values in a short immediate, allowing the
eventual condition to be accepted. If an immediate exceeds the Imm4 range, then any
explicit condition suffix is an error.
To be continued...
Now have a look at the YGREC8's
Integration manual,
the Programming
manual or the Main YGREC8 Manual.