Created sam. 30 sept. 2023 21:52:48 CEST by whygee@f-cpu.org Version dim. 12 nov. 2023 03:31:36 CET PRELIMINARY / WORK IN PROGRESS It's not yet a reliable and definitive reference! Check the latest news and updates at ygrec8.com and download the latest version from src.ygrec8.com or if you prefer git : gitlab.com/fhwd/ygrec8 ©2023 Yann Guidon all documentation published under CC BY-NC-SA 4.0 license.
Before reading further, make sure you have also read:
There are several layers of programming tools and increasing syntax complexity, this part will only address the most basic layer where only one instruction is condidered by a basic interactive assembler. It follows some old traditions while breaking away from others. Be sure to read the syntax rules in the YGREC8 assembly language manual!
Programs are terminated "gracefully" with the HLT pseudo-opcode (to differentiate from the INV opcode).
Unlike some architectures (like x86), you have to add negative number to subtract it. The SUB opcode negates the SND field, so if you write
SUB 4 R1
you will get the value 4-R1 in R1. If you really can't reorder the operands, use the PF opcode:
PF R2 SUB R2 R1 ; R2=R2-R1
When a new overlay is loaded, PC=0 and that's all. Don't forget to .ORG 0 just in case.
Don't expect any value or meaning in any other register or flag.
A dedicated I/O register might indicate the reason(s) for reaching address 0: RESET, IRQ, Trap, OVL instruction, WatchDog Timer expiration, ...
There are 3 Flags that are updated by the boolean, shift and arithmetic instructions:
Unlike other architectures, and to keep the ISA minimalist, there is not "Add with Carry" or "Subtract with Borrow" opcode. The arithmetic instructions receive a default carry input signal that depends on the opcode. This input can be changed by the prefix PF opcode, before an arithmetic instruction. The following code adds 16 bits in D1:D2 into R1:R2.
; 16-bit add: ADD D1 R1 PF R2 IFC ADD D2 R2
The prefix forces the carry input of the 3rd instruction to the value of the carry output of the first instruction. The destination field is a bit redundant (since it copies the SND field from the 3rd instruction) but can be helpful in other situations to reduce copies and register pressure.
Like the YASEP and unlike most other architectures, the YGREC8 has no LOAD/STORE instruction.
There are two data memory areas, addressed respectively by two pairs of registers: A1/D1 and A2/D2. The two areas are usually independent by default but some banking trick could remap them in future versions.
Let's say you need to add 42 to the value stored at address 123. With x86 you would write (in AT&T syntax):
mov 42, ax add ax, [123]
A classic RISC processor can't perform read-modify-write operations, which are unwrapped at the assembly level:
load [123] r1 add 42 r1 r1 store r1 [123]
Using register-mapped memory, the A register sets the address where the D register loads and stores its value as a side-effect of the operations:
set 123 A1 add 42 D1
The memory areas are not committed to a stack, which you have to manage yourself, and you can locate it anywhere you like. You can also have 2 stacks if you're a FORTHer. In any case, you need to increment and decrement the A register, and the TOS (Top Of Stack) is the corresponding register.
; PUSH: set R1 D1 ; Write R1 to the ToS add 1 A1 ; POP: add -1 A1 set D1 R2 ; read the ToS into R2
This is particularly pertinent for function calls and returns: the CALL opcode writes PC+1 to the SND, and you can define A1/D1 as your working call stack to implement the following example:
; The main program: set 42 R2 ; argument for myfunction call myfunction D1 add 24 R3 : do something ;... myfunction: add 1 A1 ; update the stack pointer ;... ; do something useful here call leaf_function D1 ;... ; return: add -1 A1 ; restore the stack pointer set PC D1 ; jump back to the caller leaf_function: ; no other call here so no need to ; fiddle with the stack pointer ;... ; do something set PC D1 ; return .END
If you need extra registers, you can set the stack top to a free index (using the add 1 A1 function prolog) and use D1 as a short-term temporary value for some computations for example. So for a short while (as long as you don't call a function) you can have the equivalent of 4 general purpose registers. And if you call a function, just pre-increment the stack pointer before calling.
somefunction: add 1 A1 ; prolog: update the stack pointer ; you can use D1 as a temporary register here ; until the next function call: call another_function D1 ; D1 is available for temporary use here again. ; return / epilog: add -1 A1 ; restore the stack pointer set PC D1 ; jump back to the caller
Similarly, if the A/D registers are not committed to a stack, the A regsiter can point to a "safe address" (such as -1?) and the D register can be freely used.
The D1 and D2 registers act as a cache of the last value written to memory at the given address, and will not change until another value is written by this data register, or the corresponding address register (A1/A2) is changed.
There is no alias detection if A1=A2 and the memory value is changed by D1 or D2: the other data register will not be updated. The value may be refreshed by reloading the address register.
set 42 a1 set 21 d1 set a1 a2 ; -> d2=21 set 77 d2 ; -> d2=77 but d1 is still 21 set a1 a1 ; refresh => d1=77
Beware though because the refresh could be forced by a context switch, which does not save the data registers (this is redundant when knowing the address register, and saves a pair of bytes in the context buffer). After processing an interrupt for example, the address registers are restored and will read an eventually new value from memory.
Of course, if the data banks are split or A1 and A2 point to different banks, none of this is a concern. But to summarise,
The most used opcodes are typically SET and ADD, supplemented by the conditional execution and the ability to write to PC which triggers a branch. With combinations of these, most common control structures are possible. A typical conditional branch looks like this:
; if R1==42 then... cmpu 42 R1 add endofif-$ PC NZ ; do something useful here ; (only 6 instructions at most) endofif:
The forward branch can jump over 7-1=6 instructions at most. A longer jump would need the whole absolute destination address somewhere, in a different register, which could be any of the Rx or even the TOS like in this example:
; if R1==42 then... set endofif D1 cmpu 42 R1 set D1 PC NZ ; do something useful here that ; takes more than 6 instructions endofif:
But like early ARM cores, if only a few instructions are skipped and these instructions are "core" ones, you can simply predicate them all using the desired condition, thus saving jumps and address computations. Just make sure your predicated instructions don't affect the flags tested by the next ones !
Since YGREC8 can jump forward or backwards, it can loop and test any condition:
set 42 R1; some loop counter set dumbloop R2 dumbloop: ; do something long and hopefully useful here add -1 R1 set R2 PC NZ ; loop back if R1 has not reached zero
Of course you can reduce the register pressure by setting the register at the last moment, if the longer/slower loop is allowed:
set 42 R1 ; some loop counter loopentry: ; do something long and useful here set loopentry R2 add -1 R1 set R2 PC NZ ; end of loop
The YGREC8 offers a compact method to store constant data in the instruction space.
Each overlay contains 256 instructions or 512 bytes, split in half which are accessed by these instructions: LDCL (lower byte) and LDCH (higher byte). There is a couple of associated pseudo-operations to create the data: .BL and .BH. The halves are currently managed separately to keep things simple.
ConstantStrings: .BL "Hello world" EndOfFirstString: .ORG ConstantStrings ; go back to the address ; of the first byte created by BL and write ; to the upper byte. .BH "This is another test string" EndOfSecondString: ; Now go to the very end of the longest string: .ORG (EndOfFirstString UMAX EndOfSecondString) ; More constants or instructions here. ; Streams the first string to IO register #123: set ConstantStrings R1 ldcl R1 R2 ; load the constant from program memory out 123 R2 ; send the constant to I/O add 1 R1 cmpu EndOfFirstString R1 add -4 PC IFNZ HLT
It is also possible to make the constant arrays more compact if the data are correctly interleaved and appropriate. The following hypothetical example shows pairs of bytes stored in instruction words:
ConstantData: .DW 1234, 4567, 9012, 3456 EndOfData: set ConstantData R1 ldcl R1 R2 out 123 R2 ldch R1 R2 out 123 R2 add 1 R1 cmpu EndOfData R1 add -6 PC IFNZ HLT
LDCL and LDCH are currently limited to the current overlay so each overlay needs its own constant section and code section, as well as switching back and forth between overlays to transmit the desired data if they reside in a differnt overlay. This might evolve later.
TODO: LDCC selects the high/low byte with the carry flag for example
To be continued... Stay tuned!