The YGREC8's Programming Manual

Created sam. 30 sept. 2023 21:52:48 CEST by whygee@f-cpu.org
Version dim. 12 nov. 2023 03:31:36 CET




PRELIMINARY / WORK IN PROGRESS
It's not yet a reliable and definitive reference!

Check the latest news and updates at ygrec8.com
and download the latest version from src.ygrec8.com
or if you prefer git : gitlab.com/fhwd/ygrec8

 ©2023 Yann Guidon
all documentation published under CC BY-NC-SA 4.0 license.

Introduction

Before reading further, make sure you have also read:

The YGREC8's Main manual for the general overview of the project, the ISA and the programming model.
The YGREC8's Assembly Language manual for details and specifications for assemblers and disassemblers: programs that transform text into binary instructions or vice versa.

Basic programming

There are several layers of programming tools and increasing syntax complexity, this part will only address the most basic layer where only one instruction is condidered by a basic interactive assembler. It follows some old traditions while breaking away from others. Be sure to read the syntax rules in the YGREC8 assembly language manual!

The basics

Programs are terminated "gracefully" with the HLT pseudo-opcode (to differentiate from the INV opcode).

Unlike some architectures (like x86), you have to add negative number to subtract it. The SUB opcode negates the SND field, so if you write

SUB 4 R1

you will get the value 4-R1 in R1. If you really can't reorder the operands, use the PF opcode:

PF R2
SUB R2 R1 ; R2=R2-R1

Initial state

When a new overlay is loaded, PC=0 and that's all. Don't forget to .ORG 0 just in case.

Don't expect any value or meaning in any other register or flag.

A dedicated I/O register might indicate the reason(s) for reaching address 0: RESET, IRQ, Trap, OVL instruction, WatchDog Timer expiration, ...

Using the Flags (Condition Codes)

There are 3 Flags that are updated by the boolean, shift and arithmetic instructions:

C is the Carry Out flag, generated by the adder of the ALU when executing and committing these opcodes: ADD, SUB, CMPU, CMPS, RC. The prefix PF unconditionally writes to C, copying the selected condition. This flag controls the conditions IFC and IFNC.
S is the Sign flag, set by the boolean, shift and arithmetic instructions. It is a copy of the Most Significant Bit of the result (bit 7, or sign bit) so it is set to 1 when the result is negative (1xxxxxxx). This flag controls the conditions IFS and IFNS.
Z is the Zero flag, also set by the boolean, shift and arithmetic instructions. It is set to 1 when the result is 00000000. This flag controls the conditions IFZ and IFNZ.

Unlike other architectures, and to keep the ISA minimalist, there is not "Add with Carry" or "Subtract with Borrow" opcode. The arithmetic instructions receive a default carry input signal that depends on the opcode. This input can be changed by the prefix PF opcode, before an arithmetic instruction. The following code adds 16 bits in D1:D2 into R1:R2.

; 16-bit add:
ADD D1 R1
PF R2 IFC
ADD D2 R2

The prefix forces the carry input of the 3rd instruction to the value of the carry output of the first instruction. The destination field is a bit redundant (since it copies the SND field from the 3rd instruction) but can be helpful in other situations to reduce copies and register pressure.

Using the Data Memory

Like the YASEP and unlike most other architectures, the YGREC8 has no LOAD/STORE instruction.

There are two data memory areas, addressed respectively by two pairs of registers: A1/D1 and A2/D2. The two areas are usually independent by default but some banking trick could remap them in future versions.

Let's say you need to add 42 to the value stored at address 123. With x86 you would write (in AT&T syntax):

mov 42, ax
add ax, [123]

A classic RISC processor can't perform read-modify-write operations, which are unwrapped at the assembly level:

load [123] r1
add 42 r1 r1
store r1 [123]

Using register-mapped memory, the A register sets the address where the D register loads and stores its value as a side-effect of the operations:

set 123 A1
add 42 D1

The memory areas are not committed to a stack, which you have to manage yourself, and you can locate it anywhere you like. You can also have 2 stacks if you're a FORTHer. In any case, you need to increment and decrement the A register, and the TOS (Top Of Stack) is the corresponding register.

; PUSH:
set R1 D1 ; Write R1 to the ToS
add 1 A1

; POP:
add -1 A1
set D1 R2 ; read the ToS into R2

This is particularly pertinent for function calls and returns: the CALL opcode writes PC+1 to the SND, and you can define A1/D1 as your working call stack to implement the following example:

; The main program:
set 42 R2   ; argument for myfunction
call myfunction D1
add 24 R3  : do something
;...

myfunction:
add 1 A1  ; update the stack pointer

;...
; do something useful here
call leaf_function D1
;...

; return:
add -1 A1  ; restore the stack pointer
set PC D1  ; jump back to the caller


leaf_function:
; no other call here so no need to
; fiddle with the stack pointer
;...
; do something
set PC D1 ; return

.END

If you need extra registers, you can set the stack top to a free index (using the add 1 A1 function prolog) and use D1 as a short-term temporary value for some computations for example. So for a short while (as long as you don't call a function) you can have the equivalent of 4 general purpose registers. And if you call a function, just pre-increment the stack pointer before calling.

somefunction:
add 1 A1  ; prolog: update the stack pointer

; you can use D1 as a temporary register here

; until the next function call:
call another_function D1

; D1 is available for temporary use here again.

; return / epilog:
add -1 A1  ; restore the stack pointer
set PC D1  ; jump back to the caller

Similarly, if the A/D registers are not committed to a stack, the A regsiter can point to a "safe address" (such as -1?) and the D register can be freely used.

Memory aliasing and data caching

The D1 and D2 registers act as a cache of the last value written to memory at the given address, and will not change until another value is written by this data register, or the corresponding address register (A1/A2) is changed.

There is no alias detection if A1=A2 and the memory value is changed by D1 or D2: the other data register will not be updated. The value may be refreshed by reloading the address register.

set 42 a1
set 21 d1

set a1 a2
; -> d2=21
set 77 d2
; -> d2=77 but d1 is still 21

set a1 a1
; refresh => d1=77

Beware though because the refresh could be forced by a context switch, which does not save the data registers (this is redundant when knowing the address register, and saves a pair of bytes in the context buffer). After processing an interrupt for example, the address registers are restored and will read an eventually new value from memory.

Of course, if the data banks are split or A1 and A2 point to different banks, none of this is a concern. But to summarise,

no alias detection mechanism is implemented,
changing the data when both A register are the same does not automatically change the aliased D register,
D registers are volatile anyway (unless IRQ are disabled)
the value of a D register could change eventually anyway (if there is a context switch and another value is written at that address),
data must be moved to a register to ensure it will not be changed spuriously.

Conditional branches and loops

The most used opcodes are typically SET and ADD, supplemented by the conditional execution and the ability to write to PC which triggers a branch. With combinations of these, most common control structures are possible. A typical conditional branch looks like this:

; if R1==42 then...
cmpu 42 R1
add endofif-$ PC NZ

; do something useful here
; (only 6 instructions at most)

endofif:

The forward branch can jump over 7-1=6 instructions at most. A longer jump would need the whole absolute destination address somewhere, in a different register, which could be any of the Rx or even the TOS like in this example:

; if R1==42 then...
set endofif D1
cmpu 42 R1
set D1 PC NZ

; do something useful here that
; takes more than 6 instructions

endofif:

But like early ARM cores, if only a few instructions are skipped and these instructions are "core" ones, you can simply predicate them all using the desired condition, thus saving jumps and address computations. Just make sure your predicated instructions don't affect the flags tested by the next ones !

Since YGREC8 can jump forward or backwards, it can loop and test any condition:

set 42 R1; some loop counter

set dumbloop R2
dumbloop:
 ; do something long and hopefully useful here

  add -1 R1
  set R2 PC NZ ; loop back if R1 has not reached zero

Of course you can reduce the register pressure by setting the register at the last moment, if the longer/slower loop is allowed:

set 42 R1 ; some loop counter

loopentry:
 ; do something long and useful here

  set loopentry R2
  add -1 R1
  set R2 PC NZ
; end of loop

Constant arrays in the code space

The YGREC8 offers a compact method to store constant data in the instruction space.

Each overlay contains 256 instructions or 512 bytes, split in half which are accessed by these instructions: LDCL (lower byte) and LDCH (higher byte). There is a couple of associated pseudo-operations to create the data: .BL and .BH. The halves are currently managed separately to keep things simple.

ConstantStrings:
.BL "Hello world"
EndOfFirstString:

.ORG ConstantStrings ; go back to the address
  ; of the first byte created by BL and write
  ; to the upper byte.
.BH "This is another test string"
EndOfSecondString:

; Now go to the very end of the longest string:
.ORG (EndOfFirstString UMAX EndOfSecondString)

; More constants or instructions here.

; Streams the first string to IO register #123:

set ConstantStrings R1
  ldcl R1 R2 ; load the constant from program memory
  out 123 R2 ; send the constant to I/O
  add 1 R1
  cmpu EndOfFirstString R1
  add -4 PC IFNZ

HLT

It is also possible to make the constant arrays more compact if the data are correctly interleaved and appropriate. The following hypothetical example shows pairs of bytes stored in instruction words:

ConstantData:
.DW 1234, 4567, 9012, 3456
EndOfData:

set ConstantData R1
  ldcl R1 R2
  out 123 R2
  ldch R1 R2
  out 123 R2
  add 1 R1
  cmpu EndOfData R1
  add -6 PC IFNZ

HLT

LDCL and LDCH are currently limited to the current overlay so each overlay needs its own constant section and code section, as well as switching back and forth between overlays to transmit the desired data if they reside in a differnt overlay. This might evolve later.

TODO: LDCC selects the high/low byte with the carry flag for example

To be continued... Stay tuned!