The YGREC8's Manual

Created sam. 30 sept. 2023 21:52:48 CEST by whygee@f-cpu.org
Version sam. 16 déc. 2023 15:19:49 CET


PRELIMINARY / WORK IN PROGRESS
Things evolve and could be changed without notice.


Check the latest news and updates at ygrec8.com
and download the latest version from src.ygrec8.com
or if you prefer git : gitlab.com/fhwd/ygrec8

 ©2023 Yann Guidon
all documentation published under CC BY-NC-SA 4.0 license.

Dedicated to all those who RTFM.

Introduction: motivation, scope and purposes

Along with this document, you should also read:

The YGREC8's Integration manual describes the structure, scheduling, architecture and lower-level circuits of the first integrated circuit implementation.
Look at the YGREC8's Programming manual for exemple code sequences, program samples and coding idioms.
The YGREC8's Assembly Language manual for details and specifications for assemblers and disassemblers: programs that transform text into binary instructions or vice versa.

This manual is a first glance at the YGREC8, its architecture, its programming model, its design, presenting the overall principles and goals, without going too deep into the details because they might change with later revisions and generations. It summarises six years of exploration, enhancements, adjustments, failed and successful attempts, and the choices and the compromises that have stood up so far while keeping the goals and constraints in mind. At the time of writing, there is no place for bikeshedding, feature creep or considering alternatives. The YGREC8's definition must be frozen and implemented, at least in a first working version.

The YGREC8 is a minimalist 8-bit microprocessor core. It is meant to be the smallest possible baseline for a microcontroller with the minimum number of gates, while still being useful and convenient to write code for. Despite the tiny code space, it does not try to squeeze all the code density in the world, since the development time and complexity would increase (for example, the "ADD Imm4 carry trick" has been deprecated). And too many quirks would make the YGREC8 a pain to use in practice. Unlike some more classical and recent microcontrollers and microprocessors, the YGREC8's ISA is not driven by a deep analysis of compiler-generated listings, but by the good and bad experiences of programming these. Sometimes you don't understand or need a feature, another time you wish you had a very simple and trivial one, and a balance had to be found under strict size constraints. YGREC8 is far from perfect but it is meant to work well enough and be convenient to implement.

The YGREC8 reuses features as much as possible to remain tiny: exploiting corner cases (such as with the CALL opcode becoming OVL) enhances the system while adding the fewest gates. The core could do the work of a PIC16F for example, without being as contrieved or awkward (hopefully). It is still purposefully limited: if you need more (performance, room, speed or features), use a better and larger system such as the YASEP microcontroller.

Indeed, the YGREC8's architecture is mostly inspired by the YASEP, shaven to its bare minimum with only 8 registers, including PC. With its orthogonality and in the absence of LOAD, STORE, BRANCH, or even multi-byte words, the instruction set contains only 20 instructions and the bare core uses about 600 3-input gates (not counting memory blocks). This quickly increases as you add features and peripherals.

The core is simple enough that focus can switch to:

development tools, validation and methodologies,
actually coding something and using the result in practice;
implementation in any possible boolean-capable technology (relays, transistors, SSI TTL, ECL, LVI, FPGA, ASIC, you name it),
reliability/safety/yield, debug and design-for-test,
I/O and other miscelaneous features...

Let's emphasize this aspect again. Though performance is not forgotten, it is not designed to be fast or powerful, but small, useful, efficient (low resources, energy-conscious) and easy to implement. It's at least "good enough" to work, so you can implement simple games like Tetris, Snake or Pong without fearing a mental breakdown. It is an ideal companion core for a more complex processor: it would handle hardware events, manage interrupts, implement a convenient bootloader, detect button presses, handle crude sound effects or check the environment's health (temperature, power...).

The unspoken assumption here is that the whole project, its development, the files, the team, follow the traditions and ethos of the Free Software and Open Source Software communities. The YGREC is a Free HardWare Design, using Copyleft licences and free from patents to allow anyone to use it and participate. Unlike the RISC-V project, you are encouraged to "give back" and publish all your enhancements, so others can benefit from them just as you benefitted in the first place. While some proprietary software might be required here and there, they are avoided as much as possible, so the source code can really be freely used. This also means a strong adherence to industry standards and practices (YGREC8 uses VHDL extensively).

Assembly language is required to pack enough useful data and programs in the very tight space available to the core. Programming the Y8 is a bit different and no compiler is available yet but this manual will show it's not hard.

Happy reading !

Architecture overview and ISA

The general layout

The YGREC8 is the result of the reduction of several experimental architectures, such as the YASEP and the YGREC16. It is inspired by RISC ideas supplemented by some Processor Design Principles. There might even be a trace of CDP1802 somewhere.

The YGREC8 implements several independent addressing spaces:

Program memory : 256 instructions pointed to only by the 8-bit PC register or LDCx instructions. When required, this 512-byte space may be extended by crude "overlays" (explained there). In fact: each block of 256 instructions is (by extension) called an overlay. This is not a "page" as in other 8-bitters (such as 6800 or 6502) since exacuting a different overlay might require loading the whole tiny program space from outside, like in the early minicomputers of the 50's or 60's.
Data memory : two windows of 256 bytes to local data, addressed respectively and directly by the 8-bit A1 and A2 registers. Both windows could point to the same area but are typically disjointed, to provide a separate stack and more space (and it is a bit easier to implement at first). Thus the default configuration has a total space of 512 bytes.
I/O registers : acceesses 512 bytes (using linear immediate addresses only) implemented as registers for GPIO, peripheral, scratch registers, coprocessors, mailboxes, core configuration bits... This is where you put everything that breaks orthogonality or might evolve, without touching the core design.

YGREC8 uses a Harvard organisation for several reasons:

It increases onchip bandwidth (hence performance) while reducing the arbitration costs and delays,
The instruction and data spaces can be implemented in different technologies (ROM, Flash, EPROM, SRAM, MRAM, DRAM, core memory, rope memory, diode arrays, you name it) to fit special needs,
With 8-bit pointers, there would not be enough space to store both data and instructions,
It adapts the granularity of the elements and saves LSB in the addresses and pointers,
While the code can read its own addressing space, it can not write it, enhancing safety and security (preventing accidental or malevolent behaviour).

The Instruction Set

Among other features, the instruction set:

is pretty orthogonal,
uses register-mapped memory access (not a load-store architecture),
includes PC as a standard register to handle jumps and branches,
and 10 of the 13 "core opcodes" have predication bits to make their execution conditional.

To accommodate this, the register set has 8 registers of 4 types, with the following map:

000    D1   \__  Data Memory window 1
001    A1   /
010    D2   \__  Data Memory window 2
011    A2   /

100    R1   \
101    R2    >--- 3 "normal" registers
110    R3   /

111    PC

The D registers are where you read from or write Data to memory. Usually, writing to D1 will also write the same byte to the 1st data memory port at the address given by A1. These registers can be considered as "write-through cache" bytes. The value will also change if you write to the corresponding A register.
The A registers are the Address registers. Changing them will trigger a read cycle from the corresponding bank and update the corresponding D register (after a possible stall/delay).
The R registers have no side effect.
The PC is the Program Counter and is a read-write register. Reading it will provide the address of the current instruction, useful for PC-relative branches. Writing to it will cause a "jump". The CALL instruction accesses a special version of the register that is pre-incremented, in order to later return to the next instruction.

The instruction word fits in 16 bits by using only two 3-bit register addresses, YGREC8 is initially a 1R1W machine (like x86) instead of the more RISCy big brothers, to remove one field and save coding space. This is extended, when needed, by using the prefix PF to override the destination address of the following instruction. The two default register address fields are:

SND means "Source - Negated - Destination" : This field is used as a source, which can be negated (for ANDN, SUB and CMPx). It is the default destination address (unless overridden by PF).
SRI is either a source register address or an immediate value, depending on the I/R flag. The other format flag I/R2 selects the size of the immediate field : 8 bits or 4 bits with conditions.

The instruction format is designed to keep all the fields as aligned as possible, to reduce the decoding logic and routing multiplexers. For example, all the Least Significant Bits of the constant fields fall on the same place (bit 3). SND is the default register field when no other value is needed and doesn't move, just like the main opcode field (bits 15:12). Some non-core opcodes have their functions encoded in place of the condition field when it is not required, to save some space. Thus the encoding can't be totally orthogonal because of the size constraints and only the most common opcodes have the full set of options, while infrequent operations don't have the 8-bit immediate mode or no condition at all.

Here are more explanations about the fields :

OPCODE: (bits 15:12) Defines the operation to execute, the units to enable, the routing of data and the scheduling of the core. Opcodes 101x are extended by FUNC2 at the cost of fewer options.
Imm9: (bits 11:3) is the address for the the IN and OUT opcodes.
I/R: (bit 11) for the first 10 core opcodes, defines the value of the SRI operand. When 0: look up the I/R2 flag, otherwise copy Imm8.
Imm8: (bits 10:3) Immediate 8-bit value for SRI operand.
I/R2: (bit 10) For the opcodes 0000 to 1010 this defines if the SRI operand is Imm4 or the register given by the SRI field.
N: (bit 9) For the first 10 core opcodes (and also PF), this flag inverts the result of the condition given by the CND field.
CND: (bits 8:6) Selects the condition for the predicated operations (and the source of the carry flag for the PF opcoode). This signal is then negated by the N flag. When I/R2 is 1 (Imm4 mode) then bit 6 (CND[0]) is cleared. See later for more details.
FUNC2: (bits 9:7) Used by the extended opcodes (OPCODE=101x) and replaces the condition field with more opcodes. I/R2 is still used but I/R is disabled (no 8-bit mode) so that's potentially 32 extended opcodes with all fields combined.
SRI: (bits 5:3) (see above) Address of the source register.
SND: (bits 2:0) (see above) Address of the source and/or destination register.

As of this version, there are 19 main opcodes, plus INV, and some aliases like OVL, HLT and NOP. Most share two instruction forms : either an IMM8 field, or a source & condition field. The source field can also be a register or a short sign-extended immediate field that is 4 bits wide only, but it is essential for conditional short jumps or increments/decrements.

The main opcode field has 4 bits and the following values:

Boolean group

These core opcodes are predicated and update the Zero and MSB flags. The SRI operand can be Imm8, Imm4 or Register:

0000    OR      SND =  SND | SRI
0001    XOR     SND =  SND ^ SRI
0010    AND     SND =  SND & SRI
0011    ANDN    SND = ~SND & SRI

Arithmetic group:

These core opcodes are also predicated. They update the Carry, Zero and MSB flags. The SRI operand can be Imm8, Imm4 or Register:

0100    CMPU          -SND + SRI  (unsigned)
0101    CMPS          -SND + SRI  (signed)
0110    SUB     SND = -SND + SRI
0111    ADD     SND =  SND + SRI

Taken with the boolean group, the SND operand is XORed with ~b15 & ( b14 ^ (b13 & b12) ) of the instruction word (this explains why ADD is last in the list).

Control group:

These last core opcodes complete the list of necessary instructions. Only SET and CALL are predicated and can use the Imm8, Imm4 or Register operands. IN and OUT can only use the IMM9 immediate field.

1000    SET      SND = SRI              (formerly known as MOV)
1001    CALL     SND = PC+1 ; PC = SRI  (also maps to OVL and HLT when SND=PC)

1100    IN       SND = IO[imm9]
1101    OUT      IO[imm9] = SND
1110    PF       Prefix : change default behaviour for the next instruction.

Optional opcodes

The following opcodes are not essential but would be very useful in certain situations. They share the same 101x opcode prefix and don't support the Imm8 or condtion fields. I/R2 is supported so SRI can be a register (bit 6 is ignored!) or a 4-bit immediate value.

Shift/Rotate group

These opcodes update the Zero and MSB flags. Shift direction is given by the sign of shift: 0001 to 0111 shift left (multiply by power of two) and 1001 to 1111 shift right (divide by a power of two). 0000 and 1000 do nothing. The higher half of the SRI register operand is ignored.

10100x000 SH : SND = SND shifted by SRI bits
10100x001 SA : SND = SND arithmetic-shifted by SRI bits
10100x010 RO : SND = SND rotated by SRI bits
10100x011 RC : SND = SND rotated by SRI bits through the Carry flag (updates it).

Program lookup group

These opcodes read into the program memory, where constant tables could be read in-place or stored before being moved to data memory for example. This is a bit more elegant and 2× denser than the PIC16's RETLW hacks. This increases the complexity of the core's scheduling but it's a smaller price to pay when it is really needed.

10100x100  LDCL  SND = lower byte PM[SRI]
10100x101  LDCH  SND = higher byte PM[SRI]

TODO: LDCC selects the high/low byte with the carry flag for example

More opcodes

The YGREC8 has room for more 1R1W opcodes in the extended range 101x, without the Imm8 or condition features.

1010 will get new non-core functions in the future.
1011 can be a sandbox for experimentatiion.

However, new opcodes should not affect the core's scheduling or pipeline. It could be a single-bit multiplication or division step passing through the ALU. However, complex operations like full-word multiplication or division are better suited for the I/O register space. Don't clutter the core, Keep It Simple.

Opcode aliases

As mentioned in other places, some opcodes can have corner cases, which get their own opcodes:

NOP is encoded as 0000h. This is a chosen special case of OR D1 D1 NEVR (see below).
OVL is a special case of CALL xx PC. See the end of the manual.
HLT is a special case of OVL FFh (CALL FFh PC or any CALL PC with SRI=FFh). This is how you "properly" terminate a program.
INV is not really an alias but it is usually coded FFFFh. Other opcodes in the Fxxxh range could be implemented in the future but so far they all map to INV.

More might appear soon.

Conditions

For the first 10 opcodes, the COND field has either 3 bits (when using a 4-bit immediate operand) or 4 bits (register operand), the later is added to directly test binary input signals. These 4 extra signals (called B0, B1, B2 and B3) may come from outside pins or internal peripherals, depending on user-defined configuration. The source could even be selected on the fly with a big multiplexer (configured by a set of IO registers).

All conditions can be negated by the N field. There are 4 basic and common conditions and 4 extended ones:

 - Always
 - C (Carry flag)
 - S (Sign, MSB of the last computation)
 - Z (Zero, all bits cleared)
 - B0, B1, B2, B3  (only in Register-Register form)

(notice the mnemotechnic trick: the condition names ACSZ are in alphabetical order)

The opcode map is organised along these two rules:

Instruction code 0000h should map to NOP (alias for OR D1 D1 NEVR), and the NEVER condition, hence ALWAYS is coded as 1 and N is negated.
Instruction code FFFFh should map to INV, which traps or reboots the core (through the overlay mechanism): condition is implicitly ALWAYS because it was a IMM8 format initially.

This creates the following table, when using the Imm4 operand:

000 Never    (instruction is not executed)
001 Carry=0
010 Sign=0   (positive signed number)
011 Zero=0   (last result had at least on set bit)
100 Always   (default: instruction is executed)
101 Carry=1
110 Sign=1   (negative signed number)
111 Zero=1   (last result had all bits cleared)

When using a register operand, the LSB of the condition selects either the internal condition flags or one of the four external sources. The bit 6 of the instruction (COND field's LSB) is ANDed with the negated I/R2 flag at bit 10, so the extended conditions are disabled in Imm4 mode:

0000  NEVR   Never (instruction is not executed or committed, like a NOP)
0001  IFN0   B0=0
0010  IFNC   Carry=0
0011  IFN1   B1=0
0100  IFNS   Sign=0 (positive signed number, aliased as IFP?)
0101  IFN2   B2=0
0110  IFNZ   Zero=0 (last result had at least on set bit)
0111  IFN3   B3=0
1000  ALWS   Always (the default condition so writing it is not required)
1001  IF0   B0=1
1010  IFC   Carry=1
1011  IF1   B1=1
1100  IFS   Sign=1 (negative signed number)
1101  IF2   B2=1
1110  IFZ   Zero=1 (last result had all bits cleared)
1111  IF3   B3=1

The bold keywords are the corresponding assembly language keywords, chosen to fit in 4 characters (which is a nice touch for electromechanical input or display).

Instruction set coverage, corner cases, redundancies and invalid codes

The YGREC8 prioritises simplicity over coding density. It is a delicate balance: in pure RISC tradition, keeping the core simple makes it fast and manageable. Orthogonality lets many instructions share the same circuits and reduces the coding complexity. Increasing density (to improve program size) would complicate the decoder and the programming experience. Typically, RAM cells are smaller than logic cells so it's a somewhat sensible argument.

The orthogonality has its downsides though, in particular with the degenerate combinations as well as absurd ones. They are handled either as "don't care" or errors, as potential future extensions or as invalid conditions.

Don't care and degenerate codes

There are cases where one assembly instruction can be translated into two different encodings: 10 core instructions with the I/R flag overlap 16 codes each, creating 160 degenerate instructions. This is not a significant loss and no correction or exploitation is considered. For the range -8 to 7, the Imm4 for is prioritised, but there is no need or benefit in handling this Imm8 range differently.
In practice, the NEVR condition is only useful for the NOP alias. This is used by 10 core opcodes, that are thus inhibited, resulting in a hole of 10*192 (-1) = 1919 unused opcodes, or 2.9% of the opcode map. Like with previous architectures, these conditions could be used later to add features (to be determined). For now, stay away from these.
The CALL opcode makes no sense when the destination is PC so this case is allocated to the OVL pseudo-code. It inherits the same forms: Imm8 or Imm4+CND or SRI+CND.

Reserved or Invalid codes

The INV opcode is obviously invalid.
The 1011 opcode range is reserved and will trap like INV.
The primary extended opcode range 1010 has unallocated opcodes and does not use the bit 6 (used by bit 3 of Imm4) in the SRI form.
The PF opcode has 5 unallocated bits that must remain cleared. Out of 4096 codes, only 128 are used.

Total : 4096 + 4096 + (4096-128) + (4096-(6*192)) = 15104 invalid opcodes. Many of them might be used in the future.

Decoding invalid instructions

The above census provides the data required to write the boolean equation that flag an instruction as invalid. Here it is in VHDL:

--TRAP if
    OPCODE=Op_RSVD
or  OPCODE=Op_INV
or (OPCODE=Op_PF and   -- PF opcode with IR or IR2 or SRI not cleared
      instruction_word(11 downto 10) & instruction_word( 5 downto 3) /= "00000")
or (OPCODE=Op_EXT1 and
     (( instruction_word(11)='1' or instruction_word(9 downto 8)="11" )   -- 10 EXT1 reserved opcodes
  or  ( instruction_word(10)='0' and instruction_word(6)='1'))) -- unused bit when SRI

Overall, about one fourth of the opcode space can be reassigned in the future, which is a fair proportion for a future-proof design. Funky features should be relegated to the IO register space but there is some space left for regular circuits that fit the scheduling and simplicity constraints of the core.

To be continued...

Now have a look at the YGREC8's Integration manual, the Programming manual or the Assembly Language manual.