# OpenTitan Big Number Accelerator (OTBN) Instruction Set Architecture

This document describes the instruction set for OTBN. For more details about the processor itself, see the OTBN Technical Specification. In particular, this document assumes knowledge of the Processor State section from that guide.

The instruction set is split into base and big number subsets. The base subset (described first) is similar to RISC-V’s RV32I instruction set. It also includes a hardware call stack and hardware loop instructions. The big number subset is designed to operate on 256b WDRs. It doesn’t include any control flow instructions, and just supports load/store, logical and arithmetic operations.

# Base Instruction Subset

The base instruction set of OTBN is a limited 32b instruction set. It is used together with the 32b wide General Purpose Register file. The primary use of the base instruction set is the control flow in applications.

The base instruction set is an extended subset of RISC-V’s RV32I_Zcsr. Refer to the RISC-V Unprivileged Specification for a detailed instruction specification. Not all RV32 instructions are implemented. The implemented subset is shown below.

ADD <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ADD 0 0 0 0 0 0 0 grs2 grs1 0 0 0 grd 0 1 1 0 0 1 1

ADDI <grd>, <grs1>, <imm>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ADDI imm grs1 0 0 0 grd 0 0 1 0 0 1 1

## LUI

LUI <grd>, <imm>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LUI imm grd 0 1 1 0 1 1 1

## SUB

Subtract.

SUB <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SUB 0 1 0 0 0 0 0 grs2 grs1 0 0 0 grd 0 1 1 0 0 1 1

## SLL

Logical left shift.

SLL <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SLL 0 0 0 0 0 0 0 grs2 grs1 0 0 1 grd 0 1 1 0 0 1 1

## SLLI

Logical left shift with Immediate.

SLLI <grd>, <grs1>, <shamt>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SLLI 0 0 0 0 0 0 0 shamt grs1 0 0 1 grd 0 0 1 0 0 1 1

## SRL

Logical right shift.

SRL <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SRL 0 0 0 0 0 0 0 grs2 grs1 1 0 1 grd 0 1 1 0 0 1 1

## SRLI

Logical right shift with Immediate.

SRLI <grd>, <grs1>, <shamt>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SRLI 0 0 0 0 0 0 0 shamt grs1 1 0 1 grd 0 0 1 0 0 1 1

## SRA

Arithmetic right shift.

SRA <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SRA 0 1 0 0 0 0 0 grs2 grs1 1 0 1 grd 0 1 1 0 0 1 1

## SRAI

Arithmetic right shift with Immediate.

SRAI <grd>, <grs1>, <shamt>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SRAI 0 1 0 0 0 0 0 shamt grs1 1 0 1 grd 0 0 1 0 0 1 1

## AND

Bitwise AND.

AND <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 AND 0 0 0 0 0 0 0 grs2 grs1 1 1 1 grd 0 1 1 0 0 1 1

## ANDI

Bitwise AND with Immediate.

ANDI <grd>, <grs1>, <imm>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ANDI imm grs1 1 1 1 grd 0 0 1 0 0 1 1

## OR

Bitwise OR.

OR <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 OR 0 0 0 0 0 0 0 grs2 grs1 1 1 0 grd 0 1 1 0 0 1 1

## ORI

Bitwise OR with Immediate.

ORI <grd>, <grs1>, <imm>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ORI imm grs1 1 1 0 grd 0 0 1 0 0 1 1

## XOR

Bitwise XOR.

XOR <grd>, <grs1>, <grs2>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 XOR 0 0 0 0 0 0 0 grs2 grs1 1 0 0 grd 0 1 1 0 0 1 1

## XORI

Bitwise XOR with Immediate.

XORI <grd>, <grs1>, <imm>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 XORI imm grs1 1 0 0 grd 0 0 1 0 0 1 1

## LW

Load Word. Loads a 32b word from address offset + grs1 in data memory, writing the result to grd. Unaligned loads are not supported. Any address that is unaligned or is above the top of memory will result in an error (with error code ErrCodeBadDataAddr).

LW <grd>, <offset>(<grs1>)


This instruction is defined in the RV32I instruction set.

This instruction takes 2 cycles.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LW offset grs1 0 1 0 grd 0 0 0 0 0 1 1

## SW

Store Word. Stores a 32b word in grs2 to address offset + grs1 in data memory. Unaligned stores are not supported. Any address that is unaligned or is above the top of memory will result in an error (with error code ErrCodeBadDataAddr).

SW <grs2>, <offset>(<grs1>)


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 SW offset[11:5] grs2 grs1 0 1 0 offset[4:0] 0 1 0 0 0 1 1

## BEQ

Branch Equal.

BEQ <grs1>, <grs2>, <offset>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BEQ offset[12] offset[10:5] grs2 grs1 0 0 0 offset[4:1] offset[11] 1 1 0 0 0 1 1

## BNE

Branch Not Equal.

BNE <grs1>, <grs2>, <offset>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BNE offset[12] offset[10:5] grs2 grs1 0 0 1 offset[4:1] offset[11] 1 1 0 0 0 1 1

## JAL

JAL <grd>, <offset>


This instruction is defined in the RV32I instruction set.

The JAL instruction has the same behavior as in RV32I, jumping by the given offset and writing PC+4 as a link address to the destination register. OTBN has a hardware managed call stack, accessed through x1, which should be used when calling subroutines. Do so by using x1 as the link register: jal x1, <offset>.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 JAL offset[20] offset[10:1] offset[11] offset[19:12] grd 1 1 0 1 1 1 1

## JALR

JALR <grd>, <grs1>, <offset>


This instruction is defined in the RV32I instruction set.

The JALR instruction has the same behavior as in RV32I, jumping by <grs1> + <offset> and writing PC+4 as a link address to the destination register. OTBN has a hardware managed call stack, accessed through x1, which should be used when calling and returning from subroutines. To return from a subroutine, use jalr x0, x1, 0. This pops a link address from the call stack and branches to it. To call a subroutine through a function pointer, use jalr x1, <grs1>, 0. This jumps to the address in <grs1> and pushes the link address onto the call stack.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 JALR offset grs1 0 0 0 grd 1 1 0 0 1 1 1

## CSRRS

Atomic Read and Set bits in CSR. Reads the value of the CSR csr, and writes it to the destination GPR grd. The initial value in grs1 is treated as a bit mask that specifies bits to be set in the CSR. Any bit that is high in grs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written).

CSRRS <grd>, <csr>, <grs1>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 CSRRS csr grs1 0 1 0 grd 1 1 1 0 0 1 1

### Decode

csr_num = UInt(csr)
d = UInt(grd)
s = UInt(grs1)



### Operation

gpr_s_val = GPR[s]
GPR[d] = CSR[csr_num]
CSR[csr_num] |= gpr_s_val



## CSRRW

Atomic Read/Write CSR. Atomically swaps values in the CSR csr with the value in the GPR grs1. Reads the old value of the CSR, and writes it to the GPR grd. Writes the initial value in grs1 to the CSR csr. If grd == x0 the instruction does not read the CSR or cause any read-related side-effects.

CSRRW <grd>, <csr>, <grs1>


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 CSRRW csr grs1 0 0 1 grd 1 1 1 0 0 1 1

### Decode

csr_num = UInt(csr)
d = UInt(grd)
s = UInt(grs1)



### Operation

gpr_s_val = GPR[s]

if d != 0:
GPR[d] = CSR[csr_num]

CSR[csr_num] = gpr_s_val



## ECALL

Environment Call. Triggers the done interrupt to indicate the completion of the operation.

ECALL


This instruction is defined in the RV32I instruction set.

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ECALL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1

## LOOP

##### Note

The LOOP and LOOPI instructions are under-specified, and improvements to them are being discussed. See https://github.com/lowRISC/opentitan/issues/2496 for up-to-date information.

Loop (indirect). Repeats a sequence of code multiple times. The number of iterations is read from grs, treated as an unsigned value. The number of instructions in the loop is given in the bodysize immediate.

LOOP <grs>, <bodysize>

Assembly symbolDescription

<grs>

Name of the GPR containing the number of iterations

<bodysize>

Number of instructions in the loop body

Valid range: 0..4095

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LOOP bodysize grs 0 0 0 1 1 1 1 0 1 1

## LOOPI

##### Note

The LOOP and LOOPI instructions are under-specified, and improvements to them are being discussed. See https://github.com/lowRISC/opentitan/issues/2496 for up-to-date information.

Loop Immediate. Repeats a sequence of code multiple times. The iterations unsigned immediate operand gives the number of iterations and the bodysize unsigned immediate operand gives the number of instructions in the body.

LOOPI <iterations>, <bodysize>

Assembly symbolDescription

<iterations>

Number of iterations

Valid range: 0..1023

<bodysize>

Number of instructions in the loop body

Valid range: 0..4095

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LOOPI bodysize iterations[9:5] 0 0 1 iterations[4:0] 1 1 1 1 0 1 1

## NOP

No Operation. A pseudo-operation that has no effect.

NOP


This instruction is defined in the RV32I instruction set.

This instruction is a pseudo-operation and expands to the following instruction sequence:

ADDI x0, x0, 0


## LI

Load Immediate. Loads a 32b signed immediate value into a GPR. This uses ADDI and LUI, expanding to one or two instructions, depending on the immediate (small non-negative immediates or immediates with all lower bits zero can be loaded with just ADDI or LUI, respectively; general immediates need a LUI followed by an ADDI).

LI <grd>, <imm>


This instruction is defined in the RV32I instruction set.

## RET

Return from subroutine.

RET


This instruction is defined in the RV32I instruction set.

This instruction is a pseudo-operation and expands to the following instruction sequence:

JALR x0, x1, 0


# Big Number Instruction Subset

All Big Number (BN) instructions operate on the Wide Data Registers (WDRs).

Add. Adds two WDR values, writes the result to the destination WDR and updates flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

BN.ADD <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.ADD flag_group shift_type shift_bytes wrs2 wrs1 0 0 0 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

b_shifted = ShiftReg(b, st, sb)
(result, flags_out) = AddWithCarry(a, b_shifted, "0")

WDR[d] = result
FLAGS[flag_group] = flags_out



Add with Carry. Adds two WDR values and the Carry flag value, writes the result to the destination WDR, and updates the flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

BN.ADDC <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.ADDC flag_group shift_type shift_bytes wrs2 wrs1 0 1 0 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

b_shifted = ShiftReg(b, st, sb)
(result, flags_out) = AddWithCarry(a, b_shifted, FLAGS[flag_group].C)

WDR[d] = result
FLAGS[flag_group] = flags_out



Add Immediate. Adds a zero-extended unsigned immediate to the value of a WDR, writes the result to the destination WDR, and updates the flags.

BN.ADDI <wrd>, <wrs>, <imm>[, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs>

Name of the source WDR

<imm>

Immediate value

Valid range: 0..1023

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.ADDI flag_group 0 imm wrs 1 0 0 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)

fg = DecodeFlagGroup(flag_group)
i = ZeroExtend(imm, WLEN)



### Operation

(result, flags_out) = AddWithCarry(a, i, "0")

WDR[d] = result
FLAGS[flag_group] = flags_out



The values in <wrs1> and <wrs2> are summed to get an intermediate result (of width WLEN + 1). If this result is greater than MOD then MOD is subtracted from it. The result is then truncated to 256 bits and stored in <wrd>.

This operation correctly implements addition modulo MOD, providing that the intermediate result is less than 2 * MOD. The intermediate result is small enough if both inputs are less than MOD.

Flags are not used or saved.

BN.ADDM <wrd>, <wrs1>, <wrs2>

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.ADDM 0 wrs2 wrs1 1 0 1 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)



### Operation

result = a + b

if result >= MOD:
result = result - MOD

WDR[d] = result & ((1 << 256) - 1)



## BN.MULQACC

Quarter-word Multiply and Accumulate. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits, and adds the result to the accumulator.

For versions of the instruction with writeback, see BN.MULQACC.WO and BN.MULQACC.SO.

BN.MULQACC[<zero_acc>] <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>

Assembly symbolDescription

<zero_acc>

Zero the accumulator before accumulating the multiply result.

To specify, use the literal syntax .z

<wrs1>

First source WDR

<wrs1_qwsel>

Quarter-word select for <wrs1>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<wrs2>

Second source WDR

<wrs2_qwsel>

Quarter-word select for <wrs2>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<acc_shift_imm>

The number of bits to shift the WLEN/2-bit multiply result before accumulating.

Valid range: 0..192 in steps of 64

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.MULQACC 0 0 wrs2_qwsel wrs1_qwsel wrs2 wrs1 acc_shift_imm[7:6] zero_acc 0 1 1 1 0 1 1

### Decode

zero_accumulator = DecodeMulqaccZeroacc(zero_acc)

d = None
a = UInt(wrs1)
b = UInt(wrs2)

d_hwsel = None
a_qwsel = DecodeQuarterWordSelect(wrs1_qwsel)
b_qwsel = DecodeQuarterWordSelect(wrs2_qwsel)



### Operation

a_qw = GetQuarterWord(a, a_qwsel)
b_qw = GetQuarterWord(b, b_qwsel)

mul_res = a_qw * b_qw

if zero_accumulator:
ACC = 0

ACC = ACC + (mul_res << acc_shift_imm)



## BN.MULQACC.WO

Quarter-word Multiply and Accumulate with half-word writeback. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits, and adds the result to the accumulator. Writes the resulting accumulator to wrd. Updates the M, L and Z flags of flag_group.

BN.MULQACC.WO[<zero_acc>] <wrd>, <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>[, FG<flag_group>]

Assembly symbolDescription

<zero_acc>

Zero the accumulator before accumulating the multiply result.

To specify, use the literal syntax .z

<wrd>

Destination WDR.

<wrs1>

First source WDR

<wrs1_qwsel>

Quarter-word select for <wrs1>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<wrs2>

Second source WDR

<wrs2_qwsel>

Quarter-word select for <wrs2>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<acc_shift_imm>

The number of bits to shift the WLEN/2-bit multiply result before accumulating.

Valid range: 0..192 in steps of 64

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.MULQACC.WO flag_group 0 1 wrs2_qwsel wrs1_qwsel wrs2 wrs1 acc_shift_imm[7:6] zero_acc wrd 0 1 1 1 0 1 1

### Decode

zero_accumulator = DecodeMulqaccZeroacc(zero_acc)

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

d_hwsel = None
a_qwsel = DecodeQuarterWordSelect(wrs1_qwsel)
b_qwsel = DecodeQuarterWordSelect(wrs2_qwsel)

fg = DecodeFlagGroup(flag_group)



### Operation

a_qw = GetQuarterWord(a, a_qwsel)
b_qw = GetQuarterWord(b, b_qwsel)

mul_res = a_qw * b_qw

if zero_accumulator:
ACC = 0

ACC = ACC + (mul_res << acc_shift_imm)

WDR[d] = ACC
FLAGS[fg].M = ACC[WLEN-1]
FLAGS[fg].L = ACC[0]
FLAGS[fg].Z = (ACC == 0)



## BN.MULQACC.SO

Quarter-word Multiply and Accumulate with half-word writeback. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits and adds the result to the accumulator. Next, shifts the resulting accumulator right by half a word (128 bits). The bits that are shifted out are written to a half-word of wrd, selected with wrd_hwsel.

This instruction never changes the C flag. If wrd_hwsel is zero (so the instruction is updating the lower half-word of wrd), it updates the L and Z flags and leaves M unchanged. The L flag is set iff the bottom bit of the shifted-out result is zero. The Z flag is set iff the shifted-out result is zero.

If wrd_hwsel is one (so the instruction is updating the upper half-word of wrd), it updates the M and Z flags and leaves L unchanged. The M flag is set iff the top bit of the shifted-out result is zero. The Z flag is left unchanged if the shifted-out result is zero and cleared if not.

BN.MULQACC.SO[<zero_acc>] <wrd>.<wrd_hwsel>, <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>[, FG<flag_group>]

Assembly symbolDescription

<zero_acc>

Zero the accumulator before accumulating the multiply result.

To specify, use the literal syntax .z

<wrd>

Destination WDR.

<wrd_hwsel>

Half-word select for <wrd>. A value of L means the less significant half-word; U means the more significant half-word.

Syntax table:

Syntax Value of immediate
l 0
u 1

<wrs1>

First source WDR

<wrs1_qwsel>

Quarter-word select for <wrs1>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<wrs2>

Second source WDR

<wrs2_qwsel>

Quarter-word select for <wrs2>.

Valid values:

• 0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
• 1: Select wrs1[WLEN/2:WLEN/4]
• 2: Select wrs1[WLEN/4*3-1:WLEN/2]
• 3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0..3

<acc_shift_imm>

The number of bits to shift the WLEN/2-bit multiply result before accumulating.

Valid range: 0..192 in steps of 64

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.MULQACC.SO flag_group 1 wrd_hwsel wrs2_qwsel wrs1_qwsel wrs2 wrs1 acc_shift_imm[7:6] zero_acc wrd 0 1 1 1 0 1 1

### Decode

zero_accumulator = DecodeMulqaccZeroacc(zero_acc)

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

d_hwsel = DecodeHalfWordSelect(wrd_hwsel)
a_qwsel = DecodeQuarterWordSelect(wrs1_qwsel)
b_qwsel = DecodeQuarterWordSelect(wrs2_qwsel)

fg = DecodeFlagGroup(flag_group)



### Operation

a_qw = GetQuarterWord(a, a_qwsel)
b_qw = GetQuarterWord(b, b_qwsel)

mul_res = a_qw * b_qw

if zero_accumulator:
ACC = 0

ACC = ACC + (mul_res << acc_shift_imm)

shifted = ACC[WLEN/2-1:0]
ACC = ACC >> (WLEN/2)

if d_hwsel == 'L':
WDR[d][WLEN/2-1:0] = shifted
FLAGS[fg].L = shifted[0]
FLAGS[fg].Z = (shifted == 0)
elif d_hwsel == 'U':
WDR[d][WLEN-1:WLEN/2] = shifted
FLAGS[fg].M = shifted[WLEN/2-1]
if (shifted != 0):
FLAGS[fg].Z = 0



## BN.SUB

Subtraction. Subtracts the second WDR value from the first one, writes the result to the destination WDR and updates flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

BN.SUB <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SUB flag_group shift_type shift_bytes wrs2 wrs1 0 0 1 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

b_shifted = ShiftReg(b, st, sb)
(result, flags_out) = SubtractWithBorrow(a, b_shifted, 0)

WDR[d] = result
FLAGS[flag_group] = flags_out



## BN.SUBB

Subtract with borrow. Subtracts the second WDR value and the Carry from the first one, writes the result to the destination WDR, and updates the flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

BN.SUBB <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SUBB flag_group shift_type shift_bytes wrs2 wrs1 0 1 1 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

b_shifted = ShiftReg(b, st, sb)
(result, flags_out) = SubtractWithBorrow(a, b_shifted, FLAGS[flag_group].C)

WDR[d] = result
FLAGS[flag_group] = flags_out



## BN.SUBI

Subtract Immediate. Subtracts a zero-extended unsigned immediate from the value of a WDR, writes the result to the destination WDR, and updates the flags.

BN.SUBI <wrd>, <wrs>, <imm>[, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs>

Name of the source WDR

<imm>

Immediate value

Valid range: 0..1023

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SUBI flag_group 1 imm wrs 1 0 0 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)

fg = DecodeFlagGroup(flag_group)
i = ZeroExtend(imm, WLEN)



### Operation

(result, flags_out) = SubtractWithBorrow(a, i, 0)

WDR[d] = result
FLAGS[flag_group] = flags_out



## BN.SUBM

Pseudo-modulo subtraction. Subtract <wrs2> from <wrs1>, modulo the MOD WSR.

The intermediate result is treated as a signed number (of width WLEN + 1). If it is negative, MOD is added to it. The 2’s-complement result is then truncated to 256 bits and stored in <wrd>.

This operation correctly implements subtraction modulo MOD, providing that the intermediate result at least -MOD and at most MOD - 1. This is guaranteed if both inputs are less than MOD.

Flags are not used or saved.

BN.SUBM <wrd>, <wrs1>, <wrs2>

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SUBM 1 wrs2 wrs1 1 0 1 wrd 0 1 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)



### Operation

result = a - b

if result < 0:
result = MOD + result

WDR[d] = result & ((1 << 256) - 1)



## BN.AND

Bitwise AND. Performs a bitwise and operation. Takes the values stored in registers referenced by wrs1 and wrs2 and stores the result in the register referenced by wrd. The content of the second source register can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group 0 are updated with the result of the operation.

BN.AND <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.AND flag_group shift_type shift_bytes wrs2 wrs1 0 1 0 wrd 1 1 1 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)
fg = DecodeFlagGroup(flag_group)



### Operation

b_shifted = ShiftReg(b, st, sb)
result = a & b_shifted

WDR[d] = result
flags_out = FlagsForResult(result)
FLAGS[flag_group] = {M: flags_out.M, L: flags_out.L, Z: flags_out.Z, C: FLAGS[flag_group].C}



## BN.OR

Bitwise OR. Performs a bitwise or operation. Takes the values stored in WDRs referenced by wrs1 and wrs2 and stores the result in the WDR referenced by wrd. The content of the second source WDR can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group 0 are updated with the result of the operation.

BN.OR <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.OR flag_group shift_type shift_bytes wrs2 wrs1 1 0 0 wrd 1 1 1 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)
fg = DecodeFlagGroup(flag_group)



### Operation

b_shifted = ShiftReg(b, st, sb)
result = a | b_shifted

WDR[d] = result
flags_out = FlagsForResult(result)
FLAGS[flag_group] = {M: flags_out.M, L: flags_out.L, Z: flags_out.Z, C: FLAGS[flag_group].C}



## BN.NOT

Bitwise NOT. Negates the value in wrs and stores the result in the register referenced by wrd. The source value can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group 0 are updated with the result of the operation.

BN.NOT <wrd>, <wrs>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs>

Name of the source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.NOT flag_group shift_type shift_bytes wrs 1 0 1 wrd 1 1 1 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)

sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)
fg = DecodeFlagGroup(flag_group)



### Operation

a_shifted = ShiftReg(a, st, sb)
result = ~a_shifted

WDR[d] = result
flags_out = FlagsForResult(result)
FLAGS[flag_group] = {M: flags_out.M, L: flags_out.L, Z: flags_out.Z, C: FLAGS[flag_group].C}



## BN.XOR

Bitwise XOR. Performs a bitwise xor operation. Takes the values stored in WDRs referenced by wrs1 and wrs2 and stores the result in the WDR referenced by wrd. The content of the second source WDR can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group 0 are updated with the result of the operation.

BN.XOR <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.XOR flag_group shift_type shift_bytes wrs2 wrs1 1 1 0 wrd 1 1 1 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)

sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)
fg = DecodeFlagGroup(flag_group)



### Operation

b_shifted = ShiftReg(b, st, sb)
result = a ^ b_shifted

WDR[d] = result
flags_out = FlagsForResult(result)
FLAGS[flag_group] = {M: flags_out.M, L: flags_out.L, Z: flags_out.Z, C: FLAGS[flag_group].C}



## BN.RSHI

Concatenate and right shift immediate. Concatenates the content of WDRs referenced by wrs1 and wrs2 (wrs1 forms the upper part), shifts it right by an immediate value and truncates to WLEN bit. The result is stored in the WDR referenced by wrd.

BN.RSHI <wrd>, <wrs1>, <wrs2> >> <imm>

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<imm>

Number of bits to shift the second source register by. Valid range: 0..(WLEN-1).

Valid range: 0..255

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.RSHI imm[7:1] wrs2 wrs1 imm[0] 1 1 wrd 1 1 1 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)
shift_bit = Uint(imm)



### Operation

WDR[d] = (((a << WLEN) | b) >> shift_bit)[WLEN-1:0]



## BN.SEL

Flag Select. Returns in the destination WDR the value of the first source WDR if the flag in the chosen flag group is set, otherwise returns the value of the second source WDR.

BN.SEL <wrd>, <wrs1>, <wrs2>, [FG<flag_group>.]<flag>

Assembly symbolDescription

<wrd>

Name of the destination WDR

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

<flag>

Flag to check. Valid values:

• C: Carry flag
• M: MSB flag
• L: LSB flag
• Z: Zero flag

Syntax table:

Syntax Value of immediate
c 0
m 1
l 2
z 3
 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SEL flag_group flag wrs2 wrs1 0 0 0 wrd 0 0 0 1 0 1 1

### Decode

d = UInt(wrd)
a = UInt(wrs1)
b = UInt(wrs2)
fg = DecodeFlagGroup(flag_group)
flag = DecodeFlag(flag)



### Operation

flag_is_set = FLAGS[fg].get(flag)

WDR[d] = wrs1 if flag_is_set else wrs2



## BN.CMP

Compare. Subtracts the second WDR value from the first one and updates flags. This instruction is identical to BN.SUB, except that no result register is written.

BN.CMP <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.CMP flag_group shift_type shift_bytes wrs2 wrs1 0 0 1 0 0 0 1 0 1 1

### Decode

a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

b_shifted = ShiftReg(b, st, sb)
(, flags_out) = SubtractWithBorrow(a, b_shifted, 0)

FLAGS[flag_group] = flags_out



## BN.CMPB

Compare with Borrow. Subtracts the second WDR value from the first one and updates flags. This instruction is identical to BN.SUBB, except that no result register is written.

BN.CMPB <wrs1>, <wrs2>[ <shift_type> <shift_bytes>B][, FG<flag_group>]

Assembly symbolDescription

<wrs1>

Name of the first source WDR

<wrs2>

Name of the second source WDR

<shift_type>

The direction of an optional shift applied to <wrs2>.

Syntax table:

Syntax Value of immediate
<< 0
>> 1

<shift_bytes>

Number of bytes by which to shift <wrs2>. Defaults to 0.

Valid range: 0..31

<flag_group>

Flag group to use. Defaults to 0.

Valid range: 0..1

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.CMPB flag_group shift_type shift_bytes wrs2 wrs1 0 1 1 0 0 0 1 0 1 1

### Decode

a = UInt(wrs1)
b = UInt(wrs2)

fg = DecodeFlagGroup(flag_group)
sb = UInt(shift_bytes)
st = DecodeShiftType(shift_type)



### Operation

(, flags_out) = SubtractWithBorrow(a, b, FLAGS[flag_group].C)

FLAGS[flag_group] = flags_out



## BN.LID

Load Word (indirect source, indirect destination). Load a WLEN-bit little-endian value from data memory.

The load address is offset plus the value in the GPR grs1. The loaded value is stored into the WDR given by the bottom 5 bits of the GPR grd.

After the operation, either the value in the GPR grs1, or the value in grd can be optionally incremented.

• If grs1_inc is set, the value in grs1 is incremented by value WLEN/8 (one word).
• If grd_inc is set, grd is updated to be (*grd + 1) & 0x1f.

The memory address must be aligned to WLEN bits. Any address that is unaligned or is above the top of memory will result in an error (with error code ErrCodeBadDataAddr).

BN.LID <grd>[<grd_inc>], <offset>(<grs1>[<grs1_inc>])


This instruction takes 2 cycles.

Assembly symbolDescription

<grd>

Name of the GPR referencing the destination WDR

<grs1>

Name of the GPR containing the memory byte address. The value contained in the referenced GPR must be WLEN-aligned.

<offset>

Offset value. Must be WLEN-aligned.

Valid range: -16384..16352 in steps of 32

<grs1_inc>

Increment the value in <grs1> by WLEN/8 (one word). Cannot be specified together with grd_inc.

To specify, use the literal syntax ++

<grd_inc>

Increment the value in <grd> by one. Cannot be specified together with grs1_inc.

To specify, use the literal syntax ++

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.LID offset[11:5] grd grs1 1 0 0 offset[14:12] grs1_inc grd_inc 0 0 0 1 0 1 1

### Decode

rd = UInt(grd)
rs1 = UInt(grs1)
offset = UInt(offset)



### Operation

mem_addr = GPR[rs1] + offset
wdr_dest = GPR[rd]

assert not (grs1_inc and grd_inc)  # prevented in encoding
if mem_addr % (WLEN / 8) or mem_addr + WLEN > DMEM_SIZE:

mem_index = mem_addr // (WLEN / 8)

if grs1_inc:
GPR[rs1] = GPR[rs1] + (WLEN / 8)
if grd_inc:
GPR[rd] = (GPR[rd] + 1) & 0x1f



## BN.SID

Store Word (indirect source, indirect destination). Store a WDR to memory as a WLEN-bit little-endian value.

The store address is offset plus the value in the GPR grs1. The value to store is taken from the WDR given by the bottom 5 bits of the GPR grs2.

After the operation, either the value in the GPR grs1, or the value in grs2 can be optionally incremented.

• If grs1_inc is set, the value in grs1 is incremented by the value WLEN/8 (one word).
• If grs2_inc is set, the value in grs2 is updated to be (*grs2 + 1) & 0x1f.

The memory address must be aligned to WLEN bits. Any address that is unaligned or is above the top of memory will result in an error (with error code ErrCodeBadDataAddr).

BN.SID <grs2>[<grs2_inc>], <offset>(<grs1>[<grs1_inc>])

Assembly symbolDescription

<grs1>

Name of the GPR containing the memory byte address. The value contained in the referenced GPR must be WLEN-aligned.

<grs2>

Name of the GPR referencing the source WDR.

<offset>

Offset value. Must be WLEN-aligned.

Valid range: -16384..16352 in steps of 32

<grs1_inc>

Increment the value in <grs1> by WLEN/8 (one word). Cannot be specified together with grs2_inc.

To specify, use the literal syntax ++

<grs2_inc>

Increment the value in <grs2> by one. Cannot be specified together with grs1_inc.

To specify, use the literal syntax ++

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.SID offset[11:5] grs2 grs1 1 0 1 offset[14:12] grs1_inc grs2_inc 0 0 0 1 0 1 1

### Decode

rs1 = UInt(grs1)
rs2 = UInt(grs2)
offset = UInt(offset)



### Operation

mem_addr = GPR[rs1] + offset
wdr_src = GPR[rs2]

assert not (grs1_inc and grd_inc)  # prevented in encoding
if mem_addr % (WLEN / 8) or mem_addr + WLEN > DMEM_SIZE:

mem_index = mem_addr // (WLEN / 8)

StoreWlenWordToMemory(mem_index, WDR[wdr_src])

if grs1_inc:
GPR[rs1] = GPR[rs1] + (WLEN / 8)
if grs2_inc:
GPR[rs2] = (GPR[rs2] + 1) & 0x1f



## BN.MOV

Copy content between WDRs (direct addressing).

BN.MOV <wrd>, <wrs>

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.MOV 0 wrs 1 1 0 wrd 0 0 0 1 0 1 1

### Decode

s = UInt(wrs)
d = UInt(wrd)



### Operation

WDR[d] = WDR[s]


## BN.MOVR

Copy content between WDRs (register-indirect addressing). Copies WDR contents between registers with indirect addressing. Optionally, either the source or the destination register address can be incremented by 1.

BN.MOVR <grd>[<grd_inc>], <grs>[<grs_inc>]

Assembly symbolDescription

<grd>

Name of the GPR containing the destination WDR.

<grs>

Name of the GPR referencing the source WDR.

<grd_inc>

Increment the value in <grd> by one. Cannot be specified together with grs_inc.

To specify, use the literal syntax ++

<grs_inc>

Increment the value in <grs> by one. Cannot be specified together with grd_inc.

To specify, use the literal syntax ++

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.MOVR 1 grd grs 1 1 0 grs_inc grd_inc 0 0 0 1 0 1 1

### Decode

s = UInt(grs)
d = UInt(grd)



### Operation

WDR[GPR[d]] = WDR[GPR[s]]

if grs_inc:
GPR[s] = GPR[s] + 1
if grd_inc:
GPR[d] = GPR[d] + 1



## BN.WSRR

Read WSR to register. Reads a WSR to a WDR. If wsr isn’t the index of a valid WSR, this halts with an error (TODO: Specify error code).

BN.WSRR <wrd>, <wsr>

Assembly symbolDescription

<wrd>

Destination WDR

<wsr>

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.WSRR 0 wsr 1 1 1 wrd 0 0 0 1 0 1 1

### Operation

WDR[wrd] = WSR[wsr]



## BN.WSRW

Write WSR from register. Writes a WDR to a WSR. If wsr isn’t the index of a valid WSR, this halts with an error (TODO: Specify error code).

BN.WSRW <wsr>, <wrs>

Assembly symbolDescription

<wsr>

<wrs>

Source WDR

 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 BN.WSRW 1 wsr wrs 1 1 1 0 0 0 1 0 1 1

### Operation

WSR[wsr] = WDR[wrs]



# Pseudo-Code Functions for BN Instructions

The instruction description uses Python-based pseudocode. Commonly used functions are defined once below.

##### Note

This “pseudo-code” is intended to be Python 3, and contains known inconsistencies at the moment. It will be further refined as we make progress in the implementation of a simulator using this syntax.

class Flag(Enum):
C: Bits[1]
M: Bits[1]
L: Bits[1]
Z: Bits[1]

class FlagGroup:
C: Bits[1]
M: Bits[1]
L: Bits[1]
Z: Bits[1]

def set(self, flag: Flag, value: Bits[1]):
assert flag in Flag

if flag == Flag.C:
self.C = value
elif flag == Flag.M:
self.M = value
elif flag == Flag.L:
self.L = value
elif flag == Flag.Z:
self.Z = value

def get(self, flag: Flag):
assert flag in Flag

if flag == Flag.C:
return self.C
elif flag == Flag.M:
return self.M
elif flag == Flag.L:
return self.L
elif flag == Flag.Z:
return self.Z

class ShiftType(Enum):
LSL = 0 # logical shift left
LSR = 1 # logical shift right

class HalfWord(Enum):
LOWER = 0 # lower or less significant half-word
UPPER = 1 # upper or more significant half-word

def DecodeShiftType(st: Bits(1)) -> ShiftType:
if st == 0:
return ShiftType.LSL
elif st == 1:
return ShiftType.LSR
else:
raise UndefinedException()

def DecodeFlagGroup(flag_group: Bits(1)) -> UInt:
if flag_group > 1:
raise UndefinedException()
return UInt(flag_group)

def DecodeFlag(flag: Bits(1)) -> Flag:
if flag == 0:
return ShiftType.C
elif flag == 1:
return ShiftType.M
elif flag == 2:
return ShiftType.L
elif flag == 3:
return ShiftType.Z
else:
raise UndefinedException()

def ShiftReg(reg, shift_type, shift_bytes) -> Bits(N):
if ShiftType == ShiftType.LSL:
return GPR[reg] << shift_bytes << 3
elif ShiftType == ShiftType.LSR:
return GPR[reg] >> shift_bytes >> 3

def AddWithCarry(a: Bits(WLEN), b: Bits(WLEN), carry_in: Bits(1)) -> (Bits(WLEN), FlagGroup):
result: Bits[WLEN+1] = a + b + carry_in

flags_out = FlagGroup()
flags_out.C = result[WLEN]
flags_out.L = result[0]
flags_out.M = result[WLEN-1]
flags_out.Z = (result[WLEN-1:0] == 0)

return (result[WLEN-1:0], flags_out)

def SubtractWithBorrow(a: Bits(WLEN), b: Bits(WLEN), borrow_in: Bits(1)) -> (Bits(WLEN), FlagGroup):
result: Bits[WLEN+1] = a - b - borrow_in

flags_out = FlagGroup()
flags_out.C = result[WLEN]
flags_out.L = result[0]
flags_out.M = result[WLEN-1]
flags_out.Z = (result[WLEN-1:0] == 0)

return (result[WLEN-1:0], flags_out)

def DecodeHalfWordSelect(hwsel: Bits(1)) -> HalfWord:
if hwsel == 0:
return HalfWord.LOWER
elif hwsel == 1:
return HalfWord.UPPER
else:
raise UndefinedException()

def GetHalfWord(reg: integer, hwsel: HalfWord) -> Bits(WLEN/2):
if hwsel == HalfWord.LOWER:
return GPR[reg][WLEN/2-1:0]
elif hwsel == HalfWord.UPPER:
return GPR[reg][WLEN-1:WLEN/2]