OpenTitan Big Number Accelerator (OTBN) Technical Specification

Note on the status of this document

This specification is work in progress and will see significant changes before it can be considered final. We invite input of all kind through the standard means of the OpenTitan project; a good starting point is filing an issue in our GitHub issue tracker.

Overview

This document specifies functionality of the OpenTitan Big Number Accelerator, or OTBN. OTBN is a coprocessor for asymmetric cryptographic operations like RSA or Elliptic Curve Cryptography (ECC).

This module conforms to the Comportable guideline for peripheral functionality. See that document for integration overview within the broader top level system.

Features

  • Processor optimized for wide integer arithmetic
  • 32b wide control path with 32 32b wide registers
  • 256b wide data path with 32 256b wide registers
  • Full control-flow support with conditional branch and unconditional jump instructions, hardware loops, and hardware-managed call/return stacks.
  • Reduced, security-focused instruction set architecture for easier verification and the prevention of data leaks.
  • Built-in access to random numbers. Note: The (quality) properties of the provided random numbers are not currently specified; this gap in the specification will be addressed in a future revision.

Description

OTBN is a processor, specialized for the execution of security-sensitive asymmetric (public-key) cryptography code, such as RSA or ECC. Such algorithms are dominated by wide integer arithmetic, which are supported by OTBN’s 256b wide data path, registers, and instructions which operate these wide data words. On the other hand, the control flow is clearly separated from the data, and reduced to a minimum to avoid data leakage.

The data OTBN processes is security-sensitive, and the processor design centers around that. The design is kept as simple as possible to reduce the attack surface and aid verification and testing. For example, no interrupts or exceptions are included in the design, and all instructions are designed to be executable within a single cycle.

OTBN is designed as a self-contained co-processor with its own instruction and data memory, which is accessible as a bus device.

Compatibility

OTBN is not designed to be compatible with other cryptographic accelerators. It received some inspiration from assembly code available from the Chromium EC project, which has been formally verified within the Fiat Crypto project.

Instruction Set

OTBN is a processor with a custom instruction set. The full ISA description can be found in our ISA manual. The instruction set is split into two groups:

  • The base instruction subset operates on the 32b General Purpose Registers (GPRs). Its instructions are used for the control flow of a OTBN application. The base instructions are inspired by RISC-V’s RV32I instruction set, but not compatible with it.
  • The big number instruction subset operates on 256b Wide Data Registers (WDRs). Its instructions are used for data processing.

Processor State

General Purpose Registers (GPRs)

OTBN has 32 General Purpose Registers (GPRs), each of which is 32b wide. The GPRs are defined in line with RV32I and are mainly used for control flow. They are accessed through the base instruction subset. GPRs aren’t used by the main data path; this operates on the wide data registers, a separate register file, controlled by the big number instructions.

x0 Zero register. Reads as 0; writes are ignored.
x1

Access to the call stack

x2 ... x31 General purpose registers

Note: Currently, OTBN has no “standard calling convention,” and GPRs other than x0 and x1 can be used for any purpose. If a calling convention is needed at some point, it is expected to be aligned with the RISC-V standard calling conventions, and the roles assigned to registers in that convention. Even without a agreed-on calling convention, software authors are encouraged to follow the RISC-V calling convention where it makes sense. For example, good choices for temporary registers are x6, x7, x28, x29, x30, and x31.

Call Stack

OTBN has an in-built call stack which is accessed through the x1 GPR. This is intended to be used as a return address stack, containing return addresses for the current stack of function calls. See the documentation for JAL and JALR for a description of how to use it for this purpose.

The call stack has a maximum depth of 8 elements. Each instruction that reads from x1 pops a single element from the stack. Each instruction that writes to x1 pushes a single element onto the stack. An instruction that reads from an empty stack or writes to a full stack, causes OTBN to stop, raising an alert and setting the ERR_CODE register to ErrCodeCallStack.

A single instruction can both read and write to the stack. In this case, the read is ordered before the write. Providing the stack has at least one element, this is allowed, even if the stack is full.

Control and Status Registers (CSRs)

Control and Status Registers (CSRs) are 32b wide registers used for “special” purposes, as detailed in their description; they are not related to the GPRs. CSRs can be accessed through dedicated instructions, CSRRS and CSRRW.

Number Privilege Description
0x7C0 RW FG0. Wide arithmetic flag group 0. This CSR provides access to flag group 0 used by wide integer arithmetic. FLAGS, FG0 and FG1 provide different views on the same underlying bits.
BitDescription
0Carry of Flag Group 0
1MSb of Flag Group 0
2LSb of Flag Group 0
3Zero of Flag Group 0
0x7C1 RW FG1. Wide arithmetic flag group 1. This CSR provides access to flag group 1 used by wide integer arithmetic. FLAGS, FG0 and FG1 provide different views on the same underlying bits.
BitDescription
0Carry of Flag Group 1
1MSb of Flag Group 1
2LSb of Flag Group 1
3Zero of Flag Group 1
0x7C8 RW FLAGS. Wide arithmetic flag groups. This CSR provides access to both flags groups used by wide integer arithmetic. FLAGS, FG0 and FG1 provide different views on the same underlying bits.
BitDescription
0Carry of Flag Group 0
1MSb of Flag Group 0
2LSb of Flag Group 0
3Zero of Flag Group 0
4Carry of Flag Group 1
5MSb of Flag Group 1
6LSb of Flag Group 1
7Zero of Flag Group 1
0x7D0 RW MOD0. Bits [31:0] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D1 RW MOD1. Bits [63:32] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D2 RW MOD2. Bits [95:64] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D3 RW MOD3. Bits [127:96] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D4 RW MOD4. Bits [159:128] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D5 RW MOD5. Bits [191:160] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D6 RW MOD6. Bits [223:192] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0x7D7 RW MOD7. Bits [255:224] of the modulus operand, used in the BN.ADDM/BN.SUBM instructions. This CSR is mapped to the MOD WSR.
0xFC0 R RND. A random number.

Wide Data Registers (WDRs)

In addition to the 32b wide GPRs, OTBN has a second “wide” register file, which is used by the big number instruction subset. This register file consists of NWDR = 32 Wide Data Registers (WDRs). Each WDR is WLEN = 256b wide.

Wide Data Registers (WDRs) and the 32b General Purpose Registers (GPRs) are separate register files. They are only accessible through their respective instruction subset: GPRs are accessible from the base instruction subset, and WDRs are accessible from the big number instruction subset (BN instructions).

Register
w0
w1
w31

Wide Special Purpose Registers (WSRs)

OTBN has 256b Wide Special purpose Registers (WSRs). These are analogous to the 32b CSRs, but are used by big number instructions. They can be accessed with the BN.WSRRS and BN.WSRRW instructions.

Number Name R/W Description
0x0 MOD RW

The modulus used by the BN.ADDM and BN.SUBM instructions. This WSR is also visible as CSRs MOD0 through to MOD7.

0x1 RND R A random number.
0x2 ACC RW The accumulator register used by the BN.MULQACC instruction.

Flags

In addition to the wide register file, OTBN maintains global state in two groups of flags for the use by wide integer operations. Flag groups are named Flag Group 0 (FG0), and Flag Group 1 (FG1). Each group consists of four flags. Each flag is a single bit.

  • C (Carry flag). Set to 1 an overflow occurred in the last arithmetic instruction.

  • L (LSb flag). The least significant bit of the result of the last arithmetic or shift instruction.

  • M (MSb flag) The most significant bit of the result of the last arithmetic or shift instruction.

  • Z (Zero Flag) Set to 1 if the result of the last operation was zero; otherwise 0.

The L, M, and Z flags are determined based on the result of the operation as it is written back into the result register, without considering the overflow bit.

Loop Stack

The LOOP instruction allows for nested loops; the active loops are stored on the loop stack. Each loop stack entry is a tuple of loop count, start address, and end address. The number of entries in the loop stack is implementation-dependent.

Theory of Operations

Block Diagram

OTBN architecture block diagram

Hardware Interfaces

Referring to the Comportable guideline for peripheral device functionality, the module otbn has the following hardware interfaces defined.

Primary Clock: clk_i

Other Clocks: none

Bus Device Interface: tlul

Bus Host Interface: none

Peripheral Pins for Chip IO: none

Interrupts:

Interrupt NameDescription
doneOTBN has completed the operation
errAn error occurred. Read the ERR_CODE register for error details.

Security Alerts:

Alert NameDescription
imem_uncorrectableUncorrectable error in the instruction memory detected.
dmem_uncorrectableUncorrectable error in the data memory detected.
reg_uncorrectableUncorrectable error in one of the register files detected.

Design Details

Note

To be filled in as we create the implementation.

By design, OTBN is a simple processor and has essentially no error handling support. When anything goes wrong (an out-of-bounds memory operation, an invalid instruction encoding, etc.), OTBN will stop fetching instructions, and set the ERR_CODE register and the err bit of the INTR_STATE register.

Programmers Guide

Note

This section will be written as we move on in the design and implementation process.

Memories

The OTBN processor core has access to two dedicated memories: an instruction memory (IMEM), and a data memory (DMEM). Each memory is 4 kiB in size.

The memory layout follows the Harvard architecture. Both memories are byte-addressed, with addresses starting at 0.

The instruction memory (IMEM) is 32b wide and provides the instruction stream to the OTBN processor; it cannot be read or written from user code through load or store instructions.

The data memory (DMEM) is 256b wide and read-write accessible from the base and big number instruction subsets of the OTBN processor core. When accessed from the base instruction subset through the LW or SW instructions, accesses must read or write 32b-aligned 32b words. When accessed from the big number instruction subset through the BN.LID or BN.SID instructions, accesses must read or write 256b-aligned 256b words.

Both memories can be accessed through OTBN’s register interface (DMEM and IMEM) only when OTBN is idle, as indicated by the STATUS.busy flag. All memory accesses through the register interface must be word-aligned 32b word accesses.

Operation

Note

The exact sequence of operations is not yet finalized.

Rough expected process:

Error conditions

Note

To be filled in as we create the implementation.

Device Interface Functions (DIFs)

To use this DIF, include the following C header:

#include "sw/device/lib/dif/dif_otbn.h"

This header provides the following device interface functions:

Register Table

otbn.INTR_STATE @ + 0x0
Interrupt State Register
Reset default = 0x0, mask 0x3
31302928272625242322212019181716
 
1514131211109876543210
  err done
BitsTypeResetNameDescription
0rw1c0x0doneOTBN has completed the operation
1rw1c0x0errAn error occurred. Read the ERR_CODE register for error details.


otbn.INTR_ENABLE @ + 0x4
Interrupt Enable Register
Reset default = 0x0, mask 0x3
31302928272625242322212019181716
 
1514131211109876543210
  err done
BitsTypeResetNameDescription
0rw0x0doneEnable interrupt when INTR_STATE.done is set
1rw0x0errEnable interrupt when INTR_STATE.err is set


otbn.INTR_TEST @ + 0x8
Interrupt Test Register
Reset default = 0x0, mask 0x3
31302928272625242322212019181716
 
1514131211109876543210
  err done
BitsTypeResetNameDescription
0wo0x0doneWrite 1 to force INTR_STATE.done to 1
1wo0x0errWrite 1 to force INTR_STATE.err to 1


otbn.ALERT_TEST @ + 0xc
Alert Test Register
Reset default = 0x0, mask 0x7
31302928272625242322212019181716
 
1514131211109876543210
  reg_uncorrectable dmem_uncorrectable imem_uncorrectable
BitsTypeResetNameDescription
0wo0x0imem_uncorrectableWrite 1 to trigger one alert event of this kind.
1wo0x0dmem_uncorrectableWrite 1 to trigger one alert event of this kind.
2wo0x0reg_uncorrectableWrite 1 to trigger one alert event of this kind.


otbn.CMD @ + 0x10
command register
Reset default = 0x0, mask 0x3
31302928272625242322212019181716
 
1514131211109876543210
  dummy start
BitsTypeResetNameDescription
0r0w1c0x0startStart the operation The completion is signalled by the done interrupt.
1r0w1c0x0dummyReggen doesn't generate sub-fields with only a single field specified; instead, the whole register is taken as a field, leading to signals like `hw2reg.status.d` instead of `hw2reg.status.start.d`. Since we expect to add more commands later, we force the generation of fields with this dummy field for now.


otbn.STATUS @ + 0x14
Status
Reset default = 0x0, mask 0x3
31302928272625242322212019181716
 
1514131211109876543210
  dummy busy
BitsTypeResetNameDescription
0ro0x0busyOTBN is performing an operation.
1ro0x0dummySee CMD.dummy for details.


otbn.ERR_CODE @ + 0x18
Error Code
Reset default = 0x0, mask 0xffffffff
31302928272625242322212019181716
err_code...
1514131211109876543210
...err_code
BitsTypeResetNameDescription
31:0ro0x0err_codeThe error cause if an error occurred. Software should read this register before clearing the err interrupt to avoid race conditions. Possible values: - 0x0 (ErrCodeNoError): No error occurred. - 0x1 (ErrCodeBadDataAddr): Load or store to invalid address - 0x2 (ErrCodeCallStack): Call stack underflow/overflow


otbn.START_ADDR @ + 0x1c
Start byte address in the instruction memory
Reset default = 0x0, mask 0xffffffff
31302928272625242322212019181716
start_addr...
1514131211109876543210
...start_addr
BitsTypeResetNameDescription
31:0wo0x0start_addrByte address in the instruction memory OTBN starts to execute from when instructed to do so with the CMD.start .


otbn.IMEM @ + 0x4000
1024 item rw window
Byte writes are not supported
310
+0x4000 
+0x4004 
 ...
+0x4ff8 
+0x4ffc 
Instruction Memory. This register should only be accesed while OTBN is not busy, as indicated by the STATUS.busy flag. Accesses while OTBN is busy are blocking. TODO: The exact behavior is yet to be determined, see https://github.com/lowRISC/opentitan/issues/2696 for details.


otbn.DMEM @ + 0x8000
1024 item rw window
Byte writes are not supported
310
+0x8000 
+0x8004 
 ...
+0x8ff8 
+0x8ffc 
Data Memory. This register should only be accesed while OTBN is not busy, as indicated by the STATUS.busy flag. Accesses while OTBN is busy are blocking. TODO: The exact behavior is yet to be determined, see https://github.com/lowRISC/opentitan/issues/2696 for details.


Algorithic Examples: Multiplication with BN.MULQACC

The big number instruction subset of OTBN generally operates on WLEN bit numbers. BN.MULQACC operates with WLEN/4 bit operands (with a full WLEN accumulator). This section outlines two techniques to perform larger multiplies by composing multiple BN.MULQACC instructions.

Multiplying two WLEN/2 numbers with BN.MULQACC

This instruction sequence multiplies the lower half of w0 by the upper half of w0 placing the result in w1.

BN.MULQACC.Z      w0.0, w0.2, 0
BN.MULQACC        w0.0, w0.3, 64
BN.MULQACC        w0.1, w0.2, 64
BN.MULQACC.WO w1, w0.1, w0.3, 128

Multiplying two WLEN numbers with BN.MULQACC

The shift out functionality can be used to perform larger multiplications without extra adds. The table below shows how two registers w0 and w1 can be multiplied together to give a result in w2 and w3. The cells on the right show how the result is built up a0:a3 = w0.0:w0.3 and b0:b3 = w1.0:w1.3. The sum of a column represents WLEN/4 bits of a destination register, where c0:c3 = w2.0:w2.3 and d0:d3 = w3.0:w3.3. Each cell with a multiply in takes up two WLEN/4-bit columns to represent the WLEN/2-bit multiply result. The current accumulator in each instruction is represented by highlighted cells where the accumulator value will be the sum of the highlighted cell and all cells above it.

The outlined technique can be extended to arbitrary bit widths but requires unrolled code with all operands in registers.

d3 d2 d1 d0 c3 c2 c1 c0
BN.MULQACC.Z w0.0, w1.0, 0 a0 * b0
BN.MULQACC w0.1, w1.0, 64 a1 * b0
BN.MULQACC.SO w2.l, w0.0, w1.1, 64 a0 * b1
BN.MULQACC w0.2, w1.0, 0 a2 * b0
BN.MULQACC w0.1, w1.1, 0 a1 * b1
BN.MULQACC w0.0, w1.2, 0 a0 * b2
BN.MULQACC w0.3, w1.0, 64 a3 * b0
BN.MULQACC w0.2, w1.1, 64 a2 * b1
BN.MULQACC w0.1, w1.2, 64 a1 * b2
BN.MULQACC.SO w2.u, w0.0, w1.3, 64 a0 * b3
BN.MULQACC w0.3, w1.1, 0 a3 * b1
BN.MULQACC w0.2, w1.2, 0 a2 * b2
BN.MULQACC w0.1, w1.3, 0 a1 * b3
BN.MULQACC w0.3, w1.2, 64 a3 * b2
BN.MULQACC.SO w3.l, w0.2, w1.3, 64 a2 * b3
BN.MULQACC.SO w3.u, w0.3, w1.3, 0 a3 * b3

Code snippets giving examples of 256x256 and 384x384 multiplies can be found in sw/otbn/code-snippets/mul256.s and sw/otbn/code-snippets/mul384.s.