ALERT_HANDLER DV document

Goals

  • DV
    • Verify all ALERT_HANDLER IP features by running dynamic simulations with a SV/UVM based testbench
    • Develop and run all tests based on the testplan below towards closing code and functional coverage on the IP and all of its sub-modules
  • FPV
    • Verify TileLink device protocol compliance with an SVA based testbench
    • Verify transmitter and receiver pairs for alert and escalator
    • Partially verify ping_timer

Current status

Design features

For detailed information on ALERT_HANDLER design features, please see the ALERT_HANDLER HWIP technical specification.

Testbench architecture

ALERT_HANDLER testbench has been constructed based on the CIP testbench architecture.

Block diagram

Block diagram

Top level testbench

Top level testbench is located at hw/ip/alert_handler/dv/tb/tb.sv. It instantiates the ALERT_HANDLER DUT module hw/ip/alert_handler/rtl/alert_handler.sv. In addition, it instantiates the following interfaces, connects them to the DUT and sets their handle into uvm_config_db:

The alert_handler testbench environment can be reused in chip level testing.

Common DV utility components

The following utilities provide generic helper tasks and functions to perform activities that are common across the project:

Global types & methods

All common types and methods defined at the package level can be found in alert_handler_env_pkg. Some of them in use are:

  parameter uint NUM_MAX_ESC_SEV = 8;

TL_agent

ALERT_HANDLER testbench instantiates (already handled in CIP base env) tl_agent which provides the ability to drive and independently monitor random traffic via TL host interface into ALERT_HANDLER device.

ALERT_HANDLER Agent

[ALERT_HANDLER agent]:link WIP is used to drive and monitor transmitter and receiver pairs for the alerts and escalators.

UVM RAL Model

The ALERT_HANDLER RAL model is created with the ralgen fusesoc generator script automatically when the simulation is at the build stage.

It can be created manually by invoking regtool:

Stimulus strategy

Test sequences

All test sequences reside in hw/ip/alert_handler/dv/env/seq_lib. The alert_handler_base_vseq virtual sequence is extended from cip_base_vseq and serves as a starting point. All test sequences are extended from alert_handler_base_vseq. It provides commonly used handles, variables, functions and tasks that the test sequences can simple use / call. Some of the most commonly used tasks / functions are as follows:

  • drive_alert: Drive alert_tx signal pairs through alert_esc_if interface
  • read_ecs_status: Readout registers that reflect escalation status, including classa/b/c/d_accum_cnt, classa/b/c/d_esc_cnt, and classa/b/c/d_state

Functional coverage

To ensure high quality constrained random stimulus, it is necessary to develop a functional coverage model. The following covergroups have been developed to prove that the test intent has been adequately met:

  • accum_cnt_cg: Cover number of alerts triggered under the same class
  • esc_sig_length_cg: Cover signal length of each escalation pairs

Self-checking strategy

Scoreboard

The alert_handler_scoreboard is primarily used for end to end checking. It creates the following analysis ports to retrieve the data monitored by corresponding interface agents:

  • tl_a_chan_fifo: tl address channel
  • tl_d_chan_fifo: tl data channel
  • alert_fifo: An array of alert_fifo that connects to corresponding alert_monitors
  • esc_fifo: An array of esc_fifo that connects to corresponding esc_monitors

Alert_handler scoreboard monitors all valid CSR registers, alert handshakes, and escalation handshakes. To ensure certain alert, interrupt, or escalation signals are triggered at the expected time, the alert_handler scoreboard implemented a few counters:

  • intr_cnter_per_class[NUM_ALERT_HANDLER_CLASSES]: Count number of clock cycles that the interrupt bit stays high. If the stored number is larger than the timeout_cyc registers, the corresponding escalation is expected to be triggered
  • accum_cnter_per_class[NUM_ALERT_HANDLER_CLASSES]: Count number of alerts triggered under the same class. If the stored number is larger than the accum_threshold registers, the corresponding escalation is expected to be triggered
  • esc_cnter_per_signal[NUM_ESC_SIGNALS]: Count number of clock cycles that each escalation signal stays high. Compare the counter against phase_cyc registers

The alert_handler scoreboard is parameterized to support different number of classes, alert pairs, and escalation pairs.

Assertions

  • TLUL assertions: The tb/alert_handler_bind.sv binds the tlul_assert assertions to the IP to ensure TileLink interface protocol compliance.
  • Unknown checks on DUT outputs: The RTL has assertions to ensure all outputs are initialized to known values after coming out of reset.

Building and running tests

We are using our in-house developed regression tool for building and running our tests and regressions. Please take a look at the link for detailed information on the usage, capabilities, features and known issues. Here’s how to run a smoke test:

$ $REPO_TOP/util/dvsim/dvsim.py $REPO_TOP/hw/$CHIP/ip/alert_handler/dv/alert_handler_sim_cfg.hjson -i alert_handler_smoke

In this run command, $CHIP can be top_earlgrey, etc.

Testplan

Testpoints

Milestone Name Tests Description
V1 smoke alert_handler_smoke
  • Alert_handler smoke test with one class configured that escalates through all phases after one alert has been triggered
  • Check interrupt pins, alert cause CSR values, escalation pings, and crashdump_o output values
  • Support both synchronous and asynchronous settings
V1 csr_hw_reset alert_handler_csr_hw_reset

Verify the reset values as indicated in the RAL specification.

  • Write all CSRs with a random value.
  • Apply reset to the DUT as well as the RAL model.
  • Read each CSR and compare it against the reset value. it is mandatory to replicate this test for each reset that affects all or a subset of the CSRs.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
V1 csr_rw alert_handler_csr_rw

Verify accessibility of CSRs as indicated in the RAL specification.

  • Loop through each CSR to write it with a random value.
  • Read the CSR back and check for correctness while adhering to its access policies.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
V1 csr_bit_bash alert_handler_csr_bit_bash

Verify no aliasing within individual bits of a CSR.

  • Walk a 1 through each CSR by flipping 1 bit at a time.
  • Read the CSR back and check for correctness while adhering to its access policies.
  • This verify that writing a specific bit within the CSR did not affect any of the other bits.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
V1 csr_aliasing alert_handler_csr_aliasing

Verify no aliasing within the CSR address space.

  • Loop through each CSR to write it with a random value
  • Shuffle and read ALL CSRs back.
  • All CSRs except for the one that was written in this iteration should read back the previous value.
  • The CSR that was written in this iteration is checked for correctness while adhering to its access policies.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
V1 csr_mem_rw_with_rand_resetalert_handler_csr_mem_rw_with_rand_reset

Verify random reset during CSR/memory access.

  • Run csr_rw sequence to randomly access CSRs
  • If memory exists, run mem_partial_access in parallel with csr_rw
  • Randomly issue reset and then use hw_reset sequence to check all CSRs are reset to default value
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
V2 esc_accum alert_handler_esc_alert_accum

Based on the smoke test, this test will focus on testing the escalation accumulation feature. So all the escalations in the test will be triggered by alert accumulation.

V2 esc_timeout alert_handler_esc_intr_timeout

Based on the smoke test, this test will focus on testing the escalation timeout feature. So all the escalations in the test will be triggered by interrupt timeout.

V2 entropy alert_handler_entropy

Based on the smoke test, this test enables ping testing, and check if the ping feature correctly pings all devices within certain period of time

V2 sig_int_fail alert_handler_sig_int_fail

This test will randomly inject differential pair failures on alert tx/rx pairs and the escalator tx/rx pairs. Then check if integrity failure alert is triggered and escalated

V2 ping_corner_cases alert_handler_ping_corner_cases

Based on the entropy test, this test will randomly inject ping timeout errors and ping signal integrity errors on alert tx/rx or escalator tx/rx pairs. Once a ping request is detected, the sequence will randomly execute one of the three tasks:

  • Interrupt the ping by a reset
  • Inject alerts without any delay. This task attempts to hit the corner case where esc ping is interrupted by real esc signal
  • Let the ping response finish without any interruption. This taks attempts to hit the ping timeout and ping signal integrity fail corner case Then the sequence will read and check all the alert and esc status registers. Then clear the interrupts. Because alert_handler module's ping interval default value is relative long for simulation, this test shortens this interval in order to hit all the corner cases in a reasonable amount of run time. This test also disables ping related functional coverage to ensure the original LFSR design is able to reach all the alerts and escalators
V2 clk_skew alert_handler_smoke

This test will randomly inject clock skew within the differential pairs. Then check no alert is raised

V2 random_alerts alert_handler_random_alerts

Input random alerts and randomly write phase cycles

V2 random_classes alert_handler_random_classes

Based on random_alerts test, this test will also randomly enable interrupt classes

V2 stress_all alert_handler_stress_all

Combine above sequences in one test to run sequentially with the following exclusions:

  • CSR sequences: scoreboard disabled
  • Ping_corner_cases sequence: included reset in the sequence
V2 intr_test alert_handler_intr_test

Verify common intr_test CSRs that allows SW to mock-inject interrupts.

  • Enable a random set of interrupts by writing random value(s) to intr_enable CSR(s).
  • Randomly "turn on" interrupts by writing random value(s) to intr_test CSR(s).
  • Read all intr_state CSR(s) back to verify that it reflects the same value as what was written to the corresponding intr_test CSR.
  • Check the cfg.intr_vif pins to verify that only the interrupts that were enabled and turned on are set.
  • Clear a random set of interrupts by writing a randomly value to intr_state CSR(s).
  • Repeat the above steps a bunch of times.
V2 enable_reg alert_handler_csr_rw
alert_handler_csr_bit_bash
alert_handler_csr_aliasing

The CSR test sequences will read and write accessible CSRs including the enable registers and their locked registers. The RAL model supports predicting the correct value of the locked registers based on their enable registers.

V2 stress_all_with_rand_resetalert_handler_stress_all_with_rand_reset

This test runs 3 parallel threads - stress_all, tl_errors and random reset. After reset is asserted, the test will read and check all valid CSR registers.

V2 tl_d_oob_addr_access alert_handler_tl_errors

Access out of bounds address and verify correctness of response / behavior

V2 tl_d_illegal_access alert_handler_tl_errors

Drive unsupported requests via TL interface and verify correctness of response / behavior. Below error cases are tested bases on the [TLUL spec]({{< relref "hw/ip/tlul/doc/_index.md#explicit-error-cases" >}})

  • TL-UL protocol error cases
    • invalid opcode
    • some mask bits not set when opcode is PutFullData
    • mask does not match the transfer size, e.g. a_address = 0x00, a_size = 0, a_mask = 'b0010
    • mask and address misaligned, e.g. a_address = 0x01, a_mask = 'b0001
    • address and size aren't aligned, e.g. a_address = 0x01, a_size != 0
    • size is greater than 2
  • OpenTitan defined error cases
    • access unmapped address, expect d_error = 1 when devmode_i == 1
    • write a CSR with unaligned address, e.g. a_address[1:0] != 0
    • write a CSR less than its width, e.g. when CSR is 2 bytes wide, only write 1 byte
    • write a memory with a_mask != '1 when it doesn't support partial accesses
    • read a WO (write-only) memory
    • write a RO (read-only) memory
V2 tl_d_outstanding_access alert_handler_csr_hw_reset
alert_handler_csr_rw
alert_handler_csr_aliasing
alert_handler_same_csr_outstanding

Drive back-to-back requests without waiting for response to ensure there is one transaction outstanding within the TL device. Also, verify one outstanding when back- to-back accesses are made to the same address.

V2 tl_d_partial_access alert_handler_csr_hw_reset
alert_handler_csr_rw
alert_handler_csr_aliasing
alert_handler_same_csr_outstanding

Access CSR with one or more bytes of data. For read, expect to return all word value of the CSR. For write, enabling bytes should cover all CSR valid fields.

V3 tl_intg_err alert_handler_tl_intg_err

Verify that the data integrity check violation generates an alert.

Randomly inject errors on the control, data, or the ECC bits during CSR accesses. Verify that triggers the correct fatal alert.

Covergroups

Name Description
tl_errors_cg

Cover the following error cases on TL-UL bus:

  • TL-UL protocol error cases.
  • OpenTitan defined error cases, refer to testpoint tl_d_illegal_access.
tl_intg_err_cg

Cover all kinds of integrity errors (command, data or both) and cover number of error bits on each integrity check.

tl_intg_err_mem_subword_cg

Cover the kinds of integrity errors with byte enabled write on memory.

Some memories store the integrity values. When there is a subword write, design re-calculate the integrity with full word data and update integrity in the memory. This coverage ensures that memory byte write has been issued and the related design logic has been verfied.