ALERT_HANDLER DV Plan

Goals

  • DV
    • Verify all ALERT_HANDLER IP features by running dynamic simulations with a SV/UVM based testbench
    • Develop and run all tests based on the testplan below towards closing code and functional coverage on the IP and all of its sub-modules
  • FPV
    • Verify TileLink device protocol compliance with an SVA based testbench
    • Verify transmitter and receiver pairs for alert and escalator
    • Partially verify ping_timer

Current status

Design features

For detailed information on ALERT_HANDLER design features, please see the ALERT_HANDLER HWIP technical specification.

Testbench architecture

ALERT_HANDLER testbench has been constructed based on the CIP testbench architecture.

Block diagram

Block diagram

Top level testbench

Top level testbench is located at hw/ip/alert_handler/dv/tb/tb.sv. It instantiates the ALERT_HANDLER DUT module hw/ip/alert_handler/rtl/alert_handler.sv. In addition, it instantiates the following interfaces, connects them to the DUT and sets their handle into uvm_config_db:

In chip level testing, alert_handler testbench environment can be reused with a chip-level paramter package located at hw/$CHIP/ip/alert_handler/dv/alert_handler_env_pkg__params.sv

Common DV utility components

The following utilities provide generic helper tasks and functions to perform activities that are common across the project:

Global types & methods

All common types and methods defined at the package level can be found in alert_handler_env_pkg. Some of them in use are:

  parameter uint NUM_MAX_ESC_SEV = 8;

TL_agent

ALERT_HANDLER testbench instantiates (already handled in CIP base env) tl_agent which provides the ability to drive and independently monitor random traffic via TL host interface into ALERT_HANDLER device.

ALERT_HANDLER Agent

[ALERT_HANDLER agent]:link WIP is used to drive and monitor transmitter and receiver pairs for the alerts and escalators.

UVM RAL Model

The ALERT_HANDLER RAL model is created with the ralgen fusesoc generator script automatically when the simulation is at the build stage.

It can be created manually by invoking regtool:

Stimulus strategy

Test sequences

All test sequences reside in hw/ip/alert_handler/dv/env/seq_lib. The alert_handler_base_vseq virtual sequence is extended from cip_base_vseq and serves as a starting point. All test sequences are extended from alert_handler_base_vseq. It provides commonly used handles, variables, functions and tasks that the test sequences can simple use / call. Some of the most commonly used tasks / functions are as follows:

  • drive_alert: Drive alert_tx signal pairs through alert_esc_if interface
  • read_ecs_status: Readout registers that reflect escalation status, including classa/b/c/d_accum_cnt, classa/b/c/d_esc_cnt, and classa/b/c/d_state

Functional coverage

To ensure high quality constrained random stimulus, it is necessary to develop a functional coverage model. The following covergroups have been developed to prove that the test intent has been adequately met:

  • accum_cnt_cg: Cover number of alerts triggered under the same class
  • esc_sig_length_cg: Cover signal length of each escalation pairs

Self-checking strategy

Scoreboard

The alert_handler_scoreboard is primarily used for end to end checking. It creates the following analysis ports to retrieve the data monitored by corresponding interface agents:

  • tl_a_chan_fifo: tl address channel
  • tl_d_chan_fifo: tl data channel
  • alert_fifo: An array of alert_fifo that connects to corresponding alert_monitors
  • esc_fifo: An array of esc_fifo that connects to corresponding esc_monitors

Alert_handler scoreboard monitors all valid CSR registers, alert handshakes, and escalation handshakes. To ensure certain alert, interrupt, or escalation signals are triggered at the expected time, the alert_handler scoreboard implemented a few counters:

  • intr_cnter_per_class[NUM_ALERT_HANDLER_CLASSES]: Count number of clock cycles that the interrupt bit stays high. If the stored number is larger than the timeout_cyc registers, the corresponding escalation is expected to be triggered
  • accum_cnter_per_class[NUM_ALERT_HANDLER_CLASSES]: Count number of alerts triggered under the same class. If the stored number is larger than the accum_threshold registers, the corresponding escalation is expected to be triggered
  • esc_cnter_per_signal[NUM_ESC_SIGNALS]: Count number of clock cycles that each escalation signal stays high. Compare the counter against phase_cyc registers

The alert_handler scoreboard is parameterized to support different number of classes, alert pairs, and escalation pairs.

Assertions

  • TLUL assertions: The tb/alert_handler_bind.sv binds the tlul_assert assertions to the IP to ensure TileLink interface protocol compliance.
  • Unknown checks on DUT outputs: The RTL has assertions to ensure all outputs are initialized to known values after coming out of reset.

Building and running tests

We are using our in-house developed regression tool for building and running our tests and regressions. Please take a look at the link for detailed information on the usage, capabilities, features and known issues. Here’s how to run a smoke test:

$ $REPO_TOP/util/dvsim/dvsim.py $REPO_TOP/hw/$CHIP/ip/alert_handler/dv/alert_handler_sim_cfg.hjson -i alert_handler_smoke

In this run command, $CHIP can be top_earlgrey, etc.

Testplan

Milestone Name Description Tests
V1 smoke
  • Alert_handler smoke test with one class configured that escalates through all phases after one alert has been triggered
  • Check interrupt pins, alert cause CSR values, escalation pings, and crashdump_o output values
  • Support both synchronous and asynchronous settings
alert_handler_smoke
V1 csr_hw_reset

Verify the reset values as indicated in the RAL specification.

  • Write all CSRs with a random value.
  • Apply reset to the DUT as well as the RAL model.
  • Read each CSR and compare it against the reset value. it is mandatory to replicate this test for each reset that affects all or a subset of the CSRs.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
alert_handler_csr_hw_reset
V1 csr_rw

Verify accessibility of CSRs as indicated in the RAL specification.

  • Loop through each CSR to write it with a random value.
  • Read the CSR back and check for correctness while adhering to its access policies.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
alert_handler_csr_rw
V1 csr_bit_bash

Verify no aliasing within individual bits of a CSR.

  • Walk a 1 through each CSR by flipping 1 bit at a time.
  • Read the CSR back and check for correctness while adhering to its access policies.
  • This verify that writing a specific bit within the CSR did not affect any of the other bits.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
alert_handler_csr_bit_bash
V1 csr_aliasing

Verify no aliasing within the CSR address space.

  • Loop through each CSR to write it with a random value
  • Shuffle and read ALL CSRs back.
  • All CSRs except for the one that was written in this iteration should read back the previous value.
  • The CSR that was written in this iteration is checked for correctness while adhering to its access policies.
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
  • Shuffle the list of CSRs first to remove the effect of ordering.
alert_handler_csr_aliasing
V1 csr_mem_rw_with_rand_reset

Verify random reset during CSR/memory access.

  • Run csr_rw sequence to randomly access CSRs
  • If memory exists, run mem_partial_access in parallel with csr_rw
  • Randomly issue reset and then use hw_reset sequence to check all CSRs are reset to default value
  • It is mandatory to run this test for all available interfaces the CSRs are accessible from.
alert_handler_csr_mem_rw_with_rand_reset
V2 esc_accum

Based on the smoke test, this test will focus on testing the escalation accumulation feature. So all the escalations in the test will be triggered by alert accumulation.

alert_handler_esc_alert_accum
V2 esc_timeout

Based on the smoke test, this test will focus on testing the escalation timeout feature. So all the escalations in the test will be triggered by interrupt timeout.

alert_handler_esc_intr_timeout
V2 entropy

Based on the smoke test, this test enables ping testing, and check if the ping feature correctly pings all devices within certain period of time

alert_handler_entropy
V2 sig_int_fail

This test will randomly inject differential pair failures on alert tx/rx pairs and the escalator tx/rx pairs. Then check if integrity failure alert is triggered and escalated

alert_handler_sig_int_fail
V2 ping_corner_cases

Based on the entropy test, this test will randomly inject ping timeout errors and ping signal integrity errors on alert tx/rx or escalator tx/rx pairs. Once a ping request is detected, the sequence will randomly execute one of the three tasks:

  • Interrupt the ping by a reset
  • Inject alerts without any delay. This task attempts to hit the corner case where esc ping is interrupted by real esc signal
  • Let the ping response finish without any interruption. This taks attempts to hit the ping timeout and ping signal integrity fail corner case Then the sequence will read and check all the alert and esc status registers. Then clear the interrupts. Because alert_handler module's ping interval default value is relative long for simulation, this test shortens this interval in order to hit all the corner cases in a reasonable amount of run time. This test also disables ping related functional coverage to ensure the original LFSR design is able to reach all the alerts and escalators
alert_handler_ping_corner_cases
V2 clk_skew

This test will randomly inject clock skew within the differential pairs. Then check no alert is raised

alert_handler_smoke
V2 random_alerts

Input random alerts and randomly write phase cycles

alert_handler_random_alerts
V2 random_classes

Based on random_alerts test, this test will also randomly enable interrupt classes

alert_handler_random_classes
V2 stress_all

Combine above sequences in one test to run sequentially with the following exclusions:

  • CSR sequences: scoreboard disabled
  • Ping_corner_cases sequence: included reset in the sequence
alert_handler_stress_all
V2 intr_test

Verify common intr_test CSRs that allows SW to mock-inject interrupts.

  • Enable a random set of interrupts by writing random value(s) to intr_enable CSR(s).
  • Randomly "turn on" interrupts by writing random value(s) to intr_test CSR(s).
  • Read all intr_state CSR(s) back to verify that it reflects the same value as what was written to the corresponding intr_test CSR.
  • Check the cfg.intr_vif pins to verify that only the interrupts that were enabled and turned on are set.
  • Clear a random set of interrupts by writing a randomly value to intr_state CSR(s).
  • Repeat the above steps a bunch of times.
alert_handler_intr_test
V2 enable_reg

The CSR test sequences will read and write accessible CSRs including the enable registers and their locked registers. The RAL model supports predicting the correct value of the locked registers based on their enable registers.

alert_handler_csr_rw
alert_handler_csr_bit_bash
alert_handler_csr_aliasing
V2 stress_all_with_rand_reset

This test runs 3 parallel threads - stress_all, tl_errors and random reset. After reset is asserted, the test will read and check all valid CSR registers.

alert_handler_stress_all_with_rand_reset
V2 tl_d_oob_addr_access

Access out of bounds address and verify correctness of response / behavior

alert_handler_tl_errors
V2 tl_d_illegal_access

Drive unsupported requests via TL interface and verify correctness of response / behavior. Below error cases are tested

  • TL-UL protocol error cases
    • Unsupported opcode. e.g a_opcode isn't Get, PutPartialData or PutFullData
    • Mask isn't all active if opcode = PutFullData
    • Mask isn't in enabled lanes, e.g. a_address = 0x00, a_size = 0, a_mask = 'b0010
    • Mask doesn't align with address, e.g. a_address = 0x01, a_mask = 'b0001
    • Address and size aren't aligned, e.g. a_address = 0x01, a_size != 0
    • Size is over 2.
  • OpenTitan defined error cases
    • Access unmapped address, return d_error = 1 when devmode_i == 1
    • Write CSR with unaligned address, e.g. a_address[1:0] != 0
    • Write CSR less than its width, e.g. when CSR is 2 bytes wide, only write 1 byte
    • Write a memory without enabling all lanes (a_mask = '1) if memory doesn't support byte enabled write
    • Read a WO (write-only) memory
alert_handler_tl_errors
V2 tl_d_outstanding_access

Drive back-to-back requests without waiting for response to ensure there is one transaction outstanding within the TL device. Also, verify one outstanding when back- to-back accesses are made to the same address.

alert_handler_csr_hw_reset
alert_handler_csr_rw
alert_handler_csr_aliasing
alert_handler_same_csr_outstanding
V2 tl_d_partial_access

Access CSR with one or more bytes of data For read, expect to return all word value of the CSR For write, enabling bytes should cover all CSR valid fields

alert_handler_csr_hw_reset
alert_handler_csr_rw
alert_handler_csr_aliasing
alert_handler_same_csr_outstanding