CPU Stress Testing using Random Disturbances
Introduction
The performance of modern processors has substantially improved by exploiting techniques like out-of-order execution, on-chip caching, speculative execution, prefetching, and thread switching. These techniques have significantly increased the complexity of processors and verification. Thus, the verification of processors requires a higher quantum of engineering effort and time. Hence, a more innovative and practical approach is needed to verify such complex processors.
Testing CPU under stress is crucial for ensuring CPU’s performance and stability under heavy workloads. When it comes to stress testing the CPU, there are several methods we can use. Here, we will explore stress testing using Random Disturbances. Random Disturbances create interesting stress cases in the system by manipulating certain critical signals in the design.
For instance, without filling up the instruction or load buffer, we can mimic the buffer full condition by forcing 1 on the buffer full signal. We can identify multiple Random Disturbance scenarios in the testbench after analysing the micro-architecture details.
Challenge in Processor Verification
More complex processor bugs require a combination of rare events. For example, an ECC error is reported on the cache at the exact same time as an interrupt happens. In typical random testbenches, the likelihood of all these conditions occurring simultaneously is exceedingly rare, which can result in a hidden bug. It is difficult and time consuming to hit these events from standard testbenches by controlling the test parameters or fine tuning the tests. CPU verification using Random Disturbance is an efficient method in terms of controllability and quality of testing. We can target specific scenarios and we are confident that we can hit those scenarios.
Implementation details
The initial and most important step in the Random Disturbance approach is identifying the scenarios to be generated for targeting the bug. Creating those conditions from the testbench and bringing out the bug is the next step. We can use reconfigurable and reusable UVM verification environment for the simulation. Listed below are the additional components required for Random Disturbance approach. The main advantage of this approach is that we can use the existing testbench and scoreboard and no additional tools or licenses are required.
- Trigger Signal: The implementation of each Random Disturbance in the testbench requires a trigger signal to trigger the forcing at appropriate time. We need to wait for certain conditions to be true before doing the actual injection to ensure that disturbances are injected only in the interesting time windows. The trigger signal for each disturbance can either be probed from DUT using hierarchical path or driven from TB.
- Force signal: Instead of directly forcing the actual RTL signal, we can add one force signal in the RTL for forcing Random Disturbance from the testbench. We can declare this signal in the RTL and assign its value as 0. Below is the implementation of a simple Random Disturbance in the RTL file. We can wrap the RTL edits made for Random Disturbance implementation using #ifdef compiler directive.
For example:
`ifdef RD_SIM
logic tb_force_fifo_full;
assign tb_force_fifo_full = 1’b0;
fifo_full = (fifo_ptr[PTR-1:0] >= full_cnt[PTR-1:0] ) |
tb_force_fifo_full;
`else
fifo_full = (fifo_ptr[PTR-1:0] >= full_cnt[PTR-1:0]);
`endif
As shown above, instead of forcing the fifo_full signal directly from testbench, we can force the tb_force_fifo_full signal from the testbench. While debugging, we can check the value of tb_force_fifo_full to determine fifo_full condition is raised from testbench force or not.
- Random Delay Variable: We can add and randomize the delay variables to control the ON time and OFF time of the Random Disturbances.
- Enable/Disable switches: Based on the EN/DIS arguments, we can turn ON/OFF the Random Disturbances during the simulation.
- Occurrence Counter: We can count and limit the number of occurrences without affecting the simulation time.
- Exclusions: Sometimes enabling two disturbances together is not advised. We can use exclusions to avoid this.
Conclusion
Combining the verification approach using Random Disturbances with our processor verification methodology, we can increase the score of bug complexity level. As the bug complexity increases with the number of events or conditions required to trigger the bug, it can be considered as a measure of the quality of testbench in CPU verification. The higher the complexity score, the more stressful the testbench is. Furthermore, the simulation time to run test sequences and to get a satisfactory verification coverage is a matter of high significance, particularly while verifying pipelined multi thread, multi core processors. This methodology helps to cover unreachable combinations of events or conditions in normal execution.