Pseudo SRAM Access from lm32
Introduction
The W968D6DA provides 256Mbit (32MByte) of Pseudo SRAM (
datasheet). It provides 32 bit address width and 16 data lines. Presently, it is only employed in the Exploder5 form factor.
In early 2019 some measurements have been performed which are presented here.
Measurements using SignalTap
As the PSRAM is rarely used, the present Exploder5 VHDL connects it to the so-called device crossbar. Thus, it is not directly connected to the lm32 but access is realized via Wishbone with two crossbars ('top-crossbar', 'device-crossbar') in between. Since measurements via CPU are not useful, measurements have been done using Quartus SignalTap.
Time Values
Table: Access time values for different modes. Numbers are per 32bit word [ns].
Bandwidth
Mode |
Type |
connection |
bandwidth [Mbit/s] |
Comment |
random access (w) |
register --> PSRAM |
WB with 2 x-bars @ 62.5 MHz |
108 |
measured |
random access (r/w) |
copy RAM <-> PSRAM |
WB with 2 x-bars @ 62.5 MHz |
87 |
measured |
random access (r/w) |
copy RAM <-> PSRAM |
WB direct @ 62.5 MHz |
175 |
extrapolated |
burst (r/w) |
N/A |
DMA controller @ 62.5 MHz |
290 |
extrapolated |
burst (r/w) |
N/A |
DMA controller @ 125 MHz |
800 |
theoretical maximum for direct connection DMA/PSRAM |
Table: Bandwidth for different modes and connection modes. Measured values for the present Exploder5 are given in the first two rows.
Figures
Measurements from lm32 CPU
As a complement to the measurements using SignalTap, some numbers have determined using a simple program in the lm32 on a Exploder5.
matrix |
register (w) |
shared RAM (w) |
PSRAM (w) |
register (r) |
N/A |
795 (40) |
99 (320) |
shared RAM (r) |
443 ( 72) |
322 (96) |
N/A |
PSRAM (r) |
95 (336) |
N/A |
51 (624) |
Table: Pseudo SRAM access measured using a lm32 program. Numbers are per 32bit word and given as bandwidth in MBit/s (in brackets: access time in nanoseconds). Explanation see text.
The numbers slightly underestimate the bandwidth (= overestimate the access time), as the overhead for timestamping and handling of 'for loops' is included. The measurement protocol is
attached.
Conclusion
- non-optimized: PSRAM provides random access with about 100MBit/s (368ns per 32bit word).
- optimized:
- using direct connection and a DMA controller will allow random access up to 500 MBit/s (~ 72ns per 32bit word) or 800 MBit/s (~40ns per 32bit word).
- PSRAM is will be as fast as 'direct access' of lm32 to its own shared memory providing a bandwidth of up to 800MBit/s (~40ns).
Verdict
Direct connection and a DMA controller would provide about the same performace as access of lm32 RAM today.
A hardware implementation of true (faster) SRAM does not make much sense.
True SRAM only provide an option, if (at some time in the future) one would migrate to a different softcore architecture.
--
DietrichBeck,
MathiasKreider - 21 Feb 2019