Advancements In DRAM Technology
INTRODUCTION
In the current era of digital revolution, there is a growing demand for high capacity, high bandwidth, low cost, error free, low power semiconductor memories that occupies minimal area. The existing memory types and technologies are being improved further and new types and technologies are being introduced. DRAM is commonly used in smartphones, tablets, PCs, data centers, cloud, graphics, consumer products, automobiles, etc. The 5G, AI, AR/VR and IoT technologies are also major drivers of the DRAM market.
DRAM
The older DRAMs were asynchronous DRAMs, in which there were no system clocks to synchronize memory accesses. The data transfer between the synchronous system bus and asynchronous memory bus was not efficient and resulted in longer delays. In synchronous DRAMs, the system clock coordinates the memory accesses and provides higher performance than asynchronous DRAMs. Today, synchronous DRAMs are widely used instead of asynchronous DRAMs. DRAMs are protected from bit-flip errors using ECC.
SYNCHRONOUS DRAMs
SDR SDRAM
Single data rate SDRAM can transfer single data per clock cycle. Typical SDR SDRAM clock rates are 66, 100 and 133 MHz, respectively denoted as PC66, PC100 & PC133 and operate at a voltage of 3.3V.
DDR SDRAM
The Double Data Rate SDRAM interface makes higher transfer rates possible by more strict control of the timing of the electrical data and clock signals. The interface transfers data on both positive and negative edge of clock to double data bus bandwidth without a corresponding increase in clock frequency. The name “double data rate” refers to the fact that a DDR SDRAM with a certain clock frequency achieves nearly twice the bandwidth of an SDR SDRAM running at the same clock frequency.
Prefetch architecture in DDR SDRAMs allows to fetch multiple words with a single address request. For example, when a memory read occurs to a row, the prefetch buffer also fetches a set of adjacent words without the need of individual column addresses. The prefetch buffer depth is related to DRAM core frequency and DRAM IO frequency. This feature significantly improves performance due to the principle of locality. Each generation of SDRAM has a different prefetch size. For example, DDR SDRAM’s prefetch buffer size is 2n i.e., 2 data words per memory access.
DDR SDRAM | Clock frequency (MHz) | Prefetch size | Data rate (MT/s) | Voltage |
DDR1 | 100-200 | 2n | 200-400 | 2.5-2.6 |
DDR2 | 200-5331/3 | 4n | 400- 10662/3 | 1.8 |
DDR3 | 400- 10662/3 | 8n | 800- 21331/3 | 1.5/1.35 |
DDR4 | 800-1600 | 8n | 1600-3200 | 1.2/1.05 |
DDR5 | 1600-3600 | 8n | 3200-7200 | 1.1 |
LPDDR SDRAM
Low-Power Double Data Rate SDRAM consumes less power due to various features like low I/O supply voltage, Temperature Compensated Self Refresh, Partial Array Self Refresh, Deep Power Down etc. that make the technology more appropriate for the mobile application and is targeted for mobile computers and devices such as mobile phones. Older variants are also known as Mobile DDR and abbreviated as mDDR. LPDDR technology standards are developed independently of DDR standards. In the “E” versions or the enhanced versions of the specifications, memory array is overclocked for a 33% performance boost.
The supply voltages of different generations of LPDDRs are generally lesser than that of DDRs. This reduces power consumption significantly since switching power is directly proportional to CV2f where C is the Capacitance, V is the Voltage swing, f is the operating frequency. In the Temperature Compensated Self Refresh (TCSR), the refresh rate changes in response to the on-chip temperature sensor, .i.e., at low temperatures, the refresh oscillator frequency decreases and saves power. Partial Array Self Refresh is a specific mode in which refresh is carried out in banks where data retention is required, thus reducing self-refresh current and thereby saves power. The Deep power down mode is the least power mode in which all the memory contents are sacrificed.
DDR SDRAM | Clock frequency (MHz) | Prefetch size | Data rate (MT/s) | Command/Address bus | Voltage |
LPDDR1 | 200, 266.7(LPDDR-1E) | 2n | 400, 533.3(LPDDR-1E) | 19 bits, SDR | 1.8 |
LPDDR2 | 400, 533.3(LPDDR-2E) | 4n | 800, 1067(LPDDR-2E) | 10 bits, DDR | 1.2, 1.8 |
LPDDR3 | 800, 1067(LPDDR-3E) | 8n | 1600, 2133(LPDDR-3E) | 10 bits, DDR | 1.2, 1.8 |
LPDDR4 | 1600 | 16n | 3200 | 6 bits, SDR | 1.1,1.8 |
LPDDR4X | 2133 | 16n | 4267 | 6 bits, SDR | 0.6, 1.1, 1.8 |
LPDDR5 | 3200 | 16n | 6400 | 7 bits, DDR | 0.5,1.05, 1.8 |
LPDDR5X | 4267 | 16n | 8533 | 7 bits, DDR | 0.5,1.05, 1.8 |
GDDR SDRAM
Graphics DDR SDRAM are specifically designed for high bandwidth applications like GPUs. They have wider bus to provide higher bandwidth and performance. GDDR3 has much the same technological base as DDR2, but the power and heat dispersal requirements have been reduced and it has simplified cooling systems. GDDR4 SDRAM introduced DBI (Data Bus Inversion) and Multi-Preamble to reduce data transmission delay. GDDRs are expensive.
Chip Type | Module Type | Memory Clock (MHz) | Transfers/s(GT/s) |
GDDR2 | 500 | ||
64 lanes | GDDR3 | 625 | 2.5 |
64 lanes | GDDR4 | 275 | 2.2 |
64 lanes | GDDR5 | 625-1125 | 5-9 |
64 lanes | GDDR5X | 625-875 | 10-12 |
64 lanes | GDDR6 | 875-1125 | 14-18 |
64 lanes | GDDR6X | 594-656 | 19-21 |
HBM
High Bandwidth Memory (HBM) is a high speed memory interface for 3D-stacked SDRAM, used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter, AI ASICs etc. HBM has higher bandwidth, low power consumption and smaller form factor. This is achieved by stacking up to eight DRAM dies vertically and this also helps to shorten the datapath. HBM memory bus is much wider as compared to other DRAM types. For example, An HBM stack of four DRAM dies with each die having two channels of 128 bits wide provides a total width of 1024 bits. The dies within the stack are interconnected by through-silicon vias (TSVs) and microbumps.
Since there are larger number of connections to the HBM than other DRAM types, new method of connection is used. As HBM occupies less space, it can be placed near GPU or CPU. The memory stack can be connected to the memory controller on a GPU or CPU through a substrate like interposer. Alternatively, the memory die could also be stacked directly on the GPU or CPU chip.
Module Type | Typical interface | Maximum pin bandwidth (Gb/s) | Maximum interface bandwidth (GB/s) | Maximum capacity (GB) |
HBM2/2E | Octal 128-bit channels (1024 bits total) | 2.4-3.6 | 307 -> 461 | 8H stack 8 -> 16 |
HBM3 | 32 32-bit pseudo-channels (1024 bits total) | 4.8-6.4 | 614 -> 819 | 16H Stack 64 |
In February 2021, Samsung announced the development of HBM with processing-in-memory (PIM). A DRAM-optimized AI engine is placed inside each memory bank to enable parallel processing and minimize data movement. Samsung claims this will deliver twice the system performance and reduce energy consumption by more than 70%, while not requiring any hardware or software changes to the rest of the system.
CHALLENGES AND SOLUTIONS
Efficient architectural techniques are necessary to make better use of the available memories. To increase the capacity of DRAMs, memory cells are scaled down to smaller dimensions. Smaller transistors switch faster, use less energy and are cheaper to make. To overcome scaling challenges, we need innovative processes and material development. Efficient memory interface technologies can play an important role to meet growing performance requirements.
Compute Express Link (CXL)
CXL is a highly efficient memory interface technology that eliminates proprietary memory interconnects and enables different processors to share a pool of memory. CXL is an open standard for high-speed central processing unit (CPU)-to-device and CPU-to-memory connections.
The CXL standard defines three separate protocols:
- CXL.io - based on PCIe 5.0 with a few enhancements, it provides configuration, link initialization and management, device discovery and enumeration, interrupts, DMA, and register I/O access using non-coherent loads/stores.
- CXL.cache - allows peripheral devices to coherently access and cache host CPU memory with a low latency request/response interface.
- CXL.mem - allows host CPU to coherently access cached device memory with load/store commands for both volatile (RAM) and persistent non-volatile (flash memory) storage.
CXL is designed to support three primary device types:
- Type 1 (CXL.io and CXL.cache) – specialised accelerators (such as smart NIC) with no local memory. Devices rely on coherent access to host CPU memory.
- Type 2 (CXL.io, CXL.cache and CXL.mem) – general-purpose accelerators (GPU, ASIC or FPGA) with high-performance GDDR or HBM local memory. Devices can coherently access host CPU’s memory and/or provide coherent or non-coherent access to device local memory from the host CPU.
- Type 3 (CXL.io and CXL.mem) – memory expansion boards and storage-class memory. Devices provide host CPU with low-latency access to local DRAM or non-volatile storage.
Persistent Memory
Persistent memory (PMEM) is a solid-state high-performance byte-addressable non-volatile memory device that resides on the DRAM bus and provides nearly the same speed and latency as DRAM. They are much larger in capacity, cheaper and durable when compared to DRAM, and are also cacheable. They are ideal for applications like big data analytics, metadata servers, AI, IoT technologies etc. that require frequent access to large, complex data sets and are sensitive to downtime due to system crashes or power failures. If the working datasets can fit within the capacity of persistent memory and DRAM, the system performance increases significantly.
CONCLUSION
DRAM technology is evolving at a fast pace due to the immense growth of memory-intensive applications. Semiconductor giants have already started working on next-generation memory standards with very high data rates and those are expected to hit the market in the coming years.