An2548 Introduction To Dma Controller For stm32 Mcus Stmicroelectronics
An2548 Introduction To Dma Controller For stm32 Mcus Stmicroelectronics
An2548 Introduction To Dma Controller For stm32 Mcus Stmicroelectronics
Application note
Introduction
Every STM32 family microcontroller features at least one DMA controller intended to offload some data transfer duties from the
Cortex® CPU core. This document describes general guidelines about the usage of the basic DMA controller found in most
entry-level, mainstream, and low-power STM32 products. The goal is to explain the bus sharing principles and provide hints on
efficient usage of the DMA transfer.
1 General information
The STM32F0/F1/F3/Cx/Gx/Lx/U0 series 32-bit microcontrollers are based on the Arm® Cortex®‑M processor.
Note: Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
2 Bus bandwidth
Bus bandwidth is defined as the amount of data that the bus can transfer in a fixed amount of time, typically per
second. It is determined by the clock speed, the bus width, and the bus management overhead. Any STM32
product considered by this document features a 32‑bit data width. The clock speed is configurable by the user.
The following section provides information about the bus overhead management, and is presenting the bus
topology.
Figure 1. DMA block diagram for products with DMA and DMA2
DMA2 is only present on more complex products (see Table 2. Differences in DMA implementation for details).
CPU
Memories AHB
Bus matrix
APB APB Peripheral
DMA
AHB Peripheral
DT48598V2
DMA2
2.3 AHB
A product has one or more AHB buses, connecting the AHB peripherals to the bus matrix. In some documents all
the connections to the bus matrix or within it are referred as AHB, but in this document AHB is referred as a 32‑bit
wide connection between the bus matrix and the peripherals. Most peripherals are attached to the APB. Only a
selected few are directly on the AHB (for example SDIO, AES).
An AHB bus does not provide the data (aka layer) parallelism of the bus matrix, but it runs on the same system
clock speed, and provide moderate bandwidth, thanks to its pipelined data/address protocol.
When the CPU initiates a data transfer meanwhile the DMA is transferring a block of data from a source to a
destination, the round-robin AHB bus matrix halts the DMA bus access and inserts the CPU access, causing the
DMA transfer to have a longer latency.
2.4 APB
A product has one or more 32‑bit APB buses.
A DMA data transfer from or to an APB peripheral is first crossing the bus matrix, and the AHB to APB bridge.
Within an APB bus, any peripheral is competing with each other and a transfer can occur when the bus is idle or
ready.
An APB bus is meant to connect and share several APB peripherals with low bandwidth requirements. APB clock
speeds can typically be tuned down from the AHB speed using configurable clock dividers. A high divider ratio
yields lower power consumption, but at the cost of lower bandwidth and higher latency. Moreover, the APB buses
are connected to AHB using an AHB to APB bridge. Latency added by the AHB:APB bridges is prolonging the
bus access period, also reducing the bandwidth usable on the AHB bus.
3.1 DMAMUX
In STM32C0, STM32G0, STM32G4, STM32L5, STM32L4Rx, and STM32L4Sx products, the DMA capabilities are
enhanced by a DMA request multiplexer (DMAMUX). DMAMUX adds a fully configurable routing of any DMA
request from a given peripheral in DMA mode to any DMA channel of the DMA controllers. DMAMUX does not
add any clock cycle between the DMA request sent by the peripheral and the DMA request received by the
configured DMA channel. It features synchronization of DMA requests using dedicated inputs. DMAMUX is also
capable of generating requests from its own trigger inputs or by software.
More detailed information about the multiplexer can be found in the respective reference manuals (RM0432,
RM0438, RM0440, RM0444, RM0454, RM0490).
4 DMA controller
A DMA controller consists of an arbiter part, assigning channels to the DMA AHB master part. In products where
DMAMUX is implemented, channels are assigned to peripherals freely. Typically the channels are preconfigured
for different types of block-based data transfers used in the application and activated during application execution
as needed. A block‑based data transfer consists of a programmed number of data transfers from a source to a
destination, as well as a programmed incremented or fixed addressing, start address and data width, each being
independent for the source and the destination. The configuration is programmed via the AHB slave port of the
DMA controller.
Memory
Cortex CPU core AHB Peripherals:
AES, CRC...
BusMatrix
Ch.1
Bridge
Ch.2
Arbiter
Ch.n
Peripherals:
DMA
ADC, SPI, USART,
AHB slave req/ack TIM, DAC...
DT48599V2
There are two distinct types of transfer available:
• Memory-to-peripheral or peripheral-to-memory
When a peripheral is configured in DMA mode, each transferred data is autonomously managed at
hardware, and data level between the DMA and the related peripheral via a dedicated DMA request/
acknowledge handshake protocol. The mapping of the different DMA request signals, from a given
peripheral to a given DMA channel, is listed either inside the DMA section of the reference manual, or in
the DMAMUX implementation section of the RM0432, RM0444, and RM0490 product reference manual.
• Memory-to-memory
The transfer requires no extra control signal, it is activated by software. A channel is allocated by software,
for example to initialize a large RAM data block from flash memory. It is then likely to compete for access to
the flash memory with CPU instruction fetching. In any case, DMA arbitration between channels is
reconsidered between every transferred data.
5 DMA latency
A latency describes the delay between the request activation of the DMA data transfer and the actual completion
of the request. The request is either hardware‑driven from a peripheral or is software‑driven when a channel is
enabled.
As a result, having more active DMA channels improves the bus bandwidth usage, but leads to higher latency as
a consequence of the highest priority channel DMA arbitration scheme.
It can be noted that a same transfer but in the opposite direction (that is, peripheral‑to‑memory) would give a
same latency for the DMA data transfer.
Request Request
AHB bus access Ack AHB bus access Ack
arbitration arbitration
Request
DT49500V1
AHB bus access Ack
arbitration
In case not only two channels, but two DMA controllers are used (in products that offer this possibility), two DMA
transfers can be processed in parallel, as long as they are not conflicting within the bus matrix, not accessing the
same slave device. For example, when using STM32L486, a DMA1 transfer from SRAM1 to AES can access the
bus matrix simultaneously with DMA2 transfer from flash memory to any APB communication interface. No
conflict and arbitration occurs.
A similar case of parallelism is available in case the product features usable peripherals on separate buses. If the
UART interface is available on two different APB bridges, the chances of two parallel transfers competing for
system resources diminish.
DMA1 req
CHANNEL 1
DT49501V1
Delay to serve the request
The longest possible delay depends on the number of masters accessing the resource as well as the number of
channels configured on that particular DMA master. For example, in a simple case of minimum latency per each
bus access, the time for which the bus is occupied equals three clock cycles. If the CPU core and two DMA
controllers attempt to access the same AHB resource, the arbitration mechanism provides DMA controller access
to the AHB after a maximum of six clock cycles. If more channels are configured on the DMA controller, only one
can be serviced in a time slot. That yields a latency of 18 clock cycles for the channel 1 of the DMA1 as illustrated
in Figure 4.
The usage of the buses must be carefully considered when designing an application. When the danger of
overloading the bus capability is underestimated, several problems may arise:
• Overrun: if incoming data is not read from the peripheral register before a next data arrival, the peripheral
rises an overrun flag and data may be lost. This is a typical problem with serial interfaces such as UART or
SPI. Refer to the peripheral documentation for more details.
• Data lost without overrun flag: possible, for example, in case a GPIO port is configured as a parallel
communication port.
• Pauses in transmission: if Tx data are delayed, the communication interface is stalled for a short time. This
is a typical problem of high‑speed communication interfaces such as SPI.
• ADC/DAC sampling timing problem. This problem is less likely due to typically lower sampling speeds, but
not totally impossible.
It is recommended to carefully examine which of the required peripherals are hooked up on which bus, and
choose bus frequencies to match the projected bandwidth plus some safety margin.
Caution: The reasonable and recommended safety margin for this occasion is 50%, leaving one‑third of the total bus
capacity in reserve.
The needed bus bandwidth is to be computed based on the DMA transfer data rate and a fixed 32‑bit bus width,
independently from the programmed DMA data width of the transfer from the source and to the destination.
For example, for a 2 Mbaud 8‑bit USART reception a 250 K transfer bandwidth is necessary, because while the
internal bus is 32‑bit wide, only 8 bits are used at a time.
Note: The priority scheme of the DMA arbiter always looks for pending request from another channel at data transfer
completion. Then it switches to the channel with the pending request of the highest priority. There is no round
robin here and transfers with lower priority may never be served if the DMA is kept busy.
It is still better to assign priorities to minimize latency within the round for selected channels.
Caution: Reserve highest priority for high-speed input, slave mode peripherals, gradually decreasing the priority level
down to least priority for low‑speed output communication or those in master mode.
7 DMA overview
Table 2 summarizes differences related to the DMA implementation and usage in the products addressed by this
document.
Common notable features:
• Circular buffer support
• Programmable transfer size data up to 65535
• Configurable event and interrupt flags for full transfer, half transfer, and error
• Internal data size conversion, allowing 8‑bit communication interface data transferred to/from 32‑bit
memory in packed format.
DMAMUX
4 channels 4 channels 4 channels 4 channels 4 channels
channels - - - - - - -
21 inputs 23 inputs 21 inputs 26 inputs 23 inputs
inputs
Architecture Von Neumann Harvard
Flash
32 32
memory
32 bits 64 bits 32 bits 32/64 bits + 32/64 bits + 64 bits
interface
32 bits 32 bits
connection
CPU core M0 M0+ M3 M4 M4 M3 M4 M33
AHB:APB
2 3 2 3 2
bridge clock
8 Conclusion
When correctly used, the DMA controller provides the capability to increase the product efficiency by supporting
the application to handle data transfer tasks that would otherwise require a more powerful CPU. This document
provides different common DMA features, performances, and guidelines, which are related to the usage of such a
DMA controller.
Beyond this document, the user must read the reference manual of the specific product, to understand its DMA
specific aspects, especially the system architecture and the memory mapping, the list and mapping of the DMA
requests from the peripherals to the DMA channels. Based on this information, the user must accordingly
distribute such tasks over the product resources, such as DMA channels, AHB/APB buses, peripherals/memories,
and possibly (two) DMA controllers.
Revision history
Table 3. Document revision history
Contents
1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Bus bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
2.1 Bus architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Bus matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 AHB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 APB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Product specific features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1 DMAMUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.2 Presence of TrustZone® security features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 DMA controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
5 DMA latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
5.1 DMA transfer timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1.1 AHB peripheral and system volatile memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1.2 APB peripheral and system volatile memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Total service time and parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.3 Sharing the bus matrix with CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.4 Impact of APB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Insufficient resources threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
7 DMA overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13
List of tables
Table 1. Applicable products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Table 2. Differences in DMA implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 3. Document revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
List of figures
Figure 1. DMA block diagram for products with DMA and DMA2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Figure 2. DMA block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Figure 3. Two DMA channels on AHB bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Figure 4. Delay to serve the request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9