AN2548 Application Note: Using The STM32F0/F1/Lx Series DMA Controller
AN2548 Application Note: Using The STM32F0/F1/Lx Series DMA Controller
AN2548 Application Note: Using The STM32F0/F1/Lx Series DMA Controller
Application note
Using the STM32F0/F1/Lx Series DMA controller
Introduction
Every STM32 family microcontroller features at least one DMA controller intended to offload
some data transfer duties from the Cortex CPU core. This document describes general
guidelines about the usage of the basic DMA controller found in most entry-level,
mainstream and low-power STM32 products. The goal is to explain the bus sharing
principles and provide hints on efficient usage of the DMA transfer.
Reference documents
• STM32F0x reference manuals (RM0091, RM0360)
• STM32F1x reference manuals (RM0008, RM0041)
• STM32L0x reference manuals (RM0367, RM0376, RM0377, RM0451)
• STM32L1x reference manual (RM0038)
• STM32L4x reference manuals (RM0392, RM0393, RM0394, RM0395, RM0411,
RM0432)
• Using the STM32F0xx DMA controller (AN4104)
Contents
1 General information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Bus bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Bus architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Bus matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 AHB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 APB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 DMAMUX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 DMA controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 DMA latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1 DMA transfer timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.1.1 AHB peripheral and system volatile memory . . . . . . . . . . . . . . . . . . . . . . 8
4.1.2 APB peripheral and system volatile memory . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Total service time and parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Sharing the bus matrix with CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.4 Impact of APB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Revision history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
List of tables
List of figures
Figure 1. DMA block diagram for products with 2 DMAs (DMA and DMA2) . . . . . . . . . . . . . . . . . . . . 5
Figure 2. DMA block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Figure 3. Two DMA channels on AHB bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Figure 4. Delay to serve the request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1 General information
The STM32F0/F1/Lx Series 32-bit microcontrollers are based on the Arm® Cortex®-M
processor.
2 Bus bandwidth
Bus bandwidth is defined as amount of data that the bus can transfer in a fixed amount of
time, typically per second. It is determined by the clock speed, the bus width and the bus
management overhead. Any STM32 considered product by this document features a 32-bit
data width. The clock speed is configurable by the user. The following section provides
information about the bus management overhead, and is presenting the bus topology.
Figure 1. DMA block diagram for products with 2 DMAs (DMA and DMA2)
&38
0HPRULHV $+%
%XVPDWUL[
$3% $3%3HULSKHUDO
'0$
$+%3HULSKHUDO
'0$
06Y9
1. DMA2 is only present on more complex products (see Table 1 for details).
2.3 AHB
A product may have one or more AHB buses, connecting the AHB peripherals to the bus
matrix. In some documents all the connections to the bus matrix or within it are referred as
AHB, but in this document AHB is referred as 32bit wide connection between the bus matrix
and the peripherals. Most peripherals are attached to the APB, only selected few are directly
on the AHB (for example SDIO, AES).
An AHB bus does not provide the data (aka layer) parallelism of the bus matrix, but it runs
on the same system clock speed, and provide moderate bandwidth, thanks to its pipelined
data/address protocol.
When the CPU initiates a data transfer meanwhile the DMA is transferring a block of data
from a source to a destination, the round-robin AHB bus matrix halts the DMA bus access
and inserts the CPU access, causing the DMA transfer to have a longer latency.
2.4 APB
A product may have one or more 32bit APB buses.
A DMA data transfer from or to an APB peripheral is first crossing the bus matrix, and the
AHB to APB bridge. Within an APB bus, any peripheral is competing with each other and a
transfer can occur when the bus is idle or ready.
An APB bus is meant to connect and share several APB peripherals with low bandwidth
requirements. APB clock speeds can typically be tuned down from the AHB speed using
configurable clock dividers. High divider ratio yields lower power consumption, but at cost of
lower bandwidth and higher latency. Moreover, the APB buses are connected to AHB using
an AHB to APB bridge. Latency added by the AHB:APB bridges is prolonging the bus
access period, reducing also the bandwidth usable on the AHB bus.
2.5 DMAMUX
In STM32L4Rx and STM32L4Sx products the DMA capabilities are enhanced by a DMA
request multiplexer (DMAMUX). DMAMUX adds a fully configurable routing of any DMA
request from a given peripheral in DMA mode to any DMA channel of the DMA controller(s).
DMAMUX does not add any clock cycle between the DMA request sent by the peripheral
and the DMA request received by the configured DMA channel. It features synchronization
of DMA requests using dedicated inputs. DMAMUX is also capable of generating requests
from own trigger inputs or by software.
More detailed information about multiplexer can be found in the respective reference
manual (RM0432).
3 DMA controller
A DMA controller consists of an arbiter part, assigning channels to the DMA AHB master
part. In products where DMAMUX is implemented, channels may be assigned to peripherals
freely. Typically the channels are preconfigured for different types of block-based data
transfers used in the application and then activated during application execution as needed.
A block-based data transfer consists of a programmed number of data transfers from a
source to a destination, as well as a programmed incremented or fixed addressing, start
address and data width, each being independent for the source and the destination. The
configuration is programmed via the AHB slave port of the DMA controller.
0HPRU\
&RUWH[&38FRUH $+%3HULSKHUDOV
$(6&5&
%XV0DWUL[
&K
%ULGJH
&K
$UELWHU
&KQ
3HULSKHUDOV
'0$
$'&63,86$57
$+%VODYH UHTDFN 7,0'$&
06Y9
4 DMA latency
A latency describes the delay between the request activation of the DMA data transfer and
the actual completion of the request. The request is either hardware-driven from a
peripheral or is software-driven when channel is enabled.
where:
• tA is the arbitration time, including address computation
tA = minimum 2 AHB clock cycles, if no higher priority transfer is pending.
• tRD is the peripheral read access time
tRD = 2 APB clock cycles as the base minimum, one extra AHB cycle for bridge
synchronization if APB frequency is lower than AHB. See chapter 3.4 for additional details.
• tWR is the SRAM write access time
tWR = same as in AHB case, 1 or 2 AHB cycles.
When the DMA is idle or after the third step has completed on one channel, the DMA arbiter
compares the priorities of any pending DMA request (a request may be hardware requested
or software requested). At every data transfer completion, the DMA arbiter examines if there
is at least one pending request from another channel. If so, the DMA arbitration switches
and selects the most priority channel with a pending request for another data transfer.
As a result, having more active DMA channels improves the bus bandwidth usage, but may
lead to higher latency as a consequence of the highest priority channel DMA arbitration
scheme.
It can be noted that a same transfer but in the opposite direction (i.e. peripheral-to-memory)
would give a same latency for the DMA data transfer.
+&/.
5HTXHVW 5HTXHVW
&+ $+%EXVDFFHVV $FN $+%EXVDFFHVV $FN
DUELWUDWLRQ DUELWUDWLRQ
5HTXHVW
&+ $+%EXVDFFHVV $FN
DUELWUDWLRQ
06Y9
In case not only two channels, but two DMA controllers are used (in products that offer this
possibility), two DMA transfers can be processed in parallel, as long as they are not
conflicting within the bus matrix, not accessing the same slave device. For example when
using STM32L486, a DMA1 transfer from SRAM1 to AES can access the bus matrix
simultaneously with DMA2 transfer from Flash to any APB communication interface. No
conflict and arbitration occurs.
Similar case of parallelism is available in case the product features usable peripherals on
separate buses. If UART interface is available on two different APB bridges, the chances of
two parallel transfers competing for system resources diminish.
'0$UHT
&+$11(/
'HOD\WRVHUYHWKHUHTXHVW
06Y9
The longest possible delay depends on the number of masters accessing the resource as
well as the number of channels configured on that particular DMA master. For example in
simple case of minimum latency per each bus access, the time for which the bus is occupied
equals 3 clock cycles. If the CPU core and 2 DMA controllers all attempt to access the same
AHB resource, the arbitration mechanism will provide DMA controller access to the AHB
after a maximum of 6 clock cycles. If more channels are configured on the DMA controller,
only one can be serviced in a time slot. That yields latency of 18 clock cycles for the
channel1 of the DMA1 as illustrated in Figure 4: Delay to serve the request
Usage of the buses must be carefully considered when designing an application. When the
danger of overloading the bus capability is underestimated, several problems may arise:
1. Overrun – If incoming data is not read from the peripheral register before next data
arrival, the peripheral may rise an overrun flag and data may be lost. This is a typical
problem with serial interfaces such as UART or SPI. Refer to the peripheral
documentation for more details.
2. Data lost without overrun flag – possible for example in case a GPIO port is configured
as a parallel communication port.
3. Pauses in transmission – if Tx data are delayed, the communication interface may be
stalled for a short time. This is a typical problem of high speed communication interface
such as SPI
4. ADC/DAC sampling timing problem. This problem is less likely due to typically lower
sampling speeds, but not totally impossible.
It is advised to carefully examine which of the required peripherals are hooked up on which
bus, and choose bus frequencies to match the projected bandwidth plus some safety
margin.
Caution: Reasonable and recommended safety margin for this occasion is 50%, leaving one
third of the total bus capacity in reserve.
The needed bus bandwidth is to be computed based on the DMA transfer data rate and a
fixed 32-bit bus width, independently from the programmed DMA data width of the transfer
from the source and to the destination. For example for a 2Mbaud 8-bit USART reception a
250ktransfers bandwidth is necessary, because while the internal bus is 32bit wide, only 8
bits are used at a time.
Note that the priority scheme of the DMA arbiter will always look for pending request from
another channel at data transfer completion. Then it will switch to the channel with pending
request of the highest priority. There is no round robin here and transfers with lower priority
may never be served if the DMA is kept busy.
It is still better to assign priorities to minimize latency within the round for selected channels.
Caution: Reserve highest priority for high speed input, slave mode peripherals, gradually
decreasing the priority level down to least priority for low speed output
communication or those in master mode.
Table 1 summarizes differences related to the DMA implementation and usage in the
products addressed by this document.
Common notable features:
• Circular buffer support
• Programmable transfer size data up to 65535
• Configurable event and interrupt flags for full transfer, half transfer and error.
• Internal data size conversion, allowing 8 bit communication interface data transferred
to/from 32 bit memory in packed format.
DMA1 7 5 7 5 7 7 7 7 7
DMA2 - - 5 - - 5 5 7 7
4
DMAMUX - - - - - - - - channels,
26 inputs
architectur
Von Neumann Harvard
e
flash 32 / 64 32 / 64 32 + 32 32 + 32
32 bits 32 bits 32 bits 32 bits 32 bits
interface bits bits bits bits
CPU M0 M0 M0 M0+ M0+ M3 M3 M4 M4
AHB:APB
bridge 2 2 2 2 2 3 3 2 2
clock
7 Conclusion
When correctly used, the DMA controller provides the capability to increase the product
efficiency by supporting the application to handle data transfer tasks that would otherwise
require a more powerful CPU. This document provides different common DMA features,
performances and guidelines which are related to the usage of such a DMA controller.
Beyond this document, the user shall read the reference manual of the specific product, in
order to understand its DMA specific aspects especially the system architecture and the
memory mapping, the list and mapping of the DMA requests from the peripherals to the
DMA channels. Based on this information, the user should accordingly distribute such tasks
over the product resources, resources being DMA channels, AHB/APB buses,
peripherals/memories, and possibly (two) DMA controllers.
Revision history
STMicroelectronics NV and its subsidiaries (“ST”) reserve the right to make changes, corrections, enhancements, modifications, and
improvements to ST products and/or to this document at any time without notice. Purchasers should obtain the latest relevant information on
ST products before placing orders. ST products are sold pursuant to ST’s terms and conditions of sale in place at the time of order
acknowledgement.
Purchasers are solely responsible for the choice, selection, and use of ST products and ST assumes no liability for application assistance or
the design of Purchasers’ products.
Resale of ST products with provisions different from the information set forth herein shall void any warranty granted by ST for such product.
ST and the ST logo are trademarks of ST. All other product or service names are the property of their respective owners.
Information in this document supersedes and replaces information previously supplied in any prior versions of this document.