Integration Workshop
Integration Workshop
Integration Workshop
Important Notice
Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to
discontinue any product or service without notice, and advise customers to obtain the latest version of
relevant information to verify, before placing orders, that information being relied on is current and
complete. All products are sold subject to the terms and conditions of sale supplied at the time of order
acknowledgment, including those pertaining to warranty, patent infringement, and limitation of liability.
TI warrants performance of its semiconductor products to the specifications applicable at the time of sale in
accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the
extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not
necessarily performed, except those mandated by government requirements.
In order to minimize risks associated with the customer’s applications, adequate design and operating
safeguards must be provided by the customer to minimize inherent or procedural hazards.
TI assumes no liability for applications assistance or customer product design. TI does not
warrant or represent that any license, either express or implied, is granted under any patent right,
copyright, mask work right, or other intellectual property right of TI covering or relating to any
combination, machine, or process in which such semiconductor products or services might be or
are used. TI’s publication of information regarding any third party’s products or services does not
constitute TI’s approval, warranty or endorsement thereof.
Revision History
November 2001 – Revision 0.1 (ALPHA)
March 2002 – Revision 0.8 (BETA)
April 2002 – Revision 1.0
May 2002 – Revision 1.1
June 2002 – Revision 1.2
October 2003 – Revision 2.0
April 2005 – Revision 2.1 (added Analog Interfacing – Mod 6.5)
August 2005 – Revision 3.1a (update to CCS 3.1, SIO/IOM, errata fixes)
Mailing Address
Texas Instruments
Training Technical Organization
7839 Churchill Way, M/S 3984
Dallas, Texas 75251-1903
Workshop Introduction
What Will You Accomplish This Week?
When you leave the workshop at the end of the week, you should be able to perform certain tasks
and make critical assessments and decisions about the C6000s’ capabilities. We developed this
list based on customer feedback over the past 5 years and our own workshop design experience
spanning the past 25 years. All of the modules exercises and labs support these accomplishments
(as you’ll see when we discuss the workshop’s agenda).
The first two accomplishments are really the overall objectives of the entire workshop. Many
students attend the workshop to meet these two needs. The rest of the list supports these two
objectives and provides more insight into the expected outcomes. We hope this list meets or
exceeds most of your expectations. If you think about it, we’re going through the equivalent of a
college semester course in 4 days! We obviously can’t discuss everything given the time
limitations, but we have provided the fastest path toward understanding, using and becoming
confident in these activities.
T TO
Technical Training
Organization
So, if you’re need falls “inside the box”, be prepared to ask questions when the topic comes up. If
your need falls “outside the box”…
Regarding DSP Theory, we will not cover topics such as IIR/FIR filters, convolution, FFTs, and
the rest of the topics addressed by the numerous DSP theory books and college courses. We
assume that you know this theory if need to apply it. Our job is to show you how to use the
device to accomplish these tasks (i.e. the CPU and peripherals) – instead of spending time
showing the theory. We do not have time to dive into any one specific algorithm – such as PID,
servo, VSELP, GSM, Viterbi, etc. If we did, it’d probably not be the one you wanted. We do
provide details about on-chip hardware peripherals, which you can apply to the various
hardware/software applications, required by your system – we just don’t intend to show the
details of any specific application.
Control response to real-time events using interrupts
Configure peripherals to communicate with various devices
software applications
Use DSP/BIOS APIs to perform various tasks in the system as
well as analyze results
Integrate an XDAIS application into your system
Detailed ASM programming
Use the bootloader and flash programming tools to create a and Code Optimization
standalone system
Understand other C6000 capabilities: EMIF, cache, HPI
Architectural details
We’ve had to make some decisions about the material in the workshop based on time and what
makes sense for all users. Many app notes have been written (and are available on the TI web site
at http://www.dspvillage.com) which cover, in detail, many of the topics we cannot here. So, if
you’re need falls “outside the box” (i.e. in addition to the accomplishment list discussed
previously), then you have two options: (1) ask the instructor if a manual or app note is available
which addresses your specific issue; or, (2) let the instructor know before or after class time – we
might be able to shed some light or direct you to other resources. If you communicate your need
then we will do our best to fulfill it.
Workshop Outline
On the first day of the workshop, you will be developing an audio application that requires you to
set up the C6000 DMA and McBSP to send and receive audio from the PC. So, you get to hear
“something” in the speakers by the end of the day. On Day 2, you will increase the complexity of
the system by modifying your application to use a double-buffer instead of a single buffer. You
will also be adding other threads to the system beyond the audio path and integrating a fully
compliant XDAIS algorithm. On Days 3 and 4, we will cover many other system issues including
EMIF, boot, cache, HPI. By the end of the workshop, you will be able to burn your application
into the DSK’s flash memory and boot from power-on reset disconnected from CCS. Wow!
Workshop Outline
Day 1 Day 3
1. Introduction 9. DSP/BIOS Scheduling
2. Code Composer Studio 10. Advanced Memory Mgmt.
3. Basic Memory Management 11. Integrating a XDAIS
4. Using the EDMA (Intro to CSL) Compliant Algorithm
Note: The outline describes which day each module should fall within. Please understand,
though, that each class moves at it’s own pace, therefore, you may find the daily breakout
differs in your workshop from that described above.
Introductions
Learning more about you, your application, and your experience will help your instructor tailor
the materials to the class needs. This is important since there is more information than can be
taught during a single week.
Introduce Yourself
Briefly, a little about your application:
Name & Company
Application
Which C6000 DSP do you plan to use?
And, a little about your experience:
Do you have experience with:
TI DSP’s (TMS320)
Another DSP
Other microprocessors
C, Assembly, or both
Have you used an OS or RTOS?
T TO
Technical Training
Organization
Performance
Interfacing
Power
Size
These needs challenge the designer with a series of tradeoffs. For example, while performance is
important in a portable MP3 player, more important would be efficiency of power dissipation and
board space. On the other hand, a cellular base station might require higher performance to
maximize the number of channels handled by each processor.
Wouldn’t it be nice if the fastest DSP consumed the lowest amount of power? While TI is
working on providing this (and making it software compatible), it provides you with a broad
assortment of DSP families to cover a varying set of system needs. Think of them as different
shoes for different chores …
TI DSP Families
TI provides a variety of DSP families to handle the tradeoffs in system requirements.
The TMS320C2000 (‘C2000) family of devices is well suited to lower cost, microcontroller-
oriented solutions. They are well suited to users who need a bit more performance than today’s
microcontrollers are able to provide, but still need the control-oriented peripherals and low cost.
The ‘C5000 family is the model of processor efficiency. While they boast incredible performance
numbers, they provide this with just as incredible low power dissipation. No wonder they are the
favorites in most wireless phones, internet audio, and digital cameras (just to name a few).
Rounding out the offerings, the ‘C6000 family provides the absolute maximum performance
offered in DSP. Couple this with its phenomenal C compiler and you have one fast, easy-to-
program DSP. When performance or time-to-market counts, this is the family to choose. It also
happens to be the family the course was designed around, thus, the rest of the workshop will
concentrate only on it.
‘C6000 Roadmap
The ‘C6000 family has grown considerably over the past few years. With the addition of the 2nd
generation of devices (‘C64x), performance has increased yet again.
C6000 Roadmap
Object Code Software Compatibility
Floating
Floating Point
Point
Multi-core
Multi-core C64x™
C64x ™ DSP
DSP
1.1
1.1 GHz
GHz
2nd Generation
C6416
C6416
C6414
C6414
C6412
C6412 C6415
C6415 DM642
DM642
C6411
C6411
t ce
es an
i gh orm
H rf
1st Generation Pe
C6203 C6713
C6713
C6202 C6204 C6205
C6201
C6211
C6701 C6711 C6712
C62x:
C62x:Fixed
FixedPoint
Point
T TO C67x:
C67x:Floating
FloatingPoint
Point
Technical Training
Organization
Yet, the ease of design within the ‘C6000 architecture has not been abandoned with its growing
family of devices. Software compatibility is addressed by the architecture, rather than by the
hard-work of the programmer. With both the ‘C67x and ‘C64x devices being able to run ‘C62x
object code, upgrading DSP’s is much easier.
Production
720, 850 MHz C645x
90nm Production and 1 GHz C6416T Next
2Q 2005 C6415T
Announcement
C6414T
In Development
Up to C6416
Future 720 MHz
C6415
C6414 Breakthrough
Performance
C6418
ce
an
rm C64x+™
rfo
Pe Next
gh C6413
C6202 Hi C6412
C6201
C6411 C6410
C6203 C6204 Value
mance
Perfor
C6205 C6211
C6701 Po C6712D
150 MHz
167 MHz C6711
150 MHz C6712
ti ng 100 MHz
F l oa
VC33
60/75 MHz
C31
C31/C32 80 MHz
60 MHz
Time
Additional Information
For More Information and Support
For support we suggest you try TI’s web site first. Then call your local support – either your local
TI representative or Authorized Distributor Sales/FAE. Finally, here are a few other places to go
for support and information:
T TO
Technical Training
Organization
In Europe …
European Product Information Center (EPIC)
Web: http://www-k.ext.ti.com/sc/technical_support/pic/euro.htm
Email: [email protected]
Key TI Manuals
Key C6000 Manuals
Hardware
SPRU189 - CPU and Instruction Set Ref. Guide
SPRU190 - Peripherals Ref. Guide
SPRZ122 - SPRU190 Manual Update Sheet (important!)
SPRU401 - Peripherals Chip Support Lib. Ref.
SPRU609 - C67x Two-Level Internal Memory Reference
SPRU610 - C64x Two-Level Internal Memory Reference
SPRU656 - Cache Memory Users Guide
Software
SPRU198 - Programmer’s Guide
SPRU423 - C6000 DSP/BIOS User’s Guide
SPRU403 - C6000 DSP/BIOS API Guide
Code Generation Tools
SPRU186 - Assembly Language Tools User’s Guide
SPRU187 - Optimizing C Compiler User’s Guide
T TO
Technical Training
Organization
Refer to the C6000 Product Update handout for full list
TI DSP Workshops
DSP Workshops Available from TI
Attend another workshop:
4-day C2000 Workshops
4-day C5000 Integration Workshops
4-day C6000 Integration Workshop
4-day C6000 Optimization Workshop
4-day DSP/BIOS Workshop
4-day OMAP Software Workshop
1-day Workshops (C2000, C5000, C6000)
1-day Reference Frameworks and XDAIS
Sign up at:
http://www.ti.com/sc/training
T TO
Technical Training
Organization
C6000 Hardware
CPU Architecture & Pipeline Details 9
Using Peripherals (EDMA, McBSP, EMIF, HPI, XBUS) 9
Tools
Compiler Optimizer, Assembly Optimizer, Profiler, PBC 9
CSL, Hex6x, Absolute Lister, Flashburn, BSL 9
Administrative Details
Administrative Topics
What you have in front of you
Name Cards
Sign-in Sheet
Refreshments
Facilities
Phones
Lunch
Cell Phones – please silence them
T TO
Technical Training
Organization
Introduction
This chapter introduces the TMS320C6000 (C6000) DSP architecture and peripherals as well as
the C6416 and C6713 DSP Starter Kit’s (DSK’s).
The chapter ends with a simple lab to setup the (DSK) and Code Composer Studio (CCS). We
like to start small and easy and then build to much more complicated topics and exercises later.
Learning Objectives
Introduction to the:
• C6000 CPU Architecture
• C6000 Peripherals
• C6000 DSK’s
Chapter Topics
C6000 Introduction ................................................................................................................................... 1-1
In its simplest form, most DSP systems receive data from an ADC (analog to digital converter).
The data is processed by the Digital Signal Processor (also called DSP) and the results are then
transformed back to analog to be output. Digitizing the analog signal (by evaluating it to a
number on a periodic basis) and the subsequent numerical (a.k.a. digital) analysis provides a more
reliable and efficient means of manipulating the signal vs. performing the manipulation in the
analog domain. With the growing interest in multimedia, the demand for DSPs to process the
various media signals is growing exponentially.
x Y
ADC DSP DAC
Σ
A
Y = coeffi * xi
i = 1
T TO
Technical Training
Organization
While interest in DSP is constantly growing today, the DSProcessor grew out of TI over 20 years
ago in its educational products group, namely the Speak and Spell. These products demanded
speech synthesis and other traditional DSProcessing (like filters) but with quick time-to-market
constraints.
The heart of DSP algorithms hasn’t changed from the early days of TI DSP; they still rely on the
fundamental difference equation (shown above). Often this equation is referred to as a MAC
(multiply-accumulate) or SOP (sum-of-products). TI has concentrated for years on providing
solutions to MAC based algorithms. The wide variety of TI DSPs is a testament to this focus,
even with the widely varying system tradeoffs discussed earlier.
For the ‘C6000 to achieve its goal, TI wanted to provide record setting performance while coding
with the universal ANSI C language.
Ease of C Programming
Even using natural C, the ‘C6000 Architecture can
perform 2 to 4 MACs per cycle
Compiler generates 80-100% efficient code
T TO
Technical Training
Organization
How does the ‘C6000 achieve such performance from C?
TI ‘C6000 devices deliver 200 to 4000 MMACs of performance, where MMAC is mega-MAC or
millions of MACs. It's stellar performance, in any case. When this can be achieved using C code,
it's even better. While providing efficiency ratings for a compiler is difficult, TI has benchmarked
a large number of common DSP kernels to provide an example of the compiler’s efficiency -
please visit the TI website for more information and benchmarking examples.
C6000 Architecture
CPU Architecture Overview
How does the ‘C6000 deliver its performance, the CPU is built to dispatch 8 instructions per
cycle – and the cycle rates run as fast as about 1 ns.
T TO Controller/Decoder
Controller/Decoder
Technical Training
Organization
The following example demonstrates the capability of the ‘C6000 architecture. Specifically, the
‘C67x floating-point DSP can execute these eight instructions in parallel, allowing two single-
precision floating point MACs to be performed in just one processor cycle. Oh, and all that from
ordinary C code.
The C64x devices provide tremendous Multiply-Accumulate performance. Not only are they
running at frequencies 2-3 times faster than other C6000 processors, but each of the multiply
units can now perform two 16x16 multiplies plus a 32-bit add in one cycle. This is accomplished
by the DOTP2 assembly instruction
n1 n0 B5
;** --------------------------------------------------*
= ; PIPED LOOP KERNEL
LOOP: ADD .L2 B8,B6,B6
m1*n1 + m0*n0 A6 || ADD .L1 A6,A7,A7
|| DOTP2 .M2X B4,A4,B8
|| DOTP2 .M1X B5,A5,A6
+ || [ B0] B .S1 LOOP
|| [ B0] SUB .S2 B0,-1,B0
running sum A7 || LDDW .D2T2 *B7++,B5:B4
|| LDDW .D1T1 *A3++,A5:A4
;** --------------------------------------------------*
T TO How many multiplies can the ‘C6x perform?
Technical Training
Organization
MMAC’s
How many 16-bit MMACs (millions of MACs per second)
can the 'C6201 perform?
400 MMACs (two .M units x 200 MHz)
2 .M units
x 2 16-bit MACs (per .M unit / per cycle)
x 1 GHz
----------------
4000 MMACs
32 GB/s
266 MB/s EMIF 16 32 GB/s
L2 Memory
TM
C64x
12.5 MB/s McBSP 1 CPU Core
2.9 GB/s
or
5760 MIPS
100 MB/s Utopia
Utopia 22
16 GB/s
16 GB/s
12.5 MB/s McBSP 2
L1D Cache
133 MB/s HPI32
T TO
Technical Training
Organization How does the DSP fit into a system?
Note: While we have been looking into the C6415, you can extrapolate these same concepts to
other C6000 device types. All device types have multiple, fast, internal buses. Most have
a dual-level memory architecture, while a few have a single-level, flat memory.
EDMA
PCI 32
/ PCI McBSP Serial Codec
Boot
Host μP /
16 or 32
HPI Loader EMIF EMAC Ethernet
16, 32, or 64-bits (TCP/IP stack avail)
Video Ports
Sync
EPROM
DM64x SDRAM SRAM
T TO Note: Not all ‘C6000 devices have all the various peripherals shown above.
Technical Training
Organization Please refer to the C6000 Product Update for a device-by-device listing.
Let’s quickly look at each of these connections beginning with VCP/TCP and working counter-
clockwise around the diagram.
Timer / Counters
• Two (or three) 32-bit timer/counters
• Use as a Counter (counting pulses from input pin)
or as a Timer (counting internal clock pulses)
• Can generate:
− Interrupts to CPU
− Events to DMA/EDMA
− Pulse or toggle-value on output pin
• Each timer/counter as both input and output pin
DMA: Offers four fully configurable channels (additional channel for the HPI), Event
synchronization, Split mode for use with McBSP, and Address/count reload
EDMA: Enhanced DMA (EDMA) offers 16 fully configurable channels (64 channels on
‘C64x devices), Event synchronization, Channel linking, and Channel auto-
initialization.
Boot Loader
• After reset but before the CPU begins running code, the “Boot Loader” can be configured
to either:
− Automatically copy code and data into on-chip memory
− Allow a host system (via HPI, XBUS, or PCI) to read/write code and data into the
C6000’s internal and external memory
− Do nothing and let the CPU immediately begin execution from address zero
• Boot mode pins allow configuration
• Please refer to the C6000 Peripherals Guide and each device’s data sheet for the modes
allowed for each specific device.
Ethernet
• 10/100 Ethernet interface
• To conserve cost, size and power – Ethernet pins are muxed with PCI
(you can use one or the other)
• Optimized TCP/IP stack available from TI (under license)
McASP
• All McBSP features plus more …
• Targeted for multi-channel audio applications such as surround sound systems
− Up to 8 stereo lines (16 channels) -
supported by 16 serial data pins configurable as transmit or receive
− Throughput: 192 kHz (all pins carrying stereo data simultaneously)
• Transmit formats:
− Multi-pin IIS for audio interface
− Multi-pin DIT for digital interfaces
• Receive format:
− Multi-pin IIS for audio interface
• Available on C6713 and DM642 devices.
Utopia
• For connection to ATM (async transfer mode)
• Utopia 2 slave interface
• 50 MHz wide area network connectivity
• Byte wide interface
• Available on ‘C64x devices
PLL
• On-chip PLL provides clock multiplication. The ‘C6000 family can run at one or more
times the provided input clock. This reduces cost and electrical interference (EMI).
• Clock modes are pin configurable.
• On most devices, along with the Clock Mode (configuration) pins, there are three other
clock pins:
− CLKIN: clock input pin
− CLKOUT: clock output from the PLL (multiplied rate)
− CLKOUT2: a reduced rate clockout. Usually ½ or less of CLKOUT
Please check the datasheet for the pins, pin names, and CKKOUT2 rates available for
your device.
• Here are the PLL rates for a sample of C6000 device types:
Device Clock Mode Pins PLL Rate
C6201
C6204
CLKMODE x1, x4
C6205
C6701
CLKMODE0
C6202 x1, x4, x6, x7,
CLKMODE1
C6203 x8, x9, x10, x11
CLKMODE2
C6211
C6711 CLKMODE x1, x4
C6712
C6414
CLKMODE0
C6415 x1, x6, x12
CLKMODE1
C6416
Power Down
• While not shown in the previous diagram, the ‘C6000 supports power down modes to
significantly reduce overall system power.
For more detailed information on these peripherals, refer to the ‘C6000 Peripherals Guide.
C6000 DSK’s
Overview
Here’s a detailed look at the DSK board and its primary features:
C6416T DSK
T TO
Technical Training
Organization
Diagnostic Utility included with DSK ...
Daughter-Card I/F
The daughter card sockets included on the DSK are similar to those found on other the
C5000/C6000 DSKs and EVMs available from Texas Instruments. Thus, any work (by you or
any 3rd Party) applied to daughter card development can be reused with the DSK. If you’re
interested in designing a daughter card for the DSK/EVM, check the TI website for an application
note which describes it in detail.
Block Diagram
Here’s a block diagram view of the C6416 DSK.
C6416 DSK
The C6713 would be almost exactly the same. (We pulled this diagram from the C6416 help file.
Look in the C6713 help file <CCS Help menu> to find a similar diagram for that platform.)
Test/Diagnose
DSK hardware
Verify USB
emulation link
Use Advanced
tests to facilitate
debugging
Reset DSK
hardware
Memory Map
The following memory-map describes the memory resources designed into the ‘C6416 DSK.
8000_0000
EMIFA CE0: 256MB SDRAM: 16MB
9000_0000
EMIFA CE1: 256MB
A000_0000
EMIFA CE2: 256MB
B000_0000
Daughter Card
EMIFA CE3: 256MB
T TO
Technical Training
Organization
The left map describes the resources available on the ‘C6416 DSP, the right map details how the
external memory resources were used on the DSK.
Software
Code Composer Studio
SD Diagnostic Utility
Example Programs
Lab 1
Hardware Software
1. Hook up the DSK 1. Run Diagnostic Utility
2. Supply power and 2. Run CCS Setup
observe POST
3. Start CCS
4. Configure CCS Options
5. Close CCS
CCS
T TO
Technical Training
Organization
Time: 20 minutes
T TO
Technical Training
Organization
Computer Login
1. If the computer is not already logged-on, check to see if the log-on information is posted on
the workstation. If not, please ask your instructor.
Note: After plugging in the USB cable, if a message appears indicating that the USB driver
needs to be installed, put the CCS CD from the DSK into the CD-ROM drive and allow
the driver to be installed. In most classroom installations, this has already been completed
for you.
)
Note: Make sure you insert the audio source and headphone plugs all the way into their
respective sockets. Failing to do this may allow audio to short from the input to the
output. While this may not hurt the board, it will prevent you from effectively evaluating
your DSP code.
4. Plug the AC power cord into the power supply and AC source.
Note: Power cable must be plugged into AC source prior to plugging the 5 Volt DC output
connector into the DSK.
5. Plug the power cable into the board. (note: when the POST runs in the next step and you have
the earpiece in your ear, it will HURT!)
6. When power is applied to the board, the Power On Self Test (POST) will run. LEDs 0-3 will
flash. When the POST is complete all LEDs blink on and off then stay on.
Hint: At this point, if you were installing the DSK for the first time on your own machine you
would now finish the USB driver installation. We have already done this for you on our
classroom PC’s.
Note: If using the C6713 DSK, the title on this icon will differ accordingly.
CCS Setup
While Code Composer Studio (CCS) has been installed, you will need to assure it is setup
properly. CCS can be used with various TI processors – such as the C6000 and C5000 families –
and each of these has various target-boards (simulators, EVMs, DSKs, and XDS emulators).
Code Composer Studio must be properly configured using the CCS_Setup application.
In this workshop, you should initially configure CCS to use either the C6713 DSK or the C6416
V1.1 DSK. Between you and your lab partner, choose one of the DSK’s and the appropriate
driver. In any case, the learning objectives will be the same whichever target you choose.
8. Start the CCS Setup utility using its desktop icon:
Be aware there are two CCS icons, one for setup, and the other to start the CCS application.
You want the Setup CCS C6000 icon.
The setup program <cc_setup.exe> is installed to the hard drive for both the full and DSK versions of CCS,
although the desktop icon and Start menu shortcut are only added when installing the full version of CCS.
For your convenience, during installation of the workshop labs and solutions an icon for CCS Setup was
placed on the desktop. If, for some unexpected reason, this icon has been deleted, you can find and run the
program from:
c:\ti\cc\bin\cc_setup.exe (where “\ti\” is the directory you installed CCS)
9. When you open CC_Setup you should see a screen similar to this:
Note: If you don’t see the Import Configuration dialog box, you should open it from the menu
using File → Import…
11. Select a new configuration from the list and click the “Import” button.
If you are using the C6416 DSK in this workshop, please choose the C6416 V1.1 DSK:
64
67
If you are using the C6713 DSK in this workshop, please choose the C6713 DSK:
67
Here are a couple options that can help make debugging easier.
− Unless you want the Disassembly
Window popping up every time you load a
program (which annoys many folks),
deselect this option.
− Many find it convenient to choose the “Perform Go Main automatically”. Whenever
a program is loaded the debugger will automatically run thru the compilers
initialization code to your main() function.
Conceptually, the CCS Integrated Development Environment (IDE) is made up of two parts:
• Edit (and Build) programs (uses editor and code gen tools to create code).
• Debug (and Load) programs (communicates with DSP/simulator to download/run code.
The Load Program After Build option automatically loads the program (.out file) created
when you build a project. If you disabled this automatic feature, you would have to manually
load the program via the File→Load Program menu.
Note: You might even think of IDE as standing for Integrated Debugger Editor, since those are
the two basic modes of the tool
Note: To reach this tab of the “Customize” dialog box, you may have to scroll to the right using
the arrows in the upper right corner of the dialog.
You’re Done
*** can you explain why you’re reading a blank page? ***
Note: Don’t worry if it takes a few seconds to perform Test 2 (External SDRAM test). It can
take a while to test all the SDRAM memory included on the DSK. (Of course, if it takes
more than 15-30 seconds, then there might be a problem.)
DSK Help
This file describes the board design, its schematics, and how the DSK utilities work.
DSK Help
T TO
Technical Training
Organization
Introduction
The importance of the C language in DSP systems has grown significantly over the past few
years. TI has responded by creating an efficient silicon and compiler architecture to provide
efficient C performance. Additionally, TI has worked hard to provide easy-to-use software
development tools.
Using these tools, all it takes is a couple of minutes to get your C code running on the 'C6000.
That's the goal of this module: compile, debug, and graph a simple C sine-wave routine.
Learning Objectives
Outline
Code Composer Studio (CCS)
Projects
Build Options
Build Configurations
Configuration Tool
C – Data Types and Header Files
Lab 2
Module Topics
Using Code Composer Studio................................................................................................................... 2-1
Standard SIM
Compiler Runtime
Asm Opto Libraries
DSK
.out
Edit Asm Link Debug
EVM
DSP/BIOS DSP/BIOS
Config Third
Tool Libraries Party
When TI developed Code Composer Studio, it added a number of capabilities to the environment.
First of all, the code generation tools (compiler, assembler, and linker) were added so that you
wouldn’t have to purchase them separately. Secondly, the simulator was included (only in the full
version of CCS, though). Third, TI has included DSP/BIOS. DSP/BIOS is a real-time kernel
consisting of three main features: a real-time, pre-emptive scheduler; real-time capture and
analysis; and finally, real-time I/O.
Finally, CCS has been built around an extensible software architecture which allows third-parties
to build new functionality via plug-ins. See the TI website for a listing of 3rd parties already
developing for CCS.
Asm
Optimizer
Link.cmd
.sa
Compiler
If you want to use your own extensions for file names, they can be redefined with code generation
tool options. Please refer to the TMS320C6000 Assembly Tools Users Guide for the appropriate
options.
Projects
Code Composer works with a project paradigm. If you’ve done code development with most any
sophisticated IDE (Microsoft, Borland, etc.), you’ve no doubt run across the concept of projects.
Essentially, within CCS you create a project for each executable program you wish to create.
Projects store all the information required to build the executable. For example, it lists things like:
the source files, the header files, the target system’s memory-map, and program build options.
What is a Project?
Project (.PJT) file contain:
References to files:
Source
Libraries
Linker, etc …
Project settings:
Compiler Options
DSP/BIOS
Linking, etc …
The project information is stored in a .PJT file, which is created and maintained by CCS. To
create a new project, you need to select the Project:New… menu item.
Along with the main Project menu, you can also manage open projects using the right-click
popup menu. Either of these menus allows you to Add Files… to a project. Of course, you can
also drag-n-drop files onto the project from Windows Explorer.
Right-Click Menu
There are many other project management options. In the preceding graphic we’ve listed a few of
the most commonly used actions:
• If your project team builds code outside the CCS environment, you may find Export
Makefile (and/or Source Control) useful.
• CCS now allows you to keep multiple projects open simultaneously. Use the Set as Active
Project menu option or the project drop-down to choose which one is active.
• If you like digging below the surface, you’ll find that the .PJT file is simply an ASCII text
file. Open for Editing opens this file within the CCS text editor.
• Configurations… and Options… are covered in detail, next.
Build Options
Project options direct the code generation tools (i.e. compiler, assembler, linker) to create code
according to your system’s needs. Do you need to logically debug your system, improve
performance, and/or minimize code size? Your C6000 results can be dramatically affected by
compiler options.
There are probably about a 100 options available for the compiler alone. Usually, this is a bit
intimidating to wade through. To that end, we’ve provided a condensed set of options. These few
options cover about 80% of most users needs.
As you probably learned in college programming courses, you should probably follow a two-step
process when creating code:
• Write your code and debug its logical correctness (without optimization).
• Next, optimize your code and verify it still performs as expected.
As demonstrated above, certain options are ideal for debugging, but others work best to create
highly optimized code. When you create a new project, CCS creates two sets of build options –
called Configurations: one called Debug, the other Release (you might think of as Optimize).
Configurations will be explored in the next section.
Note: As with any compiler or toolset, learning the various options requires a bit of
experimentation, but it pays off in the tremendous performance gains that can be
achieved by the compiler.
There is a one-to-one relationship between the items in the text box and the GUI check and drop-
down box selections. Once you have mastered the various options, you’ll probably find yourself
just typing in the options.
The two main differences between the Debug and Release configurations:
• Debug uses the –g option to enable source-level debugging
• Release invokes the optimizer with –o3 (and doesn’t use –g)
Note: $(Proj_dir) indicates the current project directory. This aids in project portability. See
SPRA913 (Portable CCS Projects) for more information.
The following graphic summarizes the default configurations for a project called “modem”.
Additionally, it shows how to:
• Select the configuration before building your project
• Add or Remove configurations from a project (Project→Configurations… menu)
Note: The examples shown here are for a C67x DSP, hence the –mv6700 option.
Linker Options
Options Description
-o<filename> Output file name
-m<filename> Map file name
-c Auto-initialize global/static C variables
-x Exhaustively read libs (resolve back ref's)
By default, linker options
include the –o option
We recommend you add
-c -m "$(Proj_dir)\Debug\lab.map" -o"$(Proj_dir)\De the –m option
“$(Proj_dir)\Debug\"
indicates one subfolder
level below project (.pjt)
location
Run-time Autoinit (-c) tells
$(Proj_dir)\Debug\lab.out
compiler to initialize
$(Proj_dir)\Debug\lab.map global/static variables
Run-time Autoinitialization before calling main()
Autoinit discussed in Ch 3
Configuration Tool
The DSP/BIOS Configuration Tool (often called Config Tool or GUI Tool or GUI) creates and
modifies a system file called the Configuration DataBase (.CDB). If we talk about using CDB
files, we’re also talking about using the Config Tool.
The following figure shows a CDB file opened within the configuration tool:
When you add a CDB file to your project, CCS automatically adds the C and assembly
(.S62) files to the project under the Generated Files folder. (You must manually add the
CMD file, yourself.)
• Many of the CDB objects will be discussed in this workshop. To get all the details on this
tool, though, we recommend you attend the 4-day DSP/BIOS Workshop.
Here are a few guidelines to keep in mind regarding C data types on the C6000:
1. Use short types for integer multiplication. As with most fixed-point DSPs, our ‘C62x devices
use a 16x16 integer multiplier. If you specify an int multiply, a software function in the
runtime support library will be called. (Note, the ‘C67x devices do have a 32x32→64-bit
multiply instruction, MPYID.)
2. Use int types for counters and indexes. As we examine during the next chapter, all registers
and data paths are 32-bits wide.
3. Avoid accidentally mixing long and int variables. Many compilers allocate 32-bits for both
types, thus some users interchange these types. The ‘C6000 allocates longs at 40-bits to take
advantage of 40-bit hardware within the CPU. If you mix types, the compiler may be forced
to manage this – which will most likely cost you some performance.
Why 40-bits? The extra 8-bits are often used to provide headroom in integer operations. Also,
they can act like an 8-bit “carry bit”.
4. On ‘C67x devices, 32-bit float operations are performed in hardware. The ‘C6000 supports
IEEE 32-bit floating-point math.
5. The double precision floating-point hardware supports IEEE 64-bit floating-point math.
6. Pointers, at 32-bits, can reach across the entire ‘C6000 memory-map.
#include "sine.h"
#include "edma.h"
Note: You will find that in this lab, the code is working VERY inefficiently. Using the proper
optimization techniques (later in the workshop), you will experience vast improvements
in the code’s performance.
A block sine-wave generator function creates data samples which we can then graph. The block
sine-wave generator function is a basic for loop that uses the following routine to generate
individual sine values:
short sineGen() {
y[0] = y[1] * A - y[2];
y[2] = y[1];
y[1] = y[0];
return((short)(28000*y[0]);
}
The algorithm used in the workshop is similar to that shown above. It uses a monostable IIR filter
to generate a sine wave.
The lab’s version of the sine-wave generator, though, provides an sine initialization function
which calculates the value for A and y[1] based on the tone & sampling frequencies.
There are many ways to create sine values, we have chosen this simple IIR based model.While
generating a sine wave using a table is probably more MIPs efficient, this method is more
memory efficient. Also, since this function calculates each sine wave value, it gives the processor
some “work” to perform.
main.c
For your convenience, we've provided a print out of the code that you will be starting with on the
next few pages.
/*
* ======== main.c ========
* This file contains all the functions for Lab2 except
* SINE_init() and SINE_blockFill().
*/
/*
* ======== Include files ========
*/
#include "sine.h"
/*
* ======== Declarations ========
*/
#define BUFFSIZE 32
/*
* ======== Prototypes ========
*/
/*
* ======== Global Variables ========
*/
short gBuf[BUFFSIZE];
SINE_Obj sineObj;
/*
* ======== main ========
*/
void main()
{
SINE_init(&sineObj, 256, 8 * 1024);
sine.h
/*
* ======== sine.h ========
* This file contains prototypes for all functions
* contained in sine.c
*/
#ifndef SINE_Obj
typedef struct {
float freqTone;
float freqSampRate;
float a;
float b;
float y0;
float y1;
float y2;
float count;
float aInitVal;
float bInitVal;
float y0InitVal;
float y1InitVal;
float y2InitVal;
float countInitVal;
} SINE_Obj;
#endif
sine.c
// ======== sine.c ========
// The coefficient A and the three initial values
// generate a 200Hz tone (sine wave) when running
// at a sample rate of 48KHz.
//
// Even though the calculations are done in floating
// point, this function returns a short value since
// this is what's needed by a 16-bit codec (DAC).
if(freqTone == NULL)
sineObj->freqTone = 200;
else
sineObj->freqTone = freqTone;
if(freqSampRate == NULL)
sineObj->freqSampRate = 48 * 1024;
else
sineObj->freqSampRate = freqSampRate;
sineObj->a = 2 * cosf(rad);
sineObj->b = -1;
sineObj->y0 = 0;
sineObj->y1 = sinf(rad);
sineObj->y2 = 0;
sineObj->count = sineObj->freqTone * sineObj->freqSampRate;
sine.c (continued)
sineObj->aInitVal = sineObj->a;
sineObj->bInitVal = sineObj->b;
sineObj->y0InitVal = sineObj->y0;
sineObj->y1InitVal = sineObj->y1;
sineObj->y2InitVal = sineObj->y2;
sineObj->countInitVal = sineObj->count;
}
sine.c (continued)
// ======== sineGen ========
// Generate a single sine wave value
static short sineGen(SINE_Obj *sineObj)
{
float result; if (sineObj->count > 0) {
sineObj->count = sineObj->count - 1;
}
else {
sineObj->a = sineObj->aInitVal;
sineObj->b = sineObj->bInitVal;
sineObj->y0 = sineObj->y0InitVal;
sineObj->y1 = sineObj->y1InitVal;
sineObj->y2 = sineObj->y2InitVal;
sineObj->count = sineObj->countInitVal;
}
Lab 2 Procedure
Start CCS
1. Start CCS using the desktop icon
67
If using the C6713
DSK, this should
say TMS320C67XX
Note: Make sure that the location is correct. If you need to change it, you can either type it in or
browse to it by clicking on the box next to it.
When the dialog box appears, select the dsk6416.cdb (or dsk6713.cdb) template and click
OK.
67
If using the C6713
DSK, choose the
“dsk6713.cdb” file
Hint: In some TI classrooms you may see two or more tabs of CDB templates; e.g. TMS62xx,
TMS54xx, etc. If you experience this, just choose the ‘C6x tab.
C:\iw6000\labs\audioapp\audioapp.cdb
Then, close the CDB Config Tool.
The CDB files shown in the aforementioned dialog box are called “seed” CDB files. CDB
files are used to configure a great many objects. Of these, quite a few are board specific; e.g.
type of DSP, MHz, etc. To make life easier, TI provides a seed file with all boards it ships.
• main.c
• audioapp.cdb
• sine.c
Note: You may need to change the "Files of Type" box at the bottom of the Open Dialog Box to
see all of the files. We recommend that you choose "All Files" so that you can add
everything at once.
Click the + sign next to Source in the Project Window to make sure your source (*.c) files
were added successfully. Also, click the + sign next to DSP/BIOS Config to make sure the
.CDB file is displayed.
Notice that the text box at the top of the Build Options window reflects all of the currently
selected options. Click OK to close the Build Options dialog when you’re finished.
Watch Variables
Now that we have the program built and loaded, let's take a closer look at it using the tools
provided by CCS.
14. Add gBuf to the Watch window.
Select and highlight the variable gBuf in the main.c window. Right-click on gBuf and
choose Add to Watch Window.
Note: the value shown for gBuf may differ from that shown below.
After adding a variable, the Watch window automatically opens and gBuf is added to it.
Alternatively, you could have opened the watch window, selected gBuf, and drag-n-dropped
it onto the Watch 1 window.
Click on the + sign next to gBuf to see the individual elements of the array.
Note: At some point, if the Watch window shows an error “unknown identifier” for a variable,
don’t worry, it's probably due to the variable’s scope. Local variables do not exist (and
don’t have a value) until their function is called. If requested, Code Composer will add
local variables to the Watch window, but will indicate they aren’t valid until the
appropriate function is reached.
Setting Breakpoints
18. Set a break point.
Set a break point on the while loop in main( ). Breakpoints can be set in 3 different ways.
Choose the one you like best and set the breakpoint:
• Place the cursor on the end brace of the while() loop and click on the:
• Right-click on the line with the end brace and choose Toggle Breakpoint
• Double-click in the grey area next to the end brace (as shown below):
Running Code
19. Run your code.
Run the code up to the breakpoint. There are 3 different ways to cause CCS to run your code:
The values that are red are the values that have changed with the last update, which occurred
when your code hit the breakpoint.
Note: The workspace includes the current open project. So, when you retrieve the workspace, it
will retrieve the project. If you don’t wish to save the project info with the workspace,
close the project before saving your workspace.
Graphing Data
21. Graph your sine data.
The watch window is a great way to view data in CCS. But, can you tell if this is really a sine
wave? Wouldn’t it be better to see this data graphed? Well, CCS allows us to do this. Select:
View → Graph → Time/Frequency
You’re Done with the main lab. Please inform your facilitator before
moving on to the optional labs
Optional Exercises
If you still have some more time, give these simple exercises a try.
• Lab 2a – Customize CCS
• Lab 2b – Using GEL Scripts
• Lab 2c – Fixed vs. Float
CCS lets you remap many of these functions. Let’s try remapping Restart.
1. Start CCS if it isn’t already open.
2. Open the CCS customization dialog.
Option → Customize…
4. Scroll down in the Commands list box to find Debug → Restart and select it.
c:\iw6000\labs
c:\iw6000\labs\ws.w
GEL Scripting
GEL:
GEL: General
GeneralExtension
ExtensionLanguage
Language
CCstyle
stylesyntax
syntax
Large
Largenumber
numberofofdebugger
debugger
commands
commands as GELfunctions
as GEL functions
Write
Writeyour
yourown
ownfunctions
functions
Create
CreateGEL
GELmenu
menuitems
items
File → Save
We chose the name mygel.gel.
Help → Contents
Select the Index tab and type the word “GEL”.
5. Create a submenu item to clear our arrays
The menuitem command that we used in the previous step will place the title “My GEL
Functions” under the GEL menu in CCS. When you select this menu item, we want to be able
to select different operations. Submenu items are created with the hotmenu command.
Enter the following into your GEL file to create a submenu item to clear the memory array:
(Don’t forget the semicolon – as with C, it’s important!)
hotmenu ClearArray()
{
GEL_MemoryFill(gBuf, 0, 16, 0x0);
}
8. Before trying our GEL scripts, let’s show the gBuf array in Memory window.
Without looking at the arrays, it will be hard to see the effect of our scripts. Let’s open a
Memory window to view gBuf.
View → Memory…
Title: gBuf
Address: gBuf
Q-Value: 0
Format: 16-bit hex – TI style
Hint: If you modify a loaded GEL file, before you can use the modifications you must reload it.
The easiest way to reload a GEL file:
(1) Right-click the GEL file in the CCS Project Explorer window
(2) Pick Reload from the right-click popup menu
The method used to solve overflow in this application is often called Q-math. Maybe a better
name for it is fractional, fixed-point math. The beauty of fractions is that when multiplied
together, their value gets smaller. Hence the result is always bounded (i.e. no overflow).
The problem with integer math is not confined to TI DSPs (or DSPs in general), rather it is a side
affect between the fact that integer numbers get bigger when add or multiply them and that the C
language provides no means of handling overflow for signed numbers. In fact, the C language
leaves signed math that overflows undefined – every compiler writer can handle it however they
want (so much for portability).
The dynamic range of floating-point variables sure makes life easier. It’s why many folks choose
floating-point to decrease their engineering time (and get to market more quickly). Of course, this
is why the C6713 is so popular – as it’s designed to do floating-point math in hardware.
You will find LAB2c_6416.PJT or LAB2c_6713.PJT already built in the LAB2c folder:
C:\iw6000\labs\lab2c\
Try running the project and comparing all three results in three different graphs. To simplify
setting up the graph windows, try using one of the provided workspaces: C6416.wks or
C6713.wks located in C:\iw6000\labs\lab2c\.
Lab Debrief
Lab 2 Debrief
1. What differences are there in Lab2 between
the C6713 and C6416 solutions?
2. What do we need CCS Setup for?
3. Did you find the “clearArrays” GEL menu
command useful?
Optional Topics
Optional Topic: CCS Automation
As evidenced by the optional lab exercise, CCS ships provides scripting/automation tools. They
are mentioned here to make you aware of their presence. To explore them further, please examine
the online documentation.
GEL Scripting
GEL Scripting
GEL:
GEL: General
GeneralExtension
ExtensionLanguage
Language
CCstyle syntax
style syntax
Large
Largenumber
numberofofdebugger
debugger
commands
commands as GELfunctions
as GEL functions
Write
Writeyour
yourown
ownfunctions
functions
Create
CreateGEL
GELmenu
menuitems
items
Notice the GEL folder in the Project View window. You can load/unload GEL scripts by right-
clicking this window.
GEL syntax is very C-like. Notice that QuickTest() calls LED_cycle(), defined earlier in the file. (This
happens to be a C6711 DSK GEL script.)
You can add items to the GEL menu. An example is shown in the above graphic.
Finally, a GEL file can be loaded upon starting CCS. The startup GEL script is specified using the
CCS Setup application.
Command Window
For those of you ‘ol timers, who remember the old command line debugging tools, you can use
the same commands you’ve used for years.
The Command Window is available inside CCS under Tools → Command Window.
CCS Scripting
CCS Scripting is a CCS plug-in. After installing CCS on your PC, you should use the Update
Advisor feature (available from the Help menu) to download and add the CCS Scripting plug-in.
Hint: You may find other useful tools, application notes, and plug-ins available via the CCS
Update Advisor.
CCS scripting provides a method of controlling the CCS debugger from another scripting
language. Any Microsoft COM (i.e. OLE) compliant language should be able to use the CCS
Scripting library, but VB Script and Perl are the two languages for which examples are provided.
CCS Scripting
Debug
Debugusing
usingVB VBScript
ScriptororPerl
Perl
Using
Using CCS Scripting,aasimple
CCS Scripting, simplescript
scriptcan:
can:
Start CCS
Start CCS
Load
Loadaafile
file
Read/write
Read/writememory
memory
Set/clear
Set/clearbreakpoints
breakpoints
Run,
Run,and
andperform
performother
otherbasic
basicdebug
debug
functions
functions
Among other things, CCS Scripting is very useful for testing purposes. For example, if you have
a number of test vectors you would like to run against your system, you can use CCS Scripting to
automate this process. Your script could then:
• Build
• Run
• Capture data, memory values, benchmarks
• And compare the results against what you expect (or hope)
• Over and over again …
At this time, the CCS Scripting Plug-in (v1.2) only ships with C5000 based examples. For your
convenience, we have written and included some C6000 based examples along with the workshop
lab files.
/*
/* make
make all
all prog
prog objects
objects JavaScript
JavaScript global
global vars
vars */
*/
utils.getProgObjs(prog);
utils.getProgObjs(prog);
/*
/* Create
Create Memory
Memory Object
Object */
*/
var
var myMem
myMem == MEM.create("myMem");
MEM.create("myMem");
myMem.base
myMem.base == 0x00000000;
0x00000000;
myMem.len
myMem.len == 0x00100000;
0x00100000;
myMem.space = “data";
myMem.space = “data";
/* generate cfg files (and CDB file) */
prog.gen(); •• Textual
Textualway
waytotocreate
createand
andconfigure
configure
CDB files
CDB files
•• Runs
Runsononboth
bothPCPCand
andUnix
Unix
•• Create
Create #include typefiles
#include type files(.tci)
(.tci)
•• More
Moreflexible
flexiblethan
thanConfig
ConfigTool
Tool
Some users find ‘writing code’ preferable to using the Graphical User Interface (GUI) of the
Configuration Tool. This is especially true for users who build their code in the Unix
environment, as there is no Unix version of the GUI.
*** we’re not sure why this page is blank – please inform your instructor ***
Introduction
Memory management involves:
• Defining system memory requirements
• Describing the available memory map to the linker
• Allocating code and data sections using the linker
The latter two, along with the C6000 memory architecture are covered in this chapter.
Defining memory requirements is very application specific and therefore, is outside the scope of
this workshop. If you have any questions regarding this, please discuss these during a break with
your instructor.
Learning Objectives
Outline
C6416 Memory Architecture
C6713 Memory Architecture
Section → Memory Placement
T TO
Technical Training
Organization
Module Topics
Basic Memory Management..................................................................................................................... 3-1
Level 1 consists of two 16K-byte cache memories, one program, the other for data. Since these
memories are only configurable as cache they do not show up in the memory map. (Cache is
discussed further in an upcoming chapter.)
Level 2 memory consists of 1M bytes of RAM – and up to 256K bytes can be made cache. (If a
segment is configured as cache, it doesn’t show up in the memory map.) This is a unified
memory, that is, it can hold code or data.
L2 RAM EMIFB
CPU
Prog/Data EMIFA
Data
Cache
FFFF_FFFF
T TO
Technical Training
Organization
FFFF_FFFF
A Memory Map is a
table representation
of memory… 8000_0000 1GB CE0
9000_0000 1GB CE1
A000_0000 1GB CE2
B000_0000 1GB CE3
T TO
Technical Training
Organization
8000_0000
EMIFA CE0: 256MB SDRAM: 16MB
9000_0000
EMIFA CE1: 256MB
A000_0000
EMIFA CE2: 256MB
B000_0000
Daughter Card
EMIFA CE3: 256MB
T TO
Technical Training
Organization
L2
CPU SRAM EMIF
prog/data
Data FFFF_FFFF
Cache
T TO What about the External Memory?
Technical Training
Organization
Program
Cache 8000_0000
External (CE0)
9000_0000 External (CE1)
Level 2
CPU EMIF A000_0000
External (CE2)
Prog/Data
B000_0000
External (CE3)
Data FFFF_FFFF
Cache
T TO How does this apply to the DSK?
Technical Training
Organization
(16MB)
Internal Room
CPU Memory
EMIF for
CE1 Expansion
Flash ROM
Data (256KB) CE3
Cache I/O Port
One of the biggest differences between the two chips is that the C6713 only has one EMIF. The
FLASH on the C6713 DSK is also 256KB, as opposed to 512KB on the C6416 DSK.
Here is the memory map for the C6713 DSK. This shows the total available memory that a C6713
has, and how that memory was used on the DSK.
FFFF_FFFF
Sections
Global Vars (.bss) Init Vals (.cinit) Every C program
consists of different
short m = 10; parts called Sections
short x = 2;
short b = 5; All default section
names begin with "."
main()
{
short y = 0; Local Vars
(.stack)
T TO
Technical Training
Organization
In the TI code-generation tools (as with any toolset based on the COFF – Common Object File
Format), these various parts of a program are called Sections. Breaking the program code and
data into various sections provides flexibility since it allows you to place code sections in ROM
and variables in RAM. The preceding diagram illustrated five sections:
• Global Variables
• Initial Values for global variables
• Local Variables (i.e. the stack)
• Code (the actual instructions)
• Standard I/O functions
Though, that’s not all the sections broken out by the C6000’s compiler …
If you think some of these names are a bit esoteric, we agree with you. (.code might have made
more sense than .text, but we have to live with the names they chose.)
You must link (place) these sections to the appropriate memory areas as provided above. In
simplest terms, initialized might be thought of as ROM-type memory and uninitialized as RAM-
type memory.
Exercise
8000_0000
Internal CE0 16MB
Memory SDRAM
C6000
CPU
9000_0000
CE1 4MB
FLASH
Hint: Think about what type of memory each one should reside in – ROM or RAM.
Solution? There are actually many solutions to this problem, depending on your system’s needs.
If you are contemplating booting your system from reset, then your answers may be very different
from a non-booted system. Here’s what we came up with:
Solution
8000_0000
Internal CE0 16MB
Memory SDRAM
C6000
CPU Init
9000_0000 Me ialized
mo
CE1 4MB ry
FLASH
Also, consider a bootable system. Some sections may initially be “loaded” into EPROM but “run”
out of internal memory. How are these sections handled? If you thought of this, great. We’ll
tackle how to do this later.
.text
.bss
.cinit 8000_0000 16MB SDRAM
To Create a New
Memory Area:
¾ Right-click on MEM
and select Insert Mem
¾ Fill in base/len, etc.
T TO
Technical Training
Organization
T TO
Technical Training
Organization What about the BIOS Sections?
T TO
Technical Training
Organization
We haven’t had the opportunity to describe all the BIOS-related sections. Please refer to the
online help for a description of each.
At times you will need to define and place your own user-defined sections, this is discussed later
in the chapter.
Initialized Sections
Earlier we discussed putting some sections into initialized (ROM) memory. When debugging our
code with CCS, though, we haven’t been putting these sections into ROM. How can the system
work?
The key lies in the difference between ROM and initialized memory. ROM memory is a form of
initialized memory. After power-up ROM still contains its values – in other words it’s initialized
after power-up.
Therefore, for our system to work, the initialized sections must “exist” before we start running
our code. In production we can program EPROM’s or Flash memory ahead of time. Or, maybe a
host downloads the initialized code and data before releasing the processor from reset.
Initialized Memory
CCS loader copies the following
sections into volatile memory:
.text .switch
.cinit .pinit
.const
.bios .sysinit
.gblinit .trcdata
.hwi_vec .rtdx_text
IRAM
.out file
CPU
T TO
Technical Training
Organization
When using the CCS loader (File:Load Program…), CCS automatically copies each of the
initialized sections (.text, .switch, .cinit, .pinit, .const, etc.) into volatile memory on the chosen
target.
Later in the workshop we will examine more advanced ways to locate initialized sections of code
and data. We even will get a chance to burn them into a Flash memory and re-locate them at
runtime. But for now, we won’t try anything that fancy.
When you have finished creating memory regions and allocating sections into these memory
areas (i.e. when you save the .CDB file), the CCS configuration tool creates five files. One of the
files is BIOS’s cfg.cmd file — a linker command file.
MEMORY{
MEMORY{ *cfg_c.c
EPROM:
EPROM: origin=0,
origin=0, length
length == 0x20000
…… }}
0x20000 *cfg.s62
SECTIONS
SECTIONS {{ *cfg.cmd
.text:
.text: >> EPROM
EPROM
.cinit:> *cfg.h
.cinit:> EPROM
EPROM
.bss:
.bss: >> IDRAM
T TO…… }}
IDRAM *cfg.h62
Technical Training
Organization
This file contains two main parts, MEMORY and SECTIONS. (Though, if you open and examine
it, it’s not quite as nicely laid out as shown above.)
Later in the workshop we’ll explore linker command files in greater detail. In fact, you will get to
build a custom linker command file in one of the lab exercises.
The linker’s main purpose is to link together various object files. It combines like-named input
sections from the various object files and places each new output section at specific locations in
memory. In the process, it resolves (provides actual addresses for) all of the symbols described in
your code.
appcfg.cmd Linker
Linker
.obj files
.map
libraries
(.lib) myApp.out
T TO
Technical Training
Organization
The linker can create two outputs, the executable (.out) file and a report which describes the
results of linking (.map).
Note: If the graphic above wasn’t clear enough, the linker gets run automatically when you
BUILD or REBUILD your project.
Optional Discussion
Entire C6000 Family Memory Description
0000_0000 0100_0000
CE0 CE1
0140_0000 16 MB 4 MB
Program
C6000 EMIF
CPU
8000_0000 0200_0000 0300_0000
CE2 CE3
Data 16 MB 16 MB
T TO
Technical Training
Organization
0000_0000 0100_0000
CE0 CE1
0140_0000 16 MB 4 MB
Program
C6000 EMIF
CPU
8000_0000 0200_0000 0300_0000
CE2 CE3
Data 16 MB 16 MB
T TO
Technical Training
Organization
0200_0000
16MB External
(CE0)
CE2
0300_0000
16MB External
(CE0)
CE3
FFFF_FFFF
T TO
Technical Training
Organization
Program
Cache
Level 2
CPU
Prog/Data
Data
Cache FFFF_FFFF
T TO
Technical Training
Organization
Introduction
In this chapter, you will learn how to program the EDMA to perform a transfer of data from one
buffer to another.
Learning Objectives
Goals for Chapter 4…
CPU EDMA
buf0 buf1
Channel
T TO
Technical Training
Organization
Chapter Topics
Using the EDMA........................................................................................................................................ 4-1
1.
1. Include
IncludeHeader
Library
HeaderFilesFiles General Procedure
Libraryand
andindividual
individualmodule
moduleheader
headerfiles
files
2.
for using CSL
2. Declare
DeclareHandle
Handle
For
Forperiph’s
periph’swith
withmultiple
multipleresources
resources
3.
3. Define
DefineConfiguration
Configuration
Create
Createvariable
variableofofconfiguration
configurationvalues
values
4. Open peripheral
4. Open peripheral
Reserves
Reservesresource;
resource;returns
returnshandle
handle
5. Configure peripheral
5. Configure peripheral
Applies
Appliesyour
yourconfiguration
configurationtotoperipheral
peripheral
1. #include <csl.h>
#include <csl_timer.h>
C6000
Overview
EDMA Overview
EDMA EDMA Channel
Channel 0 Options
Channel 1 Source
Channel 2 Transfer Count
Destination
...
Index
Channel 63 (15)
Count Reload Link Addr
31 16 15 0
C64x has 64 channels
C67x has 16 channels EDMA requires transfer parameters
Most obvious: Src, Dest, Count
Definitions
EDMA - How much to move
Block Frame Element
Frame 1 Elem 1
Frame 2 Elem 2 ESIZE
. . 00: 32-bits
. . 01: 16-bits
Elem N 10: 8-bits
11: rsvd
Frame M
Options
ESIZE
Source
Transfer Count
Destination
# Frames (M-1) # Elements (N)
Index
Cnt Reload Link Addr 31 16 15 0
31 0
Example
How do we setup the six EDMA parameters registers to transfer 4 byte-wide elements from loc_8
to myDest?
EDMA Example
8-bit Values
1 2 3 4 5 6 8
Goal: myDest:
7 8 9 10 11 12 9
Transfer 4 elements 13 14 15 16 17 18 10
from loc_8 to myDest 19 20 21 22 23 24 11
25 26 27 28 29 30
(Src: loc_8) 8 bits
Addr Update Mode (SUM/DUM) ESIZE FS
00: fixed (no modification) 00: 32-bits Frame Sync
01: inc by element size 01: 16-bits 0: Off
10: dec by element size 10: 8-bits 1: On
11: index 11: rsvd
Using CSL
As shown below, we basically want to get the six 32-bit values we calculated for each register
into the EDMA channel parameter location.
EDMA_Config
EDMA_Config myConfig
myConfig == {{ Channel
0x51200001,
0x51200001, //
// options
options EDMA_config() Options
&loc_8,
&loc_8, // source
// source Source
0x00000004,
0x00000004, //
// count
count Transfer Count
&myDest, // Destination
&myDest, // destination
destination
Index
0x00000000, // index
0x00000000, // index Cnt Reload Link Addr
0x00000000
0x00000000 //
// reload:link
reload:link 31 0
};
};
T TO
Technical Training
Organization
EDMA
EDMA Parameter
Parameter Values
Values EDMA_Config
EDMA_Config myConfig
myConfig == {{
options
options 0x51200001
0x51200001 EDMA_OPT_RMK(
EDMA_OPT_RMK(
source &loc_8 EDMA_OPT_PRI_LOW,
EDMA_OPT_PRI_LOW,
source &loc_8
count 0x00000004 EDMA_OPT_ESIZE_8BIT,
EDMA_OPT_ESIZE_8BIT,
count 0x00000004 EDMA_OPT_2DS_NO,
dest &myDest EDMA_OPT_2DS_NO,
dest &myDest EDMA_OPT_SUM_INC,
EDMA_OPT_SUM_INC,
index
index 0x00000000
0x00000000 EDMA_OPT_2DD_NO,
EDMA_OPT_2DD_NO,
rldcnt:lnk
rldcnt:lnk 0x00000000
0x00000000 EDMA_OPT_DUM_INC,
EDMA_OPT_DUM_INC,
EDMA_OPT_TCINT_YES,
EDMA_OPT_TCINT_YES,
_RMK (register make) creates EDMA_OPT_TCC_OF(5),
EDMA_OPT_TCC_OF(5),
a single hex value from option EDMA_OPT_LINK_NO,
EDMA_OPT_LINK_NO,
symbols you select EDMA_OPT_FS_YES
EDMA_OPT_FS_YES
),
),
_OF macro performs any EDMA_SRC_OF(loc_8),
EDMA_SRC_OF(loc_8),
needed casting (and provides EDMA_CNT_OF(0x00000004),
EDMA_CNT_OF(0x00000004),
visual consistency) EDMA_DST_OF(myDest),
EDMA_DST_OF(myDest),
Highlighted in BLUE are the EDMA_IDX_OF(0),
EDMA_IDX_OF(0),
options discussed thus far EDMA_RLD_OF(0)
EDMA_RLD_OF(0)
(esize, sum, dum, fs, src, cnt, dst) };
};
0 2
0 …
T TO
Technical Training
Organization
In Chapter 6 we will show how to use interrupt events to trigger the EDMA. This will come in
handy when we use the McBSP to tell the EDMA when to transfer a value to it, or when to pick
up a value from its receive register.
Exercise
Exercise 1 (Takes 20 Minutes)
Instructors, give students 20 Minutes to do exercise; Spend 10 mins reviewing
These answers will be used during upcoming lab
gBuf0 gBuf1
BUFFSIZE = BUFFSIZE =
EDMA
512 x 16 512 x 16
T TO
Technical Training
Organization
T TO
Technical Training
Organization
Lab 4 – Overview
Lab 4 – Programming the EDMA
CPU EDMA
gBuf0 gBuf1
EDMA
Goals:
1. CPU generates 32 sine values into gBuf0
2. EDMA transfers 32 elements from gBuf0 to gBuf1
T TO
Technical Training
Organization
• To use CSL to set up the EDMA for copying buf0 to buf1. This will be done
programmatically as discussed in the material.
Lab 4
Understanding Coding/Naming Conventions
1. Reset the DSK, start CCS and open audioapp.pjt.
2. Open main.c
You can open a file by double-clicking on it in the Project View window. You may have to
expand the source files folder to find it.
3. Review coding conventions.
• Take a look at the prototypes and global variables. You’ll notice that each uses titleCase,
meaning that the first word is lower case and the concatenated second word has the first
character capitalized. Titlecase is suggested for user-defined functions as well as global
variables. Example: gBuf.
• Constants are entirely capitalized (no underscores) – notice the constant BUFFSIZE.
• CSL Functions: the CSL API uses a specific naming convention. The generic form of a
CSL function looks like: MOD_function( ). For example, when using the EDMA module,
its open function appears as:
EDMA_open( )
EDMA is capitalized because it is the module name. The function (such as “config”,
“intEnable”, or “open”) is in titleCase and separated by an underscore.
• CSL Data Types: take the generic form MOD_DataType. That is, along with the module
name, the type is separated by an underscore. Also, notice that titleCase is used here, too,
with one exception; the first letter after the underscore is Capitalized.
• To distinguish global variables from locals, we will use a small “g” prefix. Globals do not
use underscores. (The small “g” is not required, but it’s a common practice.)
• Handles (or pointers to our resources) will normally begin with a lower case prefix of
“h”. (Not required, but again, it’s a common practice.)
These conventions match TI’s software development guidelines, and are similar to
Microsoft’s naming conventions. For the most part, understanding and using these
conventions will help clarify everyone’s code. Hopefully they’ll quickly become second
nature.
We will be using the Chip Support Library (CSL) to perform setup and initialization (most of the
code you’ll need comes from the paper exercise). Refer to the 5-step CSL procedure for
programming the EDMA from the discussion – and the paper exercise you did just before the lab.
We’re going to follow the first 5 steps of the procedure and save the autoinit step until later.
If you need additional help, you can refer to the CSL Reference Manual (SPRU401) under
Help → Users Manuals in CCS.
We are going to put all of the code that initializes the EDMA into a separate file to keep it all nice
and organized. We have provided a simple file to start with called edma.c.
5. Add edma.c to your project
The file, edma.c, is located in c:\iw6000\labs\audioapp\.
6. Open edma.c and inspect it
There's not much exciting here right now, but we'll add a lot of code to this file by the day's
end.
We're going to add code to this file to initialize and configure the EDMA to do a transfer. We
will basically be following the 5 step procedure that we outlined earlier. Please refer back to
this procedure to help you keep track of what you are doing.
7. Add the two header files necessary for CSL and the EDMA APIs (Step 1 of 5)
In edma.c, our code will reference the functions and data-structures from these libraries
(<csl.h> and <csl_edma.h>). Make sure you add them in the correct order. These should be
the first #include statements in main.c
8. Declare the EDMA Handle in edma.c (Step 2 of 5)
Add a global EDMA handle, named hEdma, to the global variables area of your program in
edma.c. We will use this handle to point to and initialize the channel registers.
9. Copy the Starter EDMA Config Structure
Rather than typing the whole structure from scratch, we have provided a structure for you that
is almost completely filled in (see comments at the top of the file).
Copy the structure from the commented area to the global variables area of edma.c just
beneath the declaration for the EDMA handle. Change the name of the structure from
variableName to gEdmaConfig.
Notice: The TYPE definition EDMA_Config uses an uppercase C for “C”onfig. This is the
naming standard for CSL’s typedefs, i.e. MOD_Config, where MOD is the module name
EDMA. (As opposed to the “config” function that uses a small “c”.)
Hint: If you need some help filling in the values, you may find some hints by accessing
Help → Users Manuals and looking at the CSL Reference Guide (SPRU401).
Search the .pdf file for EDMA_OPT_field_symval. You can find tips here on how
to fill in the config structure.
Set the Options (OPT) register using the _RMK macro as follows:
• Low Priority
• 16 bit Elements
• 1-dimensional source
• Source Increments
• 1-dimensional destination
• Destination Increments
• Do NOT cause a transfer complete interrupt (later in the lab, we’ll change this)
• Set a transfer complete code of 0
(we will change this using EDMA_intAlloc later…)
• Set the transfer complete code upper bits (TCCM) to the default value
64 •
•
Set the cause alternate transfer complete interrupt to default
Set the value of the alternate transfer complete code to the default value
Leave these
bits commented • Set the peripheral device transfer source to default
out for C67x.
• Set the peripheral device transfer destination to default
• Disable linking of event parameters (we’ll change this in order to auto-initialize)
• Use Frame Synchronization
Note: If you are using the C67x, make sure to comment out the four fields that are specific to
the C64x.
Modifying main( )
16. Add initEdma() call to main( ) in main.c
Now that the function is created, we need to call it. Add a call to initEdma( ) in the main()
function just below the call to SINE_init(…).
17. Include edma.h in main.c
Since we are calling a function that is located in another file, we need to reference it in the
calling file, main.c. We have provided a header file to do this for you, edma.h. Feel free to
open edma.h and check out what it has in it.
You’re Done
Optional Topics
DMA (vs. EDMA)
DMA
4 Channels with fixed priority
1 extra channel dedicated to HPI
Global registers shared by all channels
Channel 3 Global Registers
Channel 2 Count Reload A
Channel 1 Count Reload B
Index A
Channel
DMA 0
Index B
Primary Ctrl
Address A
Secondary Ctrl
Address B
Source
Address C
Destination
Address D
Xfr Count
T TO
Technical Training
Organization
DMA
Primary Ctrl 9 8 7 6 5 4 1 0
Secondary Ctrl ESIZE
ESIZE DSTDIR SRCDIR START
Source
Destination
Xfr Count # Frames # Elements
T TO 31 16 15 0
Technical Training
Organization
T TO
Technical Training
Organization
2K parameter RAM
Interrupt Events
We often describe the EDMA as a traditional DMA peripheral
While this description works conceptually, the EDMA is actually made
of two blocks:
EDMA channel controller: Reads transfer parameters from channel
location in parameter RAM and sends request to Transfer Controller
T TO Transfer Controller: Moves blocks of data as requested
Technical Training
Organization
Transfer Controller
Program McBSP’s
Cache
EDMA HPI
L2 Transfer
CPU Controller
SRAM
EMIF
Data
Cache Etc.
T TO
Technical Training
Organization
QDMA
QDMA
EDMA Transfer Controller
Channel Controller
Transfer engine
Channels:
Transfer Takes
move requests from
EDMA, QDMA, and cache
Controller Request
Reloads:
2K parameter RAM
Interrupt Events QDMA
Sends a single block transfer request
QDMA
Starts (i.e. sends transfer request) when
Options last register is written to; it doesn't work
Source with (interrupt) events
Count No auto-init, therefore it does not have
Destination Reload:Linking register (this feature is
discussed in the next chapter)
Index
Transfer request goes directly to the
Transfer Controller
T TO
Technical Training
Organization
DAT
Block copy module
Simply moves (or fills) a block of data
No sync or ints are provided
DAT Functions
DAT_busy
DAT_close
DAT_copy
DAT_fill
DAT_open
DAT_setPriority
DAT_wait
DAT_copy2d
DAT is device independent
Implemented for all C5000/C6000 devices
It uses whatever DMA capability is available
Uses QDMA, when available
T TO
Technical Training
Organization
DAT_open(DAT_CHAANY, DAT_PRI_HIGH);
PDTS/PDTD
PDTS/PDTDallows allowsEDMA
EDMAtotouseusethe
theEMIF’s
EMIF’sPDT
PDTcapability,
capability,
that
thatisisititallows
allowsthe
theEDMA
EDMAtototransfer
transferdirectly
directlyto/from
to/fromaa
peripheral
peripheraltotoexternal
externalmemory
memory
//EDMA_OPT_PDTS_DEFAULT, // Peripheral Device Transfer Source (c64x only)
//EDMA_OPT_PDTD_DEFAULT, // Peripheral Device Transfer Dest (c64x only)
...
T TO
Technical Training
Organization
T TO
Technical Training
Organization
T TO
Technical Training
Organization
*** this page had error 141 (no text on page) ***
Introduction
In this chapter, we'll see what the EDMA can do when it finishes a transfer. We will discuss how
the CPU’s interrupts work, how to configure the EDMA to interrupt the CPU at the end of a
transfer, and how to configure the EDMA to auto-initialize.
Learning Objectives
Lab 5…
EDMA CPU
2 1
gBuf1 gBuf0
Channel
3
Frame Transfer Complete
T TO
Technical Training
Organization
Outline
T TO
Technical Training
Organization
Chapter Topics
Hardware Interrupts (HWI) .................................................................................................................... 5-1
1
..
.
15
You can prevent (or enable) the channel from sending an interrupt…
The TCINT bit of each channel turns EDMA interrupt generation on and off.
0 TCINT=0
1
..
.
15
Options TCINT
20 Channel’s Options register allows you to enable/disable
interrupt generation
Similar to the CPU's interrupt recognition, the EDMA has flag/enable bits ...
The CIPR register records which enabled (TCINT set) channels have finished. The CIER register
controls which CIPR bits send an interrupt to the CPU.
1 1 CIER1 = 0 EDMAINT
..
. 1 CIER8 = 1
15 0 CIER15 = 0
Options TCINT
20 The Channel Interrupt Pending Register (CIPR) records
that an EDMA transfer complete has occurred
The TCC field in the Options Register allows each channel to set any CIPR bit.
1
TCINT=0 TCC=0 1 CIER1 = 0 EDMAINT
..
. TCINT=1 TCC=1
1 CIER8 = 1
The Chip Support Library (CSL) has functions for manipulating the various bits used by the
EDMA to control interrupt generation.
1
TCINT=0 TCC=0 1 CIER1 = 0 EDMAINT
..
. TCINT=1 TCC=1
1 CIER8 = 1
Passing a “-1” to EDMA_intAlloc( ) allocates any available CIPR bit, as opposed to allocating a
specific bit.
For now, allocating any CIPR bit is OK. When using EDMA Channel Chaining, though, a
specific CIPR bit must be used. In these cases, it is either a good idea to allocate the specific
CIPR bits first, or plan out which channels will use which bits. Then use the EDMA_intAlloc()
function to officially allocate (i.e. reserve) each CIPR bit. (Note, Channel Chaining is briefly
discussed at the end of this chapter as an optional topic.)
T TO
Technical Training
Organization
• EDMA
• HPI
• Timers
• Ext pins
• Etc.
2. Sets flag in
IFR register
...
The IER register and the GIE bit in the Control Status Register allow users to enable and disable
interrupts.
EDMAINT 1
‘C6000
CPU
0
T TO
Technical Training
Organization
Note, the DSP/BIOS HWI Dispatcher is discussed later (on page 5-12).
Please fill-in the code that needs to be run in our system, when the EDMA finishes transferring a
block of sine wave values:
Hint: Just fill in the functions that need to run. Don’t worry about the arguments, for now.
Though, you’ll need to come up with the function arguments when coding the ISR in
the upcoming lab.
T TO
Technical Training
Organization
HWI Objects
Using the DSP/BIOS Configuration Tool, it is easy to configure each HWI object’s Interrupt
Source and ISR function. These settings can also be handled via CSL functions, but the Config
Tool is much easier to use.
void
void edmaHwi()
edmaHwi()
{{
...
...
}}
Note: Since the Config Tool expects an assembly label, you need to place an “_” (underscore)
in front of any C function name that is used – as shown above.
The HWI object allows you to select the HWI dispatcher. This is found on the 2nd tab:
Context … … HWI_nothing
Restore 15 XINT2 HWI_nothing
void edmaHWI()
{
…
}
T TO
The HWI dispatcher is plugged into the interrupt vector table. It saves the necessary CPU context,
and calls the function specified by the associated HWI object. Additionally, it allows the use of
DSP/BIOS scheduling functions by preventing the scheduler from running while an HWI ISR is
active.
Interrupt Initialization
Several concepts have been introduced up to this point. Let's take a moment to make sure that you
understand how to setup the CPU to receive a given interrupt.
void initHWI(void)
{
Since there is only one EDMA ISR, the CIPR bits can be used to tell which EDMA channels have
actually completed transfers and need to be serviced.
Which Channel?
EDMA Channels EDMA Interrupt Generation
Channel # Options CIPR CIER
0 TCINT=1 TCC=8
0 CIER0 = 0
1
TCINT=0 TCC=0 1 CIER1 = 0 EDMAINT
..
. TCINT=1 TCC=1
1 CIER8 = 1
To use the EDMA Interrupt Dispatcher, the EDMA interrupt vector needs to be setup to call the
dispatcher.
EDMA
EDMA Interrupt Problem?
Channel HWI Dispatcher
0 Reset _c_int00
EDMA_intDispatcher
EDMAINT C6000 Context
Count = 0 5 EDMAINT _edmaHWI
CPU Save
15 XINT2 HWI_nothing
Context
Restore
The EDMA Interrupt Dispatcher figures out what channels have finished and calls the function
that has been associated with each CIPR bit that’s been set.
EDMA
EDMA Interrupt Dispatcher
Channel HWI Dispatcher
0 Reset _c_int00
EDMAINT C6000 Context
Count = 0 5 EDMAINT _EDMA_intDispatcher
CPU Save
15 XINT2 HWI_nothing
Context
Restore
How do we know which function is associated with which channel (i.e. CIPR bit)?
The EDMA Interrupt Dispatcher needs to be told what function to call for each of the CIPR bits
that we want to cause an interrupt to the CPU. This is referred to as "hooking" a function into the
EDMA Interrupt Dispatcher. And thus, the CSL function is called EDMA_intHook().
EDMA_intHook
void
void initEDMA()
initEDMA()
{{
...
...
EDMA_intHook(8,
EDMA_intHook(8, edmaHWI);
edmaHWI);
...
...
}}
The EDMA_intHook function has two arguments, the CIPR bit number and the function to be
called when it’s set by a completed EDMA channel.
For simplicity, the example shown above specifies a CIPR bit with just the number “8”. Most
likely, though, you will use a variable to represent the CIPR bit number. A variable is a better
choice as it can be set when using the EDMA_intAlloc() function to reserve a CIPR bit for an
EDMA channel.
EDMA Auto-Initialization
Interrupting the CPU is nice for keeping the EDMA and CPU in sync. This allows the CPU to
know when to perform an action based upon EDMA activity, such as refilling the sine-wave
buffer.
But, how does the EDMA channel get reprogrammed to perform another block transfer?
The CPU could go off and program the EDMA for a new transfer during the ISR. Are there any
negatives to this? Yes, it takes valuable CPU time. What if we could tell the EDMA what job to
do next; that is, in advance?
Notice that the EDMA channel registers actually change as the transfer takes place. The source
address, destination address, and the transfer count are good examples of values that may change
as the transfer occurs. If these values have changed, they can't be used to do the same transfer
again without being refreshed.
The EDMA has a set of "reload" registers that can be configured like an EDMA channel. Each
channel can be linked to a reload set of registers. In this way, the values in the reload registers can
be used to "reload" the “used” EDMA channel.
The reload register sets can also be linked to other reload sets; thus a linked-list can be created.
Offloads CPU ... can reinitialize all six registers of an EDMA channel
Next transfer specified by Link Address
Perform simple re-initialization or create linked-list of events
Useful for ping-pong buffers, data sorting, circular buffers, etc.
T TO
Technical Training
Organization
6 Steps to Auto-Initialization
Here is a nice 6-step procedure for setting up EDMA Auto-Initialization.
EDMA_link(hMyReload, hMyReload)
T TO
Technical Training
Organization
Here’s a code summary of the six steps required for setting up a channel for linking:
Summary
Here is the complete flow of EDMA interrupts, from EDMA channel to CPU:
While the flow from EDMA completion to CPU interrupt may be a bit involved, it provides for an
extremely flexible, and thus capable, EDMA controller. (In fact, the EDMA is often called a co-
processor due to its extreme flexibility.)
Part 2:
Enabling CPU Ints
_FMK builds a 32-bit mask that can be used to OR a value into a register. In our case, we’re
using it to put the CIPR value allocated by EDMA_intAlloc into the TCC field of the Options
register. Note, it is important that the previous value for TCC have been set to “0000” when using
the OR command shown above. This is why we set TCC = 0 in the global EDMA configuration.
Here is the complete summary of the 6-step procedure for setting up an EDMA channel to
interrupt the CPU.
T TO
Technical Training
Organization What about setting up hardware interrupts?
T TO
Technical Training
Organization When the transfer completes…what happens?
EDMA ISR
EDMAINT C6000 HWI EDMA
Count = 0 Dispatcher Dispatcher
CPU
The flow described above is specific to the upcoming lab exercise. Though much of it is generic,
two of the steps are specific:
• The lab asks you to setup autoinitialization for the channel we’re using. This may, or may
not, be what you need in another system.
• The final step triggers the EDMA to run using the EDMA_setChannel() function. Often
this is done automatically by interrupt events. In Lab 5, we will use the _setChannel
function, but the next lab uses the McBSP to trigger the EDMA to run.
EDMA_clearChannel ECR
EDMA_getChannel ER
EDMA_enableChannel
EER
EDMA_disableChannel
EDMA_enableChaining
CCER
EDMA_disableChaining
EDMA_intAlloc
EDMA_intFree
EDMA_intTest CIPR
EDMA_intClear
EDMA_intEnable
CIER
EDMA_intDisable
T TO
Technical Training
Organization
Here’s the same summary, but we’ve added the function’s arguments and return values.
ESR
EDMA_setChannel(h)
(Event Set Register)
(sets ESR bit which sets corresponding ER bit)
ECR
EDMA_clearChannel(h)
(Event Clear Register)
(sets ECR bit which clears corresponding ER bit)
ER
1 or 0 = EDMA_getChannel(h)
(Event Register)
EDMA_enableChannel(h) EER
EDMA_disableChannel(h) (Event Enable Register)
tcc or -1 = EDMA_intAlloc(tcc or -1)
EDMA_intFree(tcc) CIPR
1 or 0 = EDMA_intTest(tcc) (Chan Interrupt Pending Reg)
EDMA_intClear(tcc)
EDMA_intEnable(tcc) CIER
EDMA_intDisable(tcc) (Chan Interrupt Enable Reg)
EDMA_enableChaining(h) CCER
EDMA_disableChaining(h) (Chan Chaining Enable Reg)
T TO
Technical Training
Organization
Exercise
Exercise 1 (Review)
• Complete the following Interrupt Service Routine.
Here’s a few hints:
Follow the code outlined on the “EDMA ISR” slide.
Don’t forget, though, that our exercise (and the upcoming lab) uses
different variable names than those used in the slide’s example code.
To “fill the buffer”, what function did we use in Labs 2 and 4 to create
a buffer of sine wave data?
void edmaHwi(void)
{
};
T TO
Technical Training
Organization
Exercise 2: Step 1
1. Change gEdmaConfig so that it will: (Just cross-out the old and jot in the new value)
Interrupt the CPU when transfer count reaches 0
Auto-initialize and keep running
EDMA_Config gEdmaConfig = {
EDMA_OPT_RMK(
EDMA_OPT_PRI_LOW, // Priority?
EDMA_OPT_ESIZE_16BIT, // Element size?
EDMA_OPT_2DS_NO, // 2 dimensional source?
EDMA_OPT_SUM_INC, // Src update mode?
EDMA_OPT_2DD_NO, // 2 dimensional dest?
EDMA_OPT_DUM_INC, // Dest update mode?
EDMA_OPT_TCINT_NO, // Cause EDMA interrupt?
EDMA_OPT_TCC_OF(0), // Transfer complete code?
EDMA_OPT_LINK_NO, // Enable link parameters?
EDMA_OPT_FS_YES ), // Use frame sync?
... };
4. Hook the ISR function so it is called whenever the appropriate CIPR bit
is set and the CPU is interrupted.
Exercise 2: Steps 5
5. Enable the CPU to accept the EDMA interrupt. (Hint: Add 3 lines of code.)
void initHwi(void)
{
};
7. Allocate one of the Reload sets: (Hint: hEdmaReload gets this value)
9. Modify both the EDMA channel and the reload set to link to the
reload set of parameters:
Lab 5
Overview
In lab 5, you'll have an opportunity to test everything that you have learned about interrupts and
auto-initialization.
EDMA
Pseudo Code
1. CPU generates 32 sine values into buf0
2. EDMA transfers 32 elements from buf0 to buf1
3. EDMA sends “transfer complete” interrupt to CPU
4. Go to step 1
T TO
Technical Training
Organization
• To use CSL to configure the EDMA interrupt to the CPU in order to generate another
buffer full of sine wave values.
• To change the configuration of the EDMA so that it uses auto-initialization to setup the
next transfer.
Lab Overview
This lab will follow the basic outline of the discussion material. Here's how we are going to go
about this:
• First, we're going to configure the CPU to respond to interrupts and set up the interrupt
vector using the .cdb file. We're going to configure the CPU to call the EDMA dispatcher
that will call our function to process the EDMA interrupt.
• Next, we'll write the function that we want the EDMA dispatcher to call.
• Then, we'll change some setting in the EDMA configuration and the initEdma( ) code.
One thing that we'll definitely need to do is to tell the EDMA dispatcher to call the
function that we wrote in the previous step.
• Finally, we'll configure the EDMA channel to use auto-initialization.
During this part of the lab, we will be somewhat following the "6-step procedure to program the
EDMA to interrupt the CPU" outlined on pages 5-21 to 5-23. Feel free to flip back and review
that material before trying to write the code.
Initializing Interrupts
We need to set up two things: (1) enable the CPU to respond to the EDMA interrupt (IER) and
(2) turn on global interrupts (GIE). Refer to the discussion material which outlines the 5-step CSL
procedure for initializing an HWI.
7. Add a new function called initHwi( ) at the end of your code in main.c
We will use this function to initialize hardware interrupts. We will add a call to it in main( )
in few steps.
8. Add a call to IRQ_enable( ) in initHwi( )to enable the EDMA interrupt to the CPU
This connects the EDMA interrupt to the CPU via the IER register.
9. Enable CPU interrupts globally and terminate the initHwi() function
Add the CSL function call that enables global interrupts (GIE). Add a closing brace to the
function to finish it off.
10. Add the proper include file for interrupts to the top of main.c in the "include" area
11. Add a call to initHwi( ) in main( ) after the call to initEdma( )
Hint: Whenever the instructions ask you to “add a new function”…don’t forget to
prototype it! We've already added it to the header file for you for inclusion in other
files.
13. Add a new function called edmaHwi( ) at the end of your edma.c code
This function will serve as our Interrupt Service Routine (ISR) that will get called by the
EDMA interrupt dispatcher. The EDMA interrupt dispatcher passes the CIPR bit of the
EDMA channel that caused the interrupt to the edmaHwi( ) routine. We will not be using this
argument for now, but we will need it later. So, go ahead and write the function with the
argument in the definition like this:
Modify initEdma( )
19. Configure the EDMA Channel to use a TCC Value
Configure the channel using your new variable. (It’s a two step process.)
• Inside the initEdma function (after the _open) set gXmtTCC equal to “any” TCC value
as shown in the discussion material.
• Then set the actual TCC field (in the configuration) to this value.
This reserves a specific TCC value so that no other channel can use it.
After referring to the material, you hopefully came up with these two steps to be added to
initEdma( ):
gXmtTCC = EDMA_intAlloc(-1);
gEdmaConfig.opt |= EDMA_FMK(OPT, TCC, gXmtTCC);
20. Hook the edmaHwi( ) function into the EDMA Interrupt Dispatcher
The EDMA Interrupt Dispatcher automatically calls a function for each of the CIPR bits that
get set by an EDMA interrupt and that are enabled.
We need to tell it what function to call when the transmit interrupt fires. The transmit
interrupt is going to assert a given CIPR bit when it occurs. So, we need to tell the EDMA
Interrupt Dispatcher which function is tied to that CIPR bit. Refer back to the lecture material
if you can't figure out which API call to use here, or how to use it. Don't forget about online
help inside CCS as well. Add this code anywhere in the initEdma( ) function that makes sense
to you.
21. Clear any spurious interrupts and enable the EDMA interrupt
At the end of the initEdma( ) function in edma.c, add the following calls to clear the
EDMA’s channel interrupt pending bit associated with the channel we’re using (i.e. clear the
appropriate CIPR bit). Also, enable the EDMA interrupt (i.e. set the required CIER bit). Note,
the same TCC value used earlier is required for both these operations.
EDMA_intClear(gXmtTCC);
EDMA_intEnable(gXmtTCC);
We will be following the "6 Steps to Auto-Initialization" procedure outlined earlier. Please feel
free to refer back to this material to help you understand this part of the lab.
22. Enable the link parameters
Change the LINK field to YES in the EDMA Configuration Structure. This will cause the
channel to link to a reload entry and refresh the channel with its original contents – this is
called autoinitialization. The next few steps will set up the channel’s link address to the
reload entry.
23. Add another global EDMA handle named hEdmaReload to edma.c
24. Initialize the new reload entry handle
In initEdma( ), add the following API call to initialize the reload handle (hEdmaReload) to
ANY reload entry location:
hEdmaReload = EDMA_allocTable(-1);
You can see an example of this in the discussion material. This handle points to the reload
entry that we will initialize with the original channel's EDMA config structure.
EDMA_config(hEdmaReload, &gEdmaConfig);
26. Link the channel and reload entry to the reload handle
After the channel finishes the first transfer, we need to tell it where to link to for the next
transfer. We need to link the channel to the new reload entry handle (acquired in the previous
step) AND we need to link the reload entry to itself for all succeeding transfers. This is the
basis of autoinitialization. Use the proper API to link the channel to the reload entry and use
that same API to link the reload entry to itself. Go ahead and add this code to initEdma( ).
You’re Done
Optional Topics
Saving Context in HWIs
main(){
Interrupt Keyword
...
interrupt occurs Vector Table
next instruction
...
interrupt myISR(void);
context save …
- - - -
- - - -
- - - -
context restore …
B IRP;
Interrupt Keyword
When using the interrupt keyword:
Compiler handles register preservation
Returns to original location
No arguments (void)
No return values (void data type) The HWI dispatcher…
main(){
HWI Dispatcher
...
interrupt occurs Vector Table
next instruction
...
2. Interrupt Keyword
Provides highest code optimization (by a little bit)
Notes:
Choose HWI dispatcher and Interrupt keyword on an
interrupt-by-interrupt basis
Caution:
For each interrupt, use only one of these two
interrupt context methods
T TO
Technical Training
Organization
Alternatively ...
myASM_ISR:
HWI_enter C62_ABTEMPS, 0, 0xffff, 0
If using Assembly, you can either handle interrupt context/restore & return with
the HWI dispatcher, or in your own code
If you don’t use the HWI Dispatcher, the HWI _enter/_exit macros can handle:
Context save (save/restore registers)
Return from interrupt
Re-enable interrupts (to allow nesting interrupts)
HWI_enter: Modify IER and re-enable GIE
HWI_exit: Disable GIE then restore IER
T TO
Technical Training
Organization
DMA/EDMA Comparison
CPU Interrupts 4 1
Interrupt six: 3 for Count
Count = 0
Conditions 3 for errors
Reload (Auto-Init) ~2 69 21
T TO
Technical Training
Organization
0
0 EER0 = 0
(DSPINT)
TCINT = 0 TCC = 8
0 CIER0 = 0
1
1 EER1 = 1
(TINT0)
TCINT = 1 TCC = 1
1 CIER1 = 1 EDMAINT
…4…
0 EER4 = 0
(EXT_INT4)
TCINT = 0 TCC = 14
1 CIER4 = 0
…8
CCR8 = 0
1 EER8 = 0
(EDMA_TCC8)
TCINT = 1 TCC = 4
0 CIER8 = 0
… 15 20 19 16 0
(REVT1) TCINT TCC .
.
.
CIPR8 – CIPR11
Connect to CCR8-11
Enabling Interrupts
What events/conditions are required to recognize an interrupt?
IER CSRGIE
“Individual “Master
Switch” Switch”
INTx
‘C6000
CPU
INTy
T TO
Technical Training
Organization
Reserved
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
IE15 IE14 IE13 IE12 IE11 IE10 IE9 IE8 IE7 IE6 IE5 IE4 rsv rsv nmie 1
R, W, +0 R,+1
//
// To
To enable,
enable, then
then disable
disable the
the timer0
timer0 int
int
IRQ_enable(IRQ_EVT_TINT0);
IRQ_enable(IRQ_EVT_TINT0);
IRQ_disable(IRQ_EVT_TINT0);
IRQ_disable(IRQ_EVT_TINT0);
T TO
Technical Training
Organization
Reserved
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
IE15 IE14 IE13 IE12 IE11 IE10 IE9 IE8 IE7 IE6 IE5 IE4 rsv rsv NMIE
NMIE 1
R, W, +0 R,+1
INTx
Interrupt
low Interrupt Interrupt
latched recognized
by CPU
To
Togenerate
generateaavalid
validinterrupt
interruptsignal,
signal,hold
holdINTx
INTxlow
low for
for2+
2+cycles,
cycles,
then highfor
thenhigh for2+
2+cycles
cycles
Interrupt
Interruptisislatched
latchedononrising
risingedge
edgeof
ofCLKOUT1
CLKOUT1following
followingaarising
rising
edge of INTx (if above timing is met)
edge of INTx (if above timing is met)
Interrupt
Interruptisisrecognized
recognizedby bythe
theCPU
CPUone
onecycle
cyclelater
later
T TO
Technical Training
Organization
T TO
Technical Training
Organization
T TO
Technical Training
Organization
Interrupt Vectors
Interrupt Vectors
0h RESET ISFP
20h NMI ISFP
RESET:
rsvd mvkl _c_int00,b0
rsvd mvkh _c_int00,b0
80h INT4 ISFP b b0
A0h INT5 ISFP nop
nop
C0h INT6 ISFP nop
INT7 ISFP nop
INT8 ISFP nop
INT9 ISFP
HWI_RESETnmi_vector:
Properties
INT10 ISFP
mvkl _nmi_isr,b0
INT11 ISFP
mvkh _nmi_isr,b0
INT12 ISFP
INT13 ISFP b b0
nop
INT14 ISFP nop
INT15 ISFP RESET
nop
200h nop
_c_int00
nop
...
_c_int00 boot.c
T TO
Technical Training
Organization
Vector table can be relocated ...
C6000
CPU
Locates vector table on any 1K boundary
Data Regs
31 10 9 0
A0 - Axx
ISTB field
B0 - Bxx
R,W,+0 R,+0
Control Regs ISTP is located in CPU
ISTP ISTB field points to vector table
. Allows you to locate Interrupt Vector Table on any
.
. 1K boundary
Configure with in CDB file
or use IRQ_setVecs()
Interrupt Selection
Interrupt Multiplexer High (INT10 - INT15)
Sel # C6701 Sources 29 26 24 21 19 16
0000b (HPI) DSPINT
INTSEL15 INTSEL14 INTSEL13
0001b TINT0
0010b TINT1 13 10 8 5 3 0
0011b SD_INT INTSEL12 INTSEL11 INTSEL10
0100b EXT_INT4
0101b EXT_INT5
0110b EXT_INT6
Interrupt Multiplexer Low (INT4 - INT9)
29 26 24 21 19 16
0111b EXT_INT7
1000b DMA_INT0 INTSEL9 INTSEL8 INTSEL7
1001b DMA_INT1 13 10 8 5 3 0
1010b DMA_INT2 INTSEL6 INTSEL5 INTSEL4
1011b DMA_INT3
1100b XINT0 Interrupt Selector registers are memory-
1101b RINT0 mapped
1110b XINT1 Configured by HWI objects in Config Tool
1111b RINT1
Or, set dynamically using IRQ_map()
T TO
Technical Training
Organization
Interrupt Selection
Interrupt Multiplexer High (INT10 - INT15)
Sel # C6701 Sources 29 26 24 21 19 16
0000b (HPI) DSPINT
INTSEL15 INTSEL14 INTSEL13
0001b TINT0
0010b TINT1 13 10 8 5 3 0
0011b SD_INT INTSEL12 INTSEL11 INTSEL10
0100b EXT_INT4
0101b EXT_INT5
0110b EXT_INT6
Interrupt Multiplexer Low (INT4 - INT9)
29 26 24 21 19 16
0111b EXT_INT7
1000b DMA_INT0 INTSEL9 INTSEL8 INTSEL7
1001b DMA_INT1 13 10 8 5 3 0
1010b DMA_INT2 INTSEL6 INTSEL5 INTSEL4
1011b DMA_INT3
1100b XINT0 Interrupt Selector registers are memory-
1101b RINT0 mapped
1110b XINT1 Configured by HWI objects in Config Tool
1111b RINT1
Or, set dynamically using IRQ_map()
T TO
Technical Training
Organization
31 0
B.S2 IRP ;return, PGIE GIE
IRP
IRP(interrupt)
(interrupt) NOP 5
R,W,+x
31 0
B.S2 NRP ;return, NMIE = 1
NRP
NRP(NMI)
(NMI) NOP 5
R,W,+x
T TO
Technical Training
Organization
ISR
IRQ_set
(Interrupt Set Register)
(sets ISR bit which sets corresponding IFR bit)
ICR
IRQ_clear
(Interrupt Clear Register)
(sets ICR bit which clears corresponding IFR bit)
IRQ_map
IFR
IRQ_config
IRQ_test (Interrupt Flag Register)
IRQ_enable
IER
IRQ_disable
IRQ_restore (Interrupt Enable Register)
IRP
(Interrupt Return Pointer)
IRP
(Non-maskable Int. Return Ptr.)
IRQ_setVecs ISTP
or Use Config Tool (Interrupt Service Table Ptr.)
T TO
Technical Training
Organization
void initHWI(void)
{
IRQ_enable(IRQ_EVT_EDMAINT);
IRQ_globalEnable();
T TO
Technical Training
Organization
Exercise 2
Exercise 2: Step 1
1. Change gEdmaConfig so that it will: (Just cross-out the old and jot in the new value)
Interrupt the CPU when transfer count reaches 0
Auto-initialize and keep running
EDMA_Config gEdmaConfig = {
EDMA_OPT_RMK(
EDMA_OPT_PRI_LOW, // Priority?
EDMA_OPT_ESIZE_16BIT, // Element size?
EDMA_OPT_2DS_NO, // 2 dimensional source?
EDMA_OPT_SUM_INC, // Src update mode?
EDMA_OPT_2DD_NO, // 2 dimensional dest?
EDMA_OPT_DUM_INC, // Dest update mode?
EDMA_OPT_TCINT_NO,YES // Cause EDMA interrupt?
EDMA_OPT_TCC_OF(0), // Transfer complete code?
EDMA_OPT_LINK_NO, YES // Enable link parameters?
EDMA_OPT_FS_YES ), // Use frame sync?
... };
T TO
Technical Training
Organization
4. Hook the ISR function so it is called whenever the appropriate CIPR bit
is set and the CPU is interrupted.
EDMA_intHook(gXmtTCC, edmaHWI);
T TO
Technical Training
Organization
Exercise 2: Steps 5
5. Enable the CPU to accept the EDMA interrupt. (Hint: Add 3 lines of code.)
#include <csl_irq.h>
void initHwi(void)
{
IRQ_enable(IRQ_EVT_EDMAINT);
IRQ_globalEnable(void);
};
T TO
Technical Training
Organization
7. Allocate one of the Reload sets: (Hint: hEdmaReload gets this value)
hEdmaReload = EDMA_allocTable( -1 );
9. Modify both the EDMA channel and the reload set to link to the
reload set of parameters:
EDMA_link(hEdma, hEdmaReload);
EDMA_link(hEdmaReload, hEdmaReload);
T TO
Technical Training
Organization
Introduction
In this module, we will learn how to program the C6000 McBSP using the CSL. First, we’ll learn
how the McBSP operates and the choices we can make, and then how to use the CSL to program
the selected options. In the lab, you will finally use the DSK to make some “noise”. If it sounds
like a song, you got it right. If it really is just noise…then you’ll have some debugging to do…
Learning Objectives
Goals for Module 6…
McBSP EDMA CPU
Rcv gBufRcv
+
ADC RCVCHAN
Xmt COPY
gBufXmt
DAC XMTCHAN
Chapter Topics
McBSP........................................................................................................................................................ 6-1
McBSP Overview
The Multi-Channel Buffered Serial Port (McBSP) is an extremely flexible serial port. The follow
graphic is a humorous approach at describing its many standards and capabilities.
T1
2 SP
M- I
IO
Codecs
AI
Cs
Bu s
ST - MVIP E1
AC’97
IIS
Could this be you? /A-La
w
u-Law nne
l
The McBSP is an extremely a
Full-
duplex -Ch
capable serial port
Mu lti
Block Diagram
The McBSP is a full-duplex, synchronous serial port. Either the CPU or EDMA can read and
write to its memory-mapped data registers (DRR, DXR).
CPU
D R
R Expand B RSR DR
(optional)
I R R 32
n
t D
e Compress
r X (optional) XSR DX
n R
a
l
B CLKR
u CLKX
s
McBSP Control RCR SRGR
SPCR CLKS
Registers XCR PCR
EDMA FSR
FSX
T TO
Technical Training
Organization
FS
D a1 a0 b7 b6 b5 b4 b3 b2 b1 b0
Word
Bit
“Bit” - one data bit per SP clock period
“Word” or “channel” contains #bits
specified by WDLEN1 (8, 12, 16, 20, 24, 32)
Serial Port
SP Ctrl (SPCR) 7 5
Rcv Ctrl (RCR) RWDLEN1
Xmt Ctrl (XCR)
Rate (SRGR) 7 5
Pin Ctrl (PCR) XWDLEN1
FS
D w6 w7 w0 w1 w2 w3 w4 w5 w6 w7
Frame
Word
Their receive and transmit bit-clocks (CLKR, CLKX) can each be setup as either an input or
output pin.
FSR
Input or Output? FSX
CLKR
CLKX
When used as an output, the McBSP generated (CLKG) clock signal can either be divided down
from the C6000’s internal clock or from a separate external clock (CLKS) input.
Frame sync signals can also be generated or input into the McBSP. When generated, you can
define their period and pulse-width. Optionally, the FSX bit can be generated automatically any
time a value is written into the Transmit Serial Register (XSR).
The McBSP’s can generate CPU interrupts for a number of conditions (as shown on the next
page). In this workshop, we will only use (and study) one of these conditions: data ready.
When the receive channel has data ready to be written (i.e. data has moved from RBR to DRR), it
sets the RRDY bit in the Serial Port Control Register (SPCR). This bit can be used to generate an
interrupt to the CPU (RINTx) and/or a trigger event to the EDMA (REVTx).
Similarly, the transmit side of the serial port can set the XRDY bit in the control register and
generate the XINTx and XEVTx interrupt and event, respectively.
McBSP Events/Interrupts
R/XRDY displays “status” of ports:
0: not ready
RBR DRR 1: ready to read/write
It was useful for use to use these two terms in order to differentiate the destination of the various
synchronization signals. While the signals may be generated (and thus sent from) a common
source, the structures in the CPU and EDMA that deal with them are entirely separate & distinct.
As mentioned on the last page, the McBSP can generate CPU interrupts for various
conditions. Shown below are the conditions along with the bit fields used to select which
condition will be used to generate an interrupt.
XRDY
End of Block (XMT)
XINTM New FSX (frame begin)
XINT
Transmit Sync Error
Serial Port
SP Ctrl (SPCR)
Rcv Ctrl (RCR)
21 20 17 5 4 1
Xmt Ctrl (XCR)
Rate (SRGR)
XINTM XRDY RINTM RRDY
Pin Ctrl (PCR) RW R RW R
The EDMA, on the other hand, only receives data ready events (REVT, XEVT).
The EDMA events and CPU interrupts can work hand-in-hand, though. Normal data ready events can be serviced by
the EDMA, while CPU can be interrupted to handle error conditions that might occur.
The above channels with shown with their sync events was originally designed for the C6711.
The C6713, though, has many more peripherals and thus additional synchronization events. To
allow the 16 channel EDMA to accommodate a much larger number of event sources, you can
now configure the EDMA channels with whichever event source you prefer. This is done through
the memory-mapped EDMA event selector registers. Please refer to the C6713 data sheet
additional information.
The list above is the default values for the C6713 EDMA channels. Since the events we care
about in our lab exercises are on the above list, we won’t have to reconfigure the EDMA’s event
sources.
The C6416 also has a vast number of EDMA event sources. With 64 channels, though, there are
still more channels than there are sources. The next page shows the C6416 events and their
associated channels.
Included below is a page from the C6416 datasheet which lists the EDMA channel sync events.
Since an event source is going to send a signal whether you want the EDMA to respond or not,
the EER allows you to prevent the associated channel from running.
EDMA_setChannel(hMyChan) 1 EER... = 1
...
XEVT1 0 14
EER14 = 1 XEVT1
REVT1 0 15
EER15 = 0 REVT1
Hint: When you set an ER bit from the CPU (for example, when using the
EDMA_setChannel() function as we have been doing in our past two lab exercises), the
associated EER bit value is ignored. That is, manually setting a channel to run will occur
regardless of the value in EER.
AIC23 Codec
Control
Channel
Data
Channel
(Left, Right)
The DSK utilizes two McBSP’s to handle AIC23 setup and data transfers, respectively. While one McBSP
could be used to handle a single AIC23, it was easier (and saved a small amount of ‘glue’ logic) to use two
McBSP’s. Besides, the DSK has only one codec and the DSP’s have 2 or 3 McBSP’s.
McBSP1
Control
McBSP2
Data
The C6416 DSK was designed first. It utilized McBSP1 and McBSP2 for the codec interface.
When the C6713 DSK was designed, though, McBSP0 & McBSP1 had to be used since the
C6713 doesn’t have a McBSP2.
McBSP0
Control
McBSP1
Data
To initialize our data stream, we first initialize the McBSP, then use it to setup the codec.
T TO
Technical Training
Organization
McBSP Init
The McBSP’s can be initialized use CSL functions, definitions, and macros. The process is
similar to that of setting up the EDMA. Though, you’ll find the McBSP has more choices and
registers. (Good in that all these options belie its flexibility; less so in that you have to figure
them all out.)
One thing that differs between EDMA configuration and McBSP configuration is that the McBSP
configuration choices are directly related to what the port is connected to. On the DSK, this is the
AIC23 codec.
With the great flexibility of the McBSP, you can connect to a great many types of serial devices.
In each case, though, you will need to read and understand the data sheet of the device you are
connecting to and configure the McBSP accordingly. (This isn’t unlike the old days of using
computer modems. To connect to your bank, for example, you usually needed to know the proper
settings: bit size, parity, etc.)
The process of reading and deciphering a codec datasheet can be time consuming (and sometimes
difficult). Based on this, and the fact that all serial devices seem to work differently, we have
chosen not to spend the hours required for this process. Rather, we have provided the McBSP
settings provided by the DSK board manufacturer.
The McBSP settings provided by the DSK designers are used in the provided MCBSP_Config
structures. Still, you will get to write the remaining McBSP initialization code. Shown below are
the same six CSL steps we have been using to configure other peripherals.
1. McBSP Setup
1 #include <csl.h>
#include <csl_mcbsp.h>
2 MCBSP_Handle hMcbsp0;
3 MCBSP_Config mcbspCfgControl = {
0x00001000, // Serial Port Control Reg. (SPCR)
0x00000000, // Receiver Control Reg. (RCR)
0x00000040, // Transmitter Control Reg. (XCR)
0x20001363, // Sample-Rate Generator Reg. (SRGR)
0x00000000, // Multichannel Control Reg. (MCR)
0x00000000, // Receiver Channel Enable (RCER)
0x00000000, // Transmitter Channel Enable (XCER)
0x00000A0A // Pin Control Reg. (PCR)
};
void initMcBSP()
{
4 hMcbsp0 = MCBSP_open(MCBSP_DEV0, MCBSP_OPEN_RESET);
MCBSP_config(hMcbsp0, &mcbspCfgControl );
5
MCBSP_start (hMcbsp0, MCBSP_XMIT_START |
6 MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 100);
}
T TO
Technical Training
Organization
While your instructor won’t show the remaining three slides of the McBSP configuration, they
are provided for completeness.
T TO
Technical Training
Organization
These
These registers
registerscontrol
control the
the multi-channel
multi-channel
capabilities
capabilities of
of the
theMcBSP.
McBSP.
We
Wearen’t
aren’t using
usingthese
these features
featuresin
in our
ourlab
lab
exercises.
exercises.
T TO
Technical Training
Organization
T TO
Technical Training
Organization
Codec Initialization
Control McBSP AIC23
SPCR SRGR Codec
1. Setup McBSP RCR PCR
XCR MCR
The codec contains a number of control registers that need to be programmed. These registers
specify options for: input and output gain, codec loopback mode, sample frequency, bit-
resolution, etc.
Again, since a codec init routine is specific to a given codec, we have provided this routine for
you. From the diagram above, you can see the codec routine includes an initialization structure,
and a routine that sends the values via the McBSP to the codec control registers. You will find
this code in the codec.c file.
Of course, the upcoming lab uses the EDMA to perform the codec reads and writes. This is
common for most systems, and a good suggestion, since the EDMA can easily off-load this task
from the CPU.
Note: Not only do you save the CPU MIPs required to the do the reads/writes, but you also
minimize the cycles required by the CPU interrupt overhead.
When using the EDMA for McBSP reads/writes, there are a few changes that need to be made to
our previous EDMA initialization code. Here’s an example of using the EDMA channel for
McBSP transmit:
Lab 6
Lab 6 – Audio Pass Thru
McBSP EDMA CPU
Rcv gBufRcv
+
ADC RCVCHAN
Xmt COPY
gBufXmt
DAC XMTCHAN
Goals:
1. EDMA (RCV) copies values from DRR to gBufRcv
2. CPU copies gBufRcv to gBufXmt
3. EDMA (XMT) copies gBufXmt to DXR
4. Opt: add sine to gBufRcv based on DIP switch
T TO
Technical Training
Organization
In order to successfully complete this lab, we will need to make the following changes to our
code:
1. Change the buffer names so that they make more sense for an audio pass-through. We used
names that imply whether the buffer is being used for receive or transmit of the audio.
2. Write the code to initialize two McBSP's (one for control, and one for data).
3. Call a provided routine to initialize the AIC23 codec.
4. Change the transmit EDMA's setup to talk to the McBSP.
5. Add a receive EDMA channel to talk to the McBSP.
6. Modify the EDMA HWI that we have been using to respond to both transmit and receive
interrupts, and copy the data.
We have provided some paper exercises to help you along the way. Please use the exercises to
test your understanding of what you are doing in this lab. If you have any questions, please feel
free to ask your instructor.
As we discussed, the DSK uses two McBSP's to interface with the codec:
• One serial port to setup and configure the codec.
A global variable, called mcbspCfgControl, was created and initialized with the appropriate
bitfield values to send control register values to the AIC23.
• A second serial port to send and receive data to/from the codec.
The global variable mcbspCfgData contains the configuration values to setup the serial port
which reads/writes data to the AIC23.
When you want to use your modem to connect to your bank, first you must get the configuration
choices from them (e.g. 9600 baud, 8-bits, no-parity, etc.). Once you have this information, you
can configure the modem.
In the same fashion, once you know the configuration options required by the serial device you
are connecting the McBSP to, you can easily plug them in. Unfortunately, extracting the required
information from an analogue data converter datasheet is often not trivial. Ideally, we would have
enjoyed taking you through this process for the AIC23, but given the time constraints in the
workshop plus the fact that you most likely are using another converter (or if using the AIC23,
you can just use our code) we decided to provide these Config’s for you.
To make this all a little easier, we have provided a space for you to write your answers on paper,
before you try to write the code. You will need to refer back to the lecture material to figure out
exactly what to write. We have provided some hints to help you. These hints are the actual lab
steps that you will do to write the code inside CCS. Please write this code in the space provided
on the next page …
mcbsp.c
// ======== Include files ========
MCBSP_Config mcbspCfgData = {
Provided for you. See file for details.
};
// McBSP Handles
Hint: MCBSP_ hMcbspControl;
Control Data
6713 DSK McBSP0 McBSP1
6416 DSK McBSP1 McBSP2
MCBSP_start(hMcbspControl, MCBSP_XMIT_START |
MCBSP_SRGR_START | MCBSP_SRGR_FRAMESYNC, 100);
Note: This is all one line of code. Since it is so long we broke it up for you. The value 100 is the
sample rate generator delay. McBSP logic requires 2 SRGR clock cycles after enabling
the sample rate generator for its logic to stabilize. This parameter is used to provide the
appropriate delay.
if (MCBSP_rrdy(hMcbspData))
MCBSP_read(hMcbspData);
This code checks to see if there is anything in the register. If there is, it reads it and throws it
away.
17. Start the data serial port
We are using different pieces of the data serial port, so the code to start it is a little different:
To make sure that we understand the changes that we are making, let's do another paper exercise
before we write the code. Take a look at the following sheet and try to figure out what changes
will need to be made in order to configure the EDMA to exchange data with the McBSP, for both
receive and transmit.
edma.c
EDMA_Config gEdmaConfig = {
EDMA_OPT_RMK(
EDMA_OPT_PRI_LOW, // Priority
EDMA_OPT_ESIZE_16BIT, // Element size
EDMA_OPT_2DS_NO, // 2 dimensional source
EDMA_OPT_SUM_INC, // Src update mode
EDMA_OPT_2DD_NO, // 2 dimensional dest
EDMA_OPT_DUM_INC, // Dest update mode
EDMA_OPT_TCINT_YES, // Cause EDMA interrupt
Hint: EDMA_OPT_TCC_OF(0), // Transfer Complete Code
Step 24 EDMA_OPT_TCCM_DEFAULT, // TCC Upper Bits (c64x only)
EDMA_OPT_ATCINT_DEFAULT, // Alternate TCC Interrupt (c64x only)
EDMA_OPT_ATCC_DEFAULT, // Alternate TCC (c64x only)
EDMA_OPT_PDTS_DEFAULT, // PDT Source (c64x only)
EDMA_OPT_PDTD_DEFAULT, // PDT Dest (c64x only)
EDMA_OPT_LINK_NO, // Enable link parameters
EDMA_OPT_FS_YES // Use frame sync
),
EDMA_SRC_OF(gBuf0), // src address
EDMA_CNT_OF(BUFFSIZE), // Count = buffer size
EDMA_DST_OF(gBuf1), // dest address
EDMA_IDX_OF(0), // frame/element index value
EDMA_RLD_OF(0) // reload
}; gEdmaConfigXmt
already exists, copy
EDMA_Config gEdmaConfigXmt = { it to create
EDMA_OPT_RMK( gEdmaConfigRcv
EDMA_OPT_PRI_LOW, // Priority
EDMA_OPT_ESIZE_16BIT, // Element size
EDMA_OPT_2DS_NO, // 2 dimensional source
EDMA_OPT_SUM_INC, // Src update mode
EDMA_OPT_2DD_NO, // 2 dimensional dest
EDMA_OPT_DUM_INC, // Dest update mode
EDMA_OPT_TCINT_YES, // Cause EDMA interrupt
Hint: EDMA_OPT_TCC_OF(0), // Transfer Complete Code
EDMA_OPT_TCCM_DEFAULT, // TCC Upper Bits (c64x only)
Step 23
EDMA_OPT_ATCINT_DEFAULT, // Alternate TCC Interrupt (c64x only)
EDMA_OPT_ATCC_DEFAULT, // Alternate TCC (c64x only)
EDMA_OPT_PDTS_DEFAULT, // PDT Source (c64x only)
EDMA_OPT_PDTD_DEFAULT, // PDT Dest (c64x only)
EDMA_OPT_LINK_NO, // Enable link parameters
EDMA_OPT_FS_YES // Use frame sync
),
EDMA_SRC_OF(gBuf0), // src address
EDMA_CNT_OF(BUFFSIZE), // Count = buffer size
EDMA_DST_OF(gBuf1), // dest address
EDMA_IDX_OF(0), // frame/element index value
EDMA_RLD_OF(0) // reload
};
Note: Refer to the lab diagram and draw notes on that diagram to help you gain a mental image
of what is going on in the lab. This will help drive a better understanding of the necessary
steps to get the lab working.
Instructions
Here is another exercise to help you understand the changes that you need to make to your code.
The opposite page is basically a picture of what your initEdma( ) function will look like if you
take the code that we have already written for Lab 5 and modify it to create a Receive channel
and communicate with the McBSP. We've already copied the code to create the Receive EDMA
channel for you, like we did with the structures earlier. But, we haven't made all of the changes
the you will need to make. We did change the comments for you if you need some help.
So, take a few minutes and try to make all of the necessary changes to the code. We've already
made a few of them for you so that you have an idea of what we are looking for. If you need
some help, use the hints provided to refer to the actual lab steps that will help you write the code
in CCS.
Step 27
// configure the transmit channel with the correct structure
EDMA_config(hEdmaXmt, &gEdmaConfigXmt);
Note: The EDMA_enableChannel() API enables the specified channel using the channel's
handle obtained through the _open API. It does not tell the channel to start transferring.
In this lab, we accomplish that by using a sync event.
Note: We need to make sure BOTH interrupts occur. If only one has triggered, the ISR does
nothing but return to the while loop and wait for the 2nd one to trigger.
Source Files
main.c edma.c mcbsp.c
<csl.h> <csl.h> <csl.h>
<csl_edma.h> <csl_edma.h> <csl_mcbsp.h>
"sine.h" "codec.h"
Header
<csl_irq.h> "mcbsp.h"
Files
"sine.h"
"edma.h"
"mcbsp.h"
Don't forget that ordering is also important with header files. For example, csl.h needs to be
included before any files that are dependent on it, csl_edma.h, csl_mcbsp.h, or anything else
starting with csl_*.h.
41. Build the project and load it to the DSK
42. Run the code
You should hear audio playing from your speakers or headphones. If there is any distortion,
adjust the volume level on your PC. If you get noise, go back and debug your code. Follow
the data from the input/receive side to the output/transmit side. If your audio doesn’t sound
good, try to find the error. If, after 5 minutes, you’re stuck, compare the solution to your code
and fix the error. Sometimes copying in the mcbsp configuration from the solution helps.If
you get frustrated, ask your instructor for help.
43. Halt the processor
Part A
Note: If you had troubles getting Lab 6 to work, copy the files from \solutions\lab6 and begin
working on the next step shown below.
Note: We used a lower sampling rate in the earlier labs so that the graphs would look better.
Otherwise, you would not see a full cycle of the sine wave.
You’re done
Optional Topics
DMA vs EDMA: Event Synchronization
16-bit Pixels DMA Synchronization
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18 DMA D/A
19 20 21 22 23 24
25 26 27 28 29 30 EXT_INT4 “Next”
31 32 33 34 35 36
(Src: mem_8) Is the DAC as fast as the EDMA?
No, the EDMA needs to be sync’d up to the DAC.
Unlike the EDMA, any DMA channel can be sync’d
to and EDMA event.
DMA Sync Events
00000 None (default)
00001 TINT0
00010 TINT1
DMA 00100 EXT_INT4
Primary Ctrl ... (see periph guide)
Secondary Ctrl
Source 23 19 13 9 8 7 6 5 4 1 0
Destination WSYNC RSYNC INDEX ESIZE DSTDIR
DSTDIR SRCDIR START
00100 00
Xfr Count
T TO
Technical Training
Organization
Frame Synchronization
FS
FS (Frame
(Frame Sync)
Sync)
0:
0: NO
NO (no
(no Frame
Frame Sync)
Sync)
1:
1: YES
YES (use
(use Frame
Frame Sync)
Sync)
Move
Move whole
whole frame
frame on
on
sync event
sync event
26 23 19 18 14 1 0
FS WSYNC RSYNC START
T TO
Technical Training
Organization
4 addresses are needed when handling receive & transmit parts of a serial port,
unfortunately the DMA only has two address registers. This is solved by:
1. Select SPLIT mode in Primary Control Register
2. Source/Destination registers contain the From/To memory addresses
3. Use global reg (A, B, or C) for address of McBSP’s DRR register.
DMA split mode knows to find the DXR address in the next word location.
11 10
DMA SPLIT
Primary Ctrl 01
Secondary Ctrl
Split Mode: 00 Split Disabled
Source
01 Use Global Address Reg A
Destination
10 Use Global Address Reg B
T TO Xfr Count 11 Use Global Address Reg C
Technical Training
Organization
element element
Sync frame frame
2D (block)
Any channel
Sync Events Each channel has specific event
can use any event
CPU Interrupts 4 1
Interrupt six: 3 for Count
Count = 0
Conditions 3 for errors
Reload (Auto-Init) ~2 69 21
Introduction
In this module, we will consider the steps required to select and apply TI Analog components to
your TI DSP system.
Objectives
At the conclusion of this module you should be able to:
• List various families of TI Analog that relate to DSP systems
• Demonstrate how to find information on TI Analog components
• List key and additional selection criteria for an ADC converter
• Identify challenges in adding peripherals to a DSP design
• Identify TI support to meet above design challenges
• Describe the types of Analog EVMs available from TI
• Create driver code with the Data Converter Plug-In
• Apply Plug-in generated code to a given system
Module Topics
Analog Interfacing..................................................................................................................................... 6-1
TI Analog Portfolio
OP-AMPs/Comparators/Support
TI Analog
- High Speed Amplifiers
- Low Power, RRIO Signal Amps
Data Trans
Data Trans
- Instrumentation Amps Another
STANDARDS
- Audio Power Amps DATA system/
RS232
subsystem/
- Power Amps TRANSMISSION etc. RS422
- Commodity Amps RS485
- Comparators Data Transmission LVDS
- Temp Sensors - Many standards 1394/Firewire
- References USB
- Special Functions (Codec) - SERDES
PCI
CAN
SONET
DAC Digital Gigabit Ethernet
MSP430
RF GTL, BTL, etc.
TI DSP
etc RF (or Wireless)
ADC POWER
SOLUTION
Clocking
Data Converter Power
Solution - Power Modules
-Standard A/D and D/A
- High Resolution/Precision converters Clocks - Linear Regulators/ LDOs
- High Speed converters • Clock Buffer & fanouts - DC-DC controllers
- Touchscreen controllers • PLL based buffers & fanouts - PFC
- μ-Law/A-Law Telecom “SLAC”s • Multipliers & Dividers - Load Share
- Communication, Video, & Ultrasound • Jitter cleaners & synchronizers - Battery Management
optimized converters/codecs • Memory specific solutions - Charge Pumps & Boost Converters
- Audio & Voice band converters/Codecs • Synthesizers - Supervisory Circuits
- Industrial converters • Real Time Clocks - Power Distribution/Hotswap
- References
Interfacing TI DSP to TI Analog 4
TI has long been a leader in the development and production of analog ICs. With the recent
acquisitions of Burr Brown, Power Trends, and Unitrode, TI’s position as the world leader in the
sale of analog ICs, placing in the first three positions in all major market segments demonstrates
that TI is a good place to start when looking for analog ICs to round out a DSP based system.
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 5
Getting Information
http://analog.ti.com
Booklet
Booklet::
SSDV004N
SSDV004N
DSP
DSPSelection
SelectionGuide
Guide
Interfacing TI DSP to TI Analog 8
From the home screen of the TI Analog web page, click on the element of interest and begin
exploring the devices offered to best meet your needs. Also on this site is a wealth of support,
from data sheets and app notes, to software development tools to help get the job done.
9Mostcontain
9Most containdownloadable
downloadablesoftware
software
examples
examplesfor
foruse
usewith
withCCS
CCSororEmbedded
Embedded
Workbench!
Workbench!
9Clickon
9Click on“Application
“ApplicationNotes”
Notes”from
fromthe
the
Product
ProductFolder
Folderfor
forlinks
linkstotospecific
specificdevices
devices
The Amplifier Design Utilities and FilterPro Design Tool allow for the creation of analog front
end circuitry. Filter Pro can design Butterworth, Sallen-Key and Chebychev filters. It will select
component values and provide frequency response plots and print schematics.
SWIFT supports selection/design of TI power devices, providing values for capacitors, resistors
and inductors based on the input parameters and analysis plots of current and voltage ripple of the
design. The I-to-V tool is for use with current output DACS, helping in op amp selection and
showing what effect the op amp they choose for doing I-to-V conversion has on DAC response.
TI Data Converters
Application Areas for TI Data Converters
High Speed Comm / High Precision Industrial Control /
Ultrasound Measurement Instrumentation
Pipeline ADCs Over Sampling ? S ADCs SAR ADCs
Current Steering DACs Precision ADCs High Speed
Micro Systems Low Power
High Speed ADCs Simultaneous Sampling
Audio
Current Input ADC’s
Voiceband Codecs ¾ Bipolar
Consumer ¾ Data Acquisition Systems
Professional Audio Touch-Screen String / R2R DACs
Controller Single Supply
Stand-Alone Monitor & Control
Embedded
Intelligent ¾ Dual Supply
High Perf. DSP
Portable / Low Power
Integrated Audio
Micro Systems
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 15
TI data converters are made in numerous technologies and are applicable to a wide variety of end
equipments.
TI ADC Technologies
ADS1625
18 bit Delta Sigma
1.25 MSPS - Fastest on the market
(averages and filters out noise)
ADS1605
16 bit Delta Sigma
5 MSPS ADS8411
16 bit
24 Cur 2 MSPS
ren Market Leader
t Te ADS5500
chn 14 bit
olo
Converter Resolution
16
SAR
Successive
12 Approximation Pipeline
T TO Conversion Rate
Technical Training
Organization Interfacing TI DSP to TI Analog 16
TI DAC Technologies
Industrial
Settling Time (µs)
Number of Out put DACs
Resistor String – Inexpensive
Instrumentation & Measurement
R-2R – More accurate -Trimmed at final test
Typically for Calibration
Typically Voltage out
20 MDAC’s coming (dig control gain/atten, Waveform gen.)
Curr
ent
Tec High Speed Video and Communication
hno
logy Update rate (MSPS)
ΔΣ
Converter Resolution
8
Steering
TI Data Converters
DACs
DACs––Delta
DeltaSigma
Sigma
High Resolution/Accuracy
DAC122X ADCs
ADCs––Delta
DeltaSigma
Sigma
ADCs
ADCs––SAR
SAR
High Precision High Precision Low bandwidth
Medical, Industrial Control, High Bandwidth
Data Acquisition Intelligent / high resolution
Simultaneous sampling 8051 core
Motor control
Touch
TouchScreen
ScreenControllers
Controllers
DACs
DACs––String
String/ /R2R
R2R
Low power, Single and Stand Alone Controllers
bipolar Suppy, Precision Integrated Audio Controllers
Audio
Audio ADCs
ADCs––Pipeline
Pipeline
Consumer Codecs, ADC/DAC Versatile, High Speed
Voice A/C Codecs Communication, Imaging,
Pro audio DACs, ADCs Ultrasound
T TO
Technical Training
Organization PGAs, SRCs, DITs 18
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 20
As an example, assume a given application required 16-bit samples at a 200 kHz rate. The codec
on the DSK cannot meet this requirement. Via the TI web page, the optimal ADC can be selected
based on a wide range of criteria. Here, the ADS8361 is chosen, since it is supported by an EVM
and the Data Converter Plug-in tool.
ADS8361
from : http://focus.ti.com/docs/prod/folders/print/ads8361.html
Resolution (Bits) 16
Sample Rate (max) 500 KSPS
Search Sample Rate (Max) (SPS) 500000
# Input Channels (Diff) 4
Power Consumption (Typ) (mW) 150
SNR (dB) 83
SFDR (dB) 94
DNL (Max) (+/-LSB) 1.5
INL (Max) (+/-LSB) 4
INL (+/- %) (Max) 0.00375
No Missing Codes (Bits) 14
Analog Voltage AV/DD (Min/Max) (V) 4.75 / 5.25
Logic Voltage DV/DD (Min / Max) (V) 2.7 / 5.5
Input Type Voltage
Input Configuration Range +/-2.5 V at 2.5
No. of Supplies 2
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 21
Development Challenges
Design Flow…
Product Selection
Key specifications (speed, resolution, …)
Secondary parameters (power, size, price, channels, …)
Research data base of candidate devices
Additional factors: ease of use, cost/value
Hardware Design
ADC / DAC pins, requirements
DSP pin matchup
Layout considerations (noise, supply requirements, etc
Software Authoring
Configuring the (serial) port
Configuring the peripheral
Getting/sending data from/to the peripheral
How? Write it yourself or with the help of an authoring tool…
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 23
As seen, the TI website facilitates the process of device selection. Next in the design effort is
hardware design, which TI facilitates with Analog EVMs, which provide a pre-built board for
test, and all artwork and bill of materials for production. Lastly, the DC Plug-in was developed to
aid in the otherwise difficult process of programming the port and peripheral to the desired mode.
Debug CCS
Observe / verify performance
Modify design as required
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 24
Analog EVMs
Signal Chain Prototyping System
TI Analog EVMs support a wide range of processors. The 5-6K Interface Board adapts TI DSP
DSKs to the A-EVM footprint. Two serial ports and the parallel bus can interface with the EVMs,
several of which can be populated on the IF card to experiment with a number of analog
implementations quickly and easily.
Analog EVMs
5-6K Interface Board
Compatible with TMS320 C5000 and C6000 series DSP starter kits
Supports parallel EVM’s up to 24 bits
Allows multiple clock sources for parallel/Serial converters
Supports two independent McBSP channels
Provides complete signal chain prototyping opportunities
Data Converter EVMs
3 standardized daughter card format (2 serial, 1 parallel)
Serial – support for SPI, McBSP, I2C; 1-16 I/O channels
Connects to (nearly) any control system
Stackable
Third Party Interface Boards
Avnet, SoftBaugh, Spectrum Digital, Insight - Memec Design …
Analog Interface Boards
Bipolar and single supply
In development – differential amps, instrumentation amps, active filters
$50 each!
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 27
The Data Converter Plug-in (DCP) greatly reduces the time and effort required to program a wide
variety of DSP ports and analog peripherals. The plug-in can be downloaded (free of charge)
from: http://www.ti.com/sc/dcplug-in.
The DCP presents simple selections for the engineer to make, indicating the desired properties of
the processor, port, and converter. The DCP then authors the code to implement the selections
specified.
“API” file
prototypes the 6
functions
generated by the
DCPin tool
Object file
implements all
device coding
and creates
structures that
manage the
behavior of the
device
The DCP generates a set of files that can be added to a given CCS project, as defined below. All
are in full source so they can be inspected and modified by the user as desired.
All objects created with the Data Converter Plug-In share these six API
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 35
All drivers produced by the DCP support an identical set of API as seen above. Below are the
object structures of the instance of the 8361 just created, typical of objects created by the DCP.
To interact with the object, a handle should be created, as seen in the code excerpt below:
*** this page is blank…so why are you staring at it? ***
• Of the three devices shown, the 8361 is the one which will operate to 200KSPS; click on
the 8361 link to learn more about this device
pwr
usb DSK
pwr
• Attach the 5-6K Interface Board to the DSK. Note the mating connectors on the right of
the DSK, and those on the bottom of the Interface Board. Carefully align these two and
press them together gently until they are fully connected.
• Attach the ADC8361 EVM to the Interface Board. As per the diagram above, carefully
align the pins beneath the 8361 EVM with the headers on the Interface board; gently
press the boards together until fully connected (might already be connected).
• Attach the Amplifier EVM to the Interface Board. Similarly, add the Amplifier EVM to
the system. This EVM will perform pre-amplification and signal conditioning for the
8361 (might already be connected).
To change the read address, look for the line that begins with gEdmaConfigRcv.src and
change its argument from:
hMcbspData to hADC->serial->hMcbsp.
To use the above argument, the handle must be declared and initialized at the start of the
initEdma function by adding the following two lines after the function’s opening brace:
TADS8361 * hADC;
hADC = &Ads8361_1;
To make the data type above known to this file, add the following line to the inclusions in:
#include "t8361_fn.h"
Time permitting, peruse the DCP files to note the declaration of the TADS8361 type and the
creation of the structure at address Ads8361_1
22. Finally, save the modified files and rebuild the project. A handful of warnings will be
generated (the libraries are being revised to eliminate them). Just ignore the warning(s).
Conclusions
Conclusions on TI DSP + TI Analog …
TI offers a large number of low cost analog
EVMs to allow developers to ‘snap together’
a signal chain for ultra-fast test and debug
of proposed components
TI provides CSL and Data Converter Plug-In
to vastly reduce the effort in getting a DSP
to talk to ports and peripherals
Getting to ‘signs of life’ result is now a
matter of minutes instead of days/weeks
Final tuning will sometimes be required, but
amounts to a manageable effort with a
device already easily observed, rather than
‘groping in the dark’ as often was the case
otherwise
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 43
Additional Information
Driver Object Details
t8361_ob.c code to implement the DC API, eg: read fn
long ads8361_read(void *pDC) prototype of the DC API
{
TADS8361 *pADS = pDC; get handle to object
if (!pADS) return; parameter check
if (pADS->iXferInProgress) return; verify no bk op in progress
while (!MCBSP_rrdy(pADS->serial->hMcbsp)); actual SP ops use CSL API
return MCBSP_read(pADS->serial->hMcbsp); when SP ready, return data rcvd
} spin loop – oops ! !
t8361_ob.c make & fill instance obj t8361_ob.c define instance object type
TADS8361 Ads8361_1 = {
typedef struct {
&ads8361_configure,
&ads8361_power, TTIDC f; // std DC API
&ads8361_read, void (*CallBack)(void *);
&ads8361_write, DCP_SERIAL *serial;
&ads8361_rblock, int iMode;
&ads8361_wblock,
0, 0, 0, 0, 0, int* Buffer;
&serial0, unsigned long ulBuffSize;
ADC1_MODE, volatile int iXferInProgress;
0, 0, 0 } TADS8361;
};
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 45
These slides depict parts of the code generated by the DC Plug-in that relate to the DC object
structures. Above is the code to implement one DC API, and how its name is loaded into the
function table portion on the 1st level structure. Below are the typedefs for the remaining
structures, as well as another portion of the definition of the 1st level structure.
Structure Definitions
from TIDC_API.h
typedef struct {
unsigned int port; Number of serial port used
unsigned short intnum; Which interrupt driver uses
MCBSP_HANDLE hMcbsp; Serial port handle (CSL)
MCBSP_CONFIG sConfig; Ptr to CSL ser pt config struc
} DCP_SERIAL;
from csl_mcbsp.h
typedef struct {
Uint32 allocated; Is port available?
Uint32 xmtEventId; Which ints port will use
Uint32 rcvEventId;
volatile Uint32 *baseAddr; Address of port registers
Uint32 drrAddr; *Data receive register
Uint32 dxrAddr; *Data transmit register
} MCBSP_Obj, *MCBSP_Handle;
typedef
typedefstruct
struct{ {
TTIDCSTATUS
TTIDCSTATUS(*configure)
(*configure)(void
(void*pDc);
*pDc); from TIDC_API.h
void (*power) (void *pDc, int bDown);
void (*power) (void *pDc, int bDown);
long
long(*read)
(*read)(void(void*pDc);
*pDc);
void
void (*write) (void*pDc,
(*write) (void *pDc,long
longlData);
lData);
void
void(*rblock)
(*rblock)(void(void*pDC,
*pDC,voidvoid*pData,
*pData,unsigned
unsignedlong
longulCount,
ulCount,void
void(*callback)
(*callback)(void
(void*));
*));
void
void (*wblock) (void *pDC, void *pData, unsigned long ulCount, void (*callback) (void*));
(*wblock) (void *pDC, void *pData, unsigned long ulCount, void (*callback) (void *));
void*
void*reserved[4];
reserved[4];
T TO }} TTIDC;
Technical Training
Organization
TTIDC; Interfacing TI DSP to TI Analog 46
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 47
New analog design tools are in development at TI, to be available on the website soon. Examples
include the OpAmpPro and Tina, as described above. The diagram below demonstrates the kind
of circuit TINA can help users generate.
ADS8325 16-bit
++
+
Vinput
R3 100k R5 40k ADS Reference
+
Vinput Vreference 5
R4 40k
C4 100p
+
Vcommon-mode
T TO
Technical Training
Organization Interfacing TI DSP to TI Analog 48
Introduction
In this chapter we are going to explore how to use a very powerful feature of the EDMA called
Channel Sorting. We are going to start with the code that we wrote in the previous chapters and
see how to use some of the other capabilities of the EDMA to sort data. These capabilities can be
used for many other types of transfers, as we will see.
Outline
Outline
Background: More EDMA Examples
Packed Data vs Sorted Data
EDMA Channel Sorting
Counter Reload
Channel Sorting Procedure
Using BSL
Exercise
Lab 7
Chapter Topics
Channel Sorting with the EDMA............................................................................................................. 7-1
Here is the same type of example using the indexing capability of the EDMA.
As you can see, we simply change the update mode of the source to use and index, and fill in the
index register with the appropriate value. Note that this value is in bytes.
We used an element index above. To move blocks of data, you may need a frame index as well.
The frame index allows you to modify the address after each frame. This capability is one of the
primary enablers to channel sorting with the EDMA.
Here's a more detailed explanation of how to calculate the frame index. One important thing to
remember is that the index register treats everything as bytes.
1 2 3 4 6
16-bits
16-bit Pixels
7 8 9FRAME 110 11 12
1 2 3
4 13 14 FRAME
15 216 17 18
5 6
RBR
DRR
mode RSR L
transfers
R R
L
R
L
After A/D conversion, the AIC23 shifts
R
out data from alternating channels:
Left, then …
Right L
R
This leaves data packed in memory.
(You might also say it’s interleaved in memory.)
The AIC23 codec has been sending us packed data up until this point.
Sorted data is separated out into buffers which contain data for only one channel. So, you would
have one buffer full of left data, and one buffer full of right data. Are there any advantages to this
approach? Most people would say yes. When the data is sorted, you can write your algorithms so
that they simply process a buffer. If you want to add another channel, you simply call the
algorithm again with a new buffer of data. If the data is packed, the algorithm would have to be
specific to the way the data is organized, and therefore less flexible.
RBR
DRR
mode RSR
transfers
R
L R
Sorting data splits data up by Left or L R
Right channel L R
Often, this is called Channel Sorting … …
Given the advantages of sorted data, how do we do it efficiently? We could do it with the CPU,
but that takes valuable time.
RBR
DRR
mode RSR EDMA
transfers
R
L R
You could use the CPU to sort data, L R
L R
or
… …
It is more efficient to use the EDMA
to sort the data
Why not do it with the EDMA as it is moving the data from the serial port? It has to do this
anyway, and it doesn't take any time away from the CPU. So, how do we set this up?
Frame # 1
Frame Element
Given:
Two buffers: Left, Right Count 9 (=10-1) 2
Buffers each 10 elements long
ESIZE = 16-bits Source McBSP
EDMA setup: Destination Left
To sort L/R data, we need to
set up EDMA with 10 frames,
each with 2 elements
In the example above, there are 2 channels of data and we want to grab 10 samples from each
channel. So, we have 10 frames of 2 elements each.
Now we need to figure out how to modify the addresses after each transfer. If each element is 2
bytes wide, how many bytes do we need to add to the address after transferring the first element
to transfer the second to the right place?
2 bytes
Frame Element
Given:
Two buffers: Left, Right Count 9 (=10-1) 2
Buffers each 10 elements long Index 20
ESIZE = 16-bits Source McBSP
EDMA setup: Destination Left
To sort L/R data, we need to
set up EDMA with 10 frames,
After EDMA writes Left[1]
each with 2 elements how many bytes must be skipped to Right[1]
Well, if there are 10 2 byte elements, we need to add 20 bytes. Take a closer look at the example
above. When we write the first element to the Left channel, we need to move down to the first
element of the Right channel. If the address of the first element in the Left channel is 0 and it has
10 2 byte elements, then the address of the first element of the Right channel is 20 (don't forget
that addresses on the C6000 are in bytes). So, we need to skip from 0 to 20 between elements in a
frame. That's why the element index above is set to 20.
Now the question becomes, what do we need to do to the addresses after we transfer the first
element of the Right channel? We need to go back up in memory to the second element of the
Left channel. After each frame, we need to go back up. How can we do this?
2 bytes
Frame 2
Frame Element
Given:
Two buffers: Left, Right Count 9 (=10-1) 2
Buffers each 10 elements long Index -18 20
ESIZE = 16-bits Source McBSP
EDMA setup: Destination Left
To sort L/R data, we need to
set up EDMA with 10 frames,
each with 2 elements How many bytes to go back to Left[2]?
We can use the frame index to move us back to the Left channel. So, if the starting address of the
Right channel is 20, and the second element of the Left channel is at 2, we need to go back (the
value is negative) by 18.
Here's a summary of the values and how we got to them. Don't forget that the addresses have to
be normalized to bytes before the indexes are calculated.
EDMA
1
Forward 10 to 1 Back
Back99to
to
next
nextframe
frame
next element
2 bytes
Counter Reload
When the EDMA transfers a frame of data, the element count goes to 0. It needs a place to
remember how many elements are in a frame. In this topic, we'll look at how this is done.
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M
Left:
P A
Right:
Frame Element
Count 9 2
Index -18 20
Count Reload link
Source McBSP
Destination Left
Notice how the element count goes to 1 after the first transfer.
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1
Left:
P A
Right:
Frame Element
Count 9 1
Index -18 20
Count Reload link
Source McBSP
Destination Left
After the second transfer (or the last element transfer in a frame) the element count sits at 0.
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1
Left:
P A
Right: 1
Frame Element
What happens when the element
count goes to zero? Count 9 0
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
When setting up the EDMA transfer parameters, the "Count Reload" field can be set to the same
value as the original element count. Then the element count can be reloaded before the next frame
transfer. This allows the EDMA to keep up with the number of elements in each frame.
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1
Left:
P A
Right: 1
Frame Element
What happens when the element
count goes to zero? Count 8 2
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
This process of reloading the element count after each frame is transferred repeats over and over
until the frame count goes to 0.
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1 2
Left:
P A
Right: 1
Frame Element
What happens when the element
count goes to zero? Count 8 1
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1 2
Left:
P A
Right: 1 2
Frame Element
What happens when the element
count goes to zero? Count 8 0
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1 2
Left:
P A
Right: 1 2
Frame Element
What happens when the element
count goes to zero? Count 7 2
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
Counter Reload
1 2 3 4 5 6 7 8 9 10
M E
c D
B
S M 1 2 3
Left:
P A
Right: 1 2
Frame Element
What happens when the element
count goes to zero? Count 7 1
Index -18 20
There’s a register for this Count Reload 2 link
Source McBSP
Destination Left
Source:
Transfer Count: BUFFSIZE - 1 # of Buffers = 2
From
From our
our previous
previous “How
“How Sorting
Sorting Works”
Works” example:
example:
BUFFSIZE
BUFFSIZE == 10
10
NBYTES
NBYTES == 22
Therefore:
Therefore:
Elem
Elem Count
Count == 22
Frame
Frame Count
Count == 1010 –– 11 == 99
Element
Element Idx
Idx == 1010 ** 22 == 20
20
Frame
Frame Idx
Idx == -(10*2)
-(10*2) ++ 2=
2= -18
-18
Note: For the channel sorting configuration described here to work properly, the two buffers
must be aligned properly and contiguous in memory. In ANSI C, declaring two arrays
one after the other does not necessarily guarantee they will be contiguous, though if you
look at the map file created during the lab exercises, you will see that by "luck" they are
contiguous.
Here are the three quick steps necessary to use a module in the BSL.
Note: If you’re using the 6416 DSK, just change 6713 to 6416.
Exercise
Exercise: Background
Update the destination EDMA configuration for channel sorting.
This exercise should take 10 minutes.
These are the data declarations and references used in Lab 7:
// ======== Declarations ========
#define BUFFSIZE 32
// ======== References ========
extern short gBufRcvL[BUFFSIZE];
extern short gBufRcvR[BUFFSIZE]; Buffers for our Left
extern short gBufXmtL[BUFFSIZE]; and Right channels
extern short gBufXmtR[BUFFSIZE];
extern SINE_Obj sineObjL;
extern SINE_Obj sineObjR;
// ======== Global Variables ========
EDMA_Handle hEdmaRcv;
EDMA_Handle hEdmaReloadRcv;
EDMA_Handle hEdmaXmt;
EDMA_Handle hEdmaReloadXmt;
short gXmtTCC;
short gRcvTCC;
Exercise: Step 1
Modify the configuration from our previous lab exercise:
EDMA_Config gEdmaConfigRcv = {
EDMA_OPT_RMK(
EDMA_OPT_PRI_LOW, // Priority?
EDMA_OPT_ESIZE_16BIT, // Element size?
EDMA_OPT_2DS_NO, // 2 dimensional source?
EDMA_OPT_SUM_NONE, // Src update mode?
EDMA_OPT_2DD_NO, // 2 dimensional dest?
EDMA_OPT_DUM_INC, // Dest update mode?
EDMA_OPT_TCINT_YES, // Cause EDMA interrupt?
EDMA_OPT_TCC_OF(0), // Transfer complete code?
EDMA_OPT_LINK_YES, // Enable link parameters?
EDMA_OPT_FS_NO // Use frame sync?
),
...
Exercise: Step 4
Using the declarations and variables from the previous slide, fill in the
correct values. Use the symbol BUFFSIZE rather than just the value,
in case we change the buffer size later.
Refer back to page 7-17 for a hints on how to fill in the blanks.
Exercise: Step 5
5 Element Reload:
EDMA_RLD_RMK(
// Number of elements, should be the same as Element Count
EDMA_RLD_ELERLD_OF( ),
// We’ll replace “0” later using EDMA_link()
EDMA_RLD_LINK_OF(0)
)
Exercise: Step 6
Complete the “if” condition below using BSL:
If DIP switch 0 is on (down), then add sine-wave values to the
Left and Right receive buffers
if ( )
{
SINE_add(&sineObjL, gBufRcvL, BUFFSIZE);
SINE_add(&sineObjR, gBufRcvR, BUFFSIZE);
}
Lab 7
In this lab, we are going to set up the EDMA to sort the packed left/right data stream into separate
buffers of all left data and right data.
Rcv L
+
ADC RCVCHAN gBufRcv
COPY
Xmt R
L
CPU
Open Audioapp.pjt
2. Reset the DSK and start CCS
3. Open audioapp.pjt
Modify Buffers
We currently have a receive and transmit buffer for the packed left/right data. In order to sort
this data into separate buffers of left data and right data, we need to add two new buffers. We
will use the current buffers for the left channel, and the two new buffers for the right channel.
4. In main.c, create a new receive buffer
Find the place where we create the two current buffers. Copy and paste the gBufRcv buffer.
Make sure to paste it immediately below itself.
5. Rename the buffers
Name the first receive buffer, gBufRcvL, and the second gBufRcvR.
Note: The order in which the buffers are declared is important. The XmtL/XmtR buffers need
to be declared together (left, the right) followed by the Rcv buffers (L then R) AND be
contiguous.
EDMA_CNT_RMK(
EDMA_CNT_FRMCNT_OF(),
EDMA_CNT_ELECNT_OF()
),
This macro will build the correct values and put them in the right place in the register.
Hint: Don't forget that the value that goes in the FRMCNT field is supposed to be
NUMFRAMES – 1.
Hint: Refer back to the discussion material to help you figure out what these values should be.
Don't forget that the constant BUFFSIZE represents the number of elements per buffer.
6. The last modification that we need to make is to the RLD register. Since we are doing a
synchronized, frame indexed transfer, we need to fill in the element count reload field of
the RLD register. You'll need to use an RMK macro again like you did before and here
are the fields:
• ELERLD - The number that you would like reloaded into the element count field
after each frame completes.
• LINK - The set of reload registers to link to. We do this in code later.
11. Apply EDMA configuration changes to the transmit side.
Does the transmit side get the same changes as the receive side?
_____________________________________________________________________
Apply any changes that you feel need to be applied to the transmit side (very few).
12. Build your code and fix any errors. If you get a clean build, move on.
Run Audio
18. Run the audio
Make sure that the audio on the computer or whatever source you are using is still playing.
Part A
Note: If you had troubles getting Lab 7 to work, copy the files from \solutions for c64x\lab7 or
\solutions for c67x\lab7 and begin working on the next step shown below.
or
C:\CCStudio_v3.1\c6000\dsk6713\lib\dsk6713bsl.lib
if (DSK6416_DIP_get(0) == 0) or if (DSK6713_DIP_get(0) == 0)
SINE_add(…) SINE_add(…)
There are 4 dip switches on the DSK (near the LEDs). _0 is the switch farthest away from the
LEDs. DIP_get simply reads the position: up is 1, down is 0. Using BSL is a quick way to
add functionality to the DSK board without writing your own routines.
27. Add search path for BSL libraries
In order for CCS to find the BSL libraries, we need to add a search path. Under Project ->
Build Options, click on the Preprocessor category and add the following include search path:
c:\ccstudio_v3.1\c6000\dsk6416\include -or- c:\ccstudio_v3.1\c6000\dsk6713\include
28. Build, Run, Debug
29. Try switching the sine wave on and off…
28. Copy project to preserve your solution.
Using Windows Explorer, copy the contents of:
c:\iw6000\labs\audioapp\*.* TO c:\iw6000\labs\lab7
You’re done
Multi-channel Operation
F
r
a Frame 3 Frame 2 Frame 1
m 4 3 2 1 4 3 2 1 4 3 2 1
e
r Memory
1
3
M
c
1
B 3
S ..
Allows multiple channels (words) to be P .
independently selected for transmit and 1
receive
3
Combined with the DMA’s flexibility ...
F
r
a Frame 3 Frame 2 Frame 1
m 4 3 2 1 4 3 2 1 4 3 2 1
e
r Memory
1
1
M
c
E 1
D ..
B
S
M .
A
P 3
EDMA’s flexible (indexed) addressing 3
allows it to sort each channel into
separate buffers! 3
Discussion Solutions
Indexed Single Frame Transfer
8-bit Pixels
Procedure 1 2 3 4 5 6 Codec:
Source & Dest Addr 7 8 9 10 11 12
Transfer Count 13 14 15 16 17 18 Codec
Element Size 8 bits
19 20 21 22 23 24
Increment src/dest 25 26 27 28 29 30
Frame Sync
31 32 33 34 35 36
________________
(Src: mem_8)
Addr Update Mode (SUM/DUM) ESIZE FS
00: fixed (no modification) 00: 32-bits Frame Sync
01: inc by element size 01: 16-bits 0: Off
10: dec by element size 10: 8-bits 1: On
11: index 11: rsvd
Exercise Solutions
Exercise: Step 1
Modify the configuration from our previous lab exercise:
EDMA_Config gEdmaConfig = {
EDMA_OPT_RMK(
EDMA_OPT_PRI_LOW, // Priority?
EDMA_OPT_ESIZE_16BIT, // Element size?
EDMA_OPT_2DS_NO, // 2 dimensional source?
EDMA_OPT_SUM_NONE, // Src update mode?
EDMA_OPT_2DD_NO, // 2 dimensional dest?
EDMA_OPT_DUM_INC,IDX // Dest update mode?
EDMA_OPT_TCINT_YES, // Cause EDMA interrupt?
EDMA_OPT_TCC_OF(0), // Transfer complete code?
EDMA_OPT_LINK_YES, // Enable link parameters?
EDMA_OPT_FS_NO // Use frame sync?
),
...
Exercise: Step 4
Using the declarations and variables from the previous slide, fill in the
correct values. Use the symbol BUFFSIZE rather than just the value,
in case we change the buffer size later.
Refer back to page 7-17 for a hints on how to fill in the blanks.
Exercise: Step 5
5 Element Reload:
EDMA_RLD_RMK(
// Number of elements, should be the same as Element Count
EDMA_RLD_ELERLD_OF( 2 ),
// We’ll replace “0” later using EDMA_link()
EDMA_RLD_LINK_OF(0)
)
Exercise: Step 6
Complete the “if” condition below using BSL:
If DIP switch 0 is on (down), then add sine-wave values to the
Left and Right receive buffers
if ( DSK6713_DIP_get(0) == 0 )
{
SINE_add(&sineObjL, gBufRcvL, BUFFSIZE);
SINE_add(&sineObjR, gBufRcvR, BUFFSIZE);
}
Introduction
In this module, we will discuss some different ways to handle system timing issues. We will
define some terms that can be used to describe a system and its timing. We will also discuss a
couple of different ways to solve timing issues. We'll take a brief look at optimization to see how
it helps solve timing problems. We'll also learn the benefits of a double-buffered system and how
to modify your current single buffered system into a double-buffered system.
Learning Objectives
Goals for Lab 8
McBSP EDMA CPU
Rcv L
R
+
COPY
Xmt R
The main purpose of this module is to help you implement a double-buffered system on a C6000
DSP.
Chapter Topics
Implementing a Double Buffered System................................................................................................ 8-1
Definition
Here is a good general definition of "real-time". Again, the true definition can change from
system to system. It basically boils down to "when do you get the data?" and "when do you need
to be finished with it?".
Process-0
tp
Out-0
tS
Definitions
tp: Processing Time
ts: Sample Period (time between input samples)
Real Time: Generating an output before receiving the
next input (tp < ts)
Latency: Time from input to output (in this case…tp)
This is a minimum latency system (no buffering), ideal for
control systems, but is computationally inefficient.
Most DSP algorithms benefit from "block processing" where you process multiple samples at
once. Some algorithms, FFT for example, require blocks for processing. When processing
samples, the CPU has to do a context save/context restore for each sample. When you buffer up
samples, the context switch time is dramatically reduced. Also, most algorithms can be optimized
to process blocks over samples by using techniques like loop-unrolling and packed data
processing (or single instruction, multiple data). We don't discuss these topics much in this class,
but the TMS320C6000 Optimization Workshop goes into great detail on these subjects.
Process 0-15
tP
Xmt Buffer 0-15
Latency
Out-0 ... Out-15
T TO Why did we have to decrease our buffer size to get lab 7 to work?
Technical Training
Organization
The main point to notice here is that we have the same amount of time (ts) to process a buffer that
we had to process a single sample in the previous slide. Does it take longer to process a buffer
than it does a single sample?
A Broken System
Since one sample period is all the system has to process the buffer, if the buffer size is too large,
it may take too much time. This causes the sytem to break because it will start dropping samples
and using buffers that may be discontinuous.
Process 0-31
tp
Xmt Buffer 0-31
Latency
Out-0 ... Out-31
Processing 32 samples takes longer than processing 16
The time to process the samples hasn’t changed (tS)
There are 3 solutions to this problem
1. Decrease buffer size (we did this at the end of lab 7)
2. Decrease processing time (tP) with optimization
3. Increase the amount of available time for processing
T TO Let’s see what Solution 2 (optimization) can do for us…
Technical Training
Organization
If the system is broken, there are two different ways to fix it:
• Decrease the amount of time needed to process a buffer (the first two solutions above)
• Increase the amount of time that the system has to process a buffer (double-buffering)
Note: The C64x is slower? Why? These benchmarks are for the sine wave generator that we
have been using in the labs. Is this algorithm a fixed- or floating-point algorithm? It is a
floating-point algorithm. The C64x is a fixed-point processor, while the C67x is a
floating-point processor. The C64x has to call floating-point library routines that emulate
floating-point on a fixed-point device. These routines are not available to the C Compiler
for optimization. This reduces its efficiency dramatically.
It is easy to see how big an effect optimization has on system timing. The optimization used here
is very basic, and there are other steps that could have been taken to further optimize these
routines. Even with basic optimization, the performance of these routines can be dramatically
improved.
The Fast RTS Library for the C62x and C64x processors contain optimized floating-point
routines that can help these processors deal with floating-point much more efficiently. These
libraries can be downloaded from our web site, www.dspvillage.com.
Notice that the time allowed to process a buffer is no longer sampling period (tS). It is now the
sampling periond times the length of the buffer (tB). This extra time can be used to reduce the
B
amount of optimization that needs to be done, increase the buffer size for more efficiency, or
simply allow for changes later on.
Are there any consequences of double-buffering that should be considered? Sure, it takes more
memory and it adds more latency. So, this is something else to add to the engineering balance
sheet.
The concept of double-buffering can also be extended to included more than two buffers. This is
very common in different kinds of systems where there is a lot of data and latency is not a big
issue (i.e. video).
Psuedo Code
• Allocate reload entries for Ping and Pong
• Src = DRR (McBSP0)
• EDMA_config (…)
• Link: channel Æ Pong, Pong Æ Ping, Ping Æ Pong
T TO
Technical Training
Organization
Lab 8
Let's go off and apply all of the new knowledge that we have learned. In Lab 8, we'll take the
single-buffered system that we've had and make it double-buffered.
Rcv L
R
+
COPY
Xmt L
Open Audioapp.pjt
1. Reset the DSK, start CCS and open audioapp.pjt
Note: You need to make sure that the sine wave is turned on for this part. If it is not turned on,
you should be able to add quite a bit more load because the system is not generating the
sine wave (which takes CPU cycles and time).
Keep incrementing the loadValue until you hear the system break (ours broke around 10 for
the 6416 and 6 for the 6713). How much load can the system handle before it starts to sound
bad? ______________________________
Now, that we know how to break the system (that's the easy part), let’s leave the load in our
code and add another buffer to our system. Using a double buffer system will give us a whole
buffer time of samples instead of just the period between two samples. In other words, with a
double buffer system, this load will be insignificant.
9. Halt the DSK
Note: If you struggled with Lab 8 and couldn’t get it to work, copy the files from \solutions for
c64x\lab8 or \solutions for c67x\lab8into your \audioapp directory and begin with the
next step shown below.
Note: Don’t forget that the order of the buffers is important. Due to the way we are using the
EDMA for channel sorting, the buffers for the Right channel need to follow immediately
after their corresponding Left channel buffers.
short * sourceL;
short * sourceR;
short * destL;
short * destR;
if (pingOrPong == PING) {
sourceL = gBufRcvLPing
sourceR = gBufRcvRPing
destL = gBufXmtLPing
destR = gBufXmtRPing
pingOrPong = PONG
}
else { // pingOrPong must equal PONG
sourceL = gBufRcvLPong
sourceR = gBufRcvRPong
destL = gBufXmtLPong
destR = gBufXmtRPong
pingOrPong = PING
}
Note: If you’re uncomfortable with adding this control logic to the code, just copy it from the
solution and continue.
21. Change two SINE_add( )'s and the two copyData( )’s
When the code finishes executing the if/else statement that we just added, the active buffers
are pointed to by the four local pointers that we added: sourceL, sourceR, destL, destR. This
makes it easy to change the processing functions, the two SINE_add( )'s and the two
copyData( )'s. Modify these functions to use the active pointer names instead of the globals
that we have been using.
Hint: gBufRcvL should become sourceL, gBufXmtL should become destL, etc.
You’re done.
Introduction
In this module, you will learn how to use the BIOS scheduler and some additional debugging
techniques provided by BIOS.
Learning Objectives
Goals for Lab 9
McBSP EDMA CPU
Rcv L
R
+
COPY
Xmt L
Chapter Topics
DSP/BIOS Scheduling............................................................................................................................... 9-1
Real-Time Problem
Definition
Lab 9 Requirement - Abstract
Previous Requirement
addSine/copy
DSP pass-through and addSine
Possible Solutions
Possible Solution – while Loop
}
idle
Timer2_ISR
{
LED/load
} Time 0 1 2 3 4 5 6 7
Interrupt is missed…
T TO
Technical Training
Organization How could we prevent this?
Timer1_ISR
{
addSine/copy Time 0 1 2 3 4 5 6 7
}
Use DSP/BIOS HWI dispatcher for context
Timer2_ISR save/restore, and allow preemption
{ Reasonable approach if you have limited
LED/load number of interrupts/functions
} Limitation: Number of HWIs and their priorities
are statically determined, only one HWI function
for each interrupt
T TO
Technical Training
Organization What option is there besides Hardware interrupts?
EDMA INT
HWIs signaling SWIs
HWI:
urgent code
SWI_post();
SWI
HWI SWI
Fast response to interrupts Latency in response time
Minimal context switching Context switch performed
High priority only Selectable priority levels
Can post SWI Can post another SWI
Could miss an interrupt Execution managed by
while executing ISR scheduler
T TO
Technical Training
Organization
Tasks
Another Solution – Tasks (TSK)
start
SEM_pend Pause
“run to (blocked
completion” state)
start
end end
T TO
Technical Training
Organization
BIOS
Enabling BIOS
Enabling BIOS – Return from main()
addSine/copy
LED/load
BIOS is …
DSP BIOS Consists Of:
Real-time scheduler
Preemptive thread management
kernel
Real-time I/O
Allows two-way communication
between threads or between
target and PC host.
Thread Scheduling
Priority Based Thread Scheduling
post3 rtn
HWI 2 SWI_post(&swi2);
(highest)
post2 rtn
HWI 1
post1 rtn
SWI 3
int2 rtn
SWI 2
rtn
SWI 1
rtn
MAIN
int1
IDLE
(lowest)
User sets the priority...BIOS does the scheduling
T TO How do you create a SWI and set priorities?
Technical Training
Organization
SWI Properties
SWI Properties
_myFunction
T TO
Technical Training
Organization
Using a Mailbox
Pass Value to SWI Using Mailbox
HWI:
…
_myFunction
SWI_or (&SWIname, value);
value
SWI:
temp = SWI_getmbox();
…
////Prolog…
Prolog… Initialization (runs once only)
while
while(‘condition’){
(‘condition’){ Processing loop -
option: termination condition
blocking_fxn()
blocking_fxn() Suspend until unblocked
////Process
Process Perform desired DSP work...
}}
////Epilog
Epilog Shutdown (runs once - at most)
}}
Periodic Functions
Periodic Functions
tick
DSP/BIOS
CLK
period
T TO
Technical Training
Organization
Let’s use the Config Tool to create a periodic function…
period
T TO
Technical Training
Organization
Execution Graph
Software logic
analyzer
Debug event timing
and priority
Statistics View
Profile routines w/o
halting the CPU
Capture & analyze data
without stopping CPU
Message LOG
Send debug msgs to host
Doesn’t halt the DSP
Deterministic, low DSP
cycle count
More efficient than
traditional printf()
LOG_printf (&logTrace, “addSine ENabled”);
T TO
Technical Training
Organization
Lab 9
In this lab, we’re going to change our copy routine to a SWI, add a routine to blink the LEDs and
analyze other parts of our code using DSP/BIOS tools.
Lab 9
McBSP EDMA CPU
Rcv L
R
+
COPY
Xmt L
3. Move pointers from the old HWI to the new processBuffer( ) function
Remember the four local pointers that we created in the last lab: sourceL, sourceR, destL, and
destR? We need to move (not copy) these pointers from edmaHwi( ) to processBuffer( ). DO
NOT move the static variable pingOrPong.
The edmaHwi( ) function is in edma.c and processBuffer( ) should be in main.c.
if (pingOrPong == PING) {
pingOrPong = PONG;
}
else {
pingOrPong = PING;
}
This code should be inside the if statement that tests the rcvInt and the xmtInt values. We
only want to use this code when both interrupts have occurred.
Note: The APIs for posting a SWI expect a pointer to the SWI object (i.e. a handle). So, make
sure and pass the address of the structure itself (the SWI_Obj) in the API call.
• if PING, assign the source and destination pointers to the PING buffers, just like we did
in the edmaHwi( ) before
• if PONG, assign the source and destination pointers to the PONG buffers
Note: Don’t forget to eliminate the two instructions that change the status of pingPong (held
over from the HWI code). These instructions are no longer needed.
10. Copy #defines for PING and PONG from edma.c to main.c
We are now using these definitions in both places to help with the control code.
Note: Notice the naming convention here. The object is called processBufferSwi to denote that
it is a SWI object that calls a function named processBuffer( ). Be careful not to use the
same name for both of these. This will cause a symbol problem in the linker because
there are two different addresses (one for the SWI object structure and another for the
code) for the same label.
Source Files
main.c edma.c mcbsp.c
<csl.h> <csl.h> <csl.h>
<csl_edma.h> <csl_edma.h> <csl_mcbsp.h>
"sine.h" "codec.h"
<csl_irq.h> "mcbsp.h"
"sine.h" "dsk6713.h"
or
"dsk6416.h"
"edma.h" "dsk6713_dip.h"
Header or
Files "dsk6416_dip.h"
"mcbsp.h" "audioappcfg.h"
"dsk6713.h"
or
"dsk6416.h"
"dsk6713_dip.h"
or
"dsk6416_dip.h"
"audioappcfg.h"
Part A
Note: If you struggled with getting Lab 9 to work, simply copy the files from \solutions for
c64x\lab9 or \solutions for c67x\lab9 into your lab9 directory and begin at the next step
shown below.
19. Add the new periodic function, blinkLeds(), to your CDB file
Open the configuration file and insert a new periodic object called blinkLedsPrd that calls
the blinkLeds() function every 250 ticks. Click OK. Right click on the CLK manager, select
properties and ensure that the default setting of 1000 microseconds/int is set. This sets the
“tick” rate for BIOS and all periodic functions can be set up to fire after X number of ticks
have expired. We’ll use the default setting of 1000 (or, 1 millisecond) for this lab. Click OK.
Part B
Use Real-Time Analysis Tools
Next, we’ll use a few tools that might help us understand what is going on in our code. A few
of these tools, such as the CPU load graph and execution graph are “ready to go” and require
no additional coding efforts to use. The other tools require minimal code to work, such as
LOG_printf().
24. Change priority of the processBuffer() SWI vs. the periodic function
OK. So, what’s the solution? Have you thought about how the processBuffer() function is
prioritized? Is the periodic function set at a higher, lower, or equal priority? Let’s take a look.
Open the audioapp.cdb file. Click on the + sign next to Scheduling. Click on SWI – Software
Interrupt Manager. In the right hand window, you’ll see the SWI priority list where 0 is the
lowest and 14 is the highest priority. What is the current setting? Which is more important –
the processing of the audio or the blinking of the LEDs? Assuming that the answer is “the
audio”, we need to set its priority higher than the LED blinking. By the way, if SWIs are set
at the same priority, they execut in a first in, first out fashion.
Click and drag processBufferSwi to Priority 2 and release it. The audio is now higher priority.
Close the .cdb file.
25. Build, Load and Run
Your audio should sound MUCH better now and the LEDs should be blinking normally. The
“enable sine” switch should also work flawlessly.
Part C
Using a TSK Instead of a SWI
Now, we’re going to switch the SWI to a TSK. There are several things that a TSK can
“block” or pend on – in this case, we’re going to use a semaphore. Because TSKs do not have
a mailbox (like SWIs do), we need to use a global variable to pass the status of pingOrPong
between the HWI and processbuffer( ). However, using a global variable means that the
status of PingOrPong changes instantly.
The first time we enter the edmaHwi( ), the PING buffers are full. So, we want to post PING
to the TSK. We must, however, switch the state of pingOrPong before doing the SEM_post
because the global variable changes instantly (vs. using the mailbox within the SWI). So, we
need to initialize pingOrPong to PONG, then switch it back to PING prior to the first
SEM_post…so it processes PING when PING is ready.
40. Once you have a clean build – load/run. Everything should operate normally.
42. Copy project to preserve your solution.
Using Windows Explorer, copy the contents of:
c:\iw6000\labs\audioapp\*.* TO c:\iw6000\labs\lab9
You’re done.
Introduction
Advance memory management involves using memory efficiently. We will step through a
number of options that can help you optimize your memory usage as well as your performance
needs.
Outline
Outline
Using Memory Efficiently
Keep it on-chip
Use multiple sections
Use local variables (stack)
Using dynamic memory (heap, BUF)
Overlay memory (load vs. run)
Use cache
Summary
Chapter Topics
Advanced Memory Management............................................................................................................10-1
Keep it On-Chip
Using Memory Efficiently
Program 1. If Possible …
Cache Put all code / data on-chip
Internal Best performance
SRAM Easiest to implement
.text
CPU EMIF
.bss
Data
Cache
From earlier discussions in this chapter, remember that two sections hold most of our code and
data. They are:
• .text - code and
• .bss - global and static variables.
Unfortunately, keeping everything on-chip is not always possible. Often code and data will
require too much space and you are left with the decision of what should be kept on-chip and
what can reside off-chip. Here are 5 other techniques to help you make the best use of on-chip
memory and maximize performance.
6. Use cache
Data
Cache
If these sections are too big to fit on-chip, you will have to place them off-chip. But you may still
want to put critical function and/or data on-chip.
Custom Sections
In order to use multiple sections, you’ll need a way to create them:
You will have to create new sections to keep critical code and data on-chip and other code and
data off-chip.
Hint: Here is a little rule of thumb: “Create a new section for any code or data that must be
placed in a specific memory location.”
3. Far keyword:
far short m;
No matter how you create additional data sections, they will always be accessed using far
addressing (MVKL/MVKH). Only .bss is ever accessed with the near addressing optimization
(global Data Pointer).
Rather, you must create your own linker command file, as shown below.
appcfg.cmd
Linker
myLink.cmd
SECTIONS
{ myVar: > SDRAM
critical: > IRAM myApp.out
.text:_dotp:> IRAM
}
A few points:
1. Second, using the SECTIONS descriptor, list all the custom sections you have created and
direct them into a MEM object. Each line “reads”:
To learn more about the SECTIONS directive, or linking in general, please refer to
TMS320C6000 Assembly Language Users Guide (SPRU186).
2. You should not specify a section in both the Configuration Tool and your own linker
command file.
3. You shouldn’t use the same label for a section name as you did for a label in your code. In
other words, don’t put variable y into section “y”.
If you are concerned that you might forget a custom-named section (or a team member might
create one without telling you), the –w linker option can warn you of unspecified sections:
CPU EMIF
Data
Cache
Whenever a new function is encountered, its local variables are automatically created on the
software stack. Upon exiting the function, they are deleted from the stack. While most folks today
call them “local” variables, they often used to be called “auto” variables. (A fitting name in that
they are automatically allocated and deallocated from memory as they’re needed.)
Linking the software stack (.stack) into on-chip memory – and using local variables – can be an
excellent way to increase on-chip memory efficiency … and performance.
The run-time stack grows from the high addresses to the low addresses. The compiler uses the
B15 register to manage this stack. B15 is the stack pointer (SP), which points to the next unused
location on the stack.
The linker sets the stack size to a default of 1024 bytes. You can change the stack size at link
time by using the –stack option with the linker command. The actual length and location of the
stack is determined at link time. Your link command file can determine where the .stack section
will reside. The stack pointer is initialized at system initialization.
s
g row B15
ck
sta SP
Top of Stack
(higher) 0xFFFFFFFF
If arguments are passed to a function, they are placed in registers or on the stack. Up to the first
10 arguments are passed in even number registers alternating between A registers and B registers
starting with A4, B4, A6, B6, and so on. If the arguments are longs, doubles, or long doubles,
they are placed in register pairs A5:A4, B5:B4, A7:A6, and so on.
Any remaining arguments are place on the stack. The stack pointer (SP) points to the next free
location. This is where the eleventh argument and so on would be placed. Arguments place on
the stack must be aligned to a value appropriate for their size. An argument that is not declared in
a prototype and whose size is less than the size of int is passed as an int. An argument that is a
float is passed as double if it has no prototype declared. A structure argument is passes as the
address of the structure. It is up to the called function to make a local copy.
ws
g ro B15
ck
sta SP
Top of Stack
(higher) 0xFFFFFFFF
In addition to using a stack, C compilers provide another block of memory that can be user-
allocated during program execution (i.e. at runtime). It is sometimes called System Memory
(.sysmem), or more commonly, the heap.
Dynamic Memory
Using Memory Efficiently
Program 3. Local Variables
Cache If stack is located on-chip,
all functions can use it
Internal External
SRAM Memory 4. Use the Heap
Stack Common memory reuse
within C language
CPU EMIF
A Heap (ie. system memory)
Heap allocate, then free chunks of
memory from a common
system block
Data
Cache
Here is an example using dynamic memory; in fact, it provides a good comparison between using
traditional static variable definitions and their dynamic counterparts.
Delete free(a);
free(x);
malloc() is a standard C language function that allocates space from the heap and returns an
address to that space.
The big advantage of dynamic allocation is that you can free it, then re-use that memory for
something else later in your program. This is not possible using static allocations of memory
(where the linker allocates memory once-and-for-all during program build).
Multiple Heaps
Assuming you have infinite memory (like most introduction to C classes assume), one heap
should be enough. In the real world, though, you may want more than one. For example, what if
you want both an off-chip and an on-chip heap.
Data
Cache
Just as we discussed earlier with Multiple Sections for code and data, multiple heaps allows you
to target critical elements on-chip, while less critical (or larger ones) can be allocated off-chip.
While standard C compilers do not provide multiple heap capability, TI’s DSP/BIOS tools do.
When creating MEM objects, you have the option to create a heap in that memory space. Just
indicate you want a heap (with a checkmark) and set the size. From henceforth, you can refer to
this specific heap by its MEM object name.
Alternatively, if you don’t want to use the MEM object name to refer to a heap you can define a
separate identification label.
Using MEM_alloc
Q: If standard C doesn’t provide multi-heap capabilities, how would the standard C functions
like malloc() know which heap to use?
MEM_alloc()
Standard C syntax Using MEM functions
free(a); MEM_free(SDRAM,a,SIZE);
free(x); MEM_free(IRAM,x,SIZE);
As you can see, there is also MEM_free() to replace free(). Additional substations can be found in
the DSP/BIOS library.
Using BUF
While using dynamic memory via the heap is advantageous from a memory reuse perspective, it
does have its drawbacks.
Heap drawbacks:
− Allocation calls (i.e. malloc) are non-deterministic. That is, each time they are called they
make take longer or shorter to complete.
− The allocation functions are non-reentrant. For example, if malloc() is called while a
malloc() is already running (say, it was called in a hardware interrupt service routine), the
system may break.
− Heap allocations are prone to memory fragmentation if many malloc's and free's are
called.
BUF solves these problems by letting users create pools of buffers that can then be allocated,
used, and set free.
BUF Concepts
POOL
BUF_create BUF BUF BUF BUF BUF BUF BUF_delete
BUF_alloc BUF_free
TSK SWI
BUF BUF BUF BUF
Memory Overlays
Another traditional method of maximizing use of on-chip memory is to overlay code and data.
(You could even substitute the term overlap for overlay.) While each exists on its own externally,
they run from the same overlayed locations, internally.
algo1 yourself
CPU EMIF
algo2
Data
Cache
With overlays, each code or data item must reside in its own starting location. The TI tools call
this its load location, because this is what is downloaded to the system (when using the CCS
Load Program menu item, or when you download to an EPROM via an EPROM programmer).
During program execution, your code must copy the overlayed data or code elements into their
run location. This is where the program expects the information to reside when it is used (i.e.
when the overlayed function is called, or the overlayed data elements are accessed). The linker
resolves all your code/data labels (i.e.symbols) to the runtime addresses.
In the case of our overlayed functions, though, we don’t want them to be loaded-to and run-
from the same locations in memory, therefore, we might try something like:
In this case, they are both loaded into EPROM and Run from IRAM. The problem is that the
linker assigns different run addresses for both functions. But, we wanted them to share (i.e.
overlap) their run addresses. How can we make this happen?
Use the linker’s UNION command. The union concept is similar to that of creating union
types in the C language. In our case, we want to tell the linker to put the run addresses of the
two functions in union.
This then, allocates separate load addresses for each function, while providing a single run
address for both functions.
Note: To set separate load and run addresses for pre-defined BIOS and Compiler sections, there
is an additional tabbed page in the CCS Config Tools Memory Section Manager dialog.
3. Last, but not least, you must copy the code from its original location to its runtime
location. Before you run each function you must force the code (or data, in a data overlay) to
be copied from its load addresses to its run addresses. When using the Copy Table feature of
the linker, copying code from its original location is quite easy.
#include <cpy_tbl.h>
extern far COPY_TABLE fir_copy_table;
extern far COPY_TABLE iir_copy_table;
extern void fir(void);
extern void iir(void);
main()
{ copy_in(&fir_copy_table);
fir();
...
copy_in(& iir_copy_table);
iir();
...
}
The copy_in() function is a simple wrapper around the compiler’s mem_copy() function. It
reads the table description created by the “table” feature of the linker and uses it to perform a
mem_copy().
From a performance standpoint, though, you are better off using the DMA or EDMA
hardware peripherals. These hardware peripherals can be easily used to copy these tables by
using the DAT_copy() function from TI”s Chip Support Library (CSL).
Overlay Summary
myLnk.CMD
First, create a section for each function SECTIONS
SECTIONS
In your own linker cmd file: {{ .bss:> IRAM /*load
.bss:> IRAM /*load&&run*/
run*/
load: where the fxn resides at reset
run: tells linker its runtime location UNION
UNION run
run == IRAM
IRAM
UNION forces both functions to be {
{
runtime linked to the same memory .FIR
addresses (ie. overlayed)
.FIR :: load
load == EPROM
EPROM
myIIR:
myIIR: load
load == EPROM
EPROM
You must move it with CPU or DMA
}}
/**************************************************************************/
/* Copy Record Data Structure */
/**************************************************************************/
typedef struct copy_record
{ unsigned int load_addr;
unsigned int run_addr;
unsigned int size;
} COPY_RECORD;
/**************************************************************************/
/* Copy Table Data Structure */
/**************************************************************************/
typedef struct copy_table
{ unsigned short rec_size;
unsigned short num_recs;
COPY_RECORD recs[1];
} COPY_TABLE;
/**************************************************************************/
/* Prototype for general purpose copy routine. */
/*************************************************************************/
extern void copy_in(COPY_TABLE *tp);
Overlays can be very useful, but they’re also tedious to setup. Isn’t there an easier way to get the
advantages of overlays? …
Cache
Data and program caching provides the benefits of memory overlays, without all the hassles.
Since modern C6000 devices have both data and program cache hardware, this is the easiest
method of overlaying memory (and hence, most commonly used).
Use Cache
Using Memory Efficiently
Program 6. Use Cache
Cache Works for Code and Data
.bss Chapter 14
Data
Cache
Rather than discuss cache in detail here, the next chapter is dedicated to this topic.
Summary
You may notice the order in the summary is a bit different from that which we just discussed the
topics. While introducing them to you, we wanted to build the concepts piece-by-piece. In real
life, though, as you design your system you will probably want to employ them in the following
order.
For example,
1. If you can get everything on-chip, you’re done.
2. If it won’t all fit, you might try enabling the cache. If your system meets its real-time
deadlines, you’re now done.
3. In most cases, you’ve probably already used local variables whenever possible. So this one is
probably a ‘given’.
4. If you’ve enabled the cache and still need to tweak the system for performance, you might try
to using dynamic memory
… or one of the remaining options.
The advantage to the top 4 methods is that they can all be done from within your C code. The
remaining two require a custom linker command file (or modification of your .cmd file). (Not
difficult, but one more thing to manage.)
Introduction
In this module, you will learn how to incorporate an XDAIS-compliant algorithm into your
application.
Outline
Code Integration Problems
Background Terminology
Basic XDAIS Components
XDAIS Example – Sine Wave Algorithm
Algorithm Instance Lifecycle
Lab 11 – Using a XDAIS FIR Algorithm
Additional Topics
COPY XDAIS
P Filter
i
Xmt n
g
DAC XMTCHAN gBufferXmt
P Flash LEDs
o
n and Load
g
Chapter Topics
Using a XDAIS Algorithm .......................................................................................................................11-1
T TO Problem 2 ...
Technical Training
Organization
T TO And finally …
Technical Training
Organization
3. Buying Algorithms
Why is it hard to integrate someone else’s algo?
1. Will the function names conflict with other code in the
system?
2. Will it use memory or peripherals needed by other algo’s?
3. How can I run the same algo on more than one channel at a
time? (How can I prevent variables from conflicting?)
4. Don’t know how fast it runs …
… or how much memory it uses.
5. How can I adapt the algorithm to meet my needs?
6. How many interfaces (API’s) do I have to learn?
Traditional Solutions
Traditional Solutions
1. Manually integrate algorithms together by finding all (hopefully)
the conflicts and fixing them.
TI XDAIS Solution
TI Solution
Input
Algo
Output Your
Algo
Application
Memory
Algo
Background Terminology
What is an Instance?
This is a key concept in XDAIS
To demonstrate the concept, let’s
examine an “instance” in C code
typedef struct myType { Define Datatype
int var1; Only a “template”
short var2; No memory allocated
char var3;
};
What is an Interface?
“Interface” can mean many things
We define it conceptually for the purposes
of this chapter
Let’s start by defining a Function interface
What is an Algorithm?
Algo’s usually are more than just a single function, an
typedef struct myType {}; algorithm may include:
myType var1; Data Types
int var2; Data Objects
int myFunction(short a, int b) Functions
T TO
Technical Training
Organization
What is an Algorithm?
Algo’s usually are more than just a single function, an
typedef struct myType {}; algorithm may include:
myType var1; Data Types
int var2; Data Objects
int myFunction(short a, int b) Functions
For example, what parameters might you need for a FIR filter?
A filter called IFIR might have:
typedef
typedef struct
struct IFIR_Params
IFIR_Params {{
Int
Int size;
size; ////size
sizeofofparams
params
XDAS_Int16
XDAS_Int16 firLen;
firLen;
XDAS_Int16
XDAS_Int16 blockSize;
blockSize;
XDAS_Int16
XDAS_Int16 *coeffPtr;
*coeffPtr;
}} IFIR_Params;
IFIR_Params;
T TO
Technical Training
Organization
MemTab:
MemTab Space: Internal / External
Size
memory
Alignment
Space
Attributes
Attributes: Scratch or
Base Addr
Persistent memory
(discussed later)
MemTab example:
Application MemTab Algorithm
Size
Based on the four Alignment Algo provides
memory details in Space info for each
MemTab, Attributes block of memory
Application allocates Base Addr it needs,
each memory block, Size
Alignment
Except base
and then Space address …
Attributes
Provides base Base
address to MemTab Size
Alignment
Space
Attributes
Base
T TO
Technical Training
Organization
Functions
sineInit()
sineValue()
sineBlockFill()
T TO
Technical Training
Organization
Star symbols indicate small amount of “extra” code required when using XDAIS
Note, extra code only affects initialization of algorithm, not runtime processing
T TO This example uses “Static” allocation of memory in application code.
Technical Training
Organization
Sine Algorithm
Functions
SINE_init()
SINE_value()
SINE_blockFill()
Algorithm
Static Dynamic
Lifecycle
algNumAlloc
Create algAlloc
SINE_init algInit (aka sineInit)
SINE_value SINE_value
Process
SINE_blockFill SINE_blockFill
Notice
Notice the
the use
use of
of
dynamic memory
dynamic memory
allocation.
allocation.
And
And the
the fact
factthe
the
algo
algo never
never does
does
the
the allocation.
allocation.
SINE_value SINE_value
Process
SINE_blockFill SINE_blockFill
algInit ()
Common for all One create function can Can be as simple as a
XDAIS compliant instantiate any XDAIS algo single-line function
algo’s which only calls
ALGRF library provided in ALGRF_create
These functions Reference Frameworks
specified by XDAIS Easier than using
algorithm standard Reference Frameworks ALGRF_create;
(RF) are discussed further no complex C casting
in the next chapter
Optional function per
XDAIS standard
T TO
Technical Training
Organization
• Create a C file that will init and create an instance of the algorithm
• Modify our audioapp.c file to call that filter at the appropriate time
Lab 11
McBSP0 EDMA CPU
P DIP_1
i
Rcv n
g
ADC RCVCHAN gBufferRcv +
P
o DIP_2
n
g
COPY XDAIS
P Filter
i
Xmt n
g
DAC XMTCHAN gBufferXmt
P Flash LEDs
o
n and Load
g
XDAIS Files
FIR.H (Vendor May Provide)
Contains FIR_create & FIR_delete functions
These are framework functions
Not required by algorithm standard (but usually provided)
Lab 11 Procedure
In this lab, we’re going to add a XDAIS algorithm to filter out the sinewave. We're going to use a
FIR module that has been written to use ALGRF to make our job a lot easier. We'll use a DIP
switch to turn the filter on and off so that we can verify that it is working correctly.
3. Examine xdais.c
Let’s take a look at this file from top to bottom. You’ll see:
• A place to put the header files for BIOS
• A place to put the header files for XDAIS
• The function prototypes
• Some declarations and a place for global variables
• One semi-empty function: initAlgs(). You will fill in this function with the code needed
to create two instances of the FIR filter. Here is a summary of the code that you will
write:
• Create two global FIR_Handle's, one for each channel
• Create a local parameters structure
• Fill the parameters structure with the default values
• Change some parameters to meet our needs
• Create two instance of the algorithm using FIR_create()
• Since FIR_create() uses ALGRF, we need to set it up
Set Up ALGRF
The FIR_create() function that we are going to use is really just a "wrapper" for calling
ALGRF_create(). The FIR_create() wrapper takes care of a lot of casting and nasty C "stuff" that
we just don’t want to have to deal with. ALGRF_create() uses BIOS's MEM Memory Manager to
allocate the memory needed by an algorithm. Since BIOS allows you to have multiple heaps,
ALGRF leverages this capability to allow algorithms to use internal and external memory. To do
this, ALGRF needs to be told which heaps to use.
4. Inside xdais.c, add the following function call in initAlgs() to set up ALGRF's heaps
Here is the code that you will need to add (below the definition of firParams):
or
Note: We currently have a heap allocated in each of these memories inside the .cdb file. BIOS
allows you to name the heaps whatever you like. The names are declared as an
enumeration, so we need to refernce them as we have done at the top of xdais.c. This
allows us to use the names ISRAM (or IRAM) and SDRAM directly.
8. Examine firParams
Inside the initAlgs() function, inspect the firParams structure that contain the default
parameters, FIR_PARAMS. You should see this below the call to ALGRF_setup().
Also notice the following steps have been completed for you:
• Coeff pointer element (coeffPtr) points to (short *)coeffs. (the coefficients are located in
a header file that we will add later).
• The filter length element of firParams (filterLen) is set to 345 (which is the number of
coefficients that we have).
• Frame length is set to BUFFSIZE (this is the number of elements that we want to process
each time we call the FIR Filter).
9. Create two instances of the FIR filter algorithm by calling FIR_create() twice
Now that we’ve initialized the parameters we'll want to create an instance of our filter using
these parameters. We’ll do that with the FIR_create() function. Here is an example that
creates the left channel instance:
Add the code to create an instance of the algorithm for the right channel. None of the
parameters need to change.
FIR_create() calls ALGRF_create() and presents it all of the correct parameters with the
correct types. The first argument to FIR_create() is a pointer to virtual table of the algorithm
for which we want to create an instance. This table is defined in the library for the algorithm.
For more information on the FIR_TI_IFIR function table look in the fir_ti.h header file.
Modify main.c
12. Add a call to initAlgs() to main()
Open main.c and add a call to initAlgs() to main(). Call this function just before you call
initMcBSP( );
14. Use the FIR_apply() function to apply the FIR filter to the audio stream
Find the place in the processBuffer() function where the data is currently copied. Just above
this, add two calls to FIR_apply() to filter the audio stream. FIR_apply() is another FIR
module function that makes it easy to call the FIR filter in the xdais instance. The calls to
FIR_apply should look something like this:
15. Use an if/else statement and a DIP switch to control when the filter is applied
Use DIP switch one on the DSK to turn the filter on and off. When the DIP switch is down,
run the filter, when the DIP switch is up, do the copy as we have been doing.
16. Include fir.h in main.c
This file has the prototypes and type information (FIR_Handle) that we need to call
FIR_apply().
;c:\iw6000\xdais\include;c:\iw6000\xdais\algFIR
Source Files
main.c edma.c xdais.c
<csl.h> <csl.h> "algrf.h"
<csl_edma.h> <csl_edma.h> "fir.h"
"sine.h" "fir_ti.h"
<csl_irq.h> "mcbsp.h" "audioappcfg.h"
"sine.h" "dsk6713.h" "200hz bandstop
or order 344.h"
"dsk6416.h"
"edma.h" "dsk6713_dip.h"
or
Header "dsk6416_dip.h"
Files "mcbsp.h" "audioappcfg.h"
"dsk6713.h"
or
"dsk6416.h"
"dsk6713_dip.h"
or
"dsk6416_dip.h"
"audioappcfg.h"
"fir.h"
"xdais.h"
25. When you're done playing, halt the processor and close CCS
You’re done.
Additional Topics
XDAIS Rules and Guidelines
XDAIS Documentation Rules
Don’t know how fast it runs … or how much memory it uses.
Strict rules on vendor-provided documentation (PDF file).
T TO
Technical Training
Organization
fir_company123_min.l64
fir_company123_max.h62
L: library
Algorithm Vendor
Variant h: header
Module Name Name
62: C62x/C67x
64: C64x
T TO
Technical Training
Organization
XDAIS Certification
Improved Software Reliability
T TO
Technical Training
Organization
Introduction
In this chapter, we will discuss a current problem for DSP system design and suggest a possible
solution provided by TI.
Learning Objectives
Objectives
System Block Diagram
Standard I/O (SIO) - Using Streams
Device Drivers (IOM)
Reference Frameworks (RF)
Lab 12/12a – Using SIO and
Modifying an IOM Driver
T TO
Technical Training
Organization
Chapter Topics
Frameworks ..............................................................................................................................................12-1
DSP
System Software
System Software
Program = Code + Data
Embedded System = Program + Mem. Management + Init + H/W + I/O …`
System
System X
Software D
H/W
(Peripherals) ? Data
Init
A
I
Algorithm
Mem. Mgmt. S
T TO Interface first…
first…
Technical Training
Organization
DSP/BIOS
provided
communications SIO PIP GIO
interface
T TO Let’
Let’s look at our lab’
lab’s system architecture…
architecture…
Technical Training
Organization
There are actually three types of interfaces as we discussed before (PIP, SIO, GIO). SIO happens
to be the easiest to use when talking to a driver – so that’s what we’re going to use in the lab.
The analogy on right hand side fits nicely. The hardware (McBSP, EDMA, codec) are the “power
plant”. They produce the data (electricity). The driver contains the transmission lines and the
adapter to adapt the high voltage lines down to a plug in your house or someone else’s. SIO is the
plug of the fan. You can take your fan (TSK) anywhere you like and plug it into a socket and
make it work. You don’t have to know where the power plant is and you need not be concerned
with how the high voltage is converted to the socket you use in your home. Also, the power plant
and transmission lines need not care WHAT you’re plugging into the wall – but the electricity
flows and everything works nicely. This is the beauty of using streams.
DSP/BIOS
SIO-
SIO-Stream SIO
Interface
Let’
Let’s take a closer
T TO look at SIO…
SIO…
Technical Training
Organization
The IOM driver on the left fills up buffers and issues them to the IN stream and reclaims (takes)
empty buffers back from the TSK to fill them up again. On the TSK side, the code will issue
empty buffers to the driver to fill up and reclaim (take) full buffers to process. The OUT stream
works the same way.
Instead of copying buffers, streams (SIO) passes pointers to the buffers increasing the efficiency
of the system. Another nice feature of streams is that a “reclaim” blocks (pends) until the buffer is
issued by the other side. So, the TSK might say “give me a buffer” using a “reclaim” and the TSK
will pend until that buffer is ready. No additional coding steps are necessary.
SIO Concepts
Driver “Streams”
Streams” Application
issue FULL buffer reclaim FULL buffer
IOM TSK
issue…
issue… reclaim…
reclaim…
reclaim…
reclaim…
OUT issue…
issue…
/*/*Allocate
Allocatebuffers
buffersfor
forthe
theSIO
SIObuffer
bufferexchanges
exchanges*/*/
rcvPing
rcvPing = (Ptr)MEM_calloc(0, BUFFSIZE*4,BUFALIGN);
= (Ptr)MEM_calloc(0, BUFFSIZE*4, BUFALIGN);
rcvPong
rcvPong==(Ptr)MEM_calloc(0,
(Ptr)MEM_calloc(0,BUFFSIZE*4,
BUFFSIZE*4,BUFALIGN);
BUFALIGN); Allocate the buffers
xmtPing = (Ptr)MEM_calloc(0, BUFFSIZE*4, BUFALIGN);
xmtPing = (Ptr)MEM_calloc(0, BUFFSIZE*4, BUFALIGN);
using MEM_calloc()
xmtPong
xmtPong==(Ptr)MEM_calloc(0,
(Ptr)MEM_calloc(0,BUFFSIZE*4,
BUFFSIZE*4,BUFALIGN);
BUFALIGN);
/*/*Issue
Issuethe
thefirst
first&&second
secondempty
emptybuffers
bufferstotothe
theinput
inputstream
stream */*/
stream
SIO_issue(inStream,
SIO_issue(inStream,rcvPing,
rcvPing,BUFFSIZE*4,
BUFFSIZE*4,NULL);
NULL); Issue 1st and 2nd empty
SIO_issue(inStream, buffers to INPUT stream
SIO_issue(inStream,rcvPong,
rcvPong,BUFFSIZE*4,
BUFFSIZE*4,NULL);
NULL);
/*/*Issue
Issuethe
thefirst
first&&second
secondempty
emptybuffers
bufferstotothe
theoutput
outputstream
stream */*/
stream
SIO_issue(outStream, xmtPing, BUFFSIZE*4, NULL);
SIO_issue(outStream, xmtPing, BUFFSIZE*4, NULL); Issue 1st and 2nd empty
SIO_issue(outStream,
SIO_issue(outStream,xmtPong,
xmtPong,BUFFSIZE*4,
BUFFSIZE*4,NULL);
NULL);
buffers to OUTPUT stream
}}
So far in this class, you’ve done all of that work – writing configuration structures and _open()
and _config() code to talk to the hardware. What the driver (IOM) does is encapsulates all of the
necessary code to talk to the hardware and places a stream (SIO) interface of top of that to talk to
application software (like our TSK).
In the lab, we’re going to do two things: (1) drop in an off-the-shelf driver for the DSK and
change our TSK to use streams to communicate with it; (2) modify the driver to perform channel
sorting. Both of these activities will be beneficial to any system designer.
DSP/BIOS
SIO-
SIO-Stream SIO
Interface
Let’
Let’s take a closer
T TO look at IOM…
IOM…
Technical Training
Organization
You will get a chance to examine several of these functions in the lab
X
I Reference D
H/W O A Algorithm
(Peripherals) Frameworks
M I
S
An Application Blueprint
Reference Frameworks
act ive te d
p ible ens nec
Com Flex Ex t Co n
Design Parameter RF1 RF3 RF5 RF6
Static Configuration
Dynamic Object Creation
Static Memory Management
Dynamic Memory Allocation
Recommended # of Channels 1 to 3 1 to 10+ 1 to 100 1 to 100
Recommended # of XDAIS Algos 1 to 3 1 to 10+ 1 to 100 1 to 100
Absolute Minimum Footprint
Single/Multi Rate Operation single multi multi multi
Thread Preemption and Blocking
Implements Control Functionality
Supports HWI HWI, SWI HWI, SWI, TSK HWI, SWI, TSK
FIR Vol
SWI Audio 0
In PIP Split SWI Join SWI PIP Out
FIR Vol
IOM IOM
SWI Audio 1
How about using the EDMA’s channel sorting capability to replace the “Split” and “Join” SWI’s.
This can be done since an IOM driver can be written to allow connections to multiple PIP’s. All
of this means less CPU MIPs tied up with moving data – and thus they can be applied to your
algorithms.
Instead of creating a driver from our own code, it is much easier to take a driver that already
exists and modify it to meet our system specs. This is what most people will do anyway.
Knowing the low-level EDMA and McBSP structures, you can easily modify an existing driver to
work in your own particular system.
First, we will use a canned “off-the-shelf” driver from the DDK (Driver Development Kit) which
covers the I/O interface (EDMA, McBSP, codec) and then modify our processing code to
communicate with the driver using Standard I/O (SIO, i.e. streams).
In the 2nd part of the lab, we will modify the existing driver to perform channel sorting and get it
working with our new processing code. This will provide you with the full knowledge of how to
use drivers in the C6000 world and modify them to your liking.
COPY XDAIS
P Filter
i
Xmt n
g
DAC XMTCHAN gBufferXmt
P Flash LEDs
o
n and Load
g
Lab 12 Procedure
In the first lab, the driver from the DDK hands us interleaved data – as opposed to the channel
sorting we’ve done all week long. So, we need to add “split” and “join” functions to properly talk
to the off-the-shelf driver. We will add the necessary stream interface (SIO) to talk to the driver
and see how the code runs. Let’s give it a try…
codec.c
edma.c
mcbsp.c
We don’t need these files because what they contain is already written in the DDK driver.
3. Delete code from main.c.
Open main.c and remove the following lines (again, this code is not necessary because it is
already contained in the driver):
#include <csl.h>
#include <csl_edma.h>
#include <csl_irq.h>
#include “edma.h”
#include “mcbsp.h”
#define PING 0
#define PONG 1
Void initHwi(void); //the ISR is inside the driver
Extern int pingOrPong
initMcBSP;
initEdma;
initHwi (both the call and the function)
McBSP_write …
dsk6416_codec_devParams.c
OR
dsk6713_codec_devParams.c
Click on the + next to Input/Output. Click on the + next to Device Drivers. Right-click on
User-Defined Devices and insert a new UDEV. Rename it to udevCodec.
Right-click on this new user-defined device and select Properties. Modify the properties as
follows. These names can be found in the header files for the chosen driver. Also, the func-
tion table type is IOM_Fxns because the model we’re using is an IOM model. If using the
6713DSK, replace “6416” with “6713” in the parameters below:
Then select the Preprocessor category. Locate the Include Search Path and add the following
to the path:
C:\CCStudio_v3.1\ddk\include
6416DSK:
dsk6416_edma_aic23.l64
c6x1x_edma_mcbsp.l64
6713DSK:
dsk6713_edma_aic23.l67
c6x1x_edma_mcbsp.l62
Examine sioFunctions.c
10. Add sioFunctions.c to your project and examine its contents.
Add sioFunctions.c to your project and examine the functions in the file. This file was
written by the authors of the workshop to encapsulate all of the SIO functions necessary to
communicate with the driver. In your own system, you will need similar functions to create
and prime the streams for whichever driver you are using.
createStreams( ) – creates the input and output SIO streams hooked to the appropriate
DIO, size and attributes
primeStreams( ) – allocates the dynamic memory buffers for ping and pong.
MEM_calloc is the BIOS API that dynamically creates these buffers in any heap.
splitBuff( ) – the canned driver hands the processing code interleaved data (LRLR) in-
stead of channel sorting it like we have before. So, a splitBuff() function is required to
split the (L)eft and (R)ight data channels.
joinBuff( ) – after processing is complete, we need to join the L and R buffers back to-
gether.
Add the following include files. <sio.h> is the header file that contains the APIs necessary for
using streams:
#include <std.h>
#include <sio.h>
12. Delete the allocations of the buffers (ping and pong) in main.c.
If you noticed in sioFunctions.c, this file allocates the buffers used by SIO – so, we
don’t need to allocate them in main.c anymore.
Delete the 8 global variables creating the rcv and xmt ping/pong buffers.
14. Move the SINE_init and initAlgs calls to the prolog of the TSK in processBuffer( ).
Cut (don’t delete) the 2 SINE_init statements and the initAlgs statement and paste them just
above the while(1) in processBuffer( ). This puts them in the prolog of the TSK. main( )
should now be completely empty.
short *source;
short *dest;
We only need two pointers at this time because L and R are combined.
16. Delete the if/else construct for pingOrPong in processBuffer( ).
Delete the entire if/else pingOrPong construct just below SEM_pend in processBuffer().
We no longer need to know whether we are processing ping or pong because the streams
handle that protocol for us. We simply issue 4 streams and the driver hands back ping, then
pong, then ping, etc.
17. Add the call to the stream functions to create/prime the streams.
Add the following 2 calls in processBuffer( ) between initAlgs( ) and while(1){ :
createStreams();
primeStreams();
This code is also in the TSK’s prolog and will only run at initialization.
18. Delete SEM_pend( ) and replace it with the _reclaim’s and splitBuff( ).
Delete the SEM_pend( ) statement and replace it with the following 3 lines:
SIO_reclaim(inStream,(Ptr*)&source,NULL);
SIO_reclaim(outStream,(Ptr*)&dest,NULL);
splitBuff(source,BUFFSIZE,sourceL,sourceR);
joinBuff(destL,destR,BUFFSIZE,dest);
SIO_issue(outStream,dest,BUFFSIZE*4,NULL);
SIO_issue(inStream,source,BUFFSIZE*4,NULL);
short sourceL[BUFFSIZE];
short sourceR[BUFFSIZE];
short destL[BUFFSIZE];
short destR[BUFFSIZE];
extern SIO_Handle inStream,outStream;
If you get a msg that says that CCS cannot find “divu.asm”, just ignore it. In Debug mode,
CCS will scan all of the source files so that you can perform mixed mode (C/asm) debug. The
DDK had a file called “divu.asm” that doesn’t exist anymore. This will be fixed in a future
build.
Click Run. The music should sound pretty good (other than the fact that you should be sick of
listening to the same midi file by now).
Now that our processing code uses streams to hook to the driver, let’s now MODIFY the existing
driver to perform channel sorting. We are going to change the low-level code of the driver to do
exactly what we want it to do. We’ve worked with the low-level EDMA configurations before, so
we have enough information to proceed.
File → Open
and browse the \audioapp folder. Find the folder called IOM original. These are the
original DDK driver files – only renamed with “audioapp” since we’ll be modifying them.
Also, the appropriate #include statements have been changed in order to accommodate the
name changes of the files. Examine the following screen capture of the IOM original
folder for future use and the exact spelling of all the filenames:
Like all I/O mini-drivers, the dsk6416_edma_aic23 driver uses channels and ports. Open the
file dsk6416_edma_aic23_audioapp.c (or 6713 equivalent) and examine it.
mdBindDev() – configures the AIC23 codec as well as the McBSP and binds them to the
driver as a port.
mdCreateChan() – configures the EDMA channels to transport data from the SIO buffer
to the McBSP (output) or from the McBSP to the SIO buffer (input), i.e. between the stream
and the port that was created by mdBindDev(). The AIC23 and McBSP configurations do not
need to be changed. However, the EDMA config structure will need to be modified to
perform channel sorting.
mdSubmitChan() – submits packets from the SIO stream to the driver to be placed in a
queue for linking into the EDMA. Since the EDMA handles the transport of samples to and
from the McBSP into and out of SIO stream buffers, the properties of the EDMA channel will
need to be modified in order to add de-interleaving to the driver.
mdDeleteChan() – you might suspect that this function might be affected, as it’s related to
the EDMA. However, this function only frees the EDMA resource to be used by the system,
and is not dependent upon the mode in which the EDMA was previously operating – so it
remains unchanged.
Modify the if/then/else statement that follows the definition of the EDMA configuration
structure to:
if (mode == IOM_INPUT) {
edmaCfg.opt |= EDMA_FMK(OPT, DUM, EDMA_OPT_DUM_IDX);
}
else {
This will change both source and destination update modes to use element/frame indexing
which is critical for channel sorting.
Open C6x1x_edma_mcbsp_audioapp.c.
Directly before this line of code (and, more importantly, after the pramPtr variable has been
initialized), insert the following code in order to calculate the number of Minimum
Addressable Units (bytes for the C6000) in each element (for us it will be two because we are
using shorts, but this code is more general) as well as the number of elements in each channel
(again, for us this will be the transfer count divided by two because we have a left and a right
channel, but let’s write the driver more generally.)
if(elemMaus == 3)
elemMaus = 4;
Note: chan->tdmChans is an element in the channel object which does not yet exist. We
will add this in later and initialize it in the mdCreateChan() function call.
29. Set up auto initialization and indexing for EDMA channel sorting.
Locate the following piece of code within the function, a few lines further down:
/*
* Load the transfer count into the EDMA. Use the ESIZE
* field of the EDMA job to calculate number of samples.
*/
Remove or comment out the EDMA_RSETH command above and replace it with the follow-
ing:
The variable tdmChans in the code above, does not currently exist as part of the ChanObj
structure (the instance object which is created every time a channel is opened). Previously the
channels did not perform channel sorting, so there was no reason to have this parameter in the
object.
Find the definition of the ChanObj structure at the beginning of
C6x1x_edma_mcbsp_audioapp.c and add the following variable to the structure:
Int tdmChans;
Position within the structure doesn’t matter, but for consistency with how the solutions are
built, insert it directly after the tcc element in the structure.
The value of tdmChans needs to be initialized. The proper place for this is in
mdCreateChan(). Scroll to the portion of this function labeled:
and insert the following line of code among the other initializations:
chan->tdmChans = params->tdmChans;
This will initialize the value of tdmChans within the channel object using the number of
channels which is passed from the calling function. Fortunately, tdmChans is already an
element in the parameter passing structure of this function call, so no further modifications
need to be made.
All 4 C files from the \IOM original folder (for whichever DSK you are using):
dsk6416_edma_aic23_audioapp.c -or-
dsk6713_edma_aic23_audioapp.c
dsk6416_aic23_audioapp.c -or-
dsk6713_aic23_audioapp.c
c6x1x_edma_mcbsp_audioapp.c
dsk6416_codec_devParams.c -or-
dsk6713_codec_devParams.c
Select:
Click the Preprocessor Category. Add the following path to the Include Search Path:
c:\iw6000\labs\audioapp\IOM original
Click OK.
37. Build your new library file and fix any errors.
Build your project and fix any errors. CCS had created a library file for us containing every-
thing we need for the driver to operate called myDriver.lib. Close the myDriver project.
Open your audioapp project and remove the following libraries and source files from it (or
the 6713 equivalent filenames):
c6x1x_edma_mcbsp.l64
dsk6416_edma_aic23.l64
dsk6416_codec_devParams.c
39. Add the new library file (myDriver.lib) to your project from \audioapp\Debug
folder.
Add the following 4 lines to the start of processBuffer( ) (do not delete the declarations for
source and dest that are already there):
short *sourceL;
short *sourceR;
short *destL;
short *destR;
sourceL = source;
sourceR = source + BUFFSIZE;
destL = dest;
destR = dest + BUFFSIZE;
45. When you're done playing, halt the processor and close CCS.
You’re done.
Introduction
Provides an introduction to the EMIF, the memory types it supports, and programming its
configuration registers.
Learning Objectives
Outline
Memory Maps
Memory Types
Programming the EMIF
Additional Memory Topics
T TO
Technical Training
Organization
Chapter Topics
External Memory Interface (EMIF) .......................................................................................................13-1
Memory Maps
Memory Map Review
L2 SRAM 8000_0000 9000_0000
128 MB 128 MB
CE0 CE1
C6000
EMIF
CPU
A000_0000 128 MB B000_0000 128 MB
CE2 CE3
A Memory Map is a
table representation
of memory… 8000_0000 128MB CE0
9000_0000 128MB CE1
A000_0000 128MB CE2
T TO
B000_0000 128MB CE3
Technical Training
Organization
FFFF_FFFF
L2
Internal On-chip Periph
Memory
FFFF_FFFF
T TO
Technical Training
Organization
‘C6x Addressing
EMIF
CE0
A24:A25 CE1
CPU CE2
32 CE3
EA2-21
A2:A21 20
DMA or BE0
EDMA BE1
32 A0:A1 BE2
BE3
With only 20 address pins, only SDRAM can access full 128M
Bytes per CE space
Not all CPU/DMA address lines are used in C6x01 example above
T TO
Technical Training
Organization
Memory Types
Overview
Memory Types Overview
16M Byte
CPU SDRAM
EMIF
Flash (ASYNC)
EDMA
I/O Port (ASYNC)
Using SDRAM
1. Select SDRAM and verify it meets system performance timing
Therefore:
To run the EMIF at 133MHz and meet the above requirements, the largest
memory size available today is 16M Bytes using two 2Mx32 SDRAMs.
Alternatively:
The largest memory size achievable using x32 devices is 32MBytes
using 4Mx32 SDRAMs. However, these devices are only available at
166Mhz.
Another option is to use x16 devices, but you have to use four of these
since the EMIF is 64 bits wide. Also, the fastest speed grade is 167MHz.
T TO
Technical Training
Organization
* These guidelines are for DM642 in June 2003. Other C6000 devices require similar consideration.
T TO
Technical Training
Organization
What is IBIS?
General IBIS Information:
http://www.eigroup.org/ibis/ibis.htm
T TO
Technical Training
Organization
http://www.eigroup.org/ibis/ibis.htm
T TO
Technical Training
Organization
Model Characteristics
T TO
Technical Training
Organization
15 12 0
TRC reserved
T TO
Technical Training
Organization
15 12 0
TRC reserved
From EMIF
SDRAM Clockspeed
T TO Datasheet
Technical Training
Organization
15 12 0
TRC reserved
A
Access 1
D
A
A
Memory Access 2
D
D
A
Access 3
D
T TO
Technical Training
Organization
EA, CE, BE
AOE
ARE
ED
Available via
Daughter Card
Connector
Available via
Daughter Card
Connector
T TO
Technical Training
Organization
150ns
Use EMIF’s
ARE pin 100ns
T TO Let's figure out the timing for the DSK's async Flash memory …
Technical Training
Organization
Writing to Flash
EA, CE, BE
AOE
ARE
ED
T TO
Technical Training
Organization
EA, CE, BE
AWE
ED
31 28 27 22 21 20
CE
EA, BE
AOE
ARE
AWE
ED
void emifInit(){
EMIFA_config(&C6416DskEmifConfigA);
}
T TO
Technical Training
Organization
GEL Startup
/*
/*
** The
TheStartUp()
StartUp()function
functionisiscalled
calledevery
everytime
timeyou
youstart
startCode
CodeComposer.
Composer.
** You can customize this function to perform desired initialization.
You can customize this function to perform desired initialization.
** This
Thisfunction
functionmay
maybe
becommented
commentedout outififno
noinitialization
initializationisisneeded.
needed.
*/
*/
StartUp()
StartUp() {{ Open
setup_memory_map();
setup_memory_map(); DSK6211_6711.gel
GEL_Reset();
GEL_Reset();
init_emif();
init_emif();
}}
T TO
Technical Training
Organization
T TO
Technical Training
Organization
PC 1
mem
regs 4
3 2
PC 1 2 3 4 5 6 7 8
mem
18
17 16 15 14 13 12 11 10 9
Even
Evenproviding
providingaazerozerowait-state
wait-stateoff-chip
off-chipmemory,
memory,thetheCPU’s
CPU’saccess
accesstime
timefor
for
external memory will be upwards of 18
external memory will be upwards of 18 cycles.cycles.
Total
Totalaffect
affectisisaa14
14cycle
cycledelay.
delay.(18
(18cycles
cyclesless
lessfour
fourafforded
affordedbybyC6000’s
C6000’s
hardware pipelining.)
hardware pipelining.)
C6201
C6201details
detailsare
areshown
shownhere.
here.Similar
Similarissues
issuesaffect
affectall
allC6000
C6000devices
devices(in
(infact,
fact,all
all
high perfμP),
highperf μP),but
buttheytheyare
aremanifested
manifesteddifferently.
differently.For
Forexample,
example,the
thecache
cacheininmore
more
recent
recentdevices
devicesmitigate
mitigatethetheaffect
affectofofthese
thesedelays
delaysby
bykeeping
keepingoften
oftenused
usedcode
codeand
and
data in faster on-chip memory.
data in faster on-chip memory.
T TO Besides cache, what is a better way to increase EMIF throughput?
Technical Training
Organization
EDMA
PC 1 2 3 4 5 6 7 8
mem
18
17 16 15 14 13 12 11 10 9
Unlike
Unlikethe
theCPU,
CPU,thetheEDMA
EDMA(and
(andDMA)
DMA)can
canpipeline-up
pipeline-upaccess
accessthrough
throughthe
theEMIF
EMIF
delays to achieve single-cycle throughput from zero wait-state externalmemories.
delays to achieve single-cycle throughput from zero wait-state external memories.
While
Whilethe
thefirst
firstaccess
accessmay
maytake
take1414cycles,
cycles,subsequent
subsequentaccesses
accessescan
canget
getdown
downtoto
aasingle cycle.
single cycle.
T TO
Technical Training
Organization
Fanout
‘C6201 Bus Fanout
Bus pin drivers rated for 30pf loading
Devices are designed for 45pf loads, but testing equipment
cannot guarantee it
Most memory devices present 5pf loads
Total fanout is six memory devices
While this slide is slightly old, the issue remains. Again,
IBIS modeling is an excellent way to deal with this issue.
H/W Max
Type Top Speed* Wait Size/Fan Glueless
ASYNC 100 MHz Yes 16 M/∞ Yes/No
SBSRAM 200 MHz No 3 MB Yes
SDRAM 100 MHz No 48 MB Yes
T TO
Technical Training
Organization
Flash
CE0
CE2
‘C6201 SBSRAM
FPGA
CE3
SDRAM
T TO
Technical Training
Organization
Shared Memory
Shared Memory
Arbiter
‘C6201
Other
μP
Costs
Using
What you
is
3-state
How can the μP
2extra:
buffers.
drawback of
Share the same
Speed, Power,
using a buffer
memory? Shared
Reliability,
One of the μP or
here? Memory
Money, etc.
another device
arbitrates.
Shared Memory
HOLD
Arbiter
‘C6201 HOLDA
Other
μP
T TO
Technical Training
Organization
9 8 7 6 5 4 3 2 0
T TO
Technical Training
Organization
SBSRAM
Synchronous Burst SRAM (SBSRAM)
SBSRAM's pipelines memory accesses
With Burst mode a processor only needs to generate an
address every four sequential accesses
Not required by C6000 DSP's as they're fast enough
'0x devices don't use (have) this feature
'1x devices include the burst feature for power savings
(only one address pin needs toggling for four sequential accesses)
Asynchronous Synchronous
t
Access 1
A1 A1 Burs
A1
D1 -A2
A2 -A3/D1
/D1
Access 2 -A4/D2
/D2
D2
A3 A5/D3
A5/D3
Access 3 -A6/D4
/D4
D3
A4 D5
D5
T TO
Access 4 D6
Technical Training
Organization
D4 D6
SBSRAM Timing
CE
ED D1 D2 D3
SSADS
SSOE
1 2 3 4 5 6 7 8 9
SDRAM Optimization
SDRAM Extension Register
31 21 20 19 18 17 16
RESERVED WR
2RD WR2DEAC TRRD R2W
15 14 12 11 10
9 8 7 6 5 4 3 1 0
RD
DQM RD2WR RD2DEAC 2RD THZP TWR TRRD TRAS TCL
CE1 Types Async Only Sync & Async Sync & Async
Sync Mem Both Either Both
Allowed SDRAM & SDRAM or All
SDRAM and SBSRAM
in System SBSRAM SBSRAM
Pipelined
SBSRAM 9 9 9 9 9 9
CE3 CE3
D D
0 1
T TO
Technical Training
Organization
Introduction
In this chapter, you will learn how to take a working application (e.g. Lab 11), program the
DSK’s flash with your application and use the bootloader to copy your application from Flash to
internal memory and run.
Outline
Flow of events in the system
Programming Flash
Flash Programming Procedure
Debugging ROM’d code
Lab
Chapter Topics
Creating a Stand-alone System ...............................................................................................................14-1
Device Reset
System Timeline
Hardware Software
Reset
H/W
Device
Reset
As shown below, certain actions are taken at reset that you need to be aware of.
Reset
Reset
RESET h/w status
actions taken
System Timeline
Hardware Software
Reset EDMA
H/W
Device Boot
Reset Loader
A bootloader basically copies the user’s application (or portions of code) from a slower, non-
volatile memory resource to a faster memory (typically internal). Some bootloaders offer various
options of booting from different types/sizes of memories, via the HPI or sometimes even via a
serial port. Typically, the on-chip DMA is used to perform this copy.
Boot options depend on the selected device. On the C671x devices, the boot options are fairly
limited. First, you must ALWAYS boot and the boot size is limited to 1K bytes. Given this
limitation, most users boot their own boot routine that copies the necessary sections from
flash/ROM to a faster memory.
You can also boot through a host processor connected to the HPI.
C671x Boot
0000_0000 reset
L2 ‘C671x
H
P Host
L2 EDMA I
CE0
Boot CPU
CE1 Logic
1KBytes
CE2 RESET BOOT Pins
CE3
HD[4:3] Boot Modes
Mode 0: Host boots C671x via HPI 00 Host Boot (HPI)
Modes 1, 2, 3: Memory Copy 01 8-bit ROM
EDMA copies from start of CE1 to 0x0 10 16-bit ROM
Uses default ROM timings
After transfer, PC = 0x0 11 32-bit ROM
Bootloader copies 1K bytes
Must always boot (No “no-boot” option)
User DIP
Switches
The C64x devices offer a few more options, including the option not to boot at all.
C64x Boot
0000_0000 reset
L2 ‘C64x
P H
C P Host
EMIFA CE0 L2 EDMA I I
CE1
CE2
CE3 Boot CPU
Logic
EMIFB CE0
CE1 RESET BOOT Pins
CE2 1KBytes
CE3
BEA[19:18] Boot Modes
Mode 0: No Boot bootmode; CPU starts at 0x0 00 None
Mode 1: Host boots C64x via HPI or PCI 01 Host Boot (HPI/PCI)
Mode 2: Memory Copy 10 EMIFB (8-bit)
EDMA copies from start of EMIFB CE1 to 0x0 11 Reserved
After transfer, PC = 0x0
Bootloader copies 1K bytes
The C6416 DSK also includes configuration switches to change how it boots up. Addtionally,
these switches let you select the endian mode and the speed of the CPU and EMIF clocks.
1 2 3 4 5 6 7 8 Configuration
0 x Little endian*
1 x Big endian
0 0 x EMIFB boot from 8-bit Flash*
0 1 x No Boot
1 0 x Reserved User DIP
1 1 x Host Boot Switches
0 0 0 0 1GHz CPU, 125MHz EMIFA*
DSK6416 flash is
0 0 1 1 720MHz CPU, 125MHz EMIFA located at EMIFB CE1
0 1 0 0 850MHz CPU, 125MHz EMIFA
*By default, all
1 0 0 1 500MHz CPU, 100MHz EMIFA switches set to 0
1 0 1 0 600MHz CPU, 100MHz EMIFA See more details in
DSK help files
System Timeline
Hardware Software
Reset EDMA boot.asm
H/W
Device Boot 2nd Boot
Reset Loader Loader
No Boot
or
From
EPROM
or
Via HPI
With a limitation of 1Kbytes on most of the C6000's, users will need to create their own boot
routine. Because the C environment is not yet initialized, this code is normally written in
assembly. The code shown below (boot.asm) is booted at reset by the on-chip EDMA and then
the PC is loaded with 0x0 and the boot routine runs. When the boot loader is finished, it normally
calls the C init routine, c_int00( ).
C Initialization
The c_int00() routine that is provided by TI in the run-time support library, initializes the C
environment including all of the BIOS setup and then calls the application’s main code.
System Timeline
Hardware Software
Reset EDMA boot.asm Provided
H/W by TI
Device Boot 2nd Boot BIOS_init
Reset Loader Loader ( _c_int00 )
No Boot EMIF
or Self test
From Load
EPROM remaining
or initialized
Via HPI sections
The BIOS_init( ) routine, which is used if your calling BIOS, initialized everything that BIOS
needs and also calls c_int00.
BIOS_init (_c_int00)
Initialize the C Initialize C environment:
environment … Init global and static vars
(copy .cinit → .bss )
Setup stack pointer (SP) and
global pointer (DP)
System Timeline
Hardware Software
Reset EDMA boot.asm Provided main.c
H/W by TI
Device Boot 2nd Boot BIOS_init System
Reset Loader Loader ( _c_int00 ) Init Code
BIOS_start
Returning from main( ), invokes the BIOS_start( ) routine to get BIOS started.
System Timeline
Hardware Software
Reset EDMA boot.asm Provided main.c Provided
H/W by TI by TI
Device Boot 2nd Boot BIOS_init System BIOS_start
Reset Loader Loader ( _c_int00 ) Init Code
DSP/BIOS Scheduler
When BIOS_start( ) completes, it calls the IDL loop which runs until a higher priority thread
becomes ready to run.
System Timeline
Hardware Software
Reset EDMA boot.asm Provided main.c Provided Provided
H/W by TI by TI by TI
Device Boot 2nd Boot BIOS_init System BIOS_start DSP/BIOS
Reset Loader Loader ( _c_int00 ) Init Code Scheduler
Programming Flash
When developing a system, you have various non-volatile memory choices. Many users have
Data I/O programmers available to program their ROM or Flash memory. Others may use a flash
algorithm to perform this task on the fly. In the development stage, it is often handy to be able to
program the flash on the target board itself.
Non-Volatile Memory
Non-volatile Options
ROM
EPROM Flash
C6000
FLASH
CPU
S
RAM D R
R A
A M
M
If you decide to use Flash, you have several options to choose from depending on your system
and development needs. In this class, we will focus on using FlashBurn.
FlashBurn is a CCS plug-in that downloads a small flash algorithm to the DSP and then
communicates w/the host via the JTAG. The selected application is read by the flash algorithm
on-chip and it programs the flash accordingly. FlashBurn requires the user to create a hex image
of the executable .out file.
Flashburn
CCS
EPROM
image DSK
file
DSP
FBTC L2
file RAM
1.1. Flashburn
Flashburnplugin
plugindownloads
downloadsand andruns
runsthe
theFBTC
FBTCfile
file
(FlashBurn
(FlashBurnTransfer
TransferControl)
Control)totoestablish
establishcontinuous
continuouslink
link Flash
between
betweenCCSCCS&&DSP.
DSP.
2.2. Choose
Choose“Erase
“EraseFlash”
Flash”tototell
tellFBTC
FBTCprogram
programrunning
runningononDSP
DSP
totoerase the flash memory.
erase the flash memory.
3.3. Select
Select“Program
“ProgramFlash”
Flash”totostream
streamthe
theEPROM
EPROMimage
imagefile
file(.hex)
(.hex)
down
downtotothe
theDSP.
DSP.
•• The TheFBTC
FBTCprogram
programmustmustbe becustomized
customizedfor
forwhatever
whateverflash
flash
memory
memoryisisononthe
thetarget
targetboard
board(documentation
(documentationisisprovided).
provided).
Using FlashBurn
Flashburn saves
these settings to a
.CDD file
Flash Burn Transfer
Controller (FBTC)
When FBTC has been
downloaded to DSP
and is running,
FlashBurn is
“connected” to DSP
Debug Flow
File→Load Program…
DSK
C6x
CPU
Flash
L2
SDRAM
First, you build your project. Then you pass the .out file to the hex6x utility to create the image
for the FLASH. This image also contains the copy table that is used by the secondary bootloader.
Finally, use FlashBurn to program the flash memory with the .hex file. You can now boot from
the FLASH, you can reset and disconnect CCS !
hex.cmd
app.cdd
DSK
C6x
CPU
FlashBurn Flash
RAM
Flash/Boot Procedure
Follow these steps to create your stand-alone system – including boot and programming the flash
memory on the DSK. You’ll get a chance to actually use this procedure in the lab.
Step 1
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
Verify address for “top of Flash memory” in your system
Plan for BOOT memory object 1KB in length
o Created for secondary boot-loader (boot.asm)
o Not absolutely required, but provides linker error if
boot.asm becomes larger than 1KB
Note, when using the hardware boot, you do not have to
relink your program with run/load addresses, HEX6x will
take care of this for you (step #4)
Shown below is the overall system memory map – what we’ve created by using a combination of
the BIOS Mem Manager and our own linker command file. It also points out that some parts of
our system will have separate load and run addresses.
T
8000_0000 8000_0000 SDRAM
O
O
init + uninit
B
9000_0000 FLASH 9000_0000 FLASH
“boot.asm”
“boot.asm”
9000_0400 FLASH 9000_0400 FLASH
“initialized sections” “init sections”
9002_0000 9002_0000
Step 2
Now that we have our memory organized, we can create anything that we need for boot loading.
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
New
New
Memories
Memories listed
listed in
in
our previous
our previous
memory-maps
memory-maps
Step 3
If you have any user created sections, you'll need to place them with your own linker command
file.
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
You'll probably have at least one user section created for the secondary boot loader code.
Step 4
Now that you've got everything organized, you need to create the .hex image file from your .out
file. We'll use hex6x.exe to do this.
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
Hex6x converts the application’s .out file to .hex so that the flash programmer can use it. Hex6x
requires a command file which specifies the input file (.out), options, and memory map.
ASCII-hex
app.out hex6x Tektronix
Intel MCS-86
Motorola-S
TI-tagged
Hex6x uses hex.cmd to determine how to convert the .out file to .hex. It specifies the input file,
options, flash location and size and which sections are to be converted. The output of hex6x is the
applications .hex file that is used by the flash programmer.
If –e is not used to set the entry point, then it will default to the
entry point indicated in the COFF object file.
For more information on using Hex6x for building a boot image, please refer the
the C6000 Assembly Language Tools Users Guide (SPRU186).
Here how the -boot options specify the ROM image will be built.
The -boot option causes HEX6x to create a COPY_TABLE which can then be used by our
secondary bootloader to copy all our initialized sections into their runtime locations. Shown are
the copy table along with pseudo version of the secondary bootloader.
The resulting MAP file shows the sections that were actually placed into the ROM image.
Step 5
Once we've got a .hex file, we need to burn it to the Flash.
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
Flashburn is a simple CCS plug-in that can burn the Flash on the DSK.
Using FlashBurn
Flashburn saves
these settings to a
.CDD file
Flash Burn Transfer
Controller (FBTC)
When FBTC has been
downloaded to DSP
and is running,
FlashBurn is
“connected” to DSP
Steps 6 and 7
The last two steps are really easy, Flashburn does most of the work. Before you burn the Flash
and see if the system works with the boot loader in place, you need to erase the Flash.
Flash/Boot Procedure
1 Plan out your system’s memory map – Before and After boot.
In cases where a host processor exists, it is advantageous to combine both processors boot images
into a single ROM – which is usually owned by the host. In these systems, the DSP would then be
configured to boot in "Host" mode, and the host would be required to copy all the initialized
section information from its Flash ROM to the DSP's memory. Essentially, the host boot
processes replaces the need of the secondary boot loader we have just discussed.
The problem, though, is how to get the DSP's boot image (all the initialized section information)
into the host's memory map. This boot image needs to contain both the initialized information,
along with the address where each piece of information needs to go.
Using the Object File Description (OFD6x) tool along with an XML-capable script, the initialized
sections from the .OUT file can be converted into a C data initialization table. This C data table
can then be used by a function on the host to copy each of the initialization values into their
respective address on the DSP.
Target
perl Host System
script CPU
Flash
RAM
appimage.c
This process is documented in the application note, Using OFD Utility to Create a DSP Boot
Image (SPRAA64.PDF). Along with the app note, you can download a code example which
contains a Perl script to perform the conversion described above.
Solutions:
1. Use Hardware breakpoints to help locate the problem.
To debug ROM program, it’s especially important to put a
H/W breakpoint at the start of your program, otherwise you
won’t be able to halt the code in time to see what executing.
2. Create a “stop condition” (infinite loop) in your boot code.
When the code stops, open CCS and load the symbol table.
Here are a few things that burned us when we tried to Flash our first program. We thought we'd
pass them on to make your life easier.
Requirements
Convert application to hex format
Burn FLASH with application/boot code
Run from power-on RESET (debug if necessary)
Objective
The objective of this lab is to set up your system to boot your application from Flash and run
from internal memory. This process will follow the 7 step procedure that we outlined in the
discussion material:
• Modifying the .cdb file to account for the changes from the above step
• Create a user defined linker command file to place user defined sections
• Using FlashBurn utility to erase and program the flash with the .hex file
LAB14 Procedure
Open the Audioapp Project
1. Reset the DSK, start CCS and open audioapp.pjt
0x00000400
0x000FFC00
Open your configuration file. Click on the little + next to System to expand it. Next, expand
67 the MEM – Memory Section Manager. You should see that we currently have the following
segments: CACHE_L2, IRAM, and SDRAM.
Before we create a new segment for BOOT, we need to change the IRAM segment. If we try
to add the BOOT section first, the Configuraton Tool will complain that we have overlapping
sections.
Right-click on the IRAM segment and choose properties. Change the base and len properties
to look like this:
0x00000400
0x0002FC00
0x00000000
0x00000400
Note: Make sure to change all of the properties like turning off the heap and changing the space
property.
0x90000000
0x00040000
Note: Make sure to change all of the properties like turning off the heap and changing the space
property.
SECTIONS
{
.boot_load :> BOOT
}
Tools → Flashburn
13. Open the audioapp_6416 .cdd (or audioapp_6713.cdd) file
We have already created a configuration file for you that has all of the information that
Flashburn needs to do its job. Open this file inside of Flashburn. The file is named
audioapp_6416 .cdd (or audioapp_6713.cdd) and it is located at:
File → Open
C:\iw6000\labs\audioapp\Debug\audioapp_6416.cdd
You should now see a window that looks like this (the 6713 file will look a little different):
Note: Make sure “Verify Write” is checked in the above dialogue box. Flashburn should
automatically connect to the target when you open the .cdd file. If it does not, you need to
use CCS to run the CPU. When you do this, Flashburn should connect to the target and
you should see this icon in flashburn:
Up Down
Switch 0 No sine wave Add sine wave
Switch 1 Filter disabled Filter enabled
21. Congratulations! You just flashed the audio application to the DSK
You now have successfully booted your BIOS application from Flash and are running
independently of the CCS tools.
Let your instructor know when you have reached this point before going on.
Part A
Debug Boot and Application Code with CCS
22. Introduction
You know, it’s wonderful when everything works the first time you burn the flash and boot
from reset. But what if something goes wrong? If your application was working before you
booted/flashed, and now it’s not working, what went wrong? Is the problem in your boot
routine? Your app? Your memory management? Interrupts? Load vs. Run addresses? BIOS?
Well, it’s tough to tell. Also, how do you debug code that is in the flash memory or your boot
code that runs from reset? This next section of the lab will explore the following areas:
• using hardware breakpoints in your boot routine
• using CCS to debug your code – loading “symbols” vs. loading an entire program
• setting breakpoints in bootloaded code executing from RAM
• debugging your application
• using real-time analysis tools with a bootloaded application
Debug → Halt
Note: If you actually click on main() (not the opening brace), you will get an error that CCS
needs to move that breakpoint to a valid line. The breakpoint needs to be associated with
an address, and there is no code (i.e. no address) associated with the line of code that
contains the function name.
Note: If you don't have time to move on to the next part, you may want to skip to the last part of
the lab, Flashing POST, to reprogram the Flash with the POST routine that came with the
DSK.
Part B
Overlay Data Sections on Top of Boot Section
Internal RAM is a precious resource. Many times a user will want to use a single piece of
memory to contain different code or data at different times. We call this an overlay. In our
system, we have the .boot_load section using up the lower 0x400 memory locations and will
never be called again. Why waste 1K bytes of precious internal memory? Why not put
something useful there? We could take another piece of code and use the EDMA to copy it
over those locations, but in this case, it might be easier to map an uninitialized section, like
.bss on top of it. If you didn’t remember, .bss contains the uninitialized global and static
variables. Or, you could do the same operation with a user-defined uninitialized section.
32. Check to see if .bss will fit in the first 0x400 bytes of memory.
Let’s make sure .bss will fit in the first 1K of memory. Open up audioapp.map in your
\audioapp\debug\ folder and find the .bss section. What is the length? About 0x0568,
right? That’s larger than 0x0400. We could pick another uninitialized section or we can
increase the size of BOOTRAM to accommodate .bss. Let’s try that. Close the .map file.
Note: If your .bss section happens to be larger than 0x0568, then you'll need to increase the
number in the rest of this lab. If your .bss size is smaller than 0x0568, you can decrease
the number or you can simply use the larger number.
Note: DSK6713 Users should use 0x400 for the base and 0x2FA00 for the len in IRAM
properties and 0x400 for the len in BOOT Properties below.
0x00000600 0x00000000
0x000FFA00 0x00000600
67
Use IRAM,
not ISRAM.
This tells the linker to resolve the run-time addresses of both the .boot_load and .bss
sections within the BOOTRAM memory area, while the load-time address of .boot_load is
within FLASH. The .bss section has no load-time address because it is an unitialized section.
Close and save link.cmd.
Flashing POST
You probably don't want to leave your DSK running the audio application. Here are the steps to
program the flash with the post routine.
38. Reconnect your USB emulation cable
39. Open Code Composer Studio
40. Open Flashburn
Tools → Flashburn
File → Open…
Make sure that Flashburn is connected. If not, you may need to run the processor inside of
CCS (in fact, you probably will have to in order to connect).
42. Erase the flash
You’re done
Introduction
As the performance of DSPs increase, the ability to put large, fast memories on-chip decreases.
Current silicon technology has the ability to dramatically increase the speed of DSP cores, but the
speed of the memories needed to provide single-cycle access for date and instructions to these
cores are limited in size. In order to keep DSP performance high while reducing cost, large, flat
memory models are being abandoned in favor of caching architectures. Caching memory
architectures allow small, fast memories to be used in conjunction with larger, slower memories
and a cache controller that moves data and instructions closer to the core as they are needed. The
‘C6x1x devices provide a two-level cache architecture that is flexible and powerful. We'll look at
how to configure the cache and use it effectively in a system.
Outline
Why Cache?
Cache Basics
Cache Example (Direct-Mapped)
C6211/C671x Internal Memory
‘C64x Internal Memory Overview
Additional Memory/Cache Topics
Using the C Optimizer
Lab 15
Chapter Topics
Why Cache? ...........................................................................................................................................15-3
Cache vs. RAM .................................................................................................................................15-5
Cache Fundamentals .............................................................................................................................15-7
Direct-Mapped Cache..........................................................................................................................15-11
Direct-Mapped Cache Example.......................................................................................................15-12
Three Types of Misses.....................................................................................................................15-20
C6211/C671x Internal Memory ...........................................................................................................15-21
L1 Data Cache (L1P).......................................................................................................................15-22
L1 Data Cache (L1D) ......................................................................................................................15-25
L2 Memory......................................................................................................................................15-29
L2 Configuration .............................................................................................................................15-34
C64x Internal Memory Overview.........................................................................................................15-36
Additional Memory/Cache Topics........................................................................................................15-37
'C64x Memory Banks ......................................................................................................................15-37
Cache Optimization .........................................................................................................................15-39
Data Cache Coherency ....................................................................................................................15-40
“Turn Off” the Cache (MAR)..........................................................................................................15-49
Using the C Optimizer .........................................................................................................................15-52
Compiler Build Options...................................................................................................................15-52
Using Default Build Configurations (Release) ................................................................................15-53
Optimizing C Performance (where to get help)...............................................................................15-53
Lab15 – Working with Cache...............................................................................................................15-54
Lab 15 Procedure ................................................................................................................................15-55
Move Buffers Off Chip and Turn on the L2 Cache .........................................................................15-55
Use L2 Cache Effectively................................................................................................................15-59
Lab15a – Using the C Compiler Optimizer .........................................................................................15-62
Optional Topics....................................................................................................................................15-65
‘0x Memory Summary.....................................................................................................................15-65
‘0x Data Memory – System Optimzation ........................................................................................15-66
Why Cache?
In order to understand why the C6000 family of DSPs uses cache, let's consider a common
problem. Take, for example, the last time you went to a crowded event like the symphony, a
sporting event, or the ballet, any kind of event where a lot of people want to get to one place at
the same time. How do you handle parking? You can only have so many parking spots close to
the event. Since there are only so many of them, they demand a high price. They offer close, fast
access to the event, but they are expensive and limited.
Your other option is the parking garage. It has plenty of spaces and it's not very expensive, but it
is a ten minute walk and you are all dressed up and running late. It's probably even raining. Don't
you wish you had another choice for parking?
Parking Dilemma
10 minute walk
Parking Choices:
0 minute walk @ $100 for close-in parking
10 minute walk @ $5 for distant parking
or …
Valet parking: 0 minute walk @ only $6.00
You do! A valet service gives the same access as the close parking for just a little more cost than
the parking garage. So, you arrive on time (and dry) and you still have money left over to buy
some goodies.
Cache is the valet service of DSPs. Memory that is close to the processor and fast can only be so
big. You can attach plenty of external memory, but it is slower. Cache helps solve this problem
by keeping what you need close to the processor. It makes the close parking spaces look like the
big parking garage around the corner.
Why Cache?
Cache Bulk
Memory
Memory
Sports Fast Slower
Arena Small Larger
Works like
Big, Fast Cheaper
Memory
Memory Choices:
Small, fast memory
Large, slow memory
or … Use Cache:
Combines advantages of both
Like valet, data movement is automatic
One of the often overlooked advantages of cache is that it is automatic. Data that is requested by
the CPU is moved automatically from slower memories to faster memories where it can be
accessed quickly.
If your entire system code cannot fit on chip but individual, critical routines will fit, place them
into the on-chip program RAM as needed using the DMA. Again, this method is manual and can
become complex very quickly as the system changes and new routines are added.
In the example above, the system has three functions (func1, func2, and func3) that will fit in the
on-chip program memory located at 0x0. The system designer can set up a DMA transfer from
0x8000 to 0x0 for the length of all three functions. Then, when the functions are executed they
will run from quick on-chip memory.
Unfortunately, the details of setting up the DMA-copy are left to the designer. Several of these
details change every time the system/code is modified (i.e. addresses, section lengths, etc.).
Worse yet, if the code grows beyond the size of the on-chip program memory, the designer will
have to make some tough choices about what to execute internally, and which to leave running
from external memory. Either that, or implement a more complicated system which includes
overlays.
Using Cache
The cache feature of the ‘C6000 allows the designer to store code in large off-chip memories,
while executing code loops from fast on-chip memory … automatically.
That is, the cache moves burden of memory management from the designer to the cache
controller – which is built into the device.
Cache
CPU EMIF
H/W
Notice that Cache, unlike the normal memory, does not have an address. The instructions that are
stored in cache are associated with addresses in the memory map. Over the next few pages we
further describe the term associated along with how cache works, in general.
Cache Fundamentals
As stated earlier, locations in cache memory do not have their own addresses. These locations are
associated with other memory locations. You may think of it like cache locations “shadowing”
addressable memory locations (usually a larger, slower-access memory).
As part of its function, cache hardware and memory must have an organizational method to keep
track of what addressable memory locations it contains.
0xF 0x8010
Index
0x8020
Conceptually, a cache divides the entire
memory into blocks equal to its size
A cache is divided into smaller storage Block
locations called lines
The term Index or Line-Number is used to
specify a specific cache line
In the example above, the cache has 16 lines. Therefore, the entire memory map (or at least the
part that can be cached) is broken up into 16 line blocks. The first line of each block is associated
with the first line in cache; the second line of each block is associated with the second line of
cache, continuing out to the 16th line. If the first line of cache is occupied by information from the
first block and the DSP accesses the same line from the second block, the information in the
cache will be overwritten because the two addresses reside at the same line.
Cache Tag
When values from memory are copied into a line or more of cache, how can we keep track of
which block they are from?
The cache controller uses the address of an instruction to decide which line in cache it is
associated with, and which block it came from. This effectively breaks the address into two
pieces, the index and the tag. The index determines which line of cache an instruction will reside
at in cache (and the lower order bits of the address represent it). The tag is the higher order bits of
the address, and it determines which block the cache line is associated with in the memory map.
Cache Tags
Tag Index Cache External
800 0 Memory
.. 0x8000
.
0xF 0x8010
While a single tag will allow the cache to discern which block of memory is being “shadowed”, it
requires all lines of the cache to be associated with the same block of memory. As caches become
larger, as is the case with the C6000, you may want different lines to be associated with different
blocks of memory. For this reason, each line has an associated tag.
Cache Tags
Tag Index Cache External
800 0 Memory
801 1
.. 0x8000
.
0xF 0x8010
Valid Bits
Just because a cache can hold, say, 4K bytes, that doesn’t mean that all of its lines will always
have valid data. Caches provide a separate valid bit for each line. When data is brought into the
cache, the valid bit is set.
When a CPU load instruction reads data from an address, the cache is examined to see if the
valid, specified address exists in the cache. That is, at the index specified by the address, does the
correct tag value exist and is it marked valid?
Valid Bits
Valid Tag Index Cache External
1 800 0 Memory
1 801 1
.. .. 0x8000
. .
0
0 721 0xF 0x8010
Note: Given a 4K byte cache, do the bits associated with the cache management (tag, valid,
etc.) use up part of the 4K bytes? The answer is No. When a 4K byte cache is specified,
we are indicating the amount of usable memory.
Direct-Mapped Cache
A Direct-Mapped cache is a type of cache that associates each one of its lines with a line from
each of the blocks in the memory map. So, only one line of information from any given block can
be live in cache at a given time.
Direct-Mapped Cache
Index Cache External
0 Memory
.. 0x8000
.
0xF 0x8010
Another way to think about this is, “For any given memory location, it will map into one, and-
only-one, line in the cache.”
15 4 3 0
Tag Index
The best way to understand how a cache works is by studying an example. The example below
illustrates how a direct-mapped cache with 16-bit addresses operates on a small piece of code. We
will use this example to understand basic cache operation and define several terms that are
applicable to caches.
Arbitrary Direct-Mapped
Cache Example
The following example uses:
16-line cache
16-bit addresses, and
Stores one 32-bit instruction per line
C6000 cache’s have different cache and
line sizes than this example
It is only intended as a simple cache
example to reinforce cache concepts
Note: The following cache example does not illustrate the exact operation of a 'C6000 cache.
The example has been simplified to allow us to focus on the basic operation of a direct-
mapped cache. The operation of a 'C6000 cache follows the same basic principles.
Example
0026h L2 ADD
0027h SUB cnt
0028h [!cnt] B L1
15 4 3 0
Tag Index
The first time instructions are accessed the cache is cold. A cold cache doesn't have anything in it.
When the DSP accesses the first instruction of our example code, the LDH, the cache controller
uses the index, 3, to check the contents of the cache. The cache controller includes a valid bit for
each line of cache. As you can see below, the valid bit for line 3 is not set. Therefore, the LDH
instruction causes a cache miss. More specifically, this is called a compulsory miss. The
instruction has to be fetched from memory at its address, 0x0003. This operation will cause a
delay until the instruction is brought in from memory.
When the LDH instruction is brought in from memory, it is given to the core and added to the
cache at the same time. This operation minimizes the delay to the core. When the instruction is
added to the cache, it is added to the appropriate index line, the tag is updated, and the valid bit is
set.
The following three instructions are added to the cache in the same manner. When they have all
been accessed, the cache will look like this:
Notice that the branch instruction is the last instruction that was transferred by the cache
controller. A branch by definition can take the DSP to a new location in memory. The branch in
this case takes us to the label tst, which is located at 0x0026.
When the CPU fetches the ADD instruction, it checks the cache to see if it currently resides there.
The cache controller checks the index, 6, and finds that there is something valid in cache at this
index. Unfortunately, the tag is not correct, so the add instruction must be fetched from memory
at its address.
Since this is a direct-mapped cache, the ADD instruction will overwrite whatever is in cache at its
index. So, in our example, the ADD will overwrite the B instruction since they share the same
index, 6.
The DSP executes the instructions after the ADD, the SUB and the B. Since they are not valid in
cache, they will cause cache misses.
When the branch executes, it will take the DSP to a new location in memory. The branch in this
case takes the DSP to the address of the symbol lbl, which is 0x0003. This is the address of the
original LDH instruction from above.
When the DSP accesses the LDH instruction this time, it is found to be in cache. Therefore, it is
given to the core without accessing memory, which removes any memory delays. This operation
is called a cache hit.
A few observations can be made at this point. Instructions are added to cache only by accessing
them. If they are only used once, the cache does not offer any benefit. However, it doesn't cause
any additional delays. This type of cache has the biggest benefit for looped code, or code that is
accessed over and over again. Fortunately, this is the most common type of code in DSP
programming.
Notice also what seems to be happening at line 6. Each time the code runs, line 6 is overwritten
twice. This behavior is called thrashing the cache. The cache misses that occur when you are
thrashing the cache are called conflict misses. Why is it happening? Is it reducing the
performance of the code?
Thrashing occurs when multiple elements that are executed at the same time live at the same line
in the cache. Since it causes more memory accesses, it dramatically reduces the performance of
the code. How can we remove thrashing from our code?
The thrashing problem is caused by the fact that the ADD and the B share the same index in
memory. If they had different indexes, they would not thrash the cache. So, a simple fix to this
problem is to make sure that the second piece of code (ADD, SUB, and B) doesn't share any
indexes with the first chunk of code. A simple fix is to move the second chunk down by one line
so that its indexes start at 7 instead of 6.
This relocation can be done several different ways. The simplest is probably to make the two
sections contiguous in memory. Code that is contiguous and smaller than the size of the cache
will not thrash because none of the indexes will overlap. Since code is placed in the same
memory section a lot of the time, it will not thrash. Given the possibility of thrashing, caution
should be exercised when creating different code sections in a cache based system.
Types of Misses
Compulsory
Miss when first accessing an new address
Conflict
Line is evicted upon access of an address whose
index is already cached
Solutions:
Change memory layout
Allow more lines for each index
Capacity (we didn’t see this in our example)
Line is evicted before it can be re-used because
capacity of the cache is exhausted
Solution: Increase cache size
The CacheTune tool withing CCS helps visualize different types of cache misses.
CacheTune
Cache
CacheHitHit
Hit/Miss
Hit/Miss
Cache
CacheMiss
Memory Locations →
Miss
L1
Level 2
Level 3
The third memory chunk is called L2 memory. The processor will look for an address in L1
memories first; if not found L2 memory is examined next. L2 memory may be addressable RAM
or cache – its configurability will be discussed shortly.
Finally, on these DSPs, all external memory is considered Level three memory since it is the third
location examined in the memory access hierarchy. Of course, this makes sense since external
accesses are slower than internal accesses.
L1P Cache
External
Program 4KB Memory
Cache (L1P)
CPU L2 EMIF
for(
for(ii==0;
0;ii<<10;
10;i++
i++)){{
Cache is always on sum += x[i] * y[i];
sum += x[i] * y[i];
}}
Direct-Mapped Cache
Works exceptionally well for DSP code
(which tends to have many loops)
Can be placed to minimize thrashing
The cache is 4K bytes
Each line stores 16 instructions (Linesize = 16)
L1P has 4KB of cache broken into cache lines that store 16 instructions. So, the linesize of the
L1P is 16 instructions. What do we mean by linesize …
Increasing the linesize does not change the basic concepts of cache. The cache is still organized
with: blocks, lines, tags, and valid-bits. And cache accesses still result in hits and misses. What
changes, though, is how much information is brought into cache when a miss occurs.
Let’s look at a simple linesize comparison. In this case, let’s look at a line that caches one byte of
external memory …
0x8010
0xF
0x8020
In our earlier cache example, the size was:
Size: 16 bytes
Linesize: 1 byte
# Of index’s: 16
Block
0x8010
0x8020
In our earlier cache example, the size was:
Size: 16 bytes
Linesize: 1 byte
# Of index’s: 16
Block
We have now changed it to:
Size: 16 bytes
Linesize: 2 bytes What’s the advantage of greater line size?
# Of index’s: 8
Speed! When cache retrieves one item, it
gets another at the same time.
Notice that the block size is consistent in both examples. Of course, when the linesize is doubled,
then number of indexes is cut in half.
Increasing the linesize often may increase the performance of a system. If you are accessing
information sequentially (especially common when accessing code and arrays), while the first
access to a line may take the extra time required to access the addressable memory, each
subsequent access to the cache line will occur at the fast cache speeds.
Coming back to the L1P, when a miss occurs, not only do you get one 32-bit instruction, but the
cache also brings in the next 15 instructions. Thus, if your code execute sequentially, on the first
pass through your code loops, you will only receive one delay every 16 instructions rather than a
delay for every instruction.
A direct mapped cache is very effective for program code where a sequence of instructions is
executed one after the other. This effect is maximized for looped code, where the same
instructions are executed over and over again. So a direct-mapped cache works well when a
single element (instruction) is being accessed at a given time and the next element is contiguous
in memory.
Caching Data
Tag Data Cache
0
External
Memory
4K x
One instruction may access multiple
data elements:
for( i = 0; i < 4; i++ ) {
sum += x[i] * y[i];
}
y
What would happen if x and y ended up at
the following addresses?
x = 0x8000
y = 0x9000
If the addresses of X and Y both began at the start of a cache block, then they would end up
overwriting each other in the cache, which is called thrashing. x0 would go into index 0, and then
y0 would overwrite it. x1 would be placed in index 1, and then y1 would overwrite it. And so on.
Increased Associativity
Valid Tag Data Cache External
0 Memory
Way 0 0x08000
2K
0 0x10800
Way 1
2K
0x11000
Split a Direct-Mapped Cache in half
Each half is called a cache way
Multiple ways makes data caches more efficient 0x11800
C671x/C621x L1D dimensions:
4K Bytes
2 Ways
32 Byte linesize
Cache Sets
All of the lines from the different cache ways that store the same line from memory form a set.
For example, in a 2-way cache, the first line from each way stores the first line from each of the N
blocks in memory. These two lines form a set, which is the group of lines that store the same
indexes from memory. This type of cache is called a set associative-cache. So, if you have 2
cache ways, you have a 2-way set-associative cache.
What is a Set?
External
The lines from each way that map to the Memory
same index form a set
0x8000
Data Cache
0
Set of index zero’s, 0x8008
i.e. Set 0
0 0x8010
Set 1
0x8018
Another way to look at this is from the address point of view. In a direct-mapped cache, each
index only appears once. In an N-way set-associative cache, each index appears N times. So, N
items from the same index (with the same lower address bits) can reside in the cache at the same
time. In reality, a direct-mapped cache can be thought of as a 1-way set-associative cache.
Take the same example as shown with two cache ways. Now, x[i] and y[i] each have their own
location in the cache, and the thrashing is eliminated. The programmer does not have to worry
about where the data elements ended up in their system because the associativity allows more
flexibility.
0 0x10800
Way 1
2K
0x11000
Least recently used set is replaced
Least Recently Used (LRU) algorithm
makes sure that the most recently 0x11800
accessed data is in cache
Whenever the cache is updated, the LRU
value is toggled
The cache controller uses a Least Recently Used (LRU) algorithm to decide which cache way
line to overwrite when a cache miss occurs. With this algorithm, the most recently accessed data
is always stays in the cache. Note that this may or may not be the "oldest" item in the cache,
rather the most recently “used”. In a 2-way set-associative cache, this algorithm can be
implemented with a bit per line. The LRU algorithm maximizes the effect of temporal locality,
which caches depend upon to maximize performance.
L2 Memory
The Level 2 memory (L2) is a middle hierarchical layer that helps the cache controller keep the
items that the CPU will need next closer to the L1 memories. It is significantly larger (64Kbytes
vs. 4Kbytes on the C6711) to help store larger arrays/functions and keep them closer to the CPU.
It is a unified memory, meaning that it can store both code and data.
The L1P and L1D are the 'C6x11's highest order memories in the hierarchy. As you move further
away from these memories, performance decreases. CPU requests are first sent to these fast
memories, then to slower memories lower in the hierarchy. The highest orders are designed to
store the information that the CPU needs based on temporal and spatial locality. Intermediate
levels can be inserted between the highest order (L1P and L1D) and the lowest order (external
memory) to serve as a larger buffer that further increases performance of the memory system.
Again, L2 is a middle hierarchical layer that helps the cache controller keep the items that the
CPU will need next closer to the L1 memories.
Here is a simple flow chart of the decision process that the cache controller uses to fulfill CPU
requests.
CPU requests
data
Copy Data
No No
from
Is data in L1? Is data in L2?
External Mem
to L2
Yes Yes
Enhanced
External DMA
EMIF
Memory (EDMA)
Cache
Peripheral Port
If you use the DMA to read from on-chip peripherals – such as the McBSP – you might prefer to
use part of the L2 memory as memory-mapped RAM. This setup allows you to store incoming
data on-chip, rather than having to move it to off-chip, cache it on-chip, and then move it back
off-chip to send it out to the external world.
The configurability of the L2 memory as RAM or cache allows designers to maximize the
efficiency of their system.
Mapped
as RAM
Enhanced
External DMA
EMIF
Memory (EDMA)
Cache
Peripheral Port
L2 Configuration
The L2 memory is configurable to allow for a mix of RAM blocks and cache ways. The 64KB is
divided into four chunks, each of which can either be RAM memory or a cache way. This allows
the designer to set some on-chip memory aside for dedicated buffers, and to use the other
memory as cache ways.
L2 Memory Configuration
The L2 can be changed during run time. So, a designer could choose to change a RAM block to
cache or vice versa. Before making a switch from RAM to cache, the user should make sure the
any information needed by the system that is currently in the RAM block is copied somewhere
else. This copy can be done with the DMA to minimize the overhead on the CPU. Before
switching a cache way to RAM, the cache should be free of any dirty data. Dirty data is data that
has been written by the CPU but may not have been copied out to memory.
C64x L2 Memory
Configuration
When cache is enabled,
it’s always 4-Way
This differs from C671x
L2
L2 Ways
Waysare
are
Configurable
Configurablein
inSize
Size Linesize
Linesize= 128 bytes
Same linesize as C671x
Performance
L2 → L1P
1-8 Cycles
0 32K 64K 128K 256K L2 → L1D
L2 SRAM hit: 6 cycles
L2 Cache hit: 8 cycles
Pipelined: 2 cycles
3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0
7 6 5 4 7 6 5 4 7 6 5 4 7 6 5 4 7 6 5 4 7 6 5 4 7 6 5 4 7 6 5 4
Sometimes variables need to be aligned to account for the way that memory is organized. The
DATA_MEM_BANK is a specialized data align type #pragma that does exactly this.
DATA_MEM_BANK(var, 0 or 2 or 4 or 6)
#pragma DATA_MEM_BANK(a, 0);
short a[256] = {1, 2, 3, …
#pragma DATA_MEM_BANK(x, 4);
short x[256] = {256, 255, 254, …
#pragma UNROLL(2);
#pragma MUST_ITERATE(10, 100, 2);
for(i = 0; i < count ; i++) {
sum += a[i] * x[i];
}
Unlike some of the other pragma’s discussed in this chapter, the DATA_ALIGN pragma does not
have to be used directly before the definition of the variable it aligns. Most users, though, prefer
to keep them together to ease in code maintenance.
Cache Optimization
Here are some great ideas for how to optimize cache.
Cache Optimization
Optimize for Level 1
Multiple Ways and wider lines maximize efficiency
– we did this for you!
Main Goal - maximize line reuse before eviction
Algorithms can be optimized for cache
“Touch Loops” can help with compulsory misses
Up to 4 write misses can happen sequentially, but
the next read or write will stall
Be smart about data output by one function then
read by another (touch it first)
Each one of these subjects deserves to be treated with enough material to fill a chapter in a book.
In fact, a book has been written to cover these subjects.
Example Problem
Let's look at an example that will highlight coherency issues and provide some solutions.
XmtBuf
CPU
EDMA
In this example, the coherency between the L1, L2, and external memories is considered. This
example only deals with data.
An important consideration in 'C6x11 based systems is the effect of the EDMA. The EDMA can
modify (read/write) information. The CPU does not know about the EDMA modifying memory
locations. The CPU and the DMA can be viewed as two co-processors (which is what they really
are) that are aware of each other, but don't know exactly what the other is doing.
Look at the diagram below. This system is supposed to receive buffers from the EDMA, process
them, and send them out via the EDMA. When the EDMA finishes receiving a buffer, it
interrupts the CPU to transfer ownership of the buffer from the EDMA to the CPU.
CPU
In order to process the buffers, the CPU first has to read them. The first time the buffer is
accessed, it is not in either of the caches, L1 or L2. When the buffer is read, the data is brought in
to both of the caches. At this point, all three of the buffers (L1, L2, and External) are coherent.
CPU
When the CPU is finished processing the buffer, it writes the results to a transmit buffer. This
buffer is located out in external memory. When the buffer is written, since it does not currently
reside in L1D, a write miss occurs. This write miss causes the transmit buffer to be written to the
next lower level of memory, L2 in this case. The reason for this is that L1D does NOT allocate
space for write misses. Usually DSPs do a lot more reading than they do writing, so the effect of
this is to allow more read misses to live in cache.
The net effect is that the transmit buffer gets written to L2.
XmtBuf XmtBuf
CPU
EDMA
Remember that the EDMA is going to be used to send the buffer out to the real world. So, where
does it start reading the buffer from? That's right, external memory. Don't forget that caches do
not have addresses. The EDMA requires an address for the source and destination of the transfer.
The EDMA can't transfer from cache, so the buffer has to get from cache to external memory at
the correct time.
Since the cached value which was written by the CPU is different from the value stored in
external memory, the cache is said to be incoherent.
A Coherency Issue
External
L1D L2 EDMA
RcvBuf RcvBuf RcvBuf
XmtBuf XmtBuf
CPU
EDMA
If coherency is not maintained (by sending the new cache values out to external memory), then
the EDMA will send whatever is at the address that it was told to use. The best case is that this
memory has been initialized with something that won't cause the system to break. The worst case
is that the EDMA sends garbage data that may disrupt the rest of the system. Either way, the
system is not doing what we wanted it to do.
So, when the CPU is finished with the data, performing a writeback of the entire buffer will force
the information out to its real address so that the EDMA can read it. Another way to think of a
writeback is a copy of dirty data from cache to its memory location.
XmtBuf XmtBuf
CPU writeback
EDMA
When the CPU is finished with the data (and has written it to
XmtBuf in L2), it can be sent to ext. memory with a cache writeback
A writeback is a copy operation from cache to memory
CSL (Chip Support Library) provides an API for writeback:
CACHE_wbL2((void *)XmtBuf, bytecount, CACHE_WAIT);
Now that we know how to get the transmit buffers to their memory addresses to solve the
coherency issue, let's consider another case on the read side. What happens if the EDMA writes
new data to the receive buffer. The CPU needs to process this new data and send it out, just like
before. However, this situation is different because the addresses for the receive buffer are
already in the cache. So, when the CPU reads the buffer, it will read the cached values (i.e. the
old values) and not the new values that the EDMA just wrote.
XmtBuf XmtBuf
CPU
In order to solve this problem, we need to force the CPU to read the external memory instead of
the cache. This can be done with a cache invalidate. An invalidate invalidates all of the lines by
setting the valid bit of each line of cache to 0 or false.
XmtBuf XmtBuf
CPU
To get the new data, you must first invalidate the old data before
trying to read the new data (clears cache line’s valid bits)
CSL provides an API to writeback with invalidate:
It writes back modified (i.e. dirty) data,
Then invalidates cache lines containing the buffer
CACHE_wbInvL2((void *)RcvBuf, bytecount, CACHE_WAIT);
The C621x/C671x processors only have a writeback-invalidate operation on L2. They cannot do
an invalidate by itself. A couple of things need to be considered before performing the cache
writeback-invalidate. Since the writeback-invalidate performs a writeback of the data on L2, any
modified or dirty data will be sent out to external memory. So, the writeback-invalidate must be
done while the CPU owns the buffer. Otherwise, the old modified values could overwrite the new
values from the EDMA. Also, a writeback-invalidate should only be performed after the CPU has
finished modifying the buffer. If the writeback-invalidate is performed before the CPU is finished
with the data, it will be brought back in, negating the effect of the writeback-invalidate.
XmtBuf
CPU
EDMA
This solution may be the simplest and best for the designer. It is a powerful solution, especially
when considering that the EDMA could be transferring from another peripheral, the McBSP. In
this case, it is best to have the EDMA transfer to on-chip buffers so that they don't have to be
brought back in again by the cache controller as we discussed earlier. Add this to the fact that all
coherency issues are taken care of for you, and this makes for a powerful, efficient solution
Using the Memory Attribute Registers (MAR), one can force the CPU to do a long-distance
access to memory every time a read or write is performed. The L1 and/or L2 cache is not used for
these long-distance accesses.
Why would you want to prevent some memory addresses from being cached? Often there are
values found in off-chip, memory-mapped registers that must be read anew each time they are
accessed. One example of this might be a system that references a hardware status register found
in a field programmable gate array (FPGA). Another example where this might be useful is a
FIFO out in external memory, where the same memory address is read repeatedly, but a different
value is accessed for each read.
XmtBuf
CPU
While MAR’s may also provide a solution to coherency issues, this is not a recommended
solution because long-distance accesses can be extremely slow. If accesses infrequently, this
decreased speed may not be an issue, but if used for real-time data acceses the decreased
performance may keep the system from operating correctly anyway, coherency issues or not.
The Memory Attribute Registers allow the designer to turn cacheability on and off for a given
address range. Each MAR controls the cacheablity of 16MB of external memory.
These registers can be used to control the caching of different ranges by setting the appropriate bit
to 1 for cache enabled and 0 for cache disabled. These registers can also be setup using the
configuration tool.
MAR0 00000001
MAR1 00000000
MAR2 00000000
MAR3 00000000 MAR
MARbit
bitvalues:
values:
… … 00==Not
Notcached
cached
11==Cached
Cached
MAR15 00000000
MAR
MARbit
bitvalues:
values:
00==Not
Notcached
cached
11==Cached
Cached
In the lab, we’ll use the Release configuration and do some benchmarking on code speed and
size.
We’re going to use the L2 Cache on the 'C6416 and the 'C6713 instead of using all of it as
internal SRAM. This will allow us to see how to create a system that uses cache effectively. The
general process will be:
• Use the .CDB file to move the buffers off-chip and turn the L2 cache on
• Use the MAR bits to make the external memory region uncacheable
• Use CSL cache calls to make the system work with L2 cache and cacheable external memory
• Use a nice debugger trick to view the values stored in cache vs. what is in external memory
Lab 15/15A
LAB 15
Move buffers off-chip
Turn on L2 cache
Investigate MAR bits
Solve coherency issues with
writeback/invalidate
Use cache debug techniques
LAB 15A
Use Release Configuration
Benchmark performance and
code size
Lab 15 Procedure
In this lab, we’re going to move the buffers off-chip and turn on the L2 cache. We'll change
several cache settings to see what their effect is on the system.
64 In order to turn on some of the L2 cache, we need to decrease the amount that is dedicated to
SRAM. Open the properties for the ISRAM segment. Change the len property to
0x000C0000. This will leave us space for 256KB of cache. Click OK.
67 the "621x/671x" tab. Verify that the setting highlighted below is set to 0x0001. This enables
the L2 cache.
0x0001
The value for the MAR bits in the .cdb file allocates 1 bit for each of the MAR registers, and
each register corresponds to a given memory region. The value of the bit in the ith position
determines the cacheability of that region. For example, a 1 in the 0th position makes the
MAR 0th region (from 0x80000000 to 0x80FFFFFF) cacheable, and the other regions
uncacheable.
8. Build the program, Reload the program, and Run to main()
9. Run and Listen
What is the system doing now? Probably not what you want to hear. Move on to the next step
to figure out what is going on. Halt the CPU.
Debugging Cache
This section will describe a nice little debugger trick that we can use to figure out what is going
on with the cache in our system. In order to use this trick, we need three things:
• The external memory range needs to use aliased addressing. This means that we can use two
different addresses (an alias) to access the same memory location. We also need for these two
addresses to be in two different MAR regions. We will set one region to be cacheable and the
other to be uncacheable. The SDRAM on the DSK has aliased addresses.
• If we are using the memory mapping feature of Code Composer Studio, we need to make sure
that there is a memory range created for each one of the memory region addresses from the
previous requirement.
• Two memory windows open at each of the memory ranges. Depending on how we set the
MAR bits above, one will show the value currently stored in cache, and the other will show
the actual value stored at the memory address (in the SDRAM).
Note: The debugger always shows values from the CPU's point of view. So, when we use a
memory window to view an address, we are seeing what the CPU sees. In other words, if
an address is currently cached, we will see the value in cache and NOT the value in
external memory. The trick above tells the CPU that one of the memory aliases is not
cacheable (the one with the MAR bit set to 0), therefore it will go out to the external
memory and show us what is stored there. With two memory windows, we can see both.
A note within a note, we shouldn't edit the values using the memory windows at this
point since we could easily corrupt the data.
11. Add a GEL_MapAdd() function call for the new memory region
Find the following line of code in the setup_memory_map( ) function of the GEL file:
GEL_MapAdd(0x80000000,0,0x01000000,1,1); // 16MB SDRAM…
This function adds a 16MB region at location 0x80000000. This represents the SDRAM on
the DSK.
12. Copy and paste this line. Change the address of the copied text to start at location
0x81000000.
This is an aliased address for the SDRAM which happens to fall in the second MAR region.
The MAR bit for this region is currently disabled by the configuration tool.
Save the changes to the GEL file and close the file.
13. Reload the GEL file
Reload the GEL file that we just modified by right-clicking on it in the project view and
selecting reload.
14. Apply the changes to CCS using the GEL menu
We have now made the necessary changes to the CCS memory map, but they have not been
applied yet. Use the following menu command to apply the changes:
Make sure to add a pragma for each of the data buffers. Above the 8 lines declaring the
buffers in main.c, add 8 of these #pragma statements – one for each buffer as shown below:
#pragma DATA_ALIGN(gBufRcvLPing, 128);
#pragma DATA_ALIGN(gBufRcvRPing, 128);
#pragma DATA_ALIGN(gBufRcvLPong, 128);
#pragma DATA_ALIGN(gBufRcvRPong, 128);
#pragma DATA_ALIGN(gBufXmtLPing, 128);
#pragma DATA_ALIGN(gBufXmtRPing, 128);
#pragma DATA_ALIGN(gBufXmtLPong, 128);
#pragma DATA_ALIGN(gBufXmtRPong, 128);
short gBufRcvLPing[BUFFSIZE];
short gBufRcvRPing[BUFFSIZE];
short gBufRcvLPong[BUFFSIZE];
short gBufRcvRPong[BUFFSIZE];
short gBufXmtLPing[BUFFSIZE];
short gBufXmtRPing[BUFFSIZE];
short gBufXmtLPong[BUFFSIZE];
short gBufXmtRPong[BUFFSIZE];
67 In the processBuffer() function, after you have processed an input buffer, call the CSL
writeback/invalidate API to invalidate the addresses in L2. Make sure to do this for both the
ping and pong receive buffers. Make sure that the invalidate will happen for both the FIR
filter and the copy routines for both channels.
The writeback/invalidate operation is necessary to invalidate the addresses for the processed
buffer in L2. If the addresses are NOT invalidated, the CPU will read the values from cache
the next time it wants to read the buffer. Unfortunately, these values will be incorrect as they
will be the OLD data, not the new data that has been written to the buffers in external
memory by the EDMA.
Between the 1st and 2nd closing braces “}” of processBuffer(), add the following code:
CACHE_wbInvL2(sourceL, BUFFSIZE * 2, CACHE_NOWAIT);
CACHE_wbInvL2(sourceR, BUFFSIZE * 2, CACHE_NOWAIT);
21. Add a call to CACHE_wbL2() after the initialization of the transmit buffers
Find the place in main() where we are initializing the output buffers to 0. Add the following
code to writeback the zeroes from cache to the SDRAM where the EDMA will start
transferring:
CACHE_wbL2(gBufXmtLPing, BUFFSIZE * 2, CACHE_NOWAIT);
CACHE_wbL2(gBufXmtLPong, BUFFSIZE * 2, CACHE_NOWAIT);
CACHE_wbL2(gBufXmtRPing, BUFFSIZE * 2, CACHE_NOWAIT);
CACHE_wbL2(gBufXmtRPong, BUFFSIZE * 2, CACHE_NOWAIT);
Open main.c and find the 2 SINE_add calls in processBuffer(). Add the following
statement before the first SINE_add:
STS_set(&sineAddTime,CLK_gethtime());
STS_delta(&sineAddTime,CLK_gethtime());
Click the + next to Instrumentation. Right click on STS-Statistics Object Manager and nsert
an STS object named sineAddTime. Open its properties and change the Unit Type to High
Resolution time based. Click OK and close/save your cdb.
28. Build/load/run your code.
29. Make sure DIP switches are depressed and look at the CPU load graph.
Make sure DIP switches 0 and 1 are depressed – running the sine wave generator and the FIR
filter. Open the CPU load graph, clear the peak and write your CPU load in the table below
(under Not Optimized). For reference, our results are shown in parentheses.
30. Use Statistics View to check the benchmark for sineAddTime.
Open the BIOS Statistics View, right-click in it and select clear. Write the max sineAddTime
in the table below (under Not Optimized).
31. Find the length of the .text (code) section in the .map file.
Open audioapp.map in the \audioapp\debug\ folder. Find the length of the .text
section and write it below (under Not Optimized).
Now that we have a baseline, let’s run the optimizer. First we’ll have to copy some settings.
Select:
Project → Build Options → Preprocessor Category
Under Include Search Path, copy the entire list of paths. Click Cancel.
33. Choose the Release Build Configuration
After selecting the Release Build Configuration, open the Project Build Options and note the
optimization selections made on the Basic page. Click on the Preprocessor Category and
paste your Include Search path. Add CHIP_6416 to the Pre-Define Symbol. Click OK.
34. Rebuild/load/run and re-do steps 29-31 and add your results to the table.
35. Conclusion
We saw the CPU load drop by about 13% and the sineAddTime reduced by about 23%. We
didn’t see the code length change at all. Certainly these weren’t significant gains, but well
worth the tiny effort. More complex code would likely benefit to a much greater degree.
You’re done
Optional Topics
‘0x Memory Summary
‘0x Internal Memory
‘C6203
7M bit Total
‘C6202
3M bit Total
RAM RAM
128K bytes 256K bytes
‘C6x01/04/05
1M bit Total Cache / RAM Cache / RAM
Program Cache / RAM 128K bytes 128K bytes
Data Internal Data Internal Data Internal Data
128K bytes 512K bytes
A D A D A D A D
16 16 16 16
A D A D A D A D
16 16 16 16
0 1 2 3 4 5 6 7
8 9 A B C D E F
... ... … ...
8 9 A B C D E F
A D A D A D A D
16 16 16 16
0 1 2 3 4 5 6 7
8 9 A B C D E F
... ... … ...
8 9 A B C D E F
Improving Performance
16 16 16 16
0 1 x0 = 1 2 3
0 1 x0 2 3 x1
0 1 2 3 4 5 6 7
8 9 A B C D E F
4Kx16 4Kx16 4Kx16 4Kx16
0 1 2 3 4 5 6 7
8 9 A B C D E F
4Kx16 4Kx16 4Kx16 4Kx16
The diagram above shows the configuration for the C6201. The C6701 is similar, but each of its
banks are 2Kx32 in size. This gives it the same total number of bytes, but allows the C6701 the
ability to access two LDDW loads in parallel.
Introduction
This module discusses the Host Port Interface (HPI). First, a brief overview of the HPI will
discuss the reasons for including it on these devices and some of the benefits that it provides.
Next, we present examples to help you understand the terminology, capabilities, and basic flow of
the HPI. The module also includes a discussion of the HPI’s other features. The module ends with
a basic comparison of the HPI to the ‘C6202/03/04 Expansion Bus. By the end of this module
you will have a good understanding of the HPI and the Expansion Bus and how they provide a
capable interface to industry standard hosts processors.
Learning Objectives
Objectives
HPI Overview
HPI on the DSK
Host Software Example
HPI Hardware Description
Optional Discussions
T TO
Technical Training
Organization
Chapter Topics
Host Port Interface...................................................................................................................................16-1
HPI Overview
The HPI provides an economical 16-bit parallel port for interfacing a ‘C6x to host processors,
other ‘C6xs, and PCI bridge chips. This bus is in addition to the ‘C6x external bus (EMIF) and
multi-channel serial ports, which may be dedicated to memory and A/Ds or codecs.
Why HPI?
Ded. Bus
μC
μC ‘C6x
‘C6x
|| Bus
Dedicated to memory access
32
T TO
Technical Training
Organization
A dedicated bus is used to transfer data to or from an address in the ‘C6x memory map. The HPI
has a 32-bit registers for each control, address, and data. The HPIC is used to control HPI
transfers. The HPIA is the address for the read or write operation. The HPID is the data register.
HPI Overview
HPI Bus
μC
μC HPI ‘C6x
‘C6x
HPIC DMA Memory
Aux. Ch.
Addr.
HPIA
Data ..
HPID .
What
Whatarearethe
therequirements
requirementsfor
forthe
thededicated
dedicatedbus?
bus?
1.1.Address
Address
2.2.Data
Data
3.3.Control
Control
T TO
Technical Training
Organization
The HPI is connected to the ‘C6x memory via the DMA Auxiliary Channel, which gives the host
access to the entire ‘C6x memory map. The Auxiliary Channel is the fifth channel of the DMA,
and it is dedicated to the HPI.
Since the HPI bus is only 16-bits wide, each data transfer to an HPI register requires two read or
write operations. Although this is slower, it lowers the pin count of the device.
HPI Overview
HPI Bus
μC
μC 16 HPI ‘C6x
‘C6x
HPIC DMA Memory
Aux. Ch.
Addr.
HPIA
Data ..
HPID .
Since
Sincethe
theHPI
HPIbus
bus(HD)
(HD)isisonly
only16
16bits
bitswide,
wide,each
eachread/write
read/write
requires
requirestwo
twooperations.
operations.
T TO
Technical Training
Organization
The HPI provides a simple slave interface to a host, which serves as the master. It gives the host
processor access to entire memory map of the ‘C6x, including the internal memories, the EMIF,
and the peripheral control registers.
T TO
Technical Training
Organization
....... .......
.......
HPI connector
DSP
.......
U
S JTAG
B
JTAG
Emulation
Port
.......
.......
T TO
Technical Training
Organization
T TO
Technical Training
Organization
Writing to this register is selected by the HCNTL(1:0) pins. These pins select the register that the
host wants to read or write. They are usually connected to address pins on the host side.
Setup HPIC
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Values
Ch.
HCNTL
HCNTL Values
Addr.
HCNTL1
HPIA HCNTL0 Description
0 0 Data HPIC ..
0
HPID 1 HPIA .
1 0 HPID (HPIA++)
1 1 HPID
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==00
00bbto
toenable
enableaccess
accessto
toHPIC
HPIC
T TO
Technical Training
Organization
Setup HPIC
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
HR/W Addr.
HPIA
Data ..
HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==0000bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/Wto
towrite
write(0).
(0).HD
HD==ctrlctrlbits
bits(HWOB=
(HWOB=xxx1)
xxx1)
T TO
Technical Training
Organization
HHWIL identifies which halfword is being transferred. For the first halfword of a transfer,
HHWIL will be low. For the second halfword, HHWIL will be high. Remember that the HWOB
bit in the HPIC determines if the first halfword is put in the LSBs (little endian) or the MSBs (big
endian). What happens to HPIC when it is written for the first time? Is the value written to the
LSBs or the MSBs? It turns out that HPIC is really only 16 bits, and the LSBs and MSBs are the
same.
Setup HPIC - 1
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
HR/W Addr.
HPIA
HHWIL
Data ..
HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==0000bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/Wto
towrite
write(0),
(0),HD
HD==ctrlctrlbits
bits(HWOB
(HWOB==xxx1)
xxx1)
HHWIL
HHWIL==00indicates
indicatesfirst
firsthalfword
halfwordtransfer
transfer
T TO
Technical Training
Organization
The HSTRB signal initiates the transfer. At the falling edge of HSTRB, the other control signals
are sampled and the write operation becomes active. The value on the HD pins is latched into the
HPIC register at the rising edge of HSTRB. The first half of the 32-bit transfer is complete.
HSTRB - 2
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1
HR/W Addr.
HPIA
HHWIL
Data ..
HSTRB
HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==00 00bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/Wto
towrite
write(0).
(0).HD
HD==ctrlctrlbits
bits(HWOB
(HWOB==xxx1)
xxx1)
HHWIL
HHWIL==00indicates
indicatesfirst
firsthalfword
halfwordtransfer
transfer
2.2.HSTRB
HSTRBtotoindicate
indicateactive
active
T TO
Technical Training
Organization
For the second half of the transfer, some of the conrol pins (HCNTL, HR/W) do not need to
change. In the case of HPIC, HD does not change. HHWIL will transition high to indicate the
second half of a transfer.
Setup HPIC - 3
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1
HR/W Addr.
HPIA
HHWIL
Data ..
HPID .
3.3.Use
UseHCNTL[1:0]
HCNTL[1:0]==00 00bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/Wto
towrite
write(0).
(0).HD
HD==ctrlctrlbits
bits(HWOB
(HWOB==xxx1)
xxx1)
HHWIL
HHWIL==11indicates
indicatessecond
secondhalfword
halfwordtransfer
transfer
T TO
Technical Training
Organization
The falling edge of HSTRB indicates an active transfer. At the second rising edge of HSTRB, the
transfer is complete and HPIC is setup.
Setup HPIC - 4
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1 xxx1
HR/W Addr.
HPIA
HHWIL
Data ..
HSTRB
HPID .
3.3.Use
UseHCNTL[1:0]
HCNTL[1:0]==00 00bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/W to write (0). HD = ctrl bits (HWOB==xxx1)
to write (0). HD = ctrl bits (HWOB xxx1)
HHWIL
HHWIL==11indicates
indicatessecond
secondhalfword
halfwordtransfer
transfer
4.4.HSTRB
HSTRBto toindicate
indicateactive
active
T TO
Technical Training
Organization
Setup HPIA - 1
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1 xxx1
Write
Write
HR/W Addr.
8000_0000 HPIA
8000_0000 HHWIL
toto
Data ..
HPIA
HPIA HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==0101bbto
toenable
enableaccess
accessto
toHPIA
HPIA
HR/W
HR/Wto
towrite
write(0),
(0),HD
HD==00000000
HHWIL
HHWIL==00indicates
indicatesfirst
firsthalfword
halfwordtransfer
transfer
T TO
Technical Training
Organization
The falling edge of HSTRB indicates an active transfer. Since HWOB=1 indicating little endian,
the value of the HD pins is copied into the LSBs of HPIA.
Setup HPIA - 2
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1 xxx1
Write
Write
HR/W Addr.
8000_0000 HPIA
8000_0000 HHWIL
toto 0000 Data ..
HSTRB
HPIA
HPIA HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==01 01bbto
toenable
enableaccess
accessto
toHPIA
HPIA
HR/W to write (0). HD = 0000
HR/W to write (0). HD = 0000
HHWIL
HHWIL==00indicates
indicatesfirst
firsthalfword
halfwordtransfer
transfer
2.2.HSTRB
HSTRBto toindicate
indicateactive
active
T TO
Technical Training
Organization
For the second half of the transfer, HCNTL and HR/W do not change. HHWIL transitions high to
indicate that this is the second part of a transfer, and the host has changed the HD pins to the
upper 16-bits of the address.
Setup HPIA - 3
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1 xxx1
Write
Write
HR/W Addr.
8000_0000 HPIA
8000_0000 HHWIL
toto 0000 Data ..
HPIA
HPIA HPID .
3.3.Use
UseHCNTL[1:0]
HCNTL[1:0]==0101bbto
toenable
enableaccess
accessto
toHPIA
HPIA
HR/W
HR/Wto
towrite
write(0).
(0).HD
HD==80008000
HHWIL
HHWIL==11indicates
indicatessecond
secondhalfword
halfwordtransfer
transfer
T TO
Technical Training
Organization
The falling edge of HSTRB indicates an active transfer and the address is written to the HPIA.
Setup HPIA - 4
HD
μC
μC 16 HPI ‘C6x
‘C6x
HCNTL HPIC DMA Memory
2 Aux. Ch.
xxx1 xxx1
Write
Write
HR/W Addr.
8000_0000 HPIA
8000_0000 HHWIL
toto 8000 0000 Data ..
HSTRB
HPIA
HPIA HPID .
3.3.Use
UseHCNTL[1:0]
HCNTL[1:0]==01 01bbto
toenable
enableaccess
accessto
toHPIA
HPIA
HR/W
HR/Wto
towrite
write(0).
(0).HD
HD==80008000
HHWIL
HHWIL==11indicates
indicatessecond
secondhalfword
halfwordtransfer
transfer
4.4.HSTRB
HSTRBtotoindicate
indicateactive
active
T TO
Technical Training
Organization
1.1.HCNTL[1:0]
HCNTL[1:0]==1111bb(HPID)
(HPID)
HR/W
HR/W==00, ,HD
HD==5678
5678
HHWIL
HHWIL==00
T TO
Technical Training
Organization
The falling edge of HSTRB initiates the transfer, and the rising edge latches the data into the
lower 16-bits of the HPID register.
1.1.HCNTL[1:0]
HCNTL[1:0]==1111bb(HPID)
(HPID)
HR/W
HR/W==00, ,HD
HD==5678
5678
HHWIL
HHWIL==00
2.2.HSTRB
HSTRB
T TO
Technical Training
Organization
For the second half of the transfer, HHWIL transitions high, and the value of the HD pins
changes to reflect the upper 16-bits of data.
3.3.HCNTL[1:0]
HCNTL[1:0]==1111bb(HPID)
(HPID)
HR/W
HR/W==00
Write
Writevalue:
value:HHWIL
HHWIL==1,1,HDHD==1234
1234
T TO
Technical Training
Organization
HSTRB falls low to indicate an active transfer. At the rising edge of HSTRB, the data is latched
into the HPID. The 32-bit transfer to the HPI is now complete, but has the data actually been
written to the address?
3.3.HCNTL[1:0]
HCNTL[1:0]==1111bb(HPID)
(HPID)
HR/W
HR/W==00
Write
Writevalue:
value:HHWIL
HHWIL==1,1,HDHD==1234
1234
4.4.HSTRB
HSTRB
T TO
Technical Training
Organization
When HPID has been written, the HPI will signal the DMA Auxialiary Channel to transfer the
data from the HPI to the address in the HPIA. Several factors affect the length of time that it will
take for the DMA to complete this transfer. These include:
• Speed of the destination memory
• Bus contention
• DMA Auxiliary Channel Priority
If the time needed to transfer from the HPI to memory can vary, how does the host know when it
can write a new value to the HPI? The HPI uses the HRDY pin to signal the host that it is busy
with a current transfer. This prevents the host from overwriting information in the HPI. When
HRDY is low, the HPI is ready. So, at the second rising edge of HSTRB, when all of the data is
latched into the HPID, HRDY is asserted high (not ready) until the DMA has completed the
transfer.
3.3.HCNTL[1:0]
HCNTL[1:0]==11 11bb(HPID)
(HPID)
HR/W
HR/W==00
Write
Writevalue:
value:HHWIL
HHWIL==1,1,HDHD==1234
1234
4.4.HSTRB
HSTRB
5.HRDY
T5.
TO HRDYhigh
high(not-ready)
(not-ready)until
untilDMA
DMAisisfinished
finished
Technical Training
Organization
HRDY is used more as a not-ready pin to state either data is not yet available on a read or the
DMA hasn’t yet completed the write (thus freeing-up the HPID).
1.1.HCNTL[1:0]
HCNTL[1:0]==11
11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
T TO
Technical Training
Organization
The falling edge of HSTRB initiates a read from the address in the HPIA register. This address is
copied to the DMA Auxiliary Channel.
1.1.HCNTL[1:0]
HCNTL[1:0]==11
11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
2.2.HSTRB,
HSTRB,HPIA
HPIAisiscopied
copiedto
toDMA
DMAaddress
address
T TO
Technical Training
Organization
At this point, the HPI has to wait for the DMA to complete the transfer from memory to the HPID
register. HRDY is asserted high to hold off the host until the data is written into the HPID.
1.1.HCNTL[1:0]
HCNTL[1:0]==11 11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
2.2.HSTRB,
HSTRB,HPIA
HPIAisiscopied
copiedto
toDMA
DMAaddress
address
3.HRDY
T3.
TO HRDYisisasserted
Technical Training
asserteduntil
untilHD
HD==5678
5678
Organization
The second half of the read is setup with the appropriate control signals.
4.4.HCNTL[1:0]
HCNTL[1:0]==11
11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==11
T TO
Technical Training
Organization
The second half of the read begins with the second falling edge of HSTRB.
4.4.HCNTL[1:0]
HCNTL[1:0]==11
11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==11
5.5.HSTRB
HSTRB
T TO
Technical Training
Organization
What, no Not-Ready before the second 16-bit read? Since the data is already present in the HPID,
HRDY is not required and will not be asserted. This is similar to a transfer to the HPIC or the
HPIA. Since the value is being transferred directly to (or from) the HPI, no delay time is needed
for the DMA to complete a memory transfer.
4.4.HCNTL[1:0]
HCNTL[1:0]==11
11bb(HPID)
(HPID)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
5.5.HSTRB
HSTRB
T6.
TO
6.HDHD==1234
1234
Technical Training
Organization
1.1.HCNTL[1:0]
HCNTL[1:0]==10
10bb(HPID
(HPIDw/HPIA++)
w/HPIA++)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
T TO
Technical Training
Organization
The read is setup exactly like a read without increment, except for the value of the HCNTL pins.
The first falling edge of HSTRB initiates the first transfer. After the initial address is sent to the
DMA, the address in the HPIA will automatically be incremented by four bytes.
1.1.HCNTL[1:0]
HCNTL[1:0]==10
10bb(HPID
(HPIDw/HPIA++)
w/HPIA++)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
2.2.HSTRB
HSTRB
T TO
Technical Training
Organization
HRDY is asserted high while the DMA completes the memory transfer to the HPID.
1.1.HCNTL[1:0]
HCNTL[1:0]==10 10bb(HPID
(HPIDw/HPIA++)
w/HPIA++)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
2.2.HSTRB
HSTRB
3.HRDY
T3.
TO HRDYisishigh
Technical Training
highuntil
untilHDHD==5678,
5678,HPIA
HPIAisisincremented
incremented
Organization
The second halfword of the transfer is completed without HRDY since the data is already in the
HPID.
4.4.HCNTL[1:0]
HCNTL[1:0]==10
10bb(HPID
(HPIDw/HPIA++)
w/HPIA++)
HR/W
HR/W==11
Read
Readvalue:
value:HHWIL
HHWIL==00
5.5.HSTRB
HSTRB
T6.
TO
6.HDHD==1234
1234
Technical Training
Organization
At the second rising edge of HSTRB, when the 32-bit transfer is complete, the new address in the
HPIA is copied to the DMA. The DMA uses this address to pre-fetch the data for the next
transfer. This helps reduce the latency between HPI transfers. Since the DMA is busy with the
pre-fetch, HRDY is asserted high. Thus, when the host tries to initiate the next transfer, it may
encounter a not-ready condition until the DMA completes the memory transfer.
7.7.The
Thenew
newaddress
addressin
inHPIA
HPIAisiscopied
copiedto
tothe
theDMA.
DMA.
The
TheDMA
DMAbegins
beginsto
topre-fetch
pre-fetchthis
thisaddress.
address.
HRDY
HRDYisishigh
highuntil
untilthe
theDMA
DMAfinishes.
finishes.
T TO
Technical Training
Organization
HPI Pins
The HPI uses several pins to provide a glueless interface to many industry standard hosts. Several of these
pins may or may not be used in any given application. Below is a summary of the typical connections.
R/W HR/W
HDS1
DATASTROBES HDS2 HSTRB
HCS
ALE HAS
BE HBE[1:0]
Ready HRDY
INTERRUPT HINT
Data[15:0] HD
T TO
Technical Training
Organization
Sidebar
HSTRB
HSTRB is an internal signal that is decoded from up to three host strobe signals. HSTRB is active
low when both HCS is active and either HDS1 or HDS2 is active.
HSTRB
HD
μC
μC 16 HPI ‘C62xx
HSTRB
‘C62xx
HCNTL HPIC DMA HSTRB
Memory
2 HDS1 Aux. Ch. internal signal
HR/W HDS2 Addr.
HCS
HPIA
HHWIL
Data ..
HSTRB
HPID .
1.1.Use
UseHCNTL[1:0]
HCNTL[1:0]==00 00bbto
toenable
enableaccess
accessto
toHPIC
HPIC
HR/W
HR/Wto towrite
write(0).
(0).HD
HD==ctrlctrlbits
bits(HWOB
(HWOB==x) x)
Write
Writefirst
firsthalfword,
halfword,then
thensecond
secondwith
withHHWIL
HHWIL==0,0,then
then1.1.
2.2.HSTRB
HSTRBto toindicate
indicateactive.
active.
T TO
Technical Training
Organization
HAS
HAS is an input signal to the HPI that can be used with hosts that have multiplexed address and
data lines. HAS allows the HPI to sample the control signals earlier in the access cycle so that the
bus can stabalize before the data is placed on it. HAS is usually connected to the host’s Address
Latch Enable(ALE) pin.
HAS
Facilitates interface to multiplexed
address and data buses by allowing
more time to switch bus states from
address to data information
Allows HCNTL[1:0], HR/W, and
HHWIL to be removed earlier in the
access cycle
Often connected to ALE from µC
T TO
Technical Training
Organization
An Example Interface
The MC68360 Quad Integrated Communication Controller is a 32-bit controller that is a member
of the Motorola M68300 family. It is a versatile microprocessor that can be used in a variety of
control applications.
Interface Example
MC68360
MC68360 ‘C6x
‘C6x
Data[31:16] HD[15:0]
R/W HR/W
A[3:2] HCNTRL[1:0]
A[1] HHWIL
DSACK1 HRDY
T TO
Technical Training
Organization
Here we can see how the address lines are connected to the HPI’s HCNTRL and HHWIL pins.
Software
SoftwareHandshaking
Handshaking Interrupts
Interrupts
FETCH requests a read DSPINT host interrupt to ‘6x
at the address
HINT ‘6x can interrupt Host,
pointed to by
HPIA determines the state of
HRDY Ready signal to HINT output
host. Host can
poll this bit to HWOB
HWOB
determine the 0 - Big Endian
state of the HPI.
1 - Little Endian
T TO
Technical Training
Organization
Some of the other capabilities controlled by the HPIC are Interrupts and Software Handshaking. HPI
interrupt capability is controlled by the DSPINT and HINT bits. DSPINT is one of the C6000’s interrupt
sources. It allows the host to interrupt the ‘C6x via an external interrupt pin. HINT allows the ‘C6x to
interrupt the host by controlling the state of the HINT output.
Software Handshaking is useful for hosts that do not have an external RDY signal. If this is the case, the
host can poll the HRDY bit in the HPIC to determine the state of the HPI. Notice that this bit is active high,
unlike the hardware pin HRDY. The FETCH bit initiates a read operation from the address in HPIA when it
is set to 1. This capability allows the host to initiate a read operation through software.
SDRAM
Data[31:0]
Host
Write
FIFO
Read
FIFO
T TO
Technical Training
Organization
The Expansion Bus (XB) on the ‘C6202 provides a solution to this problem. It is 32-bits wide and
it provides access to off-chip peripherals, FIFOs, host processors, and PCI interface chips.
Solution
16-bit wide C6000
EPROM
EMIF
SDRAM
Data[31:0]
Host XD[31:0]
XBUS
HPI
Sync Write
FIFO
I/O Ports
Sync Read
FIFO
T TO
Technical Training
Organization
The XB includes an HPI which is very similar to the ‘C6201’s. The primary difference is that the
XB is 32-bits wide.
Other important differences are that the XB can be either synchronous or asynchronous, and that
it can serve as the slave or the master of the bus. These differences give the XB the ability to
interface with a minimum amount of glue logic to a PCI interface. The XB also includes an
internal arbiter for bus arbitration.
XHOLD
XHOLDA ARBITER
XBOFF
T TO Shared signals
Technical Training
Organization
The XB uses the DMA Auxiliary Channel to transfer data to and from the host.
The XBUS as the master writes to the host. The DMA Aux Ch
is used to service the request of the XBUS to the ‘C6x mem map.
T TO
Technical Training
Organization
The XB HPI Control Register(XBHC) has a field which is used to store the frame count, XFRCT.
It also includes fields to start transfers and to control interrupts.
INTSRC
INTSRC START
START
10 - interrupt is caused 01 - starts a write burst
when XFRCT=0 *XBIMA to *XBEA
01 - DSPINT is the 10 - starts a read burst
interrupt source *XBEA to *XBIMA
XFRCT
XFRCT DSPINT
DSPINT
Transfer
Transfercounter
counter External
Externalmaster
masterto
to
when
whenXBUS
XBUSisismaster
master DSP interrupt
DSP interrupt
T TO
Technical Training
Organization
In addition to an HPI, the XB includes another sub-block, the I/O Ports. The HPI and the I/O
Ports can co-exist in a system. The I/O Ports is broken up into four distinct spaces, XCE0 –
XCE3. Each of these spaces has access to 16 word locations. The ‘C6202 memory map shows a
64M word block, which is really the same 16 locations aliased over and over.
I/O Ports
mem
memmap
map XBUS
XBUS
HPI
Sync or Async
I/O Ports
4000_0000
XCE0
5000_0000
XCE1
6000_0000
XCE2
7000_0000
XCE3
8000_0000
Internal Data
T TO
Technical Training
Organization
Each XCEx space can access either 32-bit wide async memory, or 32-bit wide clocked FIFOs.
The memory type of each space is configured in it’s XCE Control Register, in the MTYPE field.
I/O Ports
Data (XD31:0)
XCE Control Regs
4000_0000 XCE0
Async Bit I/O 010
5000_0000
XCE1
Write Sync FIFO 101
6000_0000
XCE2 xxx
7000_0000
XCE3
Read Sync FIFO 101
MTYPE
Async 010
T TO Sync 101
Technical Training
Organization
The I/O Ports asynchronous interface uses other fields in the XCE Control Registers. These fields
should look familiar, they are identical to the EMIF’s CE Control Registers. In fact, the signals
used by the two interfaces are alike.
Asynchronous Interface
31 28 27 22 21 20 19 16
Write Setup Write Strobe Write Read Setup
Hold
RW, +1111 RW, +111111 RW, +11 RW, +1111
15 14 13 8 7 6 4 3 2 1 0
rsv Read Strobe rsv MTYPE rsv Read
Hold
RW, + 111111 R, +x RW, +11
T TO
Technical Training
Organization
The I/O Ports synchronous interface is designed to interface gluelessly to 32-bit clocked FIFOs.
The I/O Ports can interface up to 3 write FIFOs and one read FIFO (located in XCE3) without
any glue. A minimum amount of glue can be used to expand the capabilities of this interface to
include other sizes of FIFOs (8 and 16 bit) and up to 16 read and write FIFOs per XCE space.
Synchronous Interface
EB WF
XFCLK WCLK
WEN
XCE0
XCE1 EF/FF/HF
XCE2 D[31:0]
XCE3
RF
RCLK
XWE REN
XOE OE
XRE
EXT_INTx EF/FF/HF
Q[31:0]
XD[31:0]
XB Summary
The XB, composed of the HPI and the I/O Ports, adds five new “ports” for accessing hosts and
peripherals. Each of these ports can operate in an asynchronous mode or a synchronous mode.
Each mode provides different capabilities, which can make your system easier to design and
implement.
XBUS Summary
T TO
Technical Training
Organization
Introduction
What do you need to put around your DSP? Most microprocessors usually require some support
chips – power management, clock drivers, bus interface, and so on. DSP systems usually contain
some additional devices – such as sensors, data acquisition, and such – because they receive,
modify, and output real-world signals.
Finally, pull out your DSP Selection Guide and C6000 Product Update sheet to follow along with
the last part of the workshop summarizing the C6000 devices, tools, and support
Outline
Chapter Outline
What Goes Around a DSP?
Linear Products
Logic Products
C6000 Summary
Hardware Tools
Software Tools
What’s Next?
T TO
Technical Training
Organization
Chapter Topics
Wrap Up....................................................................................................................................................17-1
DSP
T TO
Technical Training
Organization
Data Converters
• Analog-to-Digital Converters (ADC)
• Analog input to digital output
• Output is typically interfaced directly to DSP
• Digital-to-Analog Converters (DAC)
• Digital input to analog output
• Input interfaces directly to DSP
• CODEC
• Data converter system
• Combination of ADC and DAC in single package
Power Management
• Power Modules – complete power solutions
• Linear Regulators – regulated power for analog and digital
• DC-DC controllers – efficient power isolation
• Battery Management – for portable applications
• Charge Pumps & Boost Converters – portable applications
• Supervisory Circuits – to monitor processor supply voltages and control reset conditions
• Power Distribution – controlling power to system components for high efficiency
• References – for data converter circuits
A Real-Time
DSP-Based
Analog Circuits – Considerations
System
OP-AMPs
Data Trans
Another STANDARDS
• Supply Voltage available? system/ RS232
• Bandwidth required? (kHz or MHz) DATA subsystem/ RS422
• What is the input signal? TRANSMISSION etc. RS485
• What is the output driving? LVDS
• # of channels needed? Interface 1394/Firewire
• Most Important Spec(s)? USB
• Speed? (k or M bits per second) PCI
• Distance? CAN
Signal-Conditioning Data Conversion • Standard? SONET
• SERDES? –or- Topology needed? Gigabit Ethernet
(point to point, multidrop, multipoint) GTL, BTL, etc.
DAC
Digital
(MSP430/DSP/uP/ POWER
FPGA/ASIC)
Management
ADC
Power
Clocking • Do you build your own power solutions, use
modules, or both?
Data Converter/AIC/Codec
Solution • What Input Voltage(s) & the source of these
• Resolution? (bits… & ask for ENOB!) Clocks voltages (Wall, battery, AC/DC, etc.)
• Speed? (KSPS or MSPS for high speed, • Input frequencies? • What Output Voltage(s), and Output
KHz or MHz for precision ADCs, uS Current(s) do you need?
• Output frequencies desired & number
(settling time) for precision DACs) of copies necessary • How would you prioritize size, efficiency,
• # of channels needed? and cost?
• Supply voltages available/required?
• What is it interfacing to? • What are the most important parameters in
• Special needs? (low jitter/jitter cleaner?
T TO
(uC/uP/DSP/FPGA/ASIC) the design? (efficiency, form factor, ripple
low part to part skew? etc.)
voltage, tolerance, etc.)
Technical Training
Organization
What is
Real-Time
Signal
Processing?
A Typical Real-Time DSP System
RF
Front ADC . . . 01101010
End
Compressed audio
Real-Time
or digital data Signal
Processing
Engine
Power DAC 01011010 . . .
Amp
Clock
Power Circuits Interface
Circuits
http://focus.ti.com/docs/tool/toolfolder.jhtml?PartNumber=5-6KINTERFACE
Analog Cards
T TO
Technical Training
Organization
Logic
Welcome to the World of TI Logic
Specialty Harris now TI Cypress now TI
5+ V Logic
GTL
GTLP BTL
SSTL CD4000 FCT
ETL
3.3 V Logic HSTL
TVC CBT TTL LS
SSTV
AC/ACT S
LV F LV
AHC ALB AHC
HC/HCT
ALVT AHCT
AC LVT AVC AS ABT BCT
ALVC ALS
LVC 2.5 V Logic
1.8 V Logic LV
LVC
ALVC
LVC AVC
AVC
1.5 V Logic ALVT CBTLV
ALVC 1.2 V Logic
AUC
AUC 0.8 V Logic
AUC
AUC
T TO
Technical Training
Organization
LV245 :10 ns
LVC4245 :6.3 ns 1.8V
LVCC3245 :6.0 ns LV245 :15 ns
LVCC4245 :7.0 ns LVC* :4.8 ns 0.8V
ALVC164245 :5.8 ns LVCC3245 :9.4 ns LVC* :4.8 ns
AVC* :2.5 ns * 16245 functions
AVC* :4.0 ns
T TO
Technical Training
Organization
Little Logic
The Principle Example Easy Naming from TI
Single Gate SN74 LVC 1G 00 YEA R
5 4
00 Logic Function
SN74AHC2G00DCTR
SN74AHCT2G00DCUR YEA Package Type
YEA = NanoStar
Triple Gate YZA = NanoFree
DCK = SC-70
DBV = SOT-23
DCU = US-8
DCT = SM-8
SN74LVC3G04DCTR R Tape & Reel
SN74LVC3G04DCUR
T TO
Technical Training
Organization
CHOOSING LOGIC
PRIMARY CONCERN SECONDARY CONCERN
5V 3V 2.5V 1.8V
HIGH DRIVE ABT, 74F ALVT, LVT, ALVC AVC, ALVC, ALVT AUC
HIGH SPEED LOW NOISE ABT, 74F ALVC, LVT, LVC AVC AUC
HIGH SPEED ABT, 74F ALVT, LVT, ALVC AVC, ALVC, ALVT AUC
T TO
Technical Training
Organization
TI FIFO’s
MEMORY
TI
FIFO
100100... 011001...
TI TI
FIFO TMS320 FIFO
DSP
Host Interface
Host Bus
T TO
Technical Training
Organization
C6000 Summary
TMS320C6000
Easy to Use
Best C engine to date
Efficient C Compiler and Assembly Optimizer
DSP & Image Libraries include hand-optimized code
eXpressDSP Toolset eases system design
SuperComputer Performance
1.38 ns instruction rate: 720x8 MIPS (1GHz sampled)
2880 16-bit MMACs (5760 8-bit MMACs) at 720 MHz
Pipelined instruction set (maximizes MIPS)
Eight Execution Unit RISC Topology
Highly orthogonal RISC 32-bit instruction set
Double-precision floating-point math in hardware
Fix and Float in the Same Family
C62x – Fixed Point
C64x – 2nd Generation Fixed Point
C67x – Floating Point
T TO
Technical Training
Organization
C6000 Roadmap
Object Code Software Compatibility
Floating
Floating Point
Point
Multi-core
Multi-core C64x™
C64x ™ DSP
DSP
1.1
1.1 GHz
GHz
2nd Generation
C6416
C6416
C6414
C6414
C6412
C6412 C6415
C6415 DM642
DM642
C6411
C6411
t ce
es a n
i gh orm
H rf
1st Generation Pe
C6203 C6713
C6713
C6202 C6204 C6205
C6201
C6211
C6701 C6711 C6712
T TO
Technical Training
Organization
Hardware Tools
C6416 / C6713 DSK Contents
DSK Board
Low-cost
Low-cost video
video interface
interface demo
demo shows
shows how
how to to
connect
connect an
an inexpensive
inexpensive 'C6000
'C6000 DSP
DSP to
to aa video
video
decoder
decoder through
through aa low-cost
low-cost FPGA.
FPGA.
Tools of
the Trade XDS560
eXtended Development System (XDS)
Industry Standard Connections
PCI plugs into PC
JTAG plugs into DSP target board
Download code up to 500Kbytes/sec
Advanced Event Triggering for
simple and complex breakpoints
Real Time Data Exchange (RTDX) can
transfer data at 2Mbytes/sec
T TO
Technical Training
Organization
Tools of
the Trade National Instruments LabVIEW
LabVIEW Graphical Development For Integrate wide variety of I/O for
Debug and Diagnostics of DSP DSP testing
software Share real time DSP data with
RTDX
Automate routine Code Composer
Studio functions from LabVIEW
LabVIEW
Code
Composer RTDX
Studio
Automate Code
Composer Studio
Communicate directly to
DSP through RTDX
Tools of
the Trade Hyperception’s VAB
Easy to use graphical Tool
Hierarchical:
Can write code graphically
(down to ASM level instr.)
One worksheet can become
block in another worksheet
Block/Component Wizard:
You can create an optimized
VAB bldg block
Create XDAIS algorithms
If desired, wrap PC interface into
standalone EXE
Outputs:
Directly to DSP
Burn program to Flash with
single-click
Create an .OUT file
Create Relocatable Object file
(i.e. library) to use in CCS
Tools of
the Trade
MATLAB® CCS Plug-in
Capabilities:
DSP program control, memory
access, and real time data transfer
with RTDX™
MATLAB automates testing and
provides advanced analysis
Function
call support enables
hardware-in-loop simulation and
debugging
C28x™ / C5000™ / C6000™ support
Supports XDS560™ and XDS510™
Integrated
with MATLAB design
environment for a complete design
solution
Tools of
the Trade Altera FPGA Daughter Card
http://dspvillage.ti.com/docs/catalog/devtools/dsptoolslist.jhtml?familyId=132&toolTypeId=6&toolTypeFlagId=2&templateId=5154&path=templatedata/cm/toolswchrt/data/c6000_devbds
Software Tools
eXpress DSP
Target Software
Host Tools
T TO
Technical Training
Organization
Tools of
the Trade Largest DSP Third Party Network
Make or buy…
> 650 companies > 1000 algorithms
in 3rd party network from
> 100 unique 3rd parties
T TO
Technical Training
Organization
What’s Next?
Optimizing C Performance
Attend another four-day workshop (see next slide)
Review the Compiler Tutorial
See tutorials in CCS online help, or
http://www.ti.com/sc/c6000compiler
Read:
C6000 Programmer’s Guide (SPRU198)
Cache Memory User’s Guide (SPRU656)
C6000 Optimizing C Compiler Users Guide (SPRU187)
T TO
Technical Training
Organization
C6000 Hardware
CPU Architecture & Pipeline Details 9
Using Peripherals (EDMA, McBSP, EMIF, HPI, XBUS) 9
Tools
Compiler Optimizer, Assembly Optimizer, Profiler, PBC 9
CSL, Hex6x, Absolute Lister, Flashburn, BSL 9
Getting
Started
Where To Go For More Information
with TI DSP www.ti.com is your starting point
analog.ti.com
•Design Resources
•Technical Documents
•Solution/Selection Applications Solutions
Guides Find complete solutions for
your application including:
DSP, Analog, Boards Target
Software, Development tools,
third party support
Install Code Composer Studio Free Evaluation Tools (FET)
from the Essential Guide to DSP CD
Check out the DSP Selection Guide, it’s your consolidated
resource for all pertinent information
T TO
Technical Training
Organization
Email: [email protected]
T TO
Technical Training
Organization
Before Leaving …
Let’s Go Home …
Thank’s for your valuable time today
Please fill out an evaluation and let us
know how we could improve this class
If you purchased a DSK:
Make sure you pack up (or receive) your
DSK before leaving
If available, you may keep the earbud
headphones and audio patch cable
Workshop lab and solutions files will be
available via CDROM or the Internet.
Please check with your instructor.
T TO
Technical Training
Organization
*** yep, probably about the last blank page you’ll see this week…maybe…***
Legend
IW6000 = C6000 Integration Workshop
Topic Discussed 9
OP6000 = C6000 Optimization Workshop
Topic Only Discussed Briefly 9-
Includes A Hands-On Lab Exercise 9+
Not Discussed
The C6000 Integration Workshop (IW6000) may better suit your needs if you are tasked with building a
system around the C6000. In this case you may need to know about: system design, using the C6000
peripherals to move data on/off-chip, scheduling real-time code, and design your DSP’s boot-up procedure.
The C6000 Integration Workshop (IW6000) is not a prerequisite to this workshop, though if you are looking
for a broad introduction to all aspects of building a C6000 based system, the Integration Workshop might be
a better choice. On the other hand, if you are evaluating the C6000 CPU architecture or want to learn how to
write better C and assembly code for the C6000, this workshop (OP6000) would be the best choice. (Please
refer to the C6000 Workshop Comparison for differences between the two workshops.)
Bottom Line:
If you're main goal is to understand the C6000 architecture and write optimized software for it, then the C6000
Optimization Workshop (OP6000) is the best one to attend. Peripherals and other system foundation software
(DSP/BIOS, XDAIS, CSL) are only peripherally mentioned. Many software engineers are tasked with getting
their algorithms to run ... and run as fast as possible. This course is well designed to handle these issues.
On the other hand, if you need to figure out how to get an entire system working -- from programming the
peripherals to get data in/out all the way to burning the Flash memory with your final program -- the C6000
Integration Workshop (IW6000) is the ticket. Along the way you'll be introduced to (and use in lab exercises)
many of the TI Software Foundation tools (DSP/BIOS, XDAIS, CSL, BSL, and Reference Frameworks). This is
probably the single best course for an engineer/programmer that is new to the C6000 DSP and needs to get a
whole system running, as opposed to just optimizing one or two algorithms.
Of course, some engineers will need to handle both of these jobs. Get everything running and optimize their
software algorithms. In that case, you may want to take both workshops.
TM
Support
Product Info / Tech Support / Literature:
North America [email protected] or
Product Update Sheet (972) 644-5580
Europe [email protected]
Texas Instruments Website:
TMS320C6000™ DSP www.ti.com or www.dspvillage.com
DSP KnowledgeBase: www.ti.com/kbase
Platform Update DSP Support:
Revised June 6, 2005 www.ti.com/technicalsupport
Notes for TMS320C62x™, TMS320C64x, TMS320DM64x™ and TMS320C67x™ DSP generation tables:
(1) C6201/C6204/C6205/C6701 DSP internal program memory can be configured as cache or addressable RAM. C6202/C6203 DSP allows 512 KB to be programmed as cache or addressable
RAM, the balance is always addressable RAM.
(2) L1 data cache and L1 program cache are always configurable as cache memory. L2 is configurable between SRAM and cache memory.
(3) DMA has four fully configurable channels, plus one dedicated to host for HPI transfers.
(4) C6211/C6711/C6712 DSP Enhanced DMA (EDMA) has 16 fully configurable channels. Additionally, there is an independent single-channel quick DMA (QDMA) and a channel dedicated to the
host for HPI transfers.
(5) VC33 is an upgrade of TI’s TMS320C3x™ DSP generation. While not a C6000™ DSP, it is part of TI’s floating-point family.
(6) Each Chip Enable (CE) allows the user to assign a specific memory space.
(7) Host Port Interface (HPI) is slave-only async host access. Expansion Bus (XBus) is master/slave async or sync interface; operates in host or FIFO/Memory modes.
(8) These devices are pin-for-pin compatible: (Note, be aware of voltage differences.)
• (GJC) C6201/C6701 DSP • (GJL, GNZ) C6202/C6203 DSP
• (GFN) C6211/C6711/C6712 DSP • (GLS, GNY, GLW) C6202/C6203/C6204 DSP
• (GDP) C6713/C6711C/C6712C DSP • (GLZ) C6414T/C6415T/C6416T DSP
• (GTS) C6410/C6413/C6418 DSP • (GDK, GNZ) C6412/DM643/DM642/DM641/DM640 DSP
(9) Device may operate at 300 MHz with 1.7-V core.
Package Types
GGP = 35 mm × 35 mm, 1.27-mm ball pitch 352-pin BGA GJL = 27 mm × 27 mm, 1.0-mm ball pitch 352-pin BGA
GFN = 27 mm × 27 mm, 1.27-mm ball pitch 256-pin BGA GHK = 16 mm × 16 mm, 288-pin MicroStar BGA™
GLS = 18 mm × 18 mm, 0.8-mm ball pitch 384-pin BGA GLW = 18 mm × 18 mm, 340-pin BGA
PGE = 20 mm × 20 mm, 0.5-mm pitch, 144-pin TQFP GLZ = 23 mm × 23 mm, 0.8-mm ball pitch, 532-pin BGA
PYP = 28 mm × 28 mm, 0.5-mm pitch, 208-pin PQFP GNZ = Same as GJL
GNY = Same as GLS GDK = 23 mm × 23 mm, 0.8-mm ball pitch, 548-pin BGA
GDH = 17 mm × 17 mm, 1.0-mm pitch, 256-pin BGA RFP = 22 mm × 22 mm, 0.5-mm ball pitch, 144-pin PowerPAD™ PQFP
GDP = 27 mm × 27 mm, 1.27-mm ball pitch, 272-pin BGA ZDH = 17 mm × 17 mm, 1.0-mm pitch, 256-pin BGA
GTS = 23 mm × 23 mm, 1.0-mm ball pitch, 288-pin BGA ZTZ = 24 mm × 24 mm, 0.8-mm ball pitch, 697-pin plastic BGA
GJC = 35 mm × 35 mm, 1.27-mm ball pitch, 352-pin BGA
TMS320C6000™ DSP Development Tools
• Please note that all C6000™ DSP tools support all C6000 platform members (C62x™, C67x™, and C64x™ DSPs and DM64x™ digital media processors) unless otherwise noted.
• Most tools support Windows® 98/2000/NT and XP. Please check with your distributor or the tools folder on TI’s DSPvillage for operating system support on specific products.
On-Line Training
A variety of free on-line training courses is available and on-line training courses including: • TMS320C6000™, TMS320C5000™ and TMS320C2000™
addresses all aspects of using TI devices and tools. • DSP basics DSP platforms
Designed for worldwide access 24/7, these courses vary in • DSP applications • Analog
length and range from beginner overviews to advanced, • Easy-to-use software development tools • Power supplies
highly technical design information. Learn more about how • DSP programming tips and tricks For a complete list of available courses, visit
to design your signal processing application with self-paced http://www.ti.com/onlinetraining
One-Day Workshops
One-day workshops are designed to offer product or tech- Video and Audio Applications Design Hands-On DSP/BIOS™ OS One-Day Workshop
nology knowledge and more advanced information about a Workshop Based on TMS320DM642 Digital • Key elements of a real-time DSP system
particular category of devices. These workshops include a Media Processor • Practical designing and problem solving in multithreaded
significant “hands-on” section and are ideal introductions to • Getting started on a new video and audio design applications
get started with DSP. A list of available courses and sched- • Hardware platform based on DM642 digital media • Minimizing overhead
ule information can be found at processor • Real-time analysis and debug
http://www.ti.com/1dayworkshops • MPEG-4 technology • Real-time scheduling and resource management
• ADPCM audio compression technology • Host and target communications
TMS320C6416/C6713 DSK One-Day Workshop
• Introduction to TMS320C6000™ DSPs and Code Composer • Digital video security solution on DM642 – video security
Studio™ IDE application example
• C6000™ DSP peripherals
• Using the C6000 DSP system tools and software
• Optimizing C6000 DSP code
Multi-Day Workshops
Multi-day workshops are for engineers who need to sharpen • Evaluate and use C6000 DSP boot loader – Debugging software and visualizing data using break-
their design and development skills. These workshops • Setting up a bootable image in Flash ROM points
include significant “hands-on” labs emphasizing the demon- • Program the DSK on-board Flash memory – Visualizing software performance and data during exe-
stration and application of techniques and skills. TI workshops cution using DSP/BIOS kernel
C6000 DSP Optimization Workshop
are highly beneficial in helping developers implement their • Integrate system and application software into a real-time
• C6000 DSP platform CPU architecture
DSP designs quickly. A list of available courses and schedule design:
• C6000 DSP platform CPU pipeline
can be found at http://www.ti.com/multidayworkshops – Interfacing to and configuring DSP/BIOS kernel
• Building Code Composer Studio projects
– Synchronizing events and access to shared data
TMS320C6000™ DSP Integration Workshop • Exploring C6000 DSP compiler build options
structures using DSP/BIOS kernel
• Use Code Composer Studio™ IDE • Writing efficient C code
– Communicating between processes and with peripher-
• Design a real-time double-buffered system • Writing optimized standard and linear assembly code
al devices using DSP/BIOS kernel
• TMS320C6711 Design Starter Kit (DSK) • Mixing C and Assembly language
• Analyze and optimize software to meet real-time require-
• DSP/BIOS™ kernel • Software pipelining techniques
ments
• Debugging with real-time analysis • Numerical issues with fixed-point processors
– Analyzing real-time performance of software using
• Set up peripherals using the Chip Support Library • Basic C6000 DSP system memory management
DSP/BIOS kernel
• Discuss the McBSP serial ports multi-channel features • How caches work and optimizing their usage
– Calculating and optimizing I/O buffering
• Use the EDMA advanced features (auto-initialization,
DSP/BIOS™ Kernel One-Day Workshop – Optimizing the use of program and data memory
interrupt synchronization)
• Define a real-time system design and its software design
• C6000™ DSP system memory management Registration
challenges
• C6000 DSP cache operation To register for these workshops, please visit
• Apply software development tools in developing a system:
• Design your DSP system to allow code/data overlays in http://www.ti.com/multidayworkshops
– Generating and loading software for a specific target
memory
Internet
TI Semiconductor Product Information Center
Home Page
support.ti.com