Ug934 Axi VideoIP
Ug934 Axi VideoIP
Ug934 Axi VideoIP
Introduction
This section summarizes the AXI4-Stream interface Video protocol as fully defined in the
Video IP: AXI Feature Adoption section of the AXI Reference Guide (UG1037).
Video IP
s_axis_video0_tdata m_axis_video0_tdata
From s_axis_video0_tvalid m_axis_video0_tvalid To
Video IP or s_axis_video0_tready m_axis_video0_tready Video IP or
AXI VDMA s_axis_video0_tlast m_axis_video0_tlast AXI VDMA
s_axis_video0_tuser m_axis_video0_tuser
s_axis_video1_tdata m_axis_video1_tdata
From s_axis_video1_tvalid m_axis_video1_tvalid To
Video IP or s_axis_video1_tready m_axis_video1_tready Video IP or
AXI VDMA s_axis_video1_tlast m_axis_video1_tlast AXI VDMA
s_axis_video1_tuser m_axis_video1_tuser
clk_proc
aclk_s
aclken_s
aresetn_s
aclk_m
aclken_m
aresetn_m
Figure 1‐1: Video IP with Multiple AXI4-Stream Slave (Input) and Master (Output) Interfaces
Blank periods, audio data, and ancillary data packets are not transferred through the video
protocol over AXI4-Stream. All signals listed in Table 1-1 and Table 1-2 are required for
video over AXI4-Stream interfaces.
Table 1-1 shows the interface signal names and functions for the input (slave) side
connectors. To avoid naming collisions, the signal prefix s_axis_video should be
appended to s_axis_videok, for IP with multiple AXI4-Stream input interfaces, where k is
the index of the respective input AXI4-Stream; for example, axis_video_tvalid
becomes s_axis_video0_tvalid for stream 0 and s_axis_video1_tvalid for
stream 1.
Table 1-2 shows the interface signal names and functions for the output (master) side
connectors. Similarly, for IP with multiple AXI4-Stream output interfaces, the signal prefix
m_axis_video should be appended to m_axis_videok_, where k is the index of the
respective output AXI4-Stream; for example, axis_video_tvalid becomes
m_axis_video0_tvalid for stream 0 and m_axis_video1_tvalid for stream 1.
READY/VALID Handshake
A valid transfer occurs whenever READY, VALID, ACLKEN, and ARESETn signals are High at
the rising edge of ACLK, as shown in Figure 1-2.
X-Ref Target - Figure 1-2
During valid transfers, DATA only carries active video data. Blank periods and ancillary data
packets are not transferred by video over AXI4-Stream.
Data Format
To transport video data, the DATA vector encodes logical channel subsets of the physical
DATA signals. AXI4-Stream interfaces between video modules can facilitate the transfer of
video using different precision (e.g., 8, 10, or 12 bits per color channel), and/or different
formats (e.g., RGB or YUV 420) and different number of pixels per data beat.
The recommended parameter names for component data width is C_tk_DATA_WIDTH. The
optional format parameter C_tk_VIDEO_FORMAT can help the IP determine the number of
color components present on DATA using a HDL function. Video IP typically requires
specific formats on the input interfaces and can have the number of color component
channels hard coded in the IP. However, when C_tk_VIDEO_FORMAT (set by a default value
on the master interface) is propagated in HDL designs to slave interfaces, the IP source
code can perform DRC using assertions to ensure that AXI4-Stream video interfaces are
driven by video that was encoded in the expected format.
Encoding
The DATA bits are represented using the [N-1:0] bit numbering convention (N-1 through
0). The components of implicit subfields of DATA should be packed tightly together; for
example, a DW=10 bit RGB data packed together to 30 bits. If necessary, the packed data
word should be zero padded with most significant bits (MSBs) so the width of the resulting
word is an integer that is a multiple of eight as shown in Figure 1-4.
X-Ref Target - Figure 1-4
40 32 24 16 8 bit 0
Figure 1‐4: Video Data Padding for TDATA for Multiple Pixels
Table 1‐4: Video Format Codes and Data Representation for C_tk_MAX_SAMPLES_PER_CLOCK =1
VF Code Video Format [4DW-1: 3DW] [3DW-1: 2DW] [2DW-1: DW] [DW-1:0]
0 YUV 4:2:2 V/U, Cr/Cb Y
1 YUV 4:4:4 V, Cr U, Cb Y
2 RGB R B G
3 YUV 4:2:0 V/U, Cr/Cb Y
4 YUVA 4:2:2 V/U, Cr/Cb Y
5 YUVA 4:4:4 V, Cr U, Cb Y
6 RGBA R B G
7 YUVA 4:2:0 , V/U, Cr/Cb Y
8 YUVD 4:2:2 D V/U, Cr/Cb Y
9 YUVD 4:4:4 D V, Cr U, Cb Y
10 RGBD D R B G
11 YUV 4:2:0 D V/U, Cr/Cb Y
Mono/Bayer
12 Y, RGB, CMY
Sensor (RAW)
13 Custom2 2 Components – No DRC
14 Custom3 3 Components – No DRC
15 Custom4 4 Components – No DRC
Note: For any of the 4:2:2 and 4:2:0 formats, Cb (or U) and Cr (or V) samples are split over two data
beats but only in a one sample per clock mode. The first data beat holds Cb (or U); the second data
beat holds Cr (or V). In other words, the first active pixel of the frame contains [Cb0:Y0] and the next
pixel contains [Cr0:Y1]. The 4:2:0 format adds vertical subsampling to the 4:2:2 format, which is
implemented in Video over AXI4-Stream by omitting the chroma data on every other line.
Note: Bayer Sensor data is also referred to as RAW data, which is generally in
RAW8/RAW10/RAW12/RAW14/RAW16, etc. formats.
When multiple pixels or samples are transferred using the video protocol over AXI4-Stream,
color components pertinent to the individual pixels are arranged according to Table 1-5,
presenting examples for transferring two pixels for video modes 0, 1, 2, 3, 12. Pixel data is
packed continuously without any padding between pixels. When N*DW is not an integer
multiple of 8, video data is zero padded on the MSBs, as presented on Figure 1-5. If the line
size is not divisible by the number pixels/samples per data beat, then the last beat of the
line should use the LSBs. Then, the unused pixel in the MSBs of the last data beat of the line
should be padded with zeros.
X-Ref Target - Figure 1-5
64 56 48 40 32 24 16 8 bit 0
IMPORTANT: Although this specification supports dynamically changing the number of pixels/samples
per data beat, this is not recommended because not all IPs support this feature.
64 56 48 40 32 24 16 8 bit 0
X22096-121018
Figure 1‐6: One Pixel per Data Beat, Eight Bits per Component over a Two-Pixel per Data Beat, 10-Bits
per Component Bus
Pixel 1 Pixel 0
64 56 48 40 32 24 16 8 bit 0
X22097-121018
Figure 1‐7: Two Pixels per Data Beat, Eight Bits per Component over a Two-Pixel per Data Beat,
10-Bits per Component Bus
Figure 1-8. captures RGB888 (pixel with three components, component width of 8).
X-Ref Target - Figure 1-8
Pixel 1 Pixel 0
88 80 72 64 56 48 40 32 24 16 8 bit 0
X22098-121018
Figure 1‐8: Two Pixels per Data Beat, Eight bits per Component (RGB888, VF Code 2) over a
Two-Pixel per Data Beat, 14-bits per Component Bus
Notes:
1. Each G,B,R component sits in 14-bit component space with MSB alignment.
Figure 1-9. captures RAW14 (pixel with single component, component width of 14).
X-Ref Target - Figure 1-9
Pixel 1 Pixel 0
88 80 72 64 56 48 40 32 24 16 8 bit 0
X22099-121018
Figure 1‐9: Two Pixels per Data Beat, 14 Bits per Component (RAW14, VF Code 12) over a
Two-Pixel per Data Beat, 14-bits per Component Bus
Notes:
1. Although RAW14 may only use the lower 28 bits, the full AXI4-Stream interface remains 88-bits because it must
accommodate the possibility of switching to RGB at full 14-bits per color if requested when dealing with dynamic
TDATA. Down stream logic must be aware of this and should provide the appropriate bus interface and then
internally discard bits if it does not use them.
Comparing the two data type component widths in Figure 1-8 and Figure 1-9, the RAW14,
VF Code 2 data type has 14-bit component and RGB888 (VF Code 2) 8-bit component.
Therefore, the RGB888 components are placed with MSB aligned and LSB padded with zeros
on 14-bit component bus. Additionally, the RAW14 pixels are packed tightly together.
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits
RGB / YUV444 V2 Y3 U2 Y2 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
X22100-121018
Figure 1‐10: Quad Pixels Data Format (Max Bits Per Component = 16)
A data format for a fully compliant AXI4-Stream video protocol dual pixel per clock is
illustrated in Figure 1-11.
X-Ref Target - Figure 1-11
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
16-bits 16-bits 16-bits 16-bits 16-bits 16-bits 16-bits
YUV422 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits
96 80 64 48 32 16 0
X22101-121018
Figure 1‐11: Dual Pixels Data Format (Max Bits Per Component = 16)
When the parameter, Max Bits Per Component, is set to 12, video formats with actual bits
per component larger than 12 is truncated to the Max Bits Per Component. The remaining
least significant bits are discarded. If the actual bits per component is smaller than Max Bits
Per Component set in the Vivado® IDE, all bits are transported with the MSB aligned and
the remaining LSB bits are padded with 0. This applies to all Max Bits Per Component
settings.
As an illustration, when Max Bits Per Component is set to 12, Figure 1-12 shows the data
format for quad pixels per clock to be fully compliant with the AXI4-Stream video protocol.
A data format for a fully compliant AXI4-Stream video protocol with dual pixels per clock is
illustrated in Figure 1-13.
X-Ref Target - Figure 1-12
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G 0/ Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
YUV422 V2 Y3 U2 Y2 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
Figure 1‐12: Quad Pixels Data Format (Max Bits Per Component = 12)
X-Ref Target - Figure 1-13
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
YUV422 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits
72 60 48 36 24 12 0
X22103-121018
Figure 1‐13: Dual Pixels Data Format (Max Bits Per Component = 12)
When the parameter, Max Bits Per Component, is set to 12, video formats with actual bits
per component larger than 12 is truncated to the Max Bits Per Component. The remaining
least significant bits are discarded. If the actual bits per component is smaller than Max Bits
Per Component set in the Vivado IDE, all bits are transported with the MSB aligned and the
remaining LSB bits are padded with 0. This applies to all Max Bits Per Component settings.
As an illustration, when Max Bits Per Component is set to 12, Figure 1-14 shows the data
format for quad pixels per clock to be fully compliant with the AXI4-Stream video protocol.
A data format for a fully compliant AXI4-Stream video protocol with dual pixels per clock is
illustrated in Figure 1-15.
X-Ref Target - Figure 1-14
R3 / B3 / G3 / R2 / B2 / G2 / R1 / B1 / G1 / R0 / B0 / G0 /
RGB / YUV444
V3 U3 Y3 V2 U2 Y2 V1 U1 Y1 V0 U0 Y0
8-bits
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R3 / V3 B3 / U3 G3 / Y3 R2 / V2 B2 / U2 G2 / Y2 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
YUV422 V2 Y3 U2 Y2 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
• • • • • • •• • • • • •
Figure 1‐14: Quad Pixels Data Format (Max Bits Per Component = 12)
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
8-bits 8-bits 8-bits 8-bits 8-bits 8-bits 8-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
10-bits 10-bits 10-bits 10-bits 10-bits 10-bits 10-bits
RGB / YUV444 R1 / V1 B1 / U1 G1 / Y1 R0 / V0 B0 / U0 G0 / Y0
12-bits 12-bits 12-bits 12-bits 12-bits 12-bits 12-bits
YUV422 V0 Y1 U0 Y0
12-bits 12-bits 12-bits 12-bits 12-bits
72 60 48 36 24 12 0
• • • • • • •• • • • • •
Figure 1‐15: Dual Pixels Data Format (Max Bits Per Component = 12)
The video interface can also transport quad and dual pixels in the YUV420 color space.
Similarly, for YUV 4:2:0 deep color (10, 12, or 16 bits), the data representation is the same.
The only difference is that each component carries more bits (10, 12, and 16). When
transporting using AXI4-Stream, the data representation need to be compliant to the
protocol defined in UG934. With the remapping feature, the same native video data will be
converted into AXI4-Stream formats, which is shown in Figure 1-16. The 4:2:0 format adds
vertical subsampling to the 4:2:2 format, which is implemented in Video over AXI4-Stream
by omitting the chroma data on every other line.
X-Ref Target - Figure 1-16
clk
tvalid
tdata[71:60]
tdata[59:48]
tlast
The subsystem provides full flexibility to construct a system using the configuration
parameters, maximum bits per component and number of pixels per clock. Set these
parameters so that the video clock and link clock are supported by the targeted device. For
example, when dual pixels per clock is selected, the AXI4-Stream video need to run at
higher clock rate comparing with quad pixels per clock design. In this case, it is more
difficult for the system to meeting timing requirements. Therefore the quad pixels per clock
data mapping is recommended for design intended to send higher video resolutions.
Some video resolutions (for example, 720p60) have horizontal timing parameters (1650)
which are not a multiple of 4. In this case the dual pixels per clock data mapping must be
chosen.
In addition to extracting video pixel data from the input stream and sending it to
subsequent modules using video over AXI4-Stream, the interface modules must measure
timing information (including the number of pixels per scan-line, number of active rows per
frame, and so on) when receiving video from a standard periodic video source such as DVI,
HDMI, SDI, or an image sensor. Input interface modules make this information available to
video processing and output interface modules, which then recreate periodic timing signals
and embed output video pixel data that was processed by the video system to recreate a
periodic output stream such as DVI (Figure 2-1).
X-Ref Target - Figure 2-1
Video AXI4-Stream
Chroma Enhance
DVI to AXI4-S AXI4-S AXI4-S To DVI
Resampler
AXI4-Stream Video
Video Video
Timing AXI4-Lite Timing
Detector Generator
uBlaze or A9
AXI4-Lite master
Figure 2-1 illustrates the extraction and propagation of timing information. The Video In to
AXI4-Stream input interface and Video Timing Detector cores measure timing information,
and extract video pixel data. It then transmit the data using the AXI4-Stream (represented
by the AXI4-S arrows in Figure 2-1). Timing information is propagated through optional
AXI4-Lite interfaces. When present, the system processor (AXI4-Lite master) reads out
measured timing information from the timing detector, and programs subsequent
processing cores and the timing generator using the AXI4-Lite control register interfaces.
When instantiated without an AXI4-Lite control interface, video cores can only process a
fixed video format / resolution, specified in the core GUI. In Figure 2-1, the Chroma
Resampler and Enhance cores process the video stream. The processing cores might
contain line buffers for which the number of active pixels per scan line is necessary. The
processing cores receive active size (number of pixels per scan line, number of scan lines
per frame) measurement values, among other timing parameters from the Video Timing
Detector module, which is used with the DVI input interface IP. Processing cores also verify
the data by employing pixel counters between subsequent EOL pulses. The AXI4-Stream to
Video output interface core generates Standard Sync, Blank and Active Video timing
signals, as defined by the timing information received, and embeds the video pixel data as
received over the AXI4-Stream input interface.
The Video to AXI4-Stream and AXI4-Stream to Video cores are delivered as HDL source
code and provided as examples to expedite custom interface development. For embedded
systems using a processor acting as an AXI4-Lite master or dedicated IP acting as the
AXI4-Lite master, an AXI4-Lite pCore interface should be provided with a standardized
register API to present timing information. For more information, see AXI4-Lite Interface in
Chapter 3.
Using the TUSER signal to transmit periodic sync information, such as hsync or vsync
along with the video data is prohibited as there are no guarantees on IP delay consistency
(aperiodicity), and delay matching between DATA and TUSER bits through IP cores.
Furthermore, when video data is written and retrieved from frame buffers, sync information
from TUSER is not recovered.
Reset Requirements
Hardware Reset
Each AXI interface must be designed to accommodate entering or exiting a reset on a
different (preceding or subsequent) cycle than the interface to which it is connected.
Specifically, an IP core must not rely on another connected IP being reset simultaneously
during the same cycle. Video IP should be designed so that any reset of the AXI4-Stream
interfaces re-initializes the IP to reduce disruption on the output video stream.
Although Xilinx® IP can generally have multiple AXI interfaces connected to isolated
interconnection networks to support the localized reset of some interfaces, it is not
recommended. As a practical system design guideline, the reset source(s) should be held
active internally for some minimum number of cycles (of the slowest clock in the system) to
ensure that all IP is properly reinitialized and all AXI interfaces go into the quiescent state
prior to releasing the reset. If internal extension of the reset pulse is not throughble, video
IP data sheets specify the required reset pulse-width, if greater than one cycle.
As stated in the Xilinx AXI Reference Guide guidelines, it is recommended that all AXI
interfaces in a system be globally reset together. When resetting multiple video cores in a
system, all interfaces must be reset before any interface comes out of reset. Video IP should
accept and drop (not propogate) valid samples until the SOF signal is received.
AXI4-Stream interfaces must deassert their VALID and READY outputs while in reset. This
does not need to commence immediately upon sampling the reset input active, but in time
to allow the network of connected IP to reach a quiescent reset state prior to the
deassertion of reset at any IP. This allows for arbitrary (but reasonable) internal pipe-lining
of reset inputs, including resynchronization to a different clock domain, if necessary.
Software Reset
When resetting multiple video cores within a system, all interfaces must be reset before any
interface comes out of reset. When reset is performed in the software (which
asserts/deasserts software reset flags sequentially), the IP cores should be reset from the
output towards the input. The software reset pin of video IP closest to the system output
should be asserted first. Subsequent cores near the signal source should then be reset.
Software reset pins should be deasserted in the same sequence.
If permitted by the application, provide a soft software reset option (SSR) for the video IP,
where reset is synchronized with video frame boundaries. If sufficient time is available
between video frames, (for example, a vertical blanking period is present), a soft reset after
the predicted end-of-frame can facilitate the reset of individual cores without negatively
impacting system performance.
• Is it possible to hold up the input video stream? Is there a back pressure signal?
• Must the output stream be phase-locked to an external Frame Sync signal?
• Are the input and output video clocks the same or phase-locked to each other?
• Timed video input, such as DVI, that cannot be delayed. Timed video output using the
same video clock. For automatic delay matching, synchronization is necessary.
• Input and output are in unrelated clock domains (scaled video), and a frame buffer is
necessary.
In all cases, the input interface module is expected to have a “locked” output, originating
from the VTC timing detector. The VTC timing detector issues a signal when the input timing
measurements are stabilized. The input interface module is expected to drop pixel data until
input timing has locked.
• no frame buffer
• a periodic input stream that cannot be held off
• an output video pixel clock that is either the same, or a derivative of the input pixel
clock, and
• the output video stream does not have to be phase locked to an external Frame Sync
signal.
X-Ref Target - Figure 2-2
Output timing
generator stopped
at first active pixel
Yes
Waiting for
Input buffer to fill
up over 50%
Output timing
generator running
locked to input
video clock
Figure 2‐2: Output Timing Generator Control Flowchart for Unconstrained Output Video Stream
This scenario applies to a sensor image pre-processing pipeline, where input and output
pixel rates are identical, and the output timing generator does not have to be locked to an
external frame sync source. After power on or reset, the output AXI4-Stream interface
deasserts READY, and the output timing signal generator state machine is initialized to wait
in the state just before the start of active video.
Note: In this case, the function of READY is limited to what the internal buffers allow if the input
stream cannot be held back.
The output timing generator waits for the input interface to signal that timing information
has stabilized (locked). Now, the output AXI4-Stream interface should assert READY, which
propagates backward towards the input of the pipeline. As a result, pixel data is propagated
down the pipeline. Processed pixel data reaches the output interface module when its
VALID input is sampled high. When the input data buffer of the output video interface gets
50% full, the output timing generator can start generating periodic output sync/blank
signals, and pixel data can be fed forward to the output.
External Memory
AXI-
AXI-
VDMA
VDMA
Video Video Video AXI4-Stream
DVI to AXI4-S Processing AXI4-S Video AXI4-S Processing AXI4-S To DVI
Video to
AXI4-Stream Pipeline from Pipeline Video
Frame
Frame
Buffer
Buffer
clk
Video Video
Timing AXI4-Lite Timing
Fsync
Detector Generator
uBlaze or A9
AXI4-Lite master
Figure 2‐3: Example System with Output Sync Tied to an External Frame Sync Signal
The portion of this system relevant to output stream synchronization is the leg from the
frame buffer to the output interface core, which can contain processing cores. These
processing cores can change the effective pixel rate. The example presented in Figure 2-4
uses a video scaler, which typically changes the pixel rate, and can operate in three different
clock domains:
The choice of external frame buffer for AXI4-Stream based IP video systems is the
AXI-VDMA core, which must be configured to the desired frame size using an AXI4-Lite
interface. Figure 2-4 illustrates timing information (from an input interface core, or from
software) distributed using this interface.
Fsync Ext
Fsync
AXI- AXI4-
VDMA AXI4-S Stream DVI
Video Scaler AXI4-S
To
Video
frame size
timing data
frame ptr
mode
Ext
Video clk
Timing
AXI4-Lite Generator
1. External output clock (Ext clk) is different from the input clock, but there is no
external Fsync signal.
2. Output Timing Generator needs to be locked to an external Fsync.
3. External Fsync driving the AXI-VDMA readouts.
After power up or reset, the output interface core should deassert READY and set all
outputs to defaults until timing information is locked (Figure 2-7). The AXI-VDMA should be
configured with the write side being Fsync and Genlock master. When the input buffer of
the Video output core is 50% filled with data from the AXI-VDMA, the output timing signal
generation should commence. When the timing generator gets to the phase where active
video needs to be sent, but pixels are not present yet, blank frames should be generated. If
the output interface data buffer gets full, the output interface core should deassert TREADY.
For scenario two, the setup and protocol are identical, but the video timing generator
should be configured to sync with the external Fsync.
For scenario 3, a frame sync signal originating from the output timing generator or an
external fsync is used to trigger AXI-VDMA frame reads. If an external frame sync signal is
present, ensure that the phase relationship between the external Fsync pulse and the VTC
generator Fsync allows pixel data to be fetched from the AXI-VDMA and propagated
through subsequent cores between the AXI-VDMA and the output interface module. This
allows data and timing signals on the output interface to be synchronized.
A good example of this is when the external frame sync is in phase with the start of vertical
blanking. If output pixels are needed immediately, this sync is too late to trigger readout
from the AXI-VDMA.
The timing generator core contains logic which can generate frame sync pulses at arbitrary
phases after the generator is generating periodic timing signals. For scenarios when the
external frame sync is too late to trigger data readout, an earlier, regenerated frame sync
pulse should be used. This ensures that pixel data gets to the output interface core before
it needs to be sent in phase with the periodic output timing signals.
For video systems with a Frame Buffer but no external output frame sync source, the
AXI-VDMA core can automatically fetch the last frame finished on the write-side to be
picked up immediately when the read size is in idle (reading a frame has completed).
When pixel data propagates to the output interface core, the output interface core should
deassert its READY output and start driving pixel data using READY to maintain synchrony
between the input pixel flow and output sync signals.
When an out-of-sync external frame sync pulse is received, output timing generation should
re-initialize. A new fsync pulse should be generated for the AXI-VDMA, and input pixels
from the existing frame should be dropped until the arrival of the SOF pulse. If necessary, a
blank frame should be sent on the output until sync is reestablished.
If the external frame sync pulse is not present when expected, output timing generation
should continue freewheeling.
Input Interface cores should not start sending incomplete frames. If the timed video source
is disconnected or reconnected, or when the system recovers from reset or power-up, the
input AXI4-Stream interface core should wait until the start of the first frame after timing is
locked before sending data over the AXI4-Stream master interface.
There could be a discrepancy between measured frame dimensions based on EOL and SOF
locations and the frame dimensions provided to the VTC generator side and the processing
cores through the core GUI or the AXI4-Lite register interface.
If the SOF and EOL framing signals occur early, processing cores should immediately start
processing the new line or new frame. If the framing signals are late, processing cores
should purge partial frames by dropping pixels until the expected SOF or EOL signal is
received.
The choice of timing signal sets should be specified when generating the VTC core.
Figure 2-5 shows a typical example of connecting the VID-IN and VTC cores to downstream
video processing cores (“Video IP Sink”) through AXI4-Stream interfaces.
X-Ref Target - Figure 2-5
INTC_IF[8] - locked
axis_enable
aclk
aclken
aresetn
Figure 2‐5: Connecting the Video to AXI4-Stream Core to the Video Timing Controller
• The VID-IN core should not start sending data to downstream core(s) until they are
enabled and initialized.
• The VID-IN core should not start sending data to downstream cores until the VTC cores
is enabled, initialized, and locked.
After the start of streaming video, bootup, or resetting the system, the VTC core can take
more than a full frame of data to accurately measure all timing parameters. During this time
the locked status bit of the VTC, available through bit 8 of the optional INTC_IF interface, is
0. It is recommended to connect INTC_IF[8] to the axis_enable input of VID-IN core. This
hardware configuration ensures that no video is sent before the VTC is locked.
Xilinx recommends that the VTC detector be enabled only after the rest of the downstream
processing cores are all initialized and enabled. Otherwise, the output FIFO within the
VID-In core can become full while downstream cores initialize in the pipe, ultimately
resulting to lost pixels, lines, and/or frames of video.
If the downstream IP core need to know the input resolution before it can be configured,
the you should:
Memorey
From Ext
Memorey
To Ext
AXI-VDMA Fsync in
Layer
S-M_Gen-lock
AXI Master AXI Slave
AXI4-Stream M-S Gen-lock AXI4-Stream
(External Memory Write Side) (External Memory Read Side)
S_fsync
M_fsync
X22104-121018
As illustrated in Figure 2-6, the AXI-VDMA core supports one master and one slave
interface. Slave/Master interfaces can:
• Use any input SOF signals, or an external Frame Sync input as a source to initiate Frame
transfers (AXI-VDMA Frame sync crossbar).
• AXI Master interfaces to use any AXI Slave interfaces to be the Gen-lock master.
• AXI Slave interfaces to use any AXI Master interfaces to be the Gen-lock master
(Genlock crossbar).
Using a Frame sync crossbar enables video systems with a Frame Buffer, but without
external output Frame sync source, to automatically retrieve the last frame finished on the
write-side. This is picked up immediately after reading a frame has completed on the read
side.
Some IP cores, such as the Video On-Screen Display, can have multiple read channels (slave
interfaces) which must be synchronized. You might need multiple instances of a slower core
running in parallel to achieve sufficient throughput. These parallel core instances can use
multiple write channels (master interfaces), which must be synchronized. Operating modes
for single write - multiple read ports:
When the number of pixels between subsequent EOL pulses is less than the line-length
programmed into the AXI-VDMA core, the core triggers an interrupt indicating the error.
The AXI-VDMA line pointer moves forward to the next line. Data received after received EOL
is written to the start of a new line. No padding data is written to the frame buffer to
complete the line as programmed to the core.
When the number of pixels between subsequent EOL pulses is more than the line-length
programmed into the AXI-VDMA, the core triggers an interrupt indicating the error and
drops extraneous pixels until EOL is received.
When the number of lines between subsequent SOF pulses is less than the line-length
programmed into the AXI-VDMA, the core triggers an interrupt indicating the error and the
frame pointer moves forward to the next line. Data received after received SOF is written to
the next frame in the buffer. No padding data is written to the buffer to complete the frame
as programmed to the AXI-VDMA core.
When the number of lines between subsequent SOF pulses is more than the line-length
programmed into the AXI-VDMA, the core triggers an interrupt indicating the error and
drops extraneous lines until SOF is received.
Multipoint Interfaces
Some applications require a single AXI4-Stream master interface connected to multiple
slaves, such as a stream splitter, or multiple master interfaces to be connected to a single
slave, such as a stream combiner.
For video applications, the use of stream combiners is discouraged. Without the TID and
TDEST fields, pixel sources are ambiguous. The recommended solution is to create separate
slave component interfaces on the receiver IP to the IP to distinguish data received from
different sources, if necessary. No explicit video IP is provided to split AXI4-Streams. HDL
and EDK users can easily implement the video splitter with AND gates.
The example above assumes downstream target interfaces asserting READY as soon as the
target is ready to receive data, independent from VALID. Otherwise, a small, distributed
memory based FIFO must be inserted between the splitter and the target to avoid
deadlocks.
Ancillary Data
Ancillary data (which includes audio, teletext, captions, or metadata) is digital data
embedded in a video stream. Because video over an AXI4-Stream interface is not packetized
to carry video and non-video data, ancillary data must be deembedded or discarded by the
input interface and transmitted from front-to-end using a separate (AXI or non-AXI)
auxiliary channel, as seen in Figure 2-7.
X-Ref Target - Figure 2-7
clk
Video Video
Timing AXI4-Lite Timing Fsync
Detector Detector
uBlaze or A9
AXI4-Lite master
When video frame rates change, buffering, re-sampling, and other processing may be
required on ancillary data. This must be done separately from the Video over AXI4-Stream
interface by deembedding the ancillary data before the frame rate change, processing it,
and reembedding it into the video stream after the frame rate change.
This effectively doubles the time resolution (also called temporal resolution) as compared
to non-interlaced footage (for frame rates equal to field rates). Interlaced signals require a
display that is natively capable of showing the individual fields in a sequential order. CRT
displays and ALiS plasma displays are made for displaying interlaced signals.
• Each field consists of a different set of lines. The set of odd lines is separated in time
from the set of even lines.
• The timing may vary on a per frame basis. Because there are usually an odd number of
lines per frame, the number of total lines per field is different by one line. Moreover,
this line difference may appear in the active period or in the blanking period
depending on the particular line standard. This means that timing intervals may be
different in odd frames and even frames.
• There is a need to distinguish fields from each other. For progressive video, it is
sufficient to mark video frames, because the timing and line composition of each frame
is identical, however for interlace the two frames must be distinguished from each
other, and the correct set of lines must be presented with frame timing for the picture
to be displayed properly.
NTSC
vblank
20 244 20 243
3 4 265 266 3 4
field_id
PAL
line# 623 624 625 1 22 23 310 311 335 336 623 624 6251 22 23
vblank
24 288 25 288
field_id
1080i
1123
1124
1125
1123
1124
1125
560
561
583
584
line#
20
21
20
21
1
1
vblank
22 540 23 540
1125
1125
563
564
1
1
field_id
Deinterlacer
The Video Deinterlacer converts live incoming interlaced video streams into progressive
video streams. Interlaced images may have temporal motion between the two fields that
comprise an interlaced frame. The conversion to a progressive format recombines these two
fields into one single progressive scan frame. The combining of interlaced video streams
results in unsightly motion artifacts in the progressive output image. For this reason, the
Video Deinterlacer can be configured to use three field buffers and produce progressive
frames based on a combination of spatial and temporal processing. This core is part of
Video processing sub system, refer to the Video Processing Subsystem LogiCORE PIP Product
Guide (PG231)for more information on this IP.
Video In to AXI4-Stream
The Xilinx LogiCORE IP Video In to AXI4-Stream core is designed to interface from a video
source (clocked parallel video data with synchronization signals - active video with either
syncs, blanks or both) to the AXI4-Stream Video Protocol Interface. This core works with the
timing detector portion of the Xilinx Video Timing Controller (VTC) core. This core provides
a bridge between a video input and video processing cores with AXI4-Stream Video
Protocol interfaces. The interlace content is supported by field_id signal.
Master Slave
Data in Data in
Lvalid Lvalid
data_valid data_valid
Lready Lready
vblank vblank
Ldata Ldata
hblank hblank
Llas |EOL| Llas |EOL|
vsync vsync
Luser|Q||SOF| Luser|Q||SOF|
hsync hsync
field_id field_id
axi_field_id axi_field_id
VTIMING
X22106-121018
Figure 2‐10: Video System with Interlaced Content Using AXI4-Stream Bridges
Most video processing cores are field-agnostic, and not aware of whether the picture being
processed is an odd or even frame, or a progressive field. Therefore, interlace has no impact
on these cores. The Video In to AXI4-Stream core has a frame ID output, fid, timed to the
native video bus. This signal can be used as needed in the system. The only cores that use
this fid bit are the AXI4-Stream to Video Out.
AXI4-Stream to Video Out core aligns the axi_field_id signal with the field_id signal
generated by Video Timing Controller module. You can directly connect the field_id
signal to AXI4-Stream to Video Out core bypassing the Video processing cores as shown in
Figure Figure 2-10 only when latency of the processing core is less than one Video frame. If
the latency is more than one video frame, respective video processing cores should delay
the field id signal accordingly.
On the Video In to AXI4-Stream core, the fid bit changes coincident with SOF and remains
constant throughout the remainder of the field. On the AXI4-Stream to Video Out core, the
fid bit is sampled coincident with SOF in Figure 2-11. Therefore, the Video In to
AXI4-Stream can provide the field bit directly to the AXI4-Stream to Video Out core if no
intervening frame buffer exists. When a deinterlacer or frame buffer is used, a similar
scheme can be employed: generate the field ID coincident with the start of the field, and on
the receiving side sample the field ID coincident with the first received pixel.
X-Ref Target - Figure 2-11
Axi_field_id
X22107-121018
Frame buffer read/Write and Video Deinterlacer cores. The AXI4-Stream to Video Out core
has a field ID input (fid), sampled in time with the AXI4-Stream input bus. This fid bit
must be asserted by the upstream source of AXI4-Stream video. For systems without a
frame buffer or deinterlacing, the field ID input originates from the Video In core, as shown
in Figure 2-12.
X-Ref Target - Figure 2-12
DDR
FrameBuffer Red
Data in Data in
data_valid data_valid
vblank vblank
hblank hblank
vsync vsync
hsync field_id hsync
field_id field_id
axi_field_id axi_field_id
VTIMING
X22108-121018
Figure 2‐12: Video System with Interlaced Content Using Frame Buffer Write/Read
For systems with a frame buffer, the field ID input can come from any core containing a
frame buffer. The field ID from the Video In to AXI4-Stream core can be used by the frame
buffer if necessary, shown in Figure 2-12.
Note: In Figure 2-12, the AXI4-Stream to Video Out core is operating in slave mode.
Video In to AXI4-Stream
Video Input
Deinterlacer
data_valid
vblank
hblank
vsync axi_field_id
hsync
field_id
VTIMING
Video Timing
Controller
(detector)
X22109-121018
Each pipeline must be reset, configured, reconfigured, enabled, or disabled starting from
the output (back-end) moving toward the input (front-end). The following is a list of typical
video pipeline operations that must be performed from back-end to front-end:
In general, to initialize a video pipe, the following operations should be performed in this
order:
If a video subsystem contains more than one video pipeline, then each pipeline can be
operated upon individually. However, in most applications the input (front-end) pipelines
should be operated upon first, before back-end pipelines to avoid invalid data to be
processed and/or displayed.
Note: Pipelines are operated upon from front-end to back-end. Cores within a pipeline are operated
upon from back-end to front-end.
• Pipeline 1:
° Video to AXI4-Stream
° Video IP 1
° Video IP 2
° AXI4-Stream to Video
X-Ref Target - Figure 2-14
Video AXI4
To Video IP Video IP Stream
HDMI AXI4
AXI4-S 1 2
AXI4-S To
HDMI
Stream Video
AXI4-S
AXI4-S
Video Video
Timing VDMA Video VDMA Timing
Controller 1
AXI4-S Scaler
AXI4-S 2 Controller
Detector Generator
To bring up this system in software, the following operations should be performed in the
following order:
1. Initialize core drivers (Perform One time only) using the <core>_CfgInitialize()
functions.
2. Bring up Pipeline 1 (Input Video Pipeline)
a. SW Reset AXI VDMA 1 (S2MM Channel)
b. SW Reset Video IP 1
c. SW Reset VTC detector
d. Configure AXI VDMA 1 (S2MM Channel)
e. Configure Video IP 1
f. Configure VTC detector
g. Enable AXI VDMA 1 (S2MM Channel)
h. Enable Video IP 1
i. Enable VTC detector
3. Bring up Pipeline 2 (Scaler Pipeline)
a. SW Reset AXI VDMA 2 (S2MM Channel)
b. SW Reset Scaler
c. SW Reset AXI VDMA 1 (MM2S Channel)
d. Configure AXI VDMA 2 (MM2S Channel)
e. Configure Scaler
To reconfigure this system, perform the above operations except step 1 (Initialize core
drivers).
Note: VDMA S2MM and MM2S channels should be reset, configured, reconfigured and enabled
separately. Each VDMA channel should be treated as individual cores belonging to separate video
pipelines. Avoid operating on both channels at the same time. The channel operations should by
synchronized to the pipeline in which the channel belongs.
The following C code snippet shows the code needed to bring up the VDMA 1, Scaler,
VDMA 2 pipeline:
#include <stdio.h>
#include "platform.h"
#include "xparameters.h"
#include "xscaler.h"
#include "xaxivdma.h"
////////////////////////////////////////////////////////////////////
// Global Defines
////////////////////////////////////////////////////////////////////
#define VIDIN_FBADDR 0x31800000
#define SCALEROUT_FBADDR 0x33000000
#define VDMA_CIRC 1
#define VDMA_NOCIRC 0
#define VDMA_EXT_GENLOCK 0
#define VDMA_INT_GENLOCK 2
#define VDMA_S2MM_FSYNC 8
#define COEFF_SET_INDEX 0
////////////////////////////////////////////////////////////////////
// Function Prototypes
////////////////////////////////////////////////////////////////////
void vdma_init(XAxiVdma *VDMAPtr, int device_id);
int vdma_reset(XAxiVdma *VDMAPtr, int direction);
int vdma_setup(XAxiVdma *VDMAPtr,
int direction,
int width,
int height,
int frame_stores,
int start_address,
int mode
);
void scaler_init(XScaler *ScalerPtr, int device_id);
int scaler_setup(XScaler *ScalerInstPtr,
int ScalerInWidth,
int ScalerInHeight,
int ScalerOutWidth,
int ScalerOutHeight);
////////////////////////////////////////////////////////////////////
// Global Core Driver Structures
////////////////////////////////////////////////////////////////////
XAxiVdma VDMA1;
XAxiVdma VDMA2;
XScaler Scaler;
////////////////////////////////////////////////////////////////////
// Function: configure_scaler_pipeline()
// Configure Scaler Pipeline (Pipeline 2)
////////////////////////////////////////////////////////////////////
int configure_scaler_pipeline(
int input_x,
int input_y,
int output_x,
int output_y)
{
int Status;
////////////////////////////////////////////////////////////
// Initialize Drivers – Order not important
// Do after clocks are setup
///////////////////////////////////////////////////////////
vdma_init (&VDMA1, 0);
vdma_init (&VDMA2, 1);
scaler_init(&Scaler, 0);
///////////////////////////////////////////////////////////////////////////
// Pipeline 2: Reset Cores
///////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////
// Pipeline 2: Configure Cores
///////////////////////////////////////////////////////////////////////////
printf("Setting up VDMA Writer...\n");
vdma_setup(&VDMA2,
XAXIVDMA_WRITE,
output_x,
output_y,
3,
SCALEROUT_FBADDR,
VDMA_NOCIRC|VDMA_INT_GENLOCK);
printf("Setting up Scaler...\n");
scaler_setup(&Scaler, input_x, input_y, output_x, output_y);
///////////////////////////////////////////////////////////////////////////
// Pipeline 2: Enable cores
///////////////////////////////////////////////////////////////////////////
XScaler_Enable(&Scaler);
return 1;
}
///////////////////////////////////////////////////////////////////
// Function: vdma_init()
// Initialize VDMA Driver
////////////////////////////////////////////////////////////////////
void vdma_init(XAxiVdma *VDMAPtr, int device_id)
{
int Status;
XAxiVdma_Config *VDMACfgPtr;
VDMACfgPtr = XAxiVdma_LookupConfig(device_id);
if (!VDMACfgPtr)
{
Status = XAxiVdma_CfgInitialize(VDMAPtr,
VDMACfgPtr,
VDMACfgPtr->BaseAddress
);
if (Status != XST_SUCCESS) {
printf( "ERROR: VDMA Configuration Initialization failed %d\r\n",
Status);
}
}
////////////////////////////////////////////////////////////////////
// VDMA Channel Reset
////////////////////////////////////////////////////////////////////
int vdma_reset(XAxiVdma *VDMAPtr, int direction)
{
int Polls;
if (!Polls) {
printf( "ERROR: VDMA %s channel reset failed %x\n\r",
(direction==XAXIVDMA_READ)?"Read":"Write", 0);
return XST_FAILURE;
}
return 1;
}
////////////////////////////////////////////////////////////////////
// VDMA Channel Configure/Setup
////////////////////////////////////////////////////////////////////
int vdma_setup(XAxiVdma *VDMAPtr, int direction, int width, int height, int
frame_stores, int start_address, int mode)
{
int Status, i, Addr;
XAxiVdma_DmaSetup DmaSetup;
DmaSetup.EnableCircularBuf = mode&1;
DmaSetup.EnableSync = mode&1;
//Only set the number of frames if the VDMA can support more that we need
//NOTE: the VDMA debug features for write to the frame store
// num reg must be enabled.
if(VDMAPtr->MaxNumFrames > frame_stores)
{
Status = XAxiVdma_SetFrmStore(VDMAPtr, frame_stores, direction);
if (Status != XST_SUCCESS) {
return XST_FAILURE;
}
}
return XST_FAILURE;
}
return XST_FAILURE;
}
if(direction==XAXIVDMA_WRITE)
{
// use the TUSER bit for the frame sync for the write (S2MM side)
XAxiVdma_FsyncSrcSelect(VDMAPtr,
XAXIVDMA_S2MM_TUSER_FSYNC,
XAXIVDMA_WRITE);
}
else
{
if(mode&0x08)
{
// VDMA Read (MM2S side) for the scaler input must be synced
// to the S2MM frame Sync
XAxiVdma_FsyncSrcSelect(VDMAPtr,
XAXIVDMA_CHAN_OTHER_FSYNC,
XAXIVDMA_READ); // DMA_CR[6:5] = 0b01
}
else
{
// VDMA 2 Read (MM2S side) must be not by synced and in free run
// Its timing is governed by the output VTC generator
// and AXI4-Stream to Video Out
XAxiVdma_FsyncSrcSelect(VDMAPtr, XAXIVDMA_CHAN_FSYNC, XAXIVDMA_READ);
// DMA_CR[6:5] = 0b00
}
}
Status);
return XST_FAILURE;
}
return 1;
}
////////////////////////////////////////////////////////////////////
// Initialize Scaler Driver
////////////////////////////////////////////////////////////////////
void scaler_init(XScaler *ScalerPtr, int device_id)
{
int Status;
XScaler_Config *ScalerCfgPtr;
ScalerCfgPtr = XScaler_LookupConfig(device_id);
if (!ScalerCfgPtr)
{
}
////////////////////////////////////////////////////////////////////
// Scaler Configure/Setup
////////////////////////////////////////////////////////////////////
int scaler_setup(XScaler *ScalerInstPtr,
int ScalerInWidth, int ScalerInHeight,
int ScalerOutWidth, int ScalerOutHeight)
{
u8 ChromaFormat;
u8 ChromaLumaShareCoeffBank;
u8 HoriVertShareCoeffBank;
/*
* Disable the scaler before setup and tell the device not to pick up
* the register updates until all are done
*/
XScaler_DisableRegUpdate(ScalerInstPtr);
XScaler_Disable(ScalerInstPtr);
/*
* Load a set of Coefficient values
*/
CoeffBank.SetIndex = COEFF_SET_INDEX;
CoeffBank.PhaseNum = ScalerInstPtr->Config.MaxPhaseNum;
CoeffBank.TapNum = ScalerInstPtr->Config.VertTapNum;
/*
* Load phase-offsets into scaler
*/
StartFraction.LumaLeftHori = 0;
StartFraction.LumaTopVert = 0;
StartFraction.ChromaLeftHori = 0;
StartFraction.ChromaTopVert = 0;
XScaler_SetStartFraction(ScalerInstPtr, &StartFraction);
/*
* Set up Aperture.
*/
Aperture.InFirstLine = 0;
Aperture.InLastLine = ScalerInHeight - 1;
Aperture.InFirstPixel = 0;
Aperture.InLastPixel = ScalerInWidth - 1;
Aperture.OutVertSize = ScalerOutHeight;
Aperture.OutHoriSize = ScalerOutWidth;
XScaler_SetAperture(ScalerInstPtr, &Aperture);
/*
* Set up phases
*/
XScaler_SetPhaseNum(ScalerInstPtr, ScalerInstPtr->Config.MaxPhaseNum,
ScalerInstPtr->Config.MaxPhaseNum);
/*
* Choose active set indexes for both vertical and horizontal directions
*/
XScaler_SetActiveCoeffSet(ScalerInstPtr, COEFF_SET_INDEX,
COEFF_SET_INDEX);
/*
* Enable the scaling operation
*/
XScaler_EnableRegUpdate(ScalerInstPtr);
return 1;
}
To a memory subsystem, this translates to periods of bursts of video data the size of the
active video frame size followed by burst gaps the length of the video blanking period.
Therefore, for a given video frame, there are periods that require a certain peak bandwidth,
or BW peak, followed by quiescent periods of no data transmittal. This equates to a peak
bandwidth requirement, or BW peak.
BWpeak is calculated from the data width, or bits-per-pixel (bpp), and from the video pixel
clock frequency, Fvid. F vid can be calculated from the video frame rate (Fframe) measured in
frames-per-second, the number of lines-per-frame (including blanking lines) and the
number of pixel clock-cycles-per-line (including blanking clock cycles), shown in
Equation 2-1.
The BW peak is calculated by multiplying the Video Pixel clock frequency by the number of
bits-per pixel, shown in Equation 2-2.
The average bandwidth requirement is defined as the overall number of bits within a frame
over a one entire times the frame rate video frame period (not just during the bursts). This
is the average bandwidth and is always lower than the peak bandwidth requirement. For a
given video frame period, the average bandwidth is BWave. This is shown in Equation 2-3.
The B Wave is calculated the same as BW peak by multiplying the Video Pixel clock frequency
by the number of bits-per pixel. This is shown in Equation 2-4.
It is important to keep the BWpeak and B Wave in mind when designing video subsystems, as
these numbers define the clock frequencies and data width of the video IP core(s) and of
the memory subsystem.
BWmem BWmem
Memory
Subsystem
Bandwidth Examples
Scaling: Down-Scaling/Decimation
In the down-scaling system case, the input video frame is larger than the output video
frame. The average bandwidth of the output is less than the input.
Down-scaling Memory-to-Memory
Figure 2-17 shows an example of down-scaling a video frame. It assumes that the video
frame is read from external memory and written back to external memory. This allows for a
slower minimum operating clock frequency and a lower bandwidth requirement.
Input
Output
720
480
1280 640
For the example in Figure 2-17, Table 2-1 shows the input and output minimum frequency
and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Figure 2-18 shows an example of down-scaling a video frame. It assumes that the input
video frame is from live external video and the output video frame is to live external video.
The bandwidth requirement in this case is the peak bandwidth and has to take into account
bursts of video at the higher frequency.
Input
Output
480
720
525
750
1280 640
800
1650
In the example in Figure 2-18, Table 2-2 shows the input and output minimum frequency
and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Table 2‐2: Down-scaling Live-Video Example Minimum Bandwidth and Frequency Requirements
Input Output
Minimum Frequency 74.25 MHz 25.20 MHz
Figure 2-19 shows a video system that includes live-external video (peak) bandwidth and
memory (average) bandwidth requirements.
AXI4-MM
AXI4-MM
AXI4-MM
AXI4-MM
From Video AXI4- AXI4 AXI4 AXI4 Video
VDMA 1 Scaler VDMA 2 To Display
Camera In Stream Stream Stream Stream Out
In Figure 2-19, 720p live video frames are written to external memory with a bandwidth of
1.19 Gb/s. These frames can be read at an average bandwidth of 0.88 Gb/s. These frames are
then downscaled and written at an average bandwidth of 0.29 Gb/s. The downscaled frames
can then be read from external memory at a peak bandwidth of 0.40 Gb/s to display to
external video.
Thus, video input bandwidth is BW peak of input size. Video output bandwidth is BW peak of
output size. Intermediate memory read bandwidth is BWave of input size. Intermediate
memory write bandwidth is B Wave of output size.
Table 2‐3: Downscaling Subsystem Example Total Bandwidth and Minimum Frequency Requirements
Video Input Memory Read
(Memory Write) Memory Read Memory Write (Video Output) Total/Maximum
Minimum 74.25 MHz 55.3 MHz 18.43 MHz 25.20 MHz 74.25 MHz
Frequency (Max)
1.19 Gb/s 0.88 Gb/s 0.29 Gb/s 0.40 Gb/s 2.76 Gb/s
Minimum (Sum)
Bandwidth 1.48 Gb/s (W)
1.28 Gb/s (R)
Up-Scaling
In the up-scaling case, the input video frame is smaller than the output video frame. The
average bandwidth of the output is more than the input.
Up-scaling Memory-to-Memory
Figure 2-20 shows an example of up-scaling a video frame. It assumes that the video frame
is read from external memory and written back to external memory. This allows for a slower
minimum operating clock frequency and a lower bandwidth requirement.
X-Ref Target - Figure 2-20
Output
Input
720
480
640 1280
In the example in Figure 2-20, the Table 2-4 shows the input and output minimum
frequency and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Table 2‐4: Up-scaling Mem-to-Mem Example Minimum Bandwidth and Frequency Requirements
Input Output
Minimum Frequency 18.43 MHz 55.3 MHz
Figure 2-21 shows an example of up-scaling a video frame. It assumes that the input video
frame is from live external video and the output video frame is to live external video. The
bandwidth requirement in this case is the peak bandwidth and has to take into account
bursts of video at the higher frequency.
X-Ref Target - Figure 2-21
Output
Input
720
480
525
750
640 1280
800 1650
In the example in Figure 2-21 and Table 2-5 describe the input and output minimum
frequency and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Table 2‐5: Up-scaling Live-Video Example Minimum Bandwidth and Frequency Requirements
Input Output
Frequency 25.20 MHz 74.25 MHz
Figure 2-22 shows a video system that includes live-external video (peak) bandwidth and
memory (average) bandwidth requirements.
X-Ref Target - Figure 2-22
AXI4-MM
AXI4-MM
AXI4-MM
From Video AXI4- AXI4 AXI4 AXI4 Video
VDMA 1 Scaler VDMA 2 To Display
Camera In Stream Stream Stream Stream Out
In the Figure 2-22, 640x480p live video frames are written to external memory with a
bandwidth of 0.40 Gb/s. These frames can be read at an average bandwidth of 0.29 Gb/s.
These frames are then up-scaled and written at an average bandwidth of 0.88 Gb/s. The
upscaled frames can then be read from external memory at a peak bandwidth of 1.19 Gb/s
to display to external video.
Thus, video input bandwidth is BW peak of input size. Video output bandwidth is BW peak of
output size. Intermediate memory read bandwidth is BWave of input size. Intermediate
memory write bandwidth is B Wave of output size.
Table 2‐6: Up-scaling Subsystem Example Total Bandwidth and Minimum Frequency Requirements
Video Input Memory Read
(Memory Write) Memory Read Memory Write (Video Output) Total/Maximum
Minimum
25.20 MHz 18.43 MHz 55.3 MHz 74.25 MHz 74.25 MHz (Max)
Frequency
2.76 Gb/s (Sum)
Minimum
0.40 Gb/s 0.29 Gb/s 0.88 Gb/s 1.19 Gb/s 1.28 Gb/s (W) 1.48
Bandwidth Gb/s (R)
Zoom
In the zoom system case, the input video frame is the same as the output video frame. The
average bandwidth at the output is same as the input.
Zoom Memory-to-Memory
Figure 2-23 shows an example of zooming a video frame. It assumes that the video frame is
read from external memory and written back to external memory. This allows for a slower
minimum operating clock frequency and a lower bandwidth requirement.
X-Ref Target - Figure 2-23
Input Output
720
1280 1280
In the example in Figure 2-23, Table 2-7 shows the input and output minimum frequency
and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Table 2‐7: Zoom Mem-to-Mem Example Minimum Bandwidth and Frequency Requirements
Input Output
Minimum Frequency 55.3 MHz 55.3 MHz
Input Output
720
750
1280 1280
1650 1650
In the example in Figure 2-24 and Table 2-8 describe the input and output minimum
frequency and minimum bandwidth requirements, assuming a data width of 16 and a 60
frames-per-second frame rate.
Table 2‐8: Zoom Live-Video Example Minimum Bandwidth and Frequency Requirements
Input Output
Minimum Frequency 74.25 MHz 74.25 MHz
AXI4-MM
AXI4-MM
AXI4-MM
From Video AXI4- AXI4 AXI4 AXI4 Video
VDMA 1 Scaler VDMA 2 To Display
Camera In Stream Stream Stream Stream Out
In the Figure 2-25, 1280x720p live video frames are written to external memory with a
bandwidth of 1.19 Gb/s. A 640x480 region in the video frame is read at an average
bandwidth of 0.29 Gb/s. These frames are then up-scaled and written at an average
bandwidth of 0.88 Gb/s. The upscaled frames can then be read from external memory at a
peak bandwidth of 1.19 Gb/s to display to external video (Same as the input).
Thus, video input bandwidth is BW peak of input size. Video output bandwidth is BW peak of
output size. Intermediate memory read bandwidth is BWave of input size. Intermediate
memory write bandwidth is B Wave of output size.
Table 2‐9: Zoom Subsystem Example Total Bandwidth and Minimum Frequency Requirements
Video Input Memory Read
(Memory Write) Memory Read Memory Write (Video Output) Total/Maximum
Minimum
74.25 MHz 18.43 MHz 55.3 MHz 74.25 MHz 74.25 MHz Max
Frequency
3.55 Gb/s (Sum)
Minimum
1.19 Gb/s 0.29 Gb/s 0.88 Gb/s 1.19 Gb/s 2.07 Gb/s (W)
Bandwidth
1.48 Gb/s (R)
IP Development Guide
IP Parameterization
General IP configuration parameters are not covered in this specification. However,
commonly used video IP parameters generally are listed in Table 3-1.
Only one video format can be supported in video IP core systems that use an AXI-4
interface without an embedded processor. For this configuration (C_HAS_AXI4_LITE=0),
you can define the supported resolution through generic parameters C_ACTIVE_ROWS and
C_ACTIVE_COLS defined in the core GUI. When C_HAS_AXI4_LITE=0, C_ MAX_COLS
should be equal to C_ACTIVE_COLS.
When an embedded processor is present and the Video core is instantiated with an
AXI4-Lite interface (C_HAS_AXI4_LITE=1), generic parameters C_ACTIVE_ROWS and
C_ACTIVE_COLS assign default values to control registers to define the active resolution.
As an upper bound on the active scanline length supported by the core instance,
C_MAX_COLS is used to define line buffer depths, which have a direct effect on block RAM
footprint. For example, a video core, instantiated to service 720p video (1650 total pixels,
1280 active pixels per line), needs to have C_MAX_COLS set to 1280. This core instance is
not be able to service 1080p video, but works with 720p or any lower resolutions, such as
480p, when the active_size register in the AXI4-Lite control interface is set according to
720p or 480p.
C_MAX_COLS refers to the maximum number of non-blank pixels a core instance must
service. This parameter is often used to allocate block RAMs for line buffers within the core.
For example, a core instance targeting resolutions up to 720p must have this parameter set
to 1280.
General IP Structure
Video IP cores should provide an AXI4-Lite interface option to allows dynamic read and
write processing parameters, status and control data, and timing parameters. For
embedded systems using either a processor or dedicated IP acting as the AXI4-Lite master,
an AXI4-Lite interface should be provided with a standardized register API. For systems
without an embedded processor, video cores should provide a way to be instantiated,
supporting one fixed video resolution.
Figure 3-1 is a schematic for a typical video processing core with one AXI4-Stream slave
input, one AXI4-Stream master output, and an AXI4-Lite interface. In this example, the IP
core processing the input and the output AXI4-Stream interfaces are apart of the same
clock domain (ACLK), but the AXI4-Lite processor interface input is in the AXI4-Lite
processor clock domain. Typically the AXI4-Lite interface does not use the same clock as the
AXI4-Stream video slave and master interfaces. Therefore, the IP should contain
Clock-Domain Crossing (CDC) logic to facilitate re-sampling the AXI4-Lite register data to
the processing core clock domain.
X-Ref Target - Figure 3-1
DATA DATA
Core Signal
Processing
Function
VALID VALID
READY READY
CE
IRQ
SCLR
ACLK
ACLKEN
ARESE Tn
Core sw_en
AXI4 Lite CDC sw_rst
Register
Logic user
Interface
timing
X22110-121018
Figure 3‐1: General Video IP Structure with AXI4-Lite and AXI4-Stream Interfaces
All video IP cores should contain control logic to govern the propagation of VALID and
READY signals, enable/disable/initialize the core Signal Processing Function, manage
internal buffers, generate SOF and EOL signals, and monitor error conditions. See READY –
VALID Propagation and Flushing Pipelined Cores for more information.
AXI4-Lite Interface
Many video applications have an embedded processor that can dynamically monitor and
control processing parameters within IP cores. The AXI4-Lite interface provides a
standardized API across which core functionality can be controlled and monitored. Layers of
the API consist of a memory-mapped interface with programmable registers, a low level
driver to identify physical memory locations, and higher level driver functions to control
multiple registers or complex processes. The proposed standard set of memory mapped
registers is described in Table 3-2.
Control Register
The SW_ENABLE flag, located on bit 0 of the CONTROL register, allows the core to be
dynamically enabled or disabled. Disabling the core from software has similar effects as
deasserting ACLKEN. When disabled, the core AXI4-Lite decoding units remain active to
facilitate re-enabling the core. The default value of Software Enable is 1 (enabled).
Flags of the CONTROL register are not buffered, which means changes take effect
immediately. The application or higher-level driver functions need to deassert these flags to
re-enable status/error acquisition.
If the core does not provide an AXI4-Lite interface, the IP should be configured to provide
notification of critical status and error events through a dedicated set of pins. These pins
can be connected to an external interrupt controller (INTC) core in an EDK system to
facilitate interrupt requests, identification, and clearing of interrupt sources. For this
application, it is recommended that the dedicated output signals remain asserted only as
long as the status or error event persists.
Any bits of the STATUS register can generate a host-processor interrupt request through
the IRQ pin. The Interrupt Enable register facilitates selecting which bits of STATUS register
asserts IRQ. Bits of the STATUS registers are masked by (AND) corresponding bits of the
IRQ_ENABLE register and the resulting terms are combined (OR) together to generate IRQ.
For more information, see Debugging Features.
Bit fields of the Version register facilitate software identification of the exact version of the
hardware peripheral incorporated into a system. The core driver can use this Read-Only
value to verify that the software version is matched to the hardware. For more information,
see Debugging Features.
The SYSDEBUG0, or Frame Throughput Monitor, register indicates the number of frames
processed because power-up or the last time the core was reset. The SYSDEBUG registers
can be useful to identify external memory, Frame buffer, or throughput bottlenecks in a
video system. For more information, see Debugging Features.
The SYSDEBUG1, or Line Throughput Monitor, register indicates the number of lines
processed because power-up or the last time the core was reset. The SYSDEBUG registers
can be useful to identify external memory, Frame buffer, or throughput bottlenecks in a
video system. For more information, see Debugging Features.
The SYSDEBUG2, or Pixel Throughput Monitor, register indicates the number of pixels
processed because power-up or the last time the core was reset. The SYSDEBUG registers
can be useful to identify external memory, Frame buffer, or throughput bottlenecks in a
video system. For more information, see Debugging Features.
Register Synchronization
Most control registers that provide frame-by-frame control over processing should be
double-buffered to ensure no image tearing occurs if register values are modified while a
frame is being processed. Exceptions are registers which command immediate actuation
(CONTROL, STATUS, ERROR and IRQ_ENABLE registers) or need to be changed multiple
times within a frame (a readout or coefficient address register). With double buffering,
register writes are updating the first set of registers while the processing core uses values
from a second set of registers. All writable registers are also readable. Any reads from
writable registers return values that are stored in the first set of registers.
A semaphore mechanism allows you to update multiple registers without having all updates
take place within a single frame or between frames.
Values from the first register set should be copied over (committed) to the second register
set when processing cores receive the SOF signal and semaphore flag REG_UPDATE,
located on bit 1 of register CONTROL, is set.
Timing Representation
Timing information captures the phase/edge relationships between four periodic timing
signals:
Timing detector/timing generator modules provided as part of the Xilinx Video Timing
Controller core measure and regenerate timing signals. For an embedded processor with
AXI4-Lite interface, measured timing information is accessible through a standardized
register set, described inTable 3-3.
Blank/Sync Polarities
The input interface core automatically detects if timing signals (VSync, HSync, VBlank,
HBlank) are inverted. Periodic sync pulses are defined as Active Low if the low portion of
the signal is shorter than the high portion (signal pulses low). Bits 0 and 1 of timing variable
POLARITY correspond to VSync and HSync respectively, and should be set to 1 when
Active Low sync pulses are detected or to 0 when Active Low sync pulses are not detected
Periodic Blank signals are defined Active Low if the low portion of the signal is shorter than
the high portion because an active area is expected to be longer than the blanked area. Bits
2 and 3 of timing variable POLARITY correspond to VBlank and HBlank respectively, and
should be set to 1 when active low blank signals are detected or 0 when Active Low blank
signals are not detected.
The field periods for even (F0) and odd (F1) fields can differ. A frame period for interlaced
video is defined by the sum of two subsequent (odd + even) field periods. The frame
periods for both interlaced and progressive video is expected to be constant for any given
video format.
The intervals when both HBlank and VBlank are inactive mark the active video area of the
frame, where pixel data is considered valid and should be translated from a periodic
standard such as DVI to AXI4-Stream.
X-Ref Target - Figure 3-2
V Blank
V Sync
(SAV) Start HSIZE
Horizontal Blanking
Active Video
Vblank Start
(V EAV)
H Blank
H Sync
X22111-121018
The frame period contains blank and active areas and can be visualized as a set of
rectangles, as seen in Figure 3-2. In the top-left corner of the frame, pixel index 0 (scan line
index 0) is designated to be the first active pixel on the first complete active line.
The total number of scan lines per frame is defined as the number of scan-lines per frame,
or VSIZE. The timing variable VSIZE reflects the total number of active and blank lines per
frame. The index of the last scan line in a frame is VSIZE-1.
The number of video clock cycles between the HBlank pulses is expected to be equal to the
number of video clock cycles between the HSync pulses in each field. The timing variable
HSIZE reflects the total number of active and blank pixels per scan line. The index of the
last pixel in scan lines is HSIZE-1.
The Xilinx Video Timing Controller IP works with complete scan lines, so the total number
of video clock cycles in a frame period is expected to be an integer multiple of the total
number of pixels per scan line (HSIZE * VSIZE).
For progressive video, the period between the VBlank pulses is expected to have the same
number of video clock cycles as the period between the VSync pulses. For interlaced video,
the number of total scan lines in even and odd fields can differ. Therefore, two sets of timing
registers (F0 for even fields and F1 for odd fields) keep track of timing variables for
interlaced video fields.
For progressive video, only the F0 bank of timing registers are used.
The falling and rising edges of VBlank might not coincide with the falling edge of HBlank,
which could be visualized as VBlank falling on a pixel position other than 0 in a scan line
(Figure 3-2). Also, the phase difference between VBlank and HBlank can change between
even and odd fields. This phase difference between the falling and rising edges of VBlank
is captured in the nibbles of the registers F0_VBLANK_H and F1_VBLANK_H.
The phase relationships of the VSync and HSync signals can be arbitrary in relationship to
the first active pixel, the origin of the V/H coordinate system (Figure 3-2), and might be
different between even and odd fields. Nibbles in registers F0_VSYNC_V and F0_VSYNC_H
capture the horizontal and vertical positions of falling and rising edges of VSYNC for even
fields. Similarly, nibbles in registers F1_VSYNC_V and F1_VSYNC_H capture the horizontal
and vertical positions of falling and rising edges of VSYNC for odd fields.
The scan line index where VBlank transitions high1 (VBlank start) marks the vertical end
of the active area and the start of the vertical blank area. The pixel index where HBlank
transitions high1 (HBlank start) marks the horizontal end of the active area, and the start
of the horizontal blank area.
Nibbles of timing registers ACTIVE_SIZE denote the vertical (number of scan lines), and
horizontal sizes (number of pixels) in the active area.
Frame Encoding
Bits 0 to 3 (VIDEO_FORMAT) define the sampling structure of video using the video format
codes (VF) defined in Table 1-4. Bits 4-5 define the data representation, the number of bits
per component channel, as defined in Table 3-4.
Bit 6 (INTERLACED) should be set if the video processed is interlaced (1). For progressive
video, this bit should be set to 0. Corresponding Bit 7, indicates field polarity (0: even field,
1: odd field) if interlaced video is used. Processing cores should not expect the host
processor to update this register value on a frame-by-frame basis. Instead, the IP is
expected to toggle automatically after processing fields, using the programmed value as
the initial value for the first field after the value is committed.
Bit 8 (CHROMA_PARITY) of the ENCODING register specifies whether the first line of video
contains chroma information (1) or not (0) when YUV 420 encoded video is being
processed. Processing cores should not expect the host processor to update this register
value on a line-by-line basis to reflect whether the current line contains chroma or not.
Instead, the IP is expected to toggle automatically after each line was processed, using the
programmed value as the initial value for the first line of the first frame after the value is
committed. Table 3-5 provides example values for timing variable assignments for typical
video standards using 8 bit data.
Input/Output Timing
The recommended design convention for AXI4-Stream component interfaces suggests that
outputs should be registered or driven directly by flip-flops or FIFO/block RAM primitives.
Ideally, inputs are also registered but can be combinatorial. Combinatorial inputs can limit
Fmax so the amount of combinatorial logic present on inputs should be limited.
There must be no combinatorial paths between input and output signals on either master
or slave interfaces. Combinatorial paths between input and output signals are not
permitted across separate AXI4-Stream interfaces. In some cases, outputs driven by
combinatorial logic are a suitable design choice or a reasonable design trade-off, such as
when latency is critical. The IP core data sheet describes AXI4-Stream output signals that are
not registered.
Buffering Requirements
The output interface module does not start generating valid output frames until it receives
valid data on its input AXI4-Stream interface. However, after periodic output frame
generation starts, all cores in the processing pipeline should be able to provide data at the
rate required by the output standard.
For most output standards three different data rates should be defined. As an example,
720p30 video over DVI rates are used. Table 3-6 describes the three data rates.
Identifying the above rates helps determine what type of buffering is necessary, if any,
within or between processing cores. If a processing core can maintain the active pixel rate
indefinitely, such as a test-pattern generator core, no buffering is necessary.
• If a processing core cannot maintain the active pixel rate but can maintain the line pixel
rate, a line buffer is necessary on the processing core output.
• If a processing core cannot maintain the line pixel rate but can maintain the frame pixel
rate, a frame buffer is necessary on the processing core output. It is assumed that the
frame buffer IP also contains line buffers to smooth access bursts.
• If a processing core cannot maintain the frame pixel rate due to insufficient
throughput, no amount of buffering is sufficient to produce uninterrupted output
video for the desired output standard.
In this example, (Figure 3-3) a hypothetical output interface needs to generate frames with
320 clock cycles per line, with 200 active pixels per line. The external memory interface
retrieves pixels in 64 pixel bursts after which it is unavailable for 16 clock cycles. Core A
takes 3 clock cycles to generate 2 output pixels. Core B takes three line periods to generate
two active lines (no output for the 960 pixels, then 400 pixels consecutively).
X-Ref Target - Figure 3-3
X22112-121018
Although all cores (external memory, Core A, Core B) have the throughput necessary to
generate 200 pixels per 320 clock cycles on the average, the throughput degrades unless
there are line buffers on each core output when connected as a system. For example, if the
external memory provides data in 64 cycle bursts, Core A produces 42 output samples per
burst or 170 pixels per line. Core A requires the whole line period to produce the active
pixels, but it is forced to idle during the 4x16 cycles when the external memory is not
available.
To avoid processing bubbles, all cores should be appropriately buffered on the output of
the core as if the core was driving the output interface directly. Figure 3-3 illustrates the
scenario when processing cores can maintain the line-pixel rate, but cores need output
buffers to avoid processing bubbles. Green arrows represent subsequent cores reading
from the output buffers of preceding cores.
Buffer Management
Even if sufficiently deep line buffers (FIFOs) are present on the output of processing cores,
bubbles can form if buffers under-run. This can happen when a core master interface asserts
its VALID output immediately when the core output FIFO is not empty. In this case, data
percolates through a processing pipeline rapidly and trigger the output interface to start
output timing generation, after which output pixels have to be supplied consistently. Now,
if any of the cores cannot sustain the uninterrupted data rate and have to deassert its
VALID output, processing bubbles form, which eventually cause a buffer under-run at the
output interface core and break the output data–sync alignment.
X22113-121018
Figure 3-4 presents an example scenario when processing cores A and B run out of valid
samples mid-frame, so when the output interface asserts its ready output to start a new
line, samples must be retrieved from external memory and must be processed by Core A
and Core B, causing significant delay, which can break the sync - data alignment at the
output interface.
To avoid processing bubbles, cores should not assert the VALID signal on the output
interfaces until internal FIFOs are almost full and keep VALID asserted until output FIFOs
and internal pipeline stages are empty.
The READY output should be driven in a greedy fashion; asserted unless all pipeline stages
are full, internal FIFOs are almost full, and the master interface READY is sampled low, as
described in READY – VALID Propagation, or internal pipelines need to be flushed as
described in Flushing Pipelined Cores. This behavior ensures processing efficiency and
proper flushing of pipelines and processing systems at line and frame ends.
As stated in Input/Output Timing, the READY output on the slave interface and VALID
output on the master interface must be registered. This requirement inserts a propagation
delay of at least one clock cycle between the deasserted READY signal on the IP core slave
interface input and the master interface READY output. The logic controlling these outputs,
as well as the latching in of new pixels from the slave interface to internal FIFOs or pipeline
registers, must consider the scenario when all internal buffers (pipeline registers and FIFOs)
are full, the downstream slave interface just deasserted READY, but the upstream master
interface sends one more pixel due to the core master interface READY signal lagging
behind the slave interface.
To avoid pixel drops in the above situation, pipelined cores without internal FIFOs should
contain one (or more) additional pipeline stage(s) to accept one (or more) pixel(s). These
cores should keep the master interface READY output deasserted until the extra pipeline
stage is processed.
To mitigate the pixel drop condition for cores with internal FIFOs the master interface
READY output should be asserted unless:
• all pipeline stages are full, internal FIFOs are almost full, and the master interface
READY is sampled low.
• internal pipelines need to be flushed.
Instead, processing pipelines must be flushed at the end of each scan-line. If samples for
the next line (and next frame) are available immediately, processing cores can use these
samples. If samples are not available, processing cores can flush pipelines by repeating the
last valid pixel or applying a more sophisticated edge padding solution. If padding by zeros
or repeated samples from the next line are needed in preparation for the next line or next
frame, a processing core might deassert its READY input for as many clock cycles as it takes
to empty valid data samples from the pipeline or to pad and re-initialize for a new line.
Example IP
s_axis_video_tdata m_axis_video_tdata
s_axis_video_tvalid m_axis_video_tvalid
s_axis_video_tready m_axis_video_tready
s_axis_video_tlast m_axis_video_tlast
s_axis_video_tuser m_axis_video_tuser
aclk
aclken
aresetn
Figure 3‐5: Simple Video IP with One Slave and One Master AXI4-Stream Interfaces
When flushing is completed and the pipeline is empty, processing cores should assert the
READY output signals on the slave interfaces irrespective of the READY inputs of the master
interfaces, as seen in the READY_out and READY_in signals of Figure 3-5 and described in
READY – VALID Propagation.
X-Ref Target - Figure 3-6
Figure 3‐6: Inefficient Flushing Growing a Processing Bubble at the End of Frame
If the READY output signal (READY_out) assertion is delayed until the slave interface
READY_in is asserted, subsequent cores would keep inserting longer breaks between
lines/frames, as illustrated on Figure 3-6. In this example, the gap between frames/lines of
the input stream grows because the flushing periods of subsequent cores accumulate if the
IP core holds off re-asserting its READY_out output until its READY_in is asserted.
When SOF is detected early, the output SOF signal should be generated early as well,
meaning the previous frame is not padded to match programmed frame dimensions. When
SOF is detected late, extra lines/pixels from the previous frame should be dropped and the
output SOF signal should be generated according to the programmed values.
In accordance with End of Line Signal in Chapter 1, complex video IP can detect a
discrepancy between expected number of active pixels, as programmed by timing variables,
and the actual number of valid pixels received between consecutive EOL pulses.
When EOL is detected early, the output EOL signal should be generated early as well,
meaning the previous frame is not padded to match programmed frame dimensions. When
EOL is detected late, the output EOL signal should be according to programmed values and
extra pixels from the previous line should be dropped.
Interframe Reinitialization
Some video IP cores, such as the Image Statistics and Image Characterization, take
thousands of clock cycles to initialize between frames because block RAMs holding
statistical data must be cleared or large sets of metadata must be written to external
memory.
As a general recommendation, video IP cores should re-initialize at the end of the frame,
instead of at the beginning of the frame when the SOF pulse is received.
Interrupt Subsystem
Video processing cores should provide optional interrupt pin (IRQ). Timing and core
function related STATUS and ERROR flags, described in Table 3-2, should be individually
selectable to generate an interrupt.
In EDK, the interrupt controller (INTC) IP can be used to integrate IRQ pins for the system
processor. For Vivado® tools, you might need to create a custom built priority interrupt
controller to aggregate interrupt requests and identify interrupt sources.
Video IP core APIs, including registers and driver functions, should enable application
software developers to identify and clear interrupt sources within the IP.
Debugging Features
The following sections recommend video IP core features which ease and accelerate system
design, starting up and debug.
Version Register
Bit fields of the Version Register facilitate identification of the exact version of the hardware
peripheral incorporated into a system. The core driver uses this Read-Only value to verify
that the software is matched to the correct version of the hardware.
Use Flag BYPASS, located on bit 4 of the CONTROL register, to turn bypassing on (1) or off.
For single-clock-domain IP cores, this switch can control multiplexers in the AXI4-Stream
path. For applications where the input and output AXI4-Stream interfaces are in different
clock domains, the bypass multiplexers select between a clock-domain crossing FIFO
implemented using distributed memory and the actual video processing core.
Use Flag TEST_PATTERN, located on bit 5 of the CONTROL register to turn test-pattern
generation on (1) or off. This switch can control multiplexers driving the AXI4-Stream
master output and switch between the regular core processing output and the test-pattern
generator. When enabled, a set of counters should generate 256 scan-lines of color-bars,
each color bar 64 pixels wide, repetitively cycling through the colors Black, Red, Green,
Yellow, Blue, Magenta, Cyan, and White until the end of each scan line. After the Color-Bars
segment is processed, the remainder of the frame should be filled with a monochrome
horizontal + vertical ramp.
Throughput Monitors
To debug frame-buffer bandwidth limitation issues, and if possible allow video application
software to balance memory pathways, video IP cores should offer frame, line, and pixel
counter registers.
The recommended name and location of these registers are SYSDEBUG0, SYSDEBUG1 and
SYSDEBUG2, as indicated in Table 3-2. The registers should initialize to 0 after reset, but the
core might implement other, additional mechanisms to clear the counters.
Tool Support
For more information on designing and delivering IP using Vivado tools, see the Vivado
tools documentation at:
https://www.xilinx.com/cgi-bin/docs/rdoc?v=2018.3;t=vivado+userguides
EDK Compatibility
For native Xilinx EDK support, video IP must have a peripheral descriptor file (.mpd file), a
user interface file (.mui file), and driver files. The MPD file lists IP parameters and ports, and
identifies clock, reset, and interrupt pins.
Xilinx Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Xilinx
Support.
Solution Centers
See the Xilinx Solution Centers for support on devices, software tools, and intellectual
property at all stages of the design cycle. Topics include design assistance, advisories, and
troubleshooting tips.
• From the Vivado IDE, select Help > Documentation and Tutorials.
• On Windows, select Start > All Programs > Xilinx Design Tools > DocNav.
• At the Linux command prompt, enter docnav.
Xilinx Design Hubs provide links to documentation organized by design tasks and other
topics, which you can use to learn key concepts and address frequently asked questions. To
access the Design Hubs:
• In the Xilinx Documentation Navigator, click the Design Hubs View tab.
• On the Xilinx website, see the Design Hubs page.
Note: For more information on Documentation Navigator, see the Documentation Navigator page
on the Xilinx website.