Ug1449 Multimedia
Ug1449 Multimedia
Ug1449 Multimedia
Chapter 7: Debug.......................................................................................................... 64
Media Control for Capture Link Up......................................................................................... 64
Modetest for Display Link Up.................................................................................................. 68
GStreamer Debugging Techniques Using GST-launch.........................................................70
Chapter 1
Document Scope
This document describes the architecture and features of multimedia systems with the
combination of processing subsystem (PS), programmable logic (PL), and video code unit (VCU)
IP. Learning about this architecture helps you understand the complete multimedia system, and
facilitates integration with third party IP to develop custom multimedia pipelines. This document
lists several audio-video pipelines to demonstrate expanded and enhanced multimedia solutions.
It covers the VCU codec parameters that are required to fine-tune for various video use cases
and information for debugging the multimedia pipeline.
• System and Solution Planning: Identifying the components, performance, I/O, and data
transfer requirements at a system level. Includes application mapping for the solution to PS,
PL, and AI Engine.
• Video Pipelines
• Chapter 6: GStreamer Multimedia Framework
• Chapter 7: Debug
• Chapter 8: Performance and Optimization
• Hardware, IP, and Platform Development: Creating the PL IP blocks for the hardware
platform, creating PL kernels, functional simulation, and evaluating the AMD Vivado™ timing,
resource use, and power closure. Also involves developing the hardware platform for system
integration.
Chapter 2
To achieve high-performance and low-latency, advanced multimedia systems must have the right
processing engines and the capability to add custom logic. They must also support any-to-any
connectivity required by the wide range of multimedia devices available today. Traditionally,
these requirements suggest a multichip solution. While a multichip solution might provide the
required multimedia and connectivity functions, it can also lead to high power consumption.
A single chip solution like the Zynq UltraScale+ MPSoC is ideal for such scenarios. The existence
of custom logic for hardware acceleration, or any-to-any connectivity on the same device leads
to significant power savings. In addition to the mix of processing engines, hardware codec, and
support for custom logic, the Zynq UltraScale+ MPSoC places these components in a different
power domains with independent power rails. You can use this configuration to design optimized
power management schemes for the entire system. The Zynq UltraScale+ MPSoC is built upon
the 16 nm FinFET process node resulting in greater performance and lower power consumption,
and enabling the design of power-efficient next-generation multimedia systems.
The Zynq UltraScale+ MPSoC combines a powerful PS and user PL into the same device. The PS
features the Arm® flagship Cortex®-A53 64-bit quad-core or dual-core processor (APU),
Cortex®-R5F dual-core real-time processor (RPU), and graphical processor unit (GPU).
The Zynq UltraScale+ MPSoC device variants include dual application processor (CG) devices,
quad application processor and GPU (EG) devices, and video codec (EV) devices, creating
unlimited possibilities for a wide variety of applications. The Zynq UltraScale+ MPSoC EV devices
build on the powerful EG platform, and add an integrated H.264/H.265 video codec capable of
simultaneous encode and decode up to 4Kx2K (60fps). Designed with high definition video in
mind, EV devices are ideal for multimedia, advanced driver assistance systems (ADAS),
surveillance, and other embedded vision applications.
○ APU offloading
• 6-port DDR controller with ECC, supporting x32 and x64 DDR3, DDR3L, LPDDR3, LPDDR4,
DDR4 memory
• Integrated Platform Management Unit (PMU) supporting multiple power domains
• Integrated Configuration Security Unit (CSU)
• TrustZone support
• Peripheral and memory protection
For more information, see Zynq UltraScale+ Device Technical Reference Manual (UG1085).
The GPU accelerates both 2D and 3D graphics with a fully programmable architecture that
provides support for both shader-based and fixed-function graphics APIs. It includes anti-aliasing
for optimal image quality, with virtually no additional performance overhead.
For more information on GPU, refer to Chapter 5: Graphics Processing Unit of the Zynq UltraScale
+ Device Technical Reference Manual (UG1085).
DisplayPort Interface
The Zynq UltraScale+ MPSoC also includes a hardened DisplayPort (DP) interface module. The
DisplayPort interface is located in the PS and can be multiplexed to one of four dedicated high-
speed serial transceivers operating at up to 6 Gbps. This eliminates the need for additional
display chips to further reduce system BOM cost. The DisplayPort interface is based on the
Video Electronics Standards Association (VESA) V-12a specification and provides multiple
interfaces to process live audio/video feeds from either the PS or the PL, or stored audio/video
from memory frame buffers. It simultaneously supports two audio/video pipelines, providing on-
the-fly rendering features like alpha blending, chroma resampling, color space conversion, and
audio mixing. This block also includes a dedicated video Phased-Lock Loop (PLL) for generating
sync clocks.
For more information on the DP Interface, see Chapter 33: Display Port Controller of the Zynq
UltraScale+ Device Technical Reference Manual (UG1085).
VCU support for both H.264 and H.265 standards provides a compelling capability to develop
solutions in line with current market needs (H.264), as well as advance-generation requirements
(H.265). The ability to encode and decode simultaneously with low latency makes it a perfect fit
for video conferencing and transcoding from H.264 to H.265 or vice-versa. Multistream,
multicodec encoding and decoding suits the requirement for video surveillance applications such
as DVRs, video servers, and multistream IP camera head-ends.
VCU Higlights
○ HEVC: Main, Main Intra, Main 10, Main10 Intra, Main 4:2:2 10, Main 4:2:2 10 Intra up to
Level 5.1 High Tier
○ AVC: Baseline, Main, High, High10, High 4:2:2, High10 Intra, High 4:2:2 Intra up to Level
5.2
• Support simultaneous encoding and decoding of up to 32 streams with a maximum
aggregated bandwidth of 3840x2160 at 60fps
• Low latency rate control
• Flexible rate control: Constant Bit Rate (CBR), Variable Bit Rate (VBR), and Constant
Quantization Parameter (QP)
• Supports simultaneous encoding and decoding up to 4K UHD resolution at 60 Hz
IMPORTANT! 4k (3840x2160) video, and lower resolutions are supported in all speed grades.
However, 4K UHD (4096x2160) requires -2 or -3 speed grade.
For more information on VCU Encoder only and Decoder only configuration, see Design Flow
Steps in H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
Chapter 3
Multimedia PL IP
The Arm® Cortex®-A53 cores with the memory unit and many peripherals on the AMD Zynq™
UltraScale+™ MPSoC, play a strong role in managing and capturing the data from many different
sources before making it available to the VCU. PS peripherals such as USB and Ethernet are used
for streaming video devices such as camcorders, network cameras, and webcam. Custom logic is
designed in the PL to capture raw video from live sources such as serial digital interface (SDI) RX,
High-Definition Multimedia Interface (HDMI™) RX, and Mobile Industry Processor Interface
(MIPI) Camera Serial Interface (CSI) IPs. Similarly, video is displayed using the DisplayPort
controller in the PS or by creating a relevant IP such as HDMI TX, SDI TX, or MIPI Display Serial
Interface (DSI).
AMD provides a collection of multimedia IP, which are available in the AMD Vivado™ IP catalog.
The PL provides the required hardware for the IP, resulting in performance improvements
suitable for next-generation technology.
This chapter describes the AMD LogiCORE™ PL IP that are commonly used in multimedia
applications. These IP are categorized into the following types based on their functionality:
1. Video Capture
2. Video Display
3. PHY Controllers
4. Audio
5. Test Pattern Generators
Video Capture
The IPs that are part of the Capture pipeline, capture video frames into a DDR memory from
either a HDMI™ source, MIPI CSI-2 image sensor, SDI source, or test pattern generator (TPG).
Additionally, video can be sourced from a SATA drive, USB 3.0 device, or an SD card.
All capture sources uses the V4L2 framework. The V4L2 drivers can be a subset of the full AMD
LogiCORE™ PL IP specifications.
Features
For more information, refer to the HDMI 1.4/2.0 Receiver Subsystem Product Guide (PG236).
Features
○ SMPTE ST 424: 3G-SDI with data mapped by any ST 425-x mapping at 2.97 Gbps and
2.97/1.001 Gbps
○ SMPTE ST 2081-1: 6G-SDI with data mapped by any ST 2081-x mapping at 5.94 Gbps and
5.94/1.001 Gbps
○ SMPTE ST 2082-1: 12G-SDI with data mapped by any ST 2082-x mapping at 11.88 Gbps
and 11.88/1.001 Gbps
○ Dual link and quad link 6G-SDI and 12G-SDI supported by instantiating two or four SMPTE
UHD-SDI RX subsystems
For more information, see SMPTE UHD-SDI Receiver Subsystem Product Guide (PG290).
Features
For more information, see MIPI CSI-2 Receiver Subsystem Product Guide (PG232).
Sensor Demosaic
The AMD LogiCORE IP Sensor Demosaic core provides an optimized hardware block that
reconstructs sub-sampled color data for images captured by a Bayer image sensor. CMOS and
CCD images sensors leverage a Color Filter Array (CFA), which passes only the wavelengths
associated with a single primary color to a given pixel on the substrate. Each pixel is therefore
sensitive to only red, green, or blue. The Sensor Demosaic IP provides an efficient and low-
footprint solution to interpolate the missing color components for every pixel.
Features
For more information, refer to the Sensor Demosaic LogiCORE IP Product Guide (PG286).
Gamma LUT
The AMD LogiCORE IP Gamma Look-up Table (LUT) core provides an optimized hardware block
for manipulating image data to match the response of display devices. This core is implemented
using a look-up table structure that is programmed to implement a gamma correction curve
transform on the input image data.
Features
• Programmable gamma table supports gamma correction or any user defined function
• Three channel independent look-up table structure
• One, two, four, or eight pixel-wide AXI4-Stream video interface
For more information, refer to the Gamma Look Up Table LogiCORE IP Product Guide (PG285).
Features
For more information, see Video Processing Subsystem Product Guide (PG231).
Features
• Input streams can be read from memory mapped AXI4 interface or from AXI4-Stream
interface
• Supports up to eight streams in the memory mapped mode and one stream in the stream
mode
• Supports Y8 and Y10 formats for memory interface
• Supports RGB, YUV 444, YUV 422, and YUV 420 formats for stream interface
• Supports 8, 10, 12, and 16 bits per color component input and output on AXI4-Stream
interface
• Supports one, two, or four-pixel width for stream mode, and one-pixel width for memory
mode
• Supports spatial resolutions ranging from 64 × 64 up to 8,192 × 4,320
• Supports 4k 60 fps in all supported device families
• Supports 32-bit and 64-bit DDR memory address access
For more information, see Video Scene Change Detection LogiCORE IP Product Guide (PG322).
Features
• AXI4 Compliant
• Streaming Video Format support for: RGB, RGBA, YUV 4:4:4, YUVA 4:4:4, YUV 4:2:2, and
YUV 4:2:0
• Memory Video Format support for: RGBX8, BGRX8, YUVX8, YUYV8, UYVY8, RGBA8,
BGRA8, YUVA8, RGBX10, YUVX10, Y_UV8, Y_UV8_420, RGB8, BGR8, YUV8, Y_UV10,
Y_UV10_420, and Y8, Y10
• Provides programmable memory video format
• Supports progressive and interlaced video
• Supports 8-bit and 10-bit per color component on stream interface and memory interface
• Supports spatial resolutions from 64 x 64 up to 8192 x 4320
• Supports 4K60 in all supported device families
For more information, refer to the Video Frame Buffer Read and Video Frame Buffer Write LogiCORE
IP Product Guide (PG278).
VCU Sync IP
The VCU Sync IP core acts as a fence IP between the Video DMA IP and the VCU IP. It is used in
Multimedia applications that need ultra-low latencies. The VCU Sync IP does a AXI transaction
level tracking so that producer and consumer can be synchronized at the granularity of AXI
transactions, instead of granularity of Video Buffer level.
The VCU Sync IP is responsible for synchronizing buffers between the Capture DMA and the
VCU encoder. Capture hardware writes video buffers in raster scan order. The VCU Sync IP
monitors the buffer level while capture element is writing into DRAM, and allows the encoder to
read input buffer data if the requested data is already written by DMA. Otherwise, it blocks the
encoder until the DMA completes its writes.
Features
• The VCU Sync IP core can track up to four producer transactions simultaneously
• Each channel can track up to three buffer sets
• Each buffer set has Luma and Chroma buffer features
• Each consumer port can hold 256 AXI transactions without back-pressure to the consumer
• In encoder mode, the Sync IP core supports the tracking and control of four simultaneous
channels
For more information, see Chapter 7 of H.264/H.265 Video Codec Unit LogiCORE IP Product Guide
(PG252).
Video Display
The IPs that are part of the display pipeline read video frames from memory and send them to a
monitor through either the DisplayPort TX controller inside the PS, the SDI transmitter
subsystem through the PL, or the HDMI transmitter subsystem through the PL.
All display IP use the DRM/KMS driver framework. The available DRM / KMS driver can be a
subset of the underlying LogiCORE IP.
Features
For more information, see HDMI 1.4/2.0 Transmitter Subsystem Product Guide (PG235).
Features
○ SMPTE ST 424: 3G-SDI with data mapped by any ST 425-x mapping at 2.97 Gb/s and
2.97/1.001 Gb/s
○ SMPTE ST 2081-1: 6G-SDI with data mapped by any ST 2081-x mapping at 5.94 Gb/s and
5.94/1.001 Gb/s
○ SMPTE ST 2082-1: 12G-SDI with data mapped by any ST 2082-x mapping at 11.88 Gb/s
and 11.88/1.001 Gb/s
○ Dual link and quad link 6G-SDI and 12G-SDI are supported by instantiating two or four
UHD-SDI transmitter subsystems
For more information, see SMPTE UHD-SDI Transmitter Subsystem Product Guide (PG289).
Video Mixer
The AMD LogiCORE IP Video Mixer core provides the following features:
• Flexible video processing block for alpha blending and compositing multiple video and/or
graphics layers
• Support for up to sixteen layers, with an optional logo layer, using a combination of video
inputs from either frame buffer or streaming video cores through AXI4-Stream interfaces
• Programmable core through a comprehensive register interface to control frame size,
background color, layer position, and the AXI4-Lite interface
• Comprehensive set of interrupt status bits for processor monitoring
Features
For more information, see Video Mixer LogiCORE IP Product Guide (PG243).
Features
• AXI4 Compliant
• Streaming Video Format support for: RGB, RGBA, YUV 4:4:4, YUVA 4:4:4, YUV 4:2:2, YUV
4:2:0
• Memory Video Format support for: RGBX8, BGRX8, YUVX8, YUYV8, UYVY8, RGBA8,
BGRA8, YUVA8, RGBX10, YUVX10, Y_UV8, Y_UV8_420, RGB8, BGR8, YUV8, Y_UV10,
Y_UV10_420, Y8, Y10
• Provides programmable memory video format
• Supports progressive and interlaced video
• Supports 8 and 10-bits per color component on stream interface and memory interface
• Supports spatial resolutions from 64 x 64 up to 8192 x 4320
• Supports 4K60 video in all supported device families
For more information, see Video Frame Buffer Read and Video Frame Buffer Write LogiCORE IP
Product Guide (PG278).
PHY Controllers
Video PHY Controller
The AMD Video PHY Controller IP core is designed for enabling plug-and-play connectivity with
Video (DisplayPort and HDMI technology) MAC transmit or receive subsystems. The interface
between the video MAC and PHY layers is standardized to enable ease of use in accessing
shared transceiver resources. The AXI4-Lite interface is provided to enable dynamic accesses of
transceiver controls/status.
Features
○ Non-integer data recovery unit (NI-DRU) support for lower line rates. NI-DRU support is
for the HDMI protocol only
• Advanced clocking support
For more information, see Video PHY Controller LogiCORE IP Product Guide (PG230).
SMPTE UHD-SDI
The SDI family of standards from the SMPTE is widely used in professional broadcast video
equipment. The SMPTE UHD-SDI core supports SMPTE SDI data rates from SD-SDI through
12G-SDI. The core supports both transmit and receive.
Features
• SMPTE ST 259, SMPTE RP 165, SMPTE ST 292, SMPTE ST 372, SMPTE ST 424, SMPTE ST
2081-1, SMPTE ST 2082-1, SMPTE ST 352
For more information, see SMPTE UHD-SDI LogiCORE IP Product Guide (PG205).
Audio
Audio Formatter
The AMD LogiCORE IP Audio Formatter core is a soft AMD IP core that provides high-bandwidth
direct memory access between memory and AXI4-Stream target peripherals supporting audio
data.
Features
○ PCM to PCM
• Supported data formats for audio stream reads from memory (MM2S)
○ AES to AES
○ AES to PCM
○ PCM to PCM
When configured as an Audio Embedder, it can embed of up to 32 channels of AES3 audio data
that is transmitted over anAXI4-Stream Audio Interface on to an SDI stream.
Similarly, when configured as an Audio Extractor, it can extract up to 32 channels of audio data
from the incoming SDI stream, and output them in AES3 format on the AXI4-Stream Audio
Interface. In both the configurations, it supports multiple audio sample rates (32 KHz, 44.1 KHz,
and 48 KHz).
The UHD SDI Audio core is designed in accordance with SMPTE ST 272 for SD-SDI, SMPTE ST
299-1 for HD-SDI, and SMPTE ST 299-1 & 2 for 3G/6G/12G-SDI.
Features
For more information, see UHDSDI Audio LogiCORE IP Product Guide (PG309).
Features
• AXI4-Stream compliant
• Supports up to four I2S channels (up to eight audio channels)
• 16/24-bit data
• Supports master I2S mode
For more information, refer to the I2S Transmitter and Receiver LogiCORE IP Product Guide
(PG308).
Features
For more information, see Audio Clock Recovery Unit Product Guide (PG335).
Features
• Color bars
• Zone plate with adjustable sweep and speed
• Temporal and spatial ramps
• Moving box with selectable size and color over any available test pattern
• RGB, YUV 444, YUV 422, YUV 420 AXI4-Stream data interfaces
• AXI4-Lite control interface
• Supports 8, 10, 12, and 16-bits per color component input and output
• Supports spatial resolutions from 64 x 64 up to 10328 x 7760
○ Supports 4K60 video in all supported device families
For more information, see Video Test Pattern Generator LogiCORE IP Product Guide (PG103).
Features
For more information, see Video Timing Controller LogiCORE IP Product Guide (PG016).
Chapter 4
Video Pipelines
This chapter explains the example video capture, the display and processing pipelines designs
that can be generated using the AMD Multimedia IPs, and how to build and generate bootable
images using the PetaLinux tool.
See chapter 5 in Zynq UltraScale+ MPSoC ZCU106 Video Codec Unit Targeted Reference Design User
Guide (UG1250) for examples of capture, display, and video pipelines designs that can be
generated on the ZCU106 board using the AMD Multimedia IPs.
PetaLinux is an Embedded Linux System Development Kit targeting AMD FPGA-based system-
on-chip (SoC) designs. Refer to the PetaLinux Tools Documentation: Reference Guide (UG1144) on
how to configure and build bootable images for the AMD Vivado™ generated designs using the
PetaLinux toolchain.
The following section explains the device tree (DT) and how the DT is generated using PetaLinux.
In addition, the section describes all the nodes and the properties that you need to manually
update, to generate the bootable Linux images for video pipelines.
The DT is a data structure used during Linux configuration to describe the available hardware
components of the embedded platform, including multimedia IP.
Embedded Linux uses the DT for platform identification, run-time configuration like bootargs,
and the device node population.
Generally for the SoCs, there are static dts/dtsi files, but when it comes to the FPGA there can be
many complicated designs, in which the Programmable Logic (PL) IPs varies or have different
configurations.
Device-Tree Generator (DTG), a part of the AMD PetaLinux toolset, dynamically generates device
tree file for FPGA components.
Once the device-tree is generated for a hardware design using AMD PetaLinux tool, the
components folder contains statically configured DT for the board PS files and DT files generated
for FPGA components.
• system-top.dts: Contains the memory information, early console and the boot arguments.
• zynqmp.dtsi: Contains all the PS peripheral information and also the CPU information.
• zynqmp-clk-ccf.dtsi: Contains all the clock information for the peripheral IPs.
• board.dtsi: Based on the board, DTG generates this file under the same output directory.
AR# 75895 describes the DT changes that are not generated by the DTG tool, meaningsystem-
user.dtsi needs to be updated manually. The system-user.dtsi file is a part of the
PetaLinux BSP in the path <petalinux-bsp>/project-spec/meta-user/recipes-bsp/
device-tree/files/system-user.dtsi.
Chapter 5
The MCU firmware handles the rate control process. No signals (either in Software API or FPGA
signals) are triggered during the rate control process.
VBR
When using VBR, the encoder buffer is allowed to underflow (be empty), and the maximum
bitrate, which is the transmission bitrate used for the buffering model, can be higher than the
target bitrate. So VBR relaxes the buffering constraints and allows to decrease the bitrate for
simple content and can improve quality by allowing more bits on complex frames. VBR mode
constrains the bitrate with a specified maximum while keeping it on the target bit rate where
possible. Similar to CBR, it avoids buffer underflow by increasing the QP. However, the target bit
rate can exceed up to the maximum bit rate. Therefore, the QP has to be increased by a smaller
factor. A buffer overflow results in an unchanged QP and a lower bit rate.
CBR
The goal of CBR is to reach the target bitrate on average (at the level of one or a few GOPs) and
to comply with the Hypothetical Reference Decoder (HRD) model, - avoiding decoder buffer
overflows and underflows. In CBR mode, a single bitrate value defines both the target stream
bitrate and the output/transmission (leaky bucket) bitrate. The reference decoder buffer
parameters are Coded Picture Buffer (CPBSize) and Initial Delay. The CBR rate control mode tries
to keep the bit rate constant whilst avoiding buffer overflows and underflows. If a buffer
underflow happens, the QP is increased (up to MaxQP) to lower the size in bits of the next
frames. If a buffer overflow occurs, the QP is decreased (down to MinQP) to increase the size in
bits.
Low Latency
The frame is divided into multiple slices; the VCU encoder output, and decoder input are
processed in slice mode. The VCU Encoder input, and Decoder output still work in frame mode.
• LOW_DELAY_P: GopPattern with a single I-picture at the beginning followed with P-pictures
only. Each P-picture uses the picture just before as reference. IPPPP….
• LOW_DELAY_B: GopPattern with a single I-picture at the beginning followed by B-pictures
only. Each B-picture uses the picture just before it as first reference; the second reference
depends on the Gop.Length parameter. IBBB…
• PYRAMIDAL_GOP: Advanced GOP pattern with hierarchical B-frame. The size of the
hierarchy depends on the Gop.NumB parameter.
• ADAPTIVE_GOP: The encoder adapts the number of B-frames used in the GOP pattern based
on heuristics on the video content.
• DEFAULT_GOP_B: IBBBBBB... (P frames replaced with B).
• PYRAMIDAL_GOP_B: Advanced GOP pattern with hierarchical B frame. Here, P frames are
replaced with B.
Parameter List
The following table shows the list of VCU parameters and their description.
HDR10 Support
HDR10 is a high dynamic range standard for 10-bit video streams, allowing viewers better
brightness and color contrast resulting in recreation of more realistic video frames. This HDR10
information is static metadata which is passed before the video frames to tell the display that
incoming input is going to be HDR10. AMD offers this solution as a better alternative to the
HDMI-only connectivity solution.
Refer to H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252) for how exactly
HDR10 metadata is parsed by the AMD VCU software stack.
For more information, see the PL DDR HDR10 HDMI Video Capture and Display wiki page.
Dynamic Features
The VCU encoder supports the following dynamic features.
Dynamic Bitrate
The Dynamic bitrate feature is useful in the scenario where you have limited network bandwidth.
In network congestion scenarios, having a static bitrate can cause frame drops because the
network connection is unable to keep up and drop frames to improve the stability of your stream.
This feature allows you to adjust the encoder bitrate at runtime.
Dynamic bitrate is useful for Video on Demand and Video Conferencing use cases.
For more information, see Dynamic Bitrate in H.264/H.265 Video Codec Unit LogiCORE IP Product
Guide (PG252).
Dynamic GOP
The Dynamic group of pictures (GOP) feature allows you to resize the GOP length, GOP
structure, number of B-frames, or force IDR pictures at run time. This feature helps improve
visual quality if video complexity varies throughout a single stream.
Dynamic GOP is beneficial for video recording and streaming use cases.
For more information, see Dynamic GOP in H.264/H.265 Video Codec Unit LogiCORE IP Product
Guide (PG252).
For more information on Dynamic insertion of B-Frames, see Dynamic GOP in H.264/H.265
Video Codec Unit LogiCORE IP Product Guide (PG252).
For more information on dynamic insertion of key-frames, see the Dynamic GOP section in
H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
For more information, see Long Term Reference Picture in H.264/H.265 Video Codec Unit
LogiCORE IP Product Guide (PG252).
Dynamic ROI
Dynamic ROI allows to specify one or more region(s) of interest (ROI) encoding for a particular
portion of a frame. This feature allows you to specify parameters which affects visual quality of
identified regions. These parameters can now be updated dynamically.
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
For more information, see Dynamic Scene Change in H.264/H.265 Video Codec Unit LogiCORE IP
Product Guide (PG252).
For more information, see Dynamic Resolution Change in H.264/H.265 Video Codec Unit
LogiCORE IP Product Guide (PG252).
GDR Support
Gradual decoder refresh (GDR) helps the decoder refresh or start decoding without a full Intra-
Frame. In GDR mode, the encoder inserts intra-encoded MB strips (horizontal or vertical) into
inter frames. This additional data allows for decoders to refresh, enabling the decode of
compressed streams even after streams have already started or are recovering from streams due
to unreliable network conditions.
For more information, see GDR Intra Refresh in H.264/H.265 Video Codec Unit LogiCORE IP
Product Guide (PG252).
XAVC
This proprietary new AVC profile from SONY is mainly used for video recording.
The VCU encoder can produce xAVC Intra or xAVC Long GOP bitstreams.
To use this profile, you need to provide the information required to create the XAVC bitstream,
as mentioned in the specifications. For instance, you must provide at least the resolution, the
framerate, and the clock ratio.
For more information, see XAVC in H.264/H.265 Video Codec Unit LogiCORE IP Product Guide
(PG252).
HLG Support
HLG is a HDR standard that provides backwards compatibility with non-HDR monitors while
achieving a wider color gamut for HDR monitors. Because HLG EOTF is similar to standard
gamma functions, HLG streams look correct when played on SDR monitors. Meanwhile PQ EOTF
streams area displayed properly and usually look washed out when played on an SDR monitor
because the gamma and PQ curves are different. Unlike HDR10, HLG does not require any extra
metadata with the video data. HLG streams consist of the HLG transfer characteristics/EOTF and
BT2020 color primaries and BT2020 color matrix.
For information on how HLG is processed by the software stack, see H.264/H.265 Video Codec
Unit LogiCORE IP Product Guide (PG252).
PCIe
This design demonstrates the file-based VCU transcode, encode, and decode capabilities over
PCIe® in Zynq UltraScale+ MPSoC EV devices.
This design supports AVC/HEVC video codec, the NV12, NV16, XV15, and XV20 video formats,
and 4k and 1080p resolutions for encode/decode and transcode use cases.
Encode
The host application reads an input .yuv file from the host machine and sends it to the ZCU106
board, connected as an endpoint device to the PCIe slot of the host machine. The data received
from the host is encoded with the provided encoder type. mpegtsmux is run using VCU hardware
and it writes the encoded data back to the host machine in a .ts file.
Decode
The host application reads an input .mp4 or .ts file from the host machine and sends it to the
ZCU106 board, connected as an endpoint device to the PCIe slot of host machine. The data
received from the host is decoded using VCU hardware. It writes the decoded data back to the
host machine in a .yuv file.
Transcode
The host application reads an input .mp4 or .ts file from the host machine and sends it to the
ZCU106 board, connected as an endpoint device to the PCIe slot of the host machine. The data
received from the host is decoded and again encoded with the provided encoder type and
mpegtsmux using VCU hardware. Transcoded data is written back to the host machine in a .ts
file.
Chapter 6
For the PetaLinux installation steps, see PetaLinux Tools Documentation: Reference Guide
(UG1144).
gst-inspect-1.0 –-version
The following table shows that list of gst-omx video decoders, included in Gstreamer-1.0:
Decoder Description
omxh265dec OpenMAX IL H.265 Video Decoder
omxh264dec OpenMAX IL H.264 Video Decoder
The following table shows that list of gst-omx video encoders, included in Gstreamer-1.0:
Encoder Description
Omxh265enc OpenMAX IL H.265/HEVC Video Encoder
Omxh264enc OpenMAX IL H.264/AVC Video Encoder
If a format is not supported between two elements, the cap (capability) negotiation fails and
Gstreamer returns an error. In that case, use a video conversion element to perform format
conversion from one format to another.
For more details, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
GStreamer Plugins
GStreamer is a library for constructing graphs of media-handling components. The applications it
supports range from simple playback and audio/video streaming to complex audio (mixing) and
video processing.
GStreamer uses a plug-in architecture which makes the most of GStreamer functionality
implemented as shared libraries. The GStreamer base functionality contains functions for
registering and loading plug-ins and for providing the fundamentals of all classes in the form of
base classes. Plug-in libraries get dynamically loaded to support a wide spectrum of codecs,
container formats, and input/output drivers.
The following table describes the plug-ins used in the GStreamer interface library.
Plug-in Description
v4l2src Use v4l2src to capture video from V4L2 devices like Xilinx HDMI-RX and TPG.
Plug-in Description
kmssink The kmssink is a simple video sink that renders raw video frames directly in a plane of a DRM device.
Example pipeline:
omxh26xdec The omxh26xdec is a hardware-accelerated video decoder that decodes encoded video frames.
Example pipeline:
This pipeline shows a .mp4 multiplexed file where the encoded format is h26x encoded video.
Note: Use the omxh264dec for H264 decoding, and the omxh265dec for H265 decoding. h264parse
parses a H.264 encoded stream. h265parse parses a H.265 encoded stream.
omxh26xenc The omxh26xenc is a hardware-accelerated video encoder that encodes raw video frames.
Example pipeline:
This pipeline shows the video captured from a V4L2 device that delivers raw data. The data is encoded
to the h26x encoded video type, and stored to a file.
Note: Use the omxh264enc for H264 encoding, and the omxh265enc for H265 encoding.
alsasrc Use the alsasrc plug-in to capture audio from audio devices such as AMD HDMI-RX.
Example pipeline:
This pipeline shows that the audio captured from an ALSA source, plays on an ALSA sink.
alsasink The alsasink is a simple audio sink that plays raw audio frames.
Example pipeline:
This pipeline shows that the audio captured from the ALSA source, plays on an ALSA sink.
Plug-in Description
faad1 Decoder faad is an audio decoder that decodes encoded audio frames.
Example pipeline:
This pipeline shows a .ts multiplexed file where the encoded format is aac encoded audio. The data is
decoded and played on an ALSA sink device.
faac1 The faac is an audio encoder that encodes raw audio frames.
Example pipeline:
This pipeline shows the audio captured from an ALSA device that delivers raw data. The data is
encoded to aac format and stored to a file.
xilinxscd The xilinxscd is hardware-accelerated IP that enables detection of scene change in a video stream.
This plugin generates upstream events whenever there is scene change in an incoming video stream
so the encoder can insert an Intra frame to improve video quality.
Example pipeline:
This pipeline shows the video captured from a V4L2 device that delivers raw data. This raw data is
passed through the xilinxscd plugin which analyzes the stream in runtime and provides an event to
the encoder that determines whether or not a scene change is detected in a video stream. The
encoder uses this information to insert an I-frame in an encoded bit-stream.
Note: Use the omxh264enc for H264 encoding, and the omxh265enc for H265 encoding.
appsrc The appsrc element can be used by applications to insert data into a GStreamer pipeline. Unlike most
GStreamer elements, appsrc provides external API functions.
appsink The appsink is a sink plugin that supports many different methods, enabling the application to
manage the GStreamer data in a pipeline. Unlike most GStreamer elements, appsink provides external
API functions.
queue Queues data until one of the limits specified by the max-size-buffers, max-size-bytes, or max-
size-time properties has been reached
Notes:
1. The faac/faad plugin is not actively maintained in the community. For higher audio quality and less noise, Opus Codec
(opusenc/opusdec) is an alternative.
For more details, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
Use the following pipeline to play a raw video stream captured from an input source device:
In the preceding example, the live source device link is present under the /dev directory. Video
stream resolution is 4kp and 60fps. Video stream color format is NV12.
Use the following pipeline to play raw video and audio captured from the input source device:
The preceding example shows that video and audio can be captured using a single gstreamer
pipeline. Audio capture device is hw:2,1 and playback device is hw:2,0. For video stream, the
pipeline remains the same as the previous example.
Decode/Encode Example
Video Decode Using Gstreamer-1.0
The following example shows how to decode an input file with H.265 (HEVC) video format using
Gstreamer-1.0.
The file is present in the SD Card, the format is mp4, and the encoded video format is H265.
Note: To decode a file in H264(AVC) video format, replace the h265 elements with h264.
The following example shows how to encode a captured stream of input source device with
H265 (HEVC) video format using Gstreamer-1.0.
The input stream is a live source (for example, HDMI-Rx or MIPI camera) and is present under
the /dev directory. The encoded video format is H265, color format is NV12, and resolution is
4kp at 60fps.
Note: To encode a video stream in H264 (AVC) video format, replace the h265 elements with h264.
CSI Cameras
○ 1920x1080
○ 1280x720
Appsink allows the application to get access to the raw buffer from the GStreamer pipeline.
Appsink is a sink plugin that supports various methods for helping the application get a handle on
the GStreamer data in a pipeline. Unlike most GStreamer elements, Appsink provides external
API functions.
Video Pipelines
A video pipeline consists of three elements:
File Playback
Use the following static pipelines to perform local file playback using Gstreamer-1.0.
In this example, the file is present in the SD card, the container format is TS, and the
encoded video format is H265.
○ To play a TS file with both the video and audio streams:
gst-launch-1.0 filesrc location=/media/card/abc.ts ! \
tsdemux name=demux ! queue ! h265parse ! omxh265dec ! queue max-size-
bytes=0 ! \
kmssink bus-id=a0070000.v_mix sync=true demux. ! queue ! faad !
audioconvert ! \
audio/x-raw, rate=48000, channels=2, format=S24_32LE ! alsasink
device=hw:2,0
In this example, the file is present in the SD Card, the container format is TS, and the
encoded video format is H265. The encoded audio stream is AAC, with a sample rate of
48000 (48kHz) and audio format of S24_32LE. Audio playback device is hw:2,0.
• Dynamic pipeline for file playback using Gstreamer-1.0
GStreamer also provides uridecodebin, a basic media-playback plugin that automatically takes
care of most playback details. The following example shows how to play any file if the
necessary demuxing and decoding plugins are installed.
○ To play a TS file containing only the video stream:
gst-launch-1.0 uridecodebinuri=file:///media/card/abc.ts ! \
queue ! kmssink bus-id=a0070000.v_mix
• To play a TS file containing both the video and the audio streams:
gst-launch-1.0 uridecodebinuri="file:///media/card/test.ts" name=decode !
\
queue max-size-bytes=0 ! kmssink bus-id="a0070000.v_mix" decode. ! \
audioconvert ! audioresample ! audio/x-raw, rate=48000, channels=2,
format=S24_32LE ! \
queue ! alsasink device="hw:2,0"
Recording
Use the following pipelines to record video from an input source to the required file format.
This example shows that the live source device link is present under the /dev directory.
Encoded video format is H265 and color format is NV12. Video stream resolution is 4k and
60fps. The record file test.ts is present in SATA drive in TS file format.
Note:
1. For input source with 1080p@60 resolution, replace width and height with 1920 and 1080
respectively. Frame rate can also be changed to 30.
2. To encode input stream into H264 video format, replace h265 in the above pipeline with h264.
In this example, video and audio can be encoded using a single gstreamer pipeline to record
into a single file. Encoded audio stream is AAC. Audio capture device is hw:2,1 and record file
test.ts is present in SATA. For video encoding, the pipeline remains the same as the
previous example.
This example shows that video streamed out from one device (server) to another device
(client) on the same network. Encoded video format is H265 and color format is NV12.Video
stream resolution is 4kp with 60fps, and bitrate is 60 Mbps. Server sends the video stream to
the client host device with IP Address 192.168.25.89 on port 5004.
Note: Replace host IP address as per IP configuration of the client device.
This example shows that video and audio can be streamed out using a single gstreamer
pipeline. Encoded audio is in Opus. Audio and video is streamed out simultaneously. For video
stream-out, the pipeline remains the same as in the previous example.
Streaming In
• Static pipelines for stream-in using Gstreamer-1.0
○ For video stream in only:
The following pipeline can be used to receive the video stream from another device
(server), to the host device on the same network.
gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 \
caps="application/x-rtp, clock-rate=90000" ! \
rtpjitterbuffer latency=1000 ! rtpmp2tdepay ! tsparse ! \
video/mpegts ! tsdemux name=demux ! queue ! h265parse ! \
video/x-h265, profile=main, alignment=au ! \
omxh265dec internal-entropy-buffers=5 low-latency=0 ! \
queue max-size-bytes=0 ! kmssink bus-id="a0070000.v_mix"
In this example, the encoded video format is H265. Stream in at client device occurs on
port 5004 over the UDP protocol.
○ For video and audio stream in:
Use the following pipeline to receive video and audio stream from another device (server),
to the host device on the same network.
gst-launch-1.0 udpsrc port=5004 buffer-size=60000000 \
caps="application/x-rtp, clock-rate=90000" ! rtpjitterbuffer \
latency=1000 ! rtpmp2tdepay ! tsparse ! video/mpegts \
! tsdemux name=demux demux. ! queue ! h265parse ! video/x-h265, \
profile=main, alignment=au ! omxh265dec internal-entropy-buffers=5 \
low-latency=0 ! queue max-size-bytes=0 \
! kmssink bus-id="a0070000.v_mix" demux. ! queue \
! opusparse ! opusdec ! audioconvert ! audioresample \
! audio/x-raw, rate=48000, channels=2, \
format=S24_32LE ! alsasink device="hw:2,0"
In this example, the encoded video format is H265. Stream in at client device occurs on
port 5004 over the UDP protocol. Audio playback device is hw:2,0. For video stream-in,
the pipeline remains the same as in the previous example.
• Dynamic pipelines for stream-in using Gstreamer-1.0
○ For video stream-in:
gst-launch-1.0 uridecodebinuri=udp://192.168.25.89:5004 ! kmssink bus-
id=a0070000.v_mix
Low-Latency
The frame is divided into multiple slices; the VCU encoder output and decoder input are
processed in slice mode. The VCU Encoder input and Decoder output still works in frame mode.
The VCU encoder generates a slice done interrupt at every end of the slice and outputs stream
buffer for slice, and is available immediately for next element processing. Therefore, with multiple
slices it is possible to reduce VCU processing latency from one frame to one-frame/num-slices. In
the low-latency mode, a maximum of four streams for the encoder and two streams for the
decoder can be run.
• Stream Out:
Use the following pipeline to stream-out (capture → encode → stream-out) NV12 video using
a low-latency GStreamer pipeline. This pipeline demonstrates how to stream-out low-latency-
encoded video from one device (server) to another device (client) on the same network. The
pipeline is encoded with the NV12 color format, and the H265 video format. Video stream
resolution is 4kp with 60fps and bitrate is 25 Mbps. It sends the video stream to the client
host device with an IP Address 192.168.25.89 on port 5004.
gst-launch-1.0 -v v4l2src device=/dev/video0 io-mode=4 !
video/x-raw,format=NV12,width=3840,height=2160,framerate=60/1 !
omxh265enc num-slices=8 periodicity-idr=240 cpb-size=500
gdr-mode=horizontal initial-delay=250 control-rate=low-latency
prefetch-buffer=true target-bitrate=25000 gop-mode=low-delay-p !
video/x-h265, alignment=nal ! rtph265pay !
udpsink buffer-size=60000000 host=192.168.25.89 port=5004 async=false
max-lateness=-1 qos-dscp=60 max-bitrate=120000000 -v
IMPORTANT! Replace the host IP address with the IP of the client device.
• Stream In:
Use the following pipeline to stream-in (stream-in → decode → display) NV12 video using a
low-latency GStreamer pipeline. This pipeline demonstrates how low-latency stream-in data is
decoded and displayed on the client device. The pipeline states that the encoded video format
is H265, and streams-in on the client device on port 5004 - over UDP protocol.
gst-launch-1.0 udpsrc port=5004 buffer-size=60000000
caps="application/x-rtp, media=video, clock-rate=90000,
payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7
! rtph265depay ! h265parse ! video/x-h265, alignment=nal
! omxh265dec low-latency=1 internal-entropy-buffers=5
! video/x-raw ! queue max-size-bytes=0 ! fpsdisplaysink
name=fpssink text-overlay=false
'video-sink=kmssink bus-id=a0070000.v_mix hold-extra-sample=1
show-preroll-frame=false sync=true ' sync=true -v
AMD Low-Latency
AMD Low-Latency: In the low-latency mode, the VCU encoder and decoder work at subframe or
slice level boundary but other components at the input of encoder and output of decoder namely
capture DMA and display DMA still work at frame level boundary. This means that the encoder
can read input data only when capture has completed writing the full frame.
In the AMD low-latency mode, capture and display also work at subframe level and therefore
reduce the pipeline latency significantly. This is made possible by making the producer (Capture
DMA) and the consumer (VCU encoder) work on the same input buffer concurrently, but
maintaining the synchronization between the two such that consumer read request is unblocked
only when the producer is done writing the data required for that read request.
• Stream Out
Use the following pipeline to stream-out (capture → encode → stream-out) NV12 video using
AMD ultra low-latency GStreamer pipeline. This pipeline demonstrates how to stream-out
AMD ultra low-latency encoded video from one device (server) to another device (client) on
the same network. The pipeline is encoded with NV12 color format, and H265 video format.
Video stream resolution is 4kp with 60fps, and bitrate is 25 Mbps. It sends video stream to
client host device with an IP Address 192.168.25.89 on port 5004.
gst-launch-1.0 -v v4l2src device=/dev/video0
io-mode=4 ! video/x-raw\(memory:XLNXLL\),
format=NV12,width=3840,height=2160,framerate=60/1
! omxh265enc num-slices=8 periodicity-idr=240
cpb-size=500 gdr-mode=horizontal initial-delay=250
control-rate=low-latency prefetch-buffer=true
target-bitrate=25000 gop-mode=low-delay-p
! video/x-h265, alignment=nal ! rtph265pay !
udpsink buffer-size=60000000 host=192.168.25.89
port=5004 async=false max-lateness=-1
qos-dscp=60 max-bitrate=120000000 -v
IMPORTANT! Replace the host IP address with the IP of the client device.
• Stream In
Use the following pipeline to stream-in (stream-in → decode → display) NV12 video using
AMD ultra low-latency GStreamer pipeline. The pipeline demonstrates how AMD ultra low-
latency stream-in data is decoded and displayed on the client device. The pipeline states that
the encoded video format is H265 and streams-in on the client device on port 5004 - over
UDP protocol.
gst-launch-1.0 udpsrc port=5004 buffer-size=60000000
caps="application/x-rtp, media=video, clock-rate=90000,
payload=96, encoding-name=H265" ! rtpjitterbuffer latency=7
! rtph265depay ! h265parse ! video/x-h265, alignment=nal
! omxh265dec low-latency=1 internal-entropy-buffers=5
! video/x-raw\(memory:XLNXLL\) ! queue max-size-bytes=0
! fpsdisplaysink name=fpssink text-overlay=false
'video-sink=kmssink bus-id=a0070000.v_mix hold-extra-sample=1
show-preroll-frame=false sync=true ' sync=true -v
For more details, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
Transcoding
• Transcode from H.265 to H.264 using Gstreamer-1.0:
Use the following pipeline to convert a H.265 based input container format file into H.264
format.
gst-launch-1.0 filesrc location="/run/media/sda/input-h265-file.mp4" ! \
qtdemux name=demux demux.video_0 ! h265parse ! video/x-h265,
alignment=au ! \
omxh265dec low-latency=0 ! omxh264enc ! video/x-h264, alignment=au ! \
filesink location="/run/media/sda/output.h264"
In this example, the file is present in the SATA drive, the H265 based input file format is MP4,
and the file is transcoded into a H264 video format file.
• Transcode from H.264 to H.265 using Gstreamer-1.0:
Use the following pipeline to convert a H.264 based input container format file into H.265
format.
gst-launch-1.0 filesrc location="input-h264-file.mp4" ! \
qtdemux name=demux demux.video_0 ! h264parse ! video/x-h264,
alignment=au ! \
omxh264dec low-latency=0 ! omxh265enc ! video/x-h265, alignment=au ! \
filesink location="output.h265"
In this example, the file is present in the SATA drive, the H264 based input file format is MP4,
and is transcoded into a H265 video format file.
Multi-Stream
The following pipelines show that multiple streams can be played simultaneously using
Gstreamer-1.0.
In this example, two instances of 4k resolution are streamed out. Maximum bit rate is 30 Mb/s
and 30 fps. Encoded video format is H265.
Note:
1. Input source devices can be as per your choice and availability. In the preceding example, two input
devices are used, video0 and video1.
2. To run multiple instances, it is recommended to execute pipelines in the background by adding & at
the end of the pipeline.
3. To stream out both video streams, use two different ports for the same host device (that is,
port=5004, and port=5008) to avoid incorrect data stream.
In this example, you can stream out four instances of 1080p resolution. Maximum bit rate can
be 15 Mb/s and 60 fps. Encoded video format is H265.
Note:
1. Input source devices can be as per your choice and availability. In the preceding example, four input
devices are used, video0, video1, video2, and video3.
2. To run multiple instances, it is recommended to execute pipelines in background by adding & at the
end of pipeline.
3. To stream out all four video streams, four different ports are used for same host device (that is,
5004, 5008, 5012, and 5016) to avoid incorrect data stream.
DCI 4K
DCI (Digital Cinema Initiatives) is the standards body formed by motion picture studios to
establish architectures, and standards for the industry. DCI defines the Digital Cinema 4K video
(4096 x 2160) format. DCI is only supported with a speed grade of -2 and above.
Use the following pipeline to play a raw video stream captured from the input source device.
In this example, the live source device link is present under the /dev directory. The video stream
resolution is set at 4k for DCI with 60fps.
In this example, the file is present on the SD card, and the encoded video format is H265.
IMPORTANT! The Interlace pipeline supports only the HVEC/H265 mode of VCU.
Here, v4l2video1convert is a multi scalar GStreamer plugin that is created based on the video1
node.
In the preceding example, the live source device link is present under the /dev directory.
Resolution is 4k with 60fps and bitrate is 60 Mb/s. AMD SCD enables detecting a scene change
in a video stream. The encoder can insert an I-frame to improve video quality.
For 1080p60 resolution, replace the width and height with 1920 and 1080 respectively.
HDR10 Pipeline
The HDR10 pipeline supports the reception and insertion of HDR10 static metadata. This
HDR10 metadata contains critical information needed to support HDR and throughout the
pipeline - from the source to the sink.
Run the following gst-launch-1.0 command to display the XV20 HDR10 video on HDMI-Tx using
the GStreamer pipeline (capture (HDR10) → encode → decode → display(HDR10)):
Run the following gst-launch-1.0 command to record XV20 HDR10 video using the
GStreamer pipeline:
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252) and
PL DDR HDR10 HDMI Video Capture and Display.
Videotestsrc
The videotestsrc element is an open source up-streamed GStreamer plugin. It is used to
produce test video data in a wide variety of formats. You can use the pattern property to
control the video test data that is produced.
By default, videotestsrc generates data indefinitely, but if the num-buffers property is non-
zero, it generates a fixed number of video frames and sends EOS.
gst-launch-1.0 videotestsrc \
! video/x-raw,format=NV12,width=1920, \
height=1080,framerate=60/1 \
! videoconvert \
! omxh265enc prefetch-buffer=true \
! fakesink
Note: videotestsrc does not support zero copy. The primary aim of this pipeline is to validate encoder
functionality.
In the preceding example, the source device generates a video pattern with 1080p resolution and
60fps. The video stream color format is NV12.
gst-launch-1.0 videotestsrc \
! video/x-raw,format=NV12,width=1920,\
height=1080,framerate=60/1 \
! videoconvert \
! fpsdisplaysink \
video-sink="kmssink bus-id=a00c0000.v_mix" \
text-overlay=false sync=false -v
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
In the preceding example, the live source device link is present under the /dev directory. The
video stream resolution is 4kp and 60fps. The video stream color format is NV12.
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252) and
HDMI Video Capture and Display.
In the preceding example, the live source device link is present under the /dev directory. The
video stream resolution is UHD and framerate is 59.94fps. The video stream color format is
XV20.
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
HLG Pipeline
Two HLG modes can be enabled for the VCU:
• Backwards compatible (SDR EOTF + HLG SEI): This mode uses the BT2020 value in the
SPS/VUI parameters instead of the HLG transfer characteristics. The VCU encoder inserts
alternative transfer characteristics (ATC) SEI with the HLG value. The following is a sample
serial pipeline:
• HLG only (HLG EOTF): This mode directly uses the HLG value in the SPS/VUI parameters. The
following is a sample serial pipeline:
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252).
3. On the host machine, open the following URL in the VLC player: rtsp://
192.168.2.222:50000/test
The following example uses the video test source. The video stream resolution is 1080p and
30fps, and the video stream color format is GRAY8.
The following example uses the video test source. The video stream resolution is 1080p and
30fps, and the video stream color format is GRAY10_LE32.
For more information, see H.264/H.265 Video Codec Unit LogiCORE IP Product Guide (PG252) and
HDMI Video Capture and Display.
Run the following gst-launch-1.0 command to display raw YUV 4:4:4 8-bit video over HDMI
using the GStreamer pipeline.
Run the following gst-launch-1.0 command to display raw YUV 4:4:4 10-bit video over HDMI
using the GStreamer pipeline.
Run the following gst-launch-1.0 command to capture, encode, and stream-out YUV 4:4:4 8-bit
video using the GStreamer pipeline.
Run the following gst-launch-1.0 command to capture, encode, and stream-out YUV 4:4:4 10-bit
video using the GStreamer pipeline.
Run the following gst-launch-1.0 command to stream-in, decode, and play YUV 4:4:4 8-bit/10-
bit video over HDMI using the GStreamer pipeline
Note: It is required to use ZCU106 board at the client (receiving) side as this is a customized feature
offered by Zynq UltraScale+ MPSoC VCU Hardware codec.
For more information, refer Zynq UltraScale+ MPSoC VCU TRD 2023.1- YUV444 Video Capture
and Display.
Audio Pipelines
List of audio input devices:
In the followng example, the video source device generates video at 4kp resolution and 60fps.
The video stream color format is NV12. The audio source device generates audio at 48KHz with
S24_32LE format and dual channel from input device ID. The audio capture device ID is hw:2,1.
In the following example, the audio renderer renders the audio at 48KHz with S24_32LE format
and dual channel at output device ID. The audio playback device ID is hw:2,0.
In the following example, the video source device generates video at 4kp resolution and 60fps.
The video stream color format is XV20. The audio source device generates audio at 48KHz with
S24_32LE format and 8 channels from input device ID. The audio capture device ID is hw:1,1,
and the audio playback device ID is hw:1,0.
For more information, see AMD Low Latency PL DDR HLG SDI 8-ch Audio+Video Capture and
Display.
The core SDK consists of several hardware accelerator plugins that use various accelerators such
as multiscaler (for resize and color space conversion), and deep learning processing unit (DPU) for
machine learning. By performing all the compute heavy operations in dedicated accelerators,
VVAS can achieve highest performance for video analytics, transcoding, and several other
application areas.
Features
Advantages
• Application developers can build seamless streaming pipelines for AI-based video and image
analytics, complex Adaptive Bitrate Transcoding pipelines, and several other solutions using
VVAS, without having any understanding about low level environment complexities.
• VVAS provides the flexibility for rapid prototyping to full production level solutions by
significantly reducing the time to market for the solutions on AMD platforms.
For more information on VVAS and its applications, see the following:
Chapter 7
Debug
Use the following utilities to debug media issues.
Run the following command to check the link up status and formats set for each of the source
and sink pads.
If the capture device is connected, then the preceding command generates the following media
graph:
Device topology
- entity 1: vcapaxis_broad_out1hdmi_input_a (1 pad, 1 link)
type Node subtype V4L flags 0
device node name /dev/video0
pad0: Sink
<- "amba_pl@0:axis_broadcasterhdmi_":1 [ENABLED]
pad6: Source
[fmt:VYYUYY8_1X24/3840x2160 field:none]
-> "vcapaxis_broad_out1hdmi_input_a":0 [ENABLED]
pad7: Source
[fmt:VYYUYY8_1X24/3840x2160 field:none]
-> "vcapaxis_broad_out1hdmi_input_a":0 [ENABLED]
If a source is not connected to the HDMI-Rx port, then the media-ctl utility generates the
following media graph:
Device topology
- entity 1: vcapaxis_broad_out1hdmi_input_a (1 pad, 1 link)
type Node subtype V4L flags 0
device node name /dev/video0
pad0: Sink
<- "amba_pl@0:axis_broadcasterhdmi_":1 [ENABLED]
Note: Media Graph and Entity Name can vary as per design. For exact media graph of specific design, refer
to the relevant design wiki pages of the desired release.
• List all display capabilities: CRTCs, encoders & connectors (DP, HDMI, SDI ...), planes, modes...
• Perform basic tests: display a test pattern, display 2 layers, perform a vsync test
• Specify the video mode: resolution and refresh rate
Running the modetest command with the -D option and passing the bus_id provides HDMI
connector status and maximum supported resolutions, frame rate, and supporting plane formats.
The following is the example HDMI-Tx command.
$ modetest -D a0070000.v_mix
Encoders:
id crtc type possible crtcs possible clones
43 42 TMDS 0x00000001 0x00000000
Connectors:
id encoder status name size (mm) modes
encoders
44 43 connected HDMI-A-1 700x390 49 43
modes:
name refresh (Hz) hdisp hss hse htot vdisp vss vse vtot)
3840x2160 60.00 3840 3888 3920 4000 2160 2163 2168 2222 533250 flags:
phsync, nvsync; type:
preferred, driver
3840x2160 60.00 3840 4016 4104 4400 2160 2168 2178 2250 594000 flags:
phsync, pvsync; type: driver
3840x2160 59.94 3840 4016 4104 4400 2160 2168 2178 2250 593407 flags:
phsync, pvsync; type: driver
3840x2160 50.00 3840 4896 4984 5280 2160 2168 2178 2250 594000 flags:
phsync, pvsync; type: driver
3840x2160 30.00 3840 4016 4104 4400 2160 2168 2178 2250 297000 flags:
phsync, pvsync; type: driver
3840x2160 30.00 3840 4016 4104 4400 2160 2168 2178 2250 297000 flags:
phsync, pvsync; type: driver
The preceding command also shows information about the number of planes and formats, and
the DRM properties of those particular planes.
Planes:
id crtc fb CRTC x,y x,y gamma size possible
crtcs
41 0 0 0,0 0,0 0 0x00000001
formats: BG24
props:
8 type:
flags: immutable enum
enums: Overlay=0 Primary=1 Cursor=2
value: 1
17 FB_ID:
flags: object
value: 0
18 IN_FENCE_FD:
flags: signed range
values: -1 2147483647
value: -1
20 CRTC_ID:
flags: object
value: 0
13 CRTC_X:
flags: signed range
values: -2147483648 2147483647
value: 0
14 CRTC_Y:
flags: signed range
values: -2147483648 2147483647
value: 0
15 CRTC_W:
flags: range
values: 0 2147483647
value: 3840
16 CRTC_H:
flags: range
values: 0 2147483647
value: 2160
9 SRC_X:
flags: range
values: 0 4294967295
value: 0
10 SRC_Y:
flags: range
values: 0 4294967295
value: 0
11 SRC_W:
flags: range
values: 0 4294967295
value: 251658240
12 SRC_H:
flags: range
values: 0 4294967295
value: 141557760
To run the color pattern on the connected HDMI screen, run the following command:
• GST_DEBUG
• GST_Shark
• GDB
GST_DEBUG
The first category is the Debug Level, which is a number specifying the amount of desired output:
9 MEMDUMP Logs all memory dump messages. This is the heaviest logging of all, and
includes dumping the content of blocks of memory.
To enable debug output, set the GST_DEBUG environment variable to the desired debug level.
All levels lower than the set level are also displayed. For example, if you set GST_DEBUG=2, both
ERROR and WARNING message appears.
Furthermore, each plugin or part of the GStreamer defines its own category, so you can specify a
debug level for each individual category. For example, GST_DEBUG=2,v4l2src*:6, uses
Debug Level 6 for the v4l2src element, and 2 for all the others.
The '*' wildcard is also available. For example, GST_DEBUG=2,audio*:5 uses Debug Level 5 for
all categories starting with the word audio. GST_DEBUG=*:2 is equivalent to GST_DEBUG=2.
GST_Shark
Use gst-shark (a GStreamer-based tool) to verify performance and understand the element
that is causing performance drops.
data=0
cpb-size=1000 initial-delay=500 ! video/x-h265, alignment=nal ! queue max-
size-buffers=0
! rtph265pay ! udpsink host=192.168.25.89 port=5004 buffer-size=60000000
max-bitrate=120000000 max-lateness=-1 qos-dscp=60 async=false
The latency tracer module gives instantaneous latencies that can differ from the reported
latencies. The latencies are higher if the inner pipeline (element ! element) takes more
time, or lower if the inner pipeline is running faster, but the GStreamer framework waits until the
running time equals the reported latency.
Check for the time (in nanosecond for latency) marked in bold in the following logs. The initial
few readings are high due to initialization time, but become stable after initialization is complete.
For example, the following logs shows ~12 ms of latency for stream-out pipeline.
GDB
Use the gdb command for general debugging.
The command provides the gdb shell. Type run to execute the application.
(gdb)run
To debug, use the backtrace function to view the last function flow.
(gdb)bt
Chapter 8
To measure pipeline performance, use the fpsdisplaysink in the pipeline. The following is a
sample pipeline with fpsdisplaysink for file playback:
In the single threaded GStreamer pipeline, data starvation can occur. Use the queue element to
improve the performance of a single threaded pipeline.
Data is queued until one of the limits specified by the “max-size-buffers”, “max-size-bytes”
and/or “max-size-time” properties has been reached. Any attempt to push more buffers into the
queue blocks the pushing thread until more space becomes available.
The queue element adds a thread boundary to the pipeline, and support for buffering. The queue
creates a new thread on the source pad to decouple the processing on sink and source pad.
The default queue size limits are 200 buffers, 10 MB of data, or one second worth of data,
whichever is reached first.
Quality of Service in GStreamer is about measuring and adjusting the real-time performance of a
pipeline. The real-time performance is always measured relative to the pipeline clock and
typically happens in the sinks when they synchronize buffers against the clock.
When buffers arrive late in the sink, that is, when their running-time is smaller than that of the
clock, the pipeline has a quality of service problem. These are a few possible reasons:
• High CPU load, there is not enough CPU power to handle the stream, causing buffers to arrive
late in the sink.
• Network problems
• Other resource problems such as disk load, and memory bottlenecks
The measurements result in QoS events to adjust the data rate in one or more upstream
elements. The following are two possible adjustments.
Use the timestamp and the jitter value in the QoS event to perform a short-term correction. If
the jitter is positive, the previous buffer arrived late, and can be sure that a buffer with a
timestamp < timestamp + jitter is also going to be late. Therefore, drop all buffers
with a timestamp less than timestamp + jitter.
Long term corrections are a bit more difficult to perform. They rely on the value of the
proportion in the QoS event. Elements should reduce the amount of resources they consume
by the proportion field in the QoS message.
• Permanently drop frames or reduce the CPU or bandwidth requirements of the element.
• Switch to lower quality processing or reduce the algorithmic complexity. Care should be
taken that this does not introduce disturbing visual or audible glitches.
• Switch to a lower quality source to reduce network bandwidth.
• Assign more CPU cycles to critical parts of the pipeline. This could, for example, be done by
increasing the thread priority.
In all cases, elements should be prepared to go back to their normal processing rate, when the
proportion member in the QOS event approaches the ideal proportion.
The Encoder and Decoder plugin also supports the QoS functionality.
• In the decoder, QoS is enabled by default and drops frames after decoding is finished,
based on the QoS event from downstream.
• In the encoder, QoS is disabled by default and drops the input buffer while encoding, if the
QoS condition is true, based on the QoS event from downstream.
Sync
In the GStreamer pipeline, the sync flag plays an important role. The sync flag is used for the
synchronization of audio/video in the pipeline by checking the timestamp in the sink element. To
know the best outcome of the pipeline, disable the sync flag in the sink element of the pipeline,
keeping in mind that synchronization cannot be achieved by setting it to false. The sync flag is
useful for a record pipeline to dump the data as soon as it receives it at the sink element.
gst-launch-1.0 v4l2src
device=/dev/video0 io-mode=4 ! video/x-raw,
format=NV12,width=3840,height=2160,framerate=60/1 ! omxh265enc qp-
mode=auto gop-mode=basic
gop-length=60 b-frames=0 target-bitrate=60000 num-slices=8 control-
rate=constant
Appendix A
Format Mapping
The following table shows the mapping of formats in GStreamer, V4L2, Media Control, and DRM.
Media Bus
S. No. YUV Format GStreamer V4L2 Framework DRM Format
Format
1 YUV 4:2:0, 8 bit GST_VIDE0_FORMAT_ V4L2_PIX_FMT_NV12 MEDIA_BUS_FMT_VY DRM_FORMAT_NV12
NV12 YUYY8_1X24
2 YUV 4:2:2, 8 bit GST_VIDE0_FORMAT_ V4L2_PIX_FMT_NV16 MEDIA_BUS_FMT_UY DRM_FORMAT_NV16
NV16 VY8_1X16
3 YUV 4:2:0, 10-bit GST_VIDE0_FORMAT_ V4L2_PIX_FMT_XV15 MEDIA_BUS_FMT_VY DRM_FORMAT_XV15
NV12_10LE32 YUYY10_4X20
4 YUV 4:2:2, 10-bit GST_VIDE0_FORMAT_ V4L2_PIX_FMT_XV20 MEDIA_BUS_FMT_UY DRM_FORMAT_XV20
NV16_10LE32 VY10_1X20
5 YUV444 8 bit GST_VIDEO_FORMAT V4L2_PIX_FMT_YUV4 MEDIA_BUS_FMT_VU DRM_FORMAT_YUV444
_Y444 44 Y8_1X24
6 YUV444 10 bit GST_VIDEO_FORMAT V4L2_PIX_FMT_X403 MEDIA_BUS_FMT_VU DRM_FORMAT_X403
_Y444_10LE32 Y10_1X30
Appendix B
The AMD Adaptive Computing Documentation Portal is an online tool that provides robust
search and navigation for documentation using your web browser. To access the Documentation
Portal, go to https://docs.xilinx.com.
Documentation Navigator
Documentation Navigator (DocNav) is an installed tool that provides access to AMD Adaptive
Computing documents, videos, and support resources, which you can filter and search to find
information. To open DocNav:
• From the AMD Vivado™ IDE, select Help → Documentation and Tutorials.
• On Windows, click the Start button and select Xilinx Design Tools → DocNav.
• At the Linux command prompt, enter docnav.
Note: For more information on DocNav, refer to the Documentation Navigator User Guide (UG968).
Design Hubs
AMD Design Hubs provide links to documentation organized by design tasks and other topics,
which you can use to learn key concepts and address frequently asked questions. To access the
Design Hubs:
Support Resources
For support resources such as Answers, Documentation, Downloads, and Forums, see Support.
References
These documents and links provide supplemental material useful with this guide:
24. Exploring Zynq MPSoC: With PYNQ and Machine Learning Applications (https://www.zynq-
mpsoc-book.com/)
25. Zynq UltraScale+ MPSoC VCU TRD Wiki
26. https://gstreamer.freedesktop.org/
Revision History
The following table shows the revision history for this document.
Copyright
© Copyright 2020-2023 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, Zynq, and
combinations thereof are trademarks of Advanced Micro Devices, Inc. AMBA, AMBA Designer,
Arm, ARM1176JZ-S, CoreSight, Cortex, PrimeCell, Mali, and MPCore are trademarks of Arm
Limited in the US and/or elsewhere. PCI, PCIe, and PCI Express are trademarks of PCI-SIG and
used under license. The DisplayPort Icon is a trademark of the Video Electronics Standards
Association, registered in the U.S. and other countries. HDMI, HDMI logo, and High-Definition
Multimedia Interface are trademarks of HDMI Licensing LLC. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective
companies.