The H. 264-AVC Video Coding Standard For The Next Generation Multimedia Communication

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

IAEEE Journal 1

The H.264/AVC Video Coding Standard for the


Next Generation Multimedia Communication
M. Mahdi Ghandi, Member, IEEE, and Mohammad Ghanbari, Fellow, IEEE
(Invited paper)

project of ISO/IEC (MPEG) and ITU-T (VCEG). Later on the


Abstract—The capacity of a communication channel is two groups separately published two new advanced standards
determined by its bandwidth and signal-to-noise ratio. For a namely H.263 [5] and MPEG-4 [6] for coding video at low bit
digital user, these parameters determine the bit rate and the rates. The progress of video coding standards is summarized
probability of error and so affect the achievable quality of
service. In the recent multimedia communication systems,
in Fig. 1.
bandwidth is still a limiting factor. Hence, effective video
ITU-T
compression techniques are essential to reduce the amount of Standards
H.261 H.263 H.263+ H.263++
video data. The new H.264/AVC video coding standard jointly
presented by the ITU-T and ISO/IEC experts groups has Joint ITU-T/MPEG H.262/ H.264 (H.26L)/
achieved a significant improvement in compression performance Standards MPEG-2 MPEG-4v10 AVC
compared to the previous standards. It promises to deliver good
MPEG
quality of video at low bit rates. However if the probability of Standards
MPEG-1 MPEG-4
error is more than normal values, receiving the correct data is
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
not guaranteed. This implies a need for useful operation of the
video compression algorithms in error prone environments. This Fig. 1. Evolution of video coding standards ITU-T and MPEG
article presents an overview of the H.264/AVC features in terms
of video compression techniques and error resilient coding. Some In early 1998 VCEG started a project named H.26L, with L
simulation results are provided to demonstrate the efficiency of standing for the long term objective to double the coding
the codec. efficiency relative to the best of existing video coding
standards. The H.26L project targeted at a wide variety of
Index Terms—Multimedia communication, video compression
applications from low to high bit rates (e.g. multimedia over
standard, error resilient coding.
mobile networks and storage on optical devices respectively).
The fruitful outcome of the MPEG-2/H.262 video codec
I. INTRODUCTION product under the joint effort of the MPEG and VCEG
encouraged the two experts groups in further collaboration.
D IGITAL video compression techniques have played a key
role in recent multimedia communications. The limitation
of bandwidth in communication channels and storage media
Therefore, in 2001 VCEG and MPEG formed a Joint Video
Team (JVT) to finalize the project called by ITU-T as H.264
and by ISO/IEC as MPEG-4 part 10, Advanced Video Coding
demands more efficient video coding methods. On the other (AVC) [7], [8]. However for consistency, throughout this
hand, introducing new applications and advances in paper we call it just H.264.
multimedia technology demands video coding methods to Simulation results show that H.264 has achieved substantial
include more complex and advanced features. Therefore, superiority of video quality over that achieved by H.263++
standardization of the video compression techniques is and MPEG-4. The JVT test model software (a laboratory test
essential [1]. model of H.264) has achieved up to 50 per cent in bit rate
The H.261 recommendation [2] was the first standard for saving compared with the existing most optimized H.263 or
video conferencing applications that was developed by the MPEG-4 codec [9]-[11]. This means that H.264 offers
Video Coding Experts Group (VCEG) of ITU-T in late 1980s. significantly higher quality levels with the same bit rates.
The Motion Picture Experts Group (MPEG) of ISO/IEC then As well as superior Rate-Distortion (RD) efficiency, H.264
introduced MPEG-1 standard [3] in early 1990s for storage of can operate in low delay mode to adapt telecommunication
video on CD-ROMs. The MPEG-2 (or H.262) [4] which is applications, while allowing higher processing delay in
now the most widely used video coding standard was a joint applications with no delay constraints [12]. In the H.264
design there were efforts to keep the codec in a certain level of
M. Ghanbari is a professor of video networking in the University of Essex,
complexity [13]. The decoder in particular, has a reasonably
Wivenhoe Park, Colchester, CO4 3SQ, UK. (Telephone: +44-1206-872434; low complexity that makes the advanced decoder appropriate
Fax: +44-1206-972900; e-mail: ghan@ Essex.ac.uk). for applications with limited processing power [14], [15].
M. M. Ghandi is a senior research officer in the video networking group of
The H.264 codec has a feature that conceptually separates
the University of Essex. (email: [email protected])
IAEEE Journal 2

the Video Coding Layer (VCL) from the Network Abstraction Input Res. data Entropy
+ DCT Quant
Layer (NAL). VCL provides the core compressed video Video - & Headers Coding
(P)
contents, while NAL supports delivery over various types of Output
Mode Predicted blocks (P)
network. This network friendliness feature of the standard Inverse Bitstream
Select;
Quant
facilitates easier packetization and better information priority Intra Pred Reconstructed
control. In addition, to adapt H.264 to applications involving blocks

bit errors and packet losses, a number of error resilience Frame Loop +
+
IDCT
techniques are provided in the standard. Buffer Filter

This paper presents an overview on the H.264/AVC video


coding method which is given in Section II. The new features Fig. 2. A simplified block diagram of intraframe coding in the H.264 coder

that make the standard more efficient than the previous ones
are particularly emphasized. Error resilient features of the 1) Intra Prediction Modes: There are nine advanced
codec are discussed in Section III as they are the most prediction modes for luminance (luma) samples when the MB
demanding features of the codec for error prone channels. is partitioned into 4x4 blocks (i.e. intra-4x4 modes).
Section IV provides a description on the non-normative R-D Additionally, four other modes are used for predicting the
optimization method utilized in the JVT test model software whole 16x16 intra MB (intra-16x16). The chrominance
which is a key feature of the success of the method. Selected (chroma) components can also be predicted in 4 different
simulation results verifying the R-D efficiency of the modes (i.e. intra-chroma modes). Various intra-4x4 modes are
algorithm and error resilient techniques are demonstrated in demonstrated in Fig. 3, where the prediction values for pixels
Section V. are calculated from the neighboring boundary pixel values.
Each mode is suitable to predict directional structures in the
picture at different angles (e.g. horizontal, vertical, diagonal,
II. AN OVERVIEW OF THE H.264/AVC VIDEO CODING etc.). Intra-16x16 and intra-chroma modes consist of
METHOD horizontal, vertical, DC and plane modes. In these modes
similar to the intra-4x4 modes, an arrangement of neighboring
Standard video coding methods including H.264 are based
boundary pixels produce prediction pixels. For more details of
on three fundamental redundancy reduction principles, namely
each intra prediction mode, the H.264 recommendation should
spatial and temporal redundancy reduction and entropy coding
be consulted [7].
[16]. To carry out spatial redundancy reduction in intraframes
(Fig. 2), transform coding is applied using Discrete Cosine Mode 0: Vertical Mode 1: Horizontal Mode 2: DC

Transform (DCT) on rectangular blocks followed by A B C D


I
quantization and entropy coding. Temporal redundancy J (A+B+C+D+
reduction is carried out through modeling the movements of K I+J+K+L)/8
L
objects in interframes (Fig. 6) by Motion Vectors (MVs)
Mode 3: Diagonal-Down-Left Mode 4: Diagonal-Down-Right Mode 5: Vertical-Right
generated in the Motion Estimation (ME) process. In this
section, a description of the H.264 features for intra-coding is
first presented followed by a description of new inter-coding
techniques.
A. Intraframe Coding Mode 6: Horizontal-Down Mode 7: Vertical-Left Mode 8: Horizontal-Up

Although efficient video encoders mainly use interframe


prediction, nevertheless use of intraframe coding for parts of
the picture is necessary to prevent error propagation.
However, intraframe coding generates a large bit rate, and Already coded and reconstructed pixels
Pixels of the current 4x4 block
hence in order for H.264 to be efficient, special attention is of neighboring blocks in the same frame

paid on intra-frame coding. Fig. 3. Intra 4x4 prediction modes


H.264 takes advantage of correlations between neighboring
blocks to achieve better compression in intra coding. In this 2) Transform and Quantization: In H.264 similar to the
coder every intra 16x16 pixel Macroblock (MB) in a picture is other standards, a transformation and quantization is applied
first predicted in an appropriate mode from the already coded on the prediction residuals. However, H.264 employs a 4x4
and reconstructed (R) samples of the same picture. As Fig. 2 integer transform as opposed to the 8x8 floating point DCT
shows, the difference between the predicted block (P) and the transform used in the other standard codecs. The transform is
original one (prediction residual) is calculated, DCT an approximation of the 4x4 DCT and hence it has similar
transformed, quantized and entropy coded. The reconstructed coding gain to the DCT transform. Since the integer transform
samples are then filtered to generate the decoded frame and has an exact inverse operation, there is no mismatch between
stored in a buffer to be used as a reference for coding of the the encoder and the decoder which is a problem in all DCT
future pictures. based codecs.
IAEEE Journal 3

1 1 1 1  specifically provided for each context to efficiently code the


  coefficients in different statistical conditions.
2 1 − 1 − 2
Tint =
1 − 1 − 1 1  b) CABAC: In this mode, the generated data including
  headers and residual data are coded using a binary arithmetic
1 − 2 2 − 1 
Fig. 4. The integer 4x4 forward transformation matrix used in the H.264 codec coding engine. The compression improvement of CABAC is
the consequence of non-integer length symbol assignment,
After the transformation using the matrix of Fig. 4, each adaptive probability estimation and improved context
coefficient is scaled with a specified factor to make the final modeling scheme. In H.264, the CABAC is designed in a way
transform coefficient. The scaled coefficients are then that can be implemented using simple integer operations and
quantized with a quantization step size determined by a given table look-ups adapting it to low complexity applications [19].
Quantization Parameter (QP). In H.264 the values of the A block diagram of CABAC coding process is depicted in
quantizer step sizes have been defined such that the scaling Fig. 5. In order to code a syntax element, it is first mapped to
and quantizing stages are mixed to be performed by simple a binary sequence called bin string. In the standard, proper
integer operations in both encoder and decoder [17]. In binarization mapping schemes are provided for different types
particular, the inverse operations of scaling, quantizing and of data. For each element of the bin string (i.e. bin) a context
transforming are directly described in the standard using pure index is defined based on the neighboring information and the
integer operations [7]. This can significantly reduce the coder status. There are 399 different contexts in the standard
processing power which is useful in power constrained for various types of data, and the context modeling scheme
processing applications, such as video over mobile networks (i.e. derivation of the context index) for each data type is
and allow increased parallelism. clearly specified. The binary arithmetic coder engine then
It should be mentioned that during the preparation of the codes the bins using associated probability estimation tables
standard, a variable block size transform method has been addressed by the context index and generates the output
envisaged [18]. It includes 4x4, 4x8, 8x4 and 8x8 transforms stream. Subsequently, the probability tables are updated based
and improves the efficiency of the codec. However, since it is on the coded bins for the future use.
not mature enough and adds more complexity to the codec, it
has not been yet included in the first generation of the Syntax Bin Bin & Output
Element String Dtermine CtxIdx Arithmetic Stream
standard. Binerization
CtxIdx Coding Engine
3) Entropy Coding: Before transmission, generated data of
Bin
all types are entropy coded. H.264 supports two different
CtxIdx: Context Index Probability Update
methods of entropy coding namely Context Adaptive Variable Tables Tables
Length Coding (CAVLC) and Context Adaptive Binary
Arithmetic Coding (CABAC). As well as a conceptual Fig. 5. A simplified block diagram of CABAC coder
difference between the two methods, CABAC is more
efficient than CAVLC which itself is superior to the 4) In-Loop Deblocking Filter: To reconstruct a coded
conventional VLC (Huffman) used in the other standard picture, it is filtered using an adaptive deblocking filter. The
codecs. filter removes visible block structures on the edges of the 4x4
a) CAVLC: When the codec is in this mode, the residual blocks caused by block-based transform coding and motion
data is coded using CAVLC but other data are coded using estimation [20]. The strength of filtering is adaptively
simple Exp-Golomb codes which are listed in table I. These controlled by the coded information such as QP and the block
data are first appropriately mapped to the Exp-Golomb codes texture.
depending on the data type (e.g. MB headers, MVs, etc.), and As well as subjective quality improvement, simulation
then the corresponding code words are transmitted. results show that applying the filter in the encoder loop
improves the compression efficiency of the codec [20].
TABLE I Furthermore, if the filter is only applied in the decoder, an
EXP-GOLOMB CODE WORDS
extra buffer in the decoder is needed to store the non-filtered
Code Code Word frames to maintain synchronization with the encoder. These
0 1
advantages of the in-loop filter were the reasons that it was
1,2 010, 011
3,4,5,6 00100, 00101, 00110, 00111 accepted in the standard, despite an increase in the codec
7,8,9,10,11,… 0001000, 0001001, 0001010, 0001011, 0001100, … complexity.
… …
B. Interframe Coding
The zigzag scanned quantized coefficients of a residual In H.264, similar to its predecessor standards every MB of
block are coded using Context Adapting VLC tables. The an interframe could be coded in intra or inter mode. If the
already coded information of the neighboring blocks (i.e. intra mode is selected, the MB is coded as explained in Part A.
upper and left blocks) and the coding status of the current Otherwise, the interframe prediction including ME, mode
block determine the context. Optimized VLC tables are selection and Motion Compensation (MC) is performed to
IAEEE Journal 4

produce the predicted blocks. A block diagram of interframe inter pictures, the MVs are in quarter sample precision. To
coding in the H.264 coder is depicted in Fig. 6. Similar to generate the values of half-pixel positions a 6-tap Finite
intraframe coding, the residual data of original and predicted Impulse Response (FIR) filter is applied to integer position
blocks, are transform coded. The inter prediction method in samples. The quarter-pixel samples are then generated using
H.264 has some interesting features which are explained in the simple interpolation between neighboring (integer or half-
following parts. pixel position) samples [21].
4) B-pictures consideration: B-pictures are encoded using
Input Res. data, Entropy both past and future pictures as references in contrast to P-
+ DCT Quant
Video - Headers, Coding pictures which are only predicted form the past references. In
(P) & MVs
(P) Output other words, each block of a B-picture can be predicted from
Mode Bitstream
Select;
Inverse either of two reference blocks or a linear combination of them.
Quant
MC/Pred (R) In H.264 these references could be both in the past or one in
MVs the past and one in the future. New concept of direct
ME
Frame Loop +
+
IDCT
prediction mode is specified in H.264, where no data (such as
Buffer Filter MVs and reference indices) is present in the bitstream for the
prediction process and they are derived from the available
Fig. 6. A simplified block diagram of interframe coding in the H.264 encoder data of co-located MBs of the subsequent pictures. To support
B-picture coding as well as P-picture, proper mode
1) Inter Prediction Modes: Interframe predictive coding is
referencing tables and entropy coding methods are specified
where H.264 makes most of its gain in compression
[7][22].
efficiency. Motion compensation on each 16x16 MB can be
performed with various block sizes and shapes illustrated in
Fig. 7. The partitioning choice of a MB into 16x16, 8x16,
III. H.264 NAL LAYER AND ERROR RESILIENCE FEATURES
16x8 or 8x8 blocks is determined by mb-type. In 8x8 mode
(i.e. mb-type 3) each of the blocks can be further divided In a communication channel the quality of service is
independently into 8x8, 8x4, 4x8 or 4x4 sub-partitions affected by the two parameters of bandwidth and the
determined by sub-mb-type. Note that each of these blocks probability of error. Therefore, as well as video compression
contains its own MV and hence more precise motion efficiency which is provided through the VCL layer, the
compensation can be performed when the MB is divided into adaptation to communication channels should be carefully
smaller blocks. considered. The concept of NAL layer and the error resilience
features in H.264 are to provide an appropriate VCL
mb-type: 0 mb-type: 1 mb-type: 2 mb-type: 3
16 16 8 8 representation for conveyance on a variety of channels to cope
ref-idx0, MV0
8
ref-idx0, MV0 ref-idx0,
MV0
ref-idx1,
MV1
sub-type0, sub-type1,
8 ref-idx0 ref-idx1 with their erroneous situations.
16 16
ref-idx1, MV1 sub-type2, sub-type3,
ref-idx2 ref-idx3
A. NAL
The Network Abstraction Layer facilitates the delivery of
the H.264 VCL data to the underlying transport layers such as
sub-mb-type: 0 sub-mb-type: 1 sub-mb-type: 2 sub-mb-type: 3
8 8 4 4 RTP/IP, H.32X and MPEG-2 systems [23]. Each NAL unit
MV0
4
MV0 MV0 MV1
4
MV0 MV1
could be considered as a packet that contains an integer
8
MV1
8
MV2 MV3
number of bytes including a header and a payload (see Fig. 8).
The header specifies the NAL unit type and the payload
contains the related data.
Fig. 7. Top: various 16x16 MB partitioning modes for MC. Bottom: sub
partitioning modes of 8x8 blocks when mb-type 3. TABLE II
NAL UNIT TYPES
2) Multiple Reference Prediction: The H.264 standard NAL
Class Content of NAL unit
offers the option of using many previous pictures for unit type
prediction. Every MB partition (but not sub-partition) shown 0 - Unspecified
1 VCL Coded slice
on the top part of Fig. 7, could have a different reference 2 VCL Coded slice data partition A
picture that is more appropriate for that particular block. This 3 VCL Coded slice data partition B
will increase the coding efficiency and produce better 4 VCL Coded slice data partition C
5 VCL Coded slice of an IDR picture
subjective quality. Moreover, using this mode improves the 6-12 Non-VCL Supplemental Information,
robustness of the bitstream to channel errors [12]. Parameter Sets, etc.
3) Quarter Sample MVs: The objects movements in the 12-23 - Reserved
24-31 - Unspecified
consecutive pictures of a sequence particularly for reduced
size pictures such as QCIF, are not necessarily in integer pixel
units. Therefore, to improve the motion modeling scheme of Table II gives a summarized list of different NAL unit
types. NAL units 1 to 5 contain different VCL data that will
IAEEE Journal 5

be described later. NAL units 6 to 12 are non-VCL units unequal error protection to various important parts of video
containing additional information such as parameter sets and data becomes straightforward. For example a high error
supplemental information. Parameter sets are header data that protection can be applied to NAL units of type 2, 3 and 5
are unchanged in a number of NAL units, and then are sent to (headers, intra residuals and IDR pictures respectively) and a
prevent repeating them. Supplemental information is timing or lower protection for type 4 units (inter residual data), since
other addressing data that enhances the decoder usability but they are less important. It should be noted that there is no DP
are not essential in decoding the pictures. NAL units 12 to 23 for IDR pictures. However, since they have a different NAL
are reserved for future use of H.264 extensions and the types unit type (type 5), special error protection could be applied to
24 to 31 are available for use by different applications. them.
3) Flexible MB Ordering (FMO): In FMO mode, MBs are
B. Error Resilience Features
allowed to be assigned to any slice in a frame, so they are
H.264 provides features that make the generated bitstream flexible to be transmitted in a non-scanning order. Since each
robust to the bit errors and packet losses. Following is a brief slice is independently decodable, one can mange to spatially
description of each of these features: interleave MBs into different slices. Therefore, if one of the
1) Resynchronization (Slice) Headers: Each frame can be slices is missed, its missing MBs are surrounded by correctly
divided into several slices; each contains a flexible number of received MBs. Hence, by applying an appropriate error
MBs. In each slice the arithmetic coder is aligned and the concealment method, the visual artifact of the losses could
predictions are reset. Hence, every slice in the frame is significantly be reduced.
independently decodable. Therefore, they can be considered 4) Redundant Slices: H.264 has a new feature that allows
as resynchronization points that prevent propagation of a the encoder to send Redundant Slices (RSs) containing
probable error to entire frame. If the resynchronization information of the same primarily transmitted MBs. If the
markers used too often, they limit the damaged area more original slices are lost, the RSs represent an alternative data
tightly, so improving the error resilience of the codec. It is (but somewhat in lower quality) that can be used to recover
obvious that the slicing introduces some overhead and reduces the corrupted MBs.
the compression efficiency, but in the erroneous situations, 5) Intra Refresh and Multiple References: Features such as
due to the better error resilience, the received video quality Intra modes and Multiple Reference Selection (MRS),
could be much better. improve the error robustness of a bitstream as well as the
In H.264, each slice is placed in a separate NAL unit (see encoding efficiency. The intra MBs do not use temporal
table II). The slices of an IDR picture (i.e. a picture with all prediction, so they prevent error propagation through the
intra slices) are located in type 5 NAL units, while those bitstream. Similarly, using multiple references (especially in
belonged to a non-IDR picture are placed in NAL units of the presence of a back-channel) can prevent corrupted pictures
type 1 to 4 depending on the Data Partitioning (DP) mode to be used as references and hence stop the error propagation
which is explained in the following part. [24],[25].
2) Data Partitioning: DP is another important and efficient Intra MBs can be inserted very often in the bitstream and in
way to make a bitstream more robust. It is proposed due to the this case the coding efficiency is reduced, but the error
fact that the symbols appeared earlier in the bitstream suffer robustness is improved. This is the case for MRS mode which
less from the errors than those which come later. Therefore, the encoder can be purely optimized for coding efficiency or
by bringing the more important parts of the video data (such to generate more robust bitstreams.
as headers and MVs) ahead of the non-important data, the
channel error side effect can be significantly reduced. IV. RATE-DISTORTION OPTIMIZATION
In H.264, when DP is enabled, every slice is divided into
In H.264 similar to the other standards, only the bitstream
three separate partitions (Fig. 8) and every partition is located
syntax and the decoding procedure are specified, and the
in a particular NAL unit (see table II as well). Therefore, data
encoder is flexible to have different implementations. It
partitioning can be used as an efficient layering method that
should select the parameters such as MB modes, MVs, QP and
separates the data with different importance.
the residual data for each MB among several choices. This
TYPE 1
Slice
Slice Data: MB Headers, MVs, etc. and Intra and Inter Residual Data
huge amount of choices makes optimization of the encoder
Header
very crucial to achieve a good R-D efficiency. In other words,
a poor design of encoder can generate even lower quality
TYPE 2 Slice MB Headers, TYPE 3 Intra Residual TYPE 4 Inter Residual
(part A) Header MVs, etc. (part B) Data (part C) Data bitstreams than a simple traditional video coder. In the JVT
reference software a Lagrangian optimization technique is
NAL-Unit Header NAL-Unit Payload applied to achieve R-D efficiency [9].

Fig. 8. A non-IDR slice placed in NAL units. Top: data partitioning is


disabled. Bottom: data partitioning is enabled. A. Lagrangian Optimization Technique
Lagrangian techniques are based on converting a constraint
By partitioning the data into different NAL units, applying
optimization problem to an unconstrained one [26]. In (1) the
IAEEE Journal 6

problem is to minimize the distortion (D) with a constraint


that the rate (R) should be less than Rc: 80
min D R < Rc
Lambda(QP:20) Lambda(QP:25)
(1) 70 Lambda(QP:30) Lambda(QP:35)

Avearge QP Occurrence (%)


Lambda(QP:40)
Equation (1) is converted to (2) where the problem is to 60
minimize the Lagrangian cost J, with λ being the Lagrangian 50
parameter: 40
min J J = D + λ × R (2) 30
The selection of encoding parameters for every MB in a 20
picture determines its rate (r) and distortion (d) and the sum of 10
these values generates R and D. Assuming that r and d of a
0
particular MB are only dependant on its coding parameters 15 20 25 30 35 40 45
(and not the others,) the optimization of (2) is simplified to QP
minimizing the cost of each MB separately: Fig. 9. Average QP occurrence for different λ values calculated form (6),
min j j = d + λ × r (3) Foreman QCIF@10Hz

Therefore, for coding pictures a QP is first selected and a λ


B. Optimization Process value is then calculated to precede the optimization process. It
In the JVT software, motion estimation and mode selection can be seen that in this method the bit rate is not directly
have been optimized using Lagrangian method. The ME controlled and the resulting bitstream is variable bit rate. To
process finds MVs that minimize jmv, calculated using the bits add the rate control feature to the software there are some
needed to send each MV (rmv) and the relating Sum of proposals in the literature [28]-[31]. In addition, since all
Absolute Distortion (SADmv): possible modes of every MB are examined, this method is a
time consuming process. Some proposals have simplified the
jmv = SADmv + λMotion × rmv (4) process with an acceptable delay [32]-[35].
jm = SSDm + λMode × rm (5)
After determining the optimum MVs for inter modes, the
V. SELECTED SIMULATION RESULTS
Lagrangian cost of each (intra and inter) mode (jm) is
calculated using (5), where the required number of bits to send In this section, through various simulations we intend to
all MB information is rm, and SSDm is the Sum of Squared demonstrate the two fundamental properties of an H.264
Differences (SSD) between the original and the reconstructed codec namely its compression efficiency and its error
pixels. The mode that has the lowest jm among all others is resilience property.
selected as the optimum one. Note that in intraframes only the A. R-D Efficiency
intra modes are allowed and searched while in interframes,
Four different video test sequences have been selected in
both types are examined.
our simulations namely: Foreman, News, Mobile and Akio,
each have a different texture and movement property. The
C. Selection of λ sequences were coded by JM7.6 JVT reference software [36].
In the above optimization process, the value of λ is In the default settings for each test, the picture resolution is
calculated using an empirical formula of (6): QCIF, the frame rate is 10Hz, number of reference frames is
1, ME search window is 16, the entropy coding method is
λ Mode = 0.85 × 2 (QP −12) / 3 CABAC, Lagrangian R-D optimization is enabled, and
(6)
λ Motion = λ Mode bitstreams contain an Intra picture followed by 99 P-pictures.
Fig. 10 demonstrates the results of these four tests in the
This relationship between QP and λ is extracted through described default modes. The results clearly show that the R-
experiments similar to what is described in [27] for H.263. In D efficiency (or quality: PSNR) of the codec is strongly
these experiments the average selected values of QP by dependant on the contents of the pictures. To evaluate the
optimization process with various given λ values are coding features, we have performed a series of tests. In each
calculated and equation (6) has been established. Fig. 9 gives test, settings that are unique for that test and are different from
the results of the experiment for the Foreman test sequence. the defaults will be clearly specified. For briefness, we will
The encoder examines different values of QP as well as MB just demonstrate two extreme cases (i.e. the tests that have the
modes and selects the QP that has the minimum Lagrangian best and the worst results) for each series.
cost. From the figure it is clear that when λ is calculated using 1) Intra Prediction Modes: To evaluate the efficiency of
(6) by an specific QP, the average number of selected QP is intra prediction modes adapted in H.264, we have selected 10
equal to that particular QP, and this verifies the accuracy of frames of each sequence, with 10 frames distance between
relationship of (6).
IAEEE Journal 7

each, and coded them in Intra mode and their averages are compared in CABAC and CAVLC modes. In these tests, the
calculated. Each test was done in three scenarios; firstly all CABAC entropy coding mode has outperformed the CAVLC
intra prediction modes were allowed, secondly they were method by up to 8%. Obviously the achieved saving is
limited to 16x16 intra modes only, and finally the encoder was dependant on the test sequence and the bit rate as well.
forced to select 4x4 DC mode only. In the latter scenario, we
have amended the bitstream structure in a way that no header 42

is sent for intra prediction mode addressing, since there is only 40


one mode.
38

PSNR Y (dB)
44
36
42

40 34

38
32 CABAC
PSNR Y (dB)

36 CAVLC

34 30
25 75 125 175 225
32 Bit Rate (Kb/s)
Akio
30 News 45
Foreman
28
Mobile 43
26
0 50 100 150 200 250 300 41
Bit Rate (Kb/s)

PSNR Y (dB)
Fig. 10. R-D curves of the selected tests in default modes. 39

37

42 35
CABAC
40
33 CAVLC

38
31
5 15 25 35 45 55
PSNR Y

36
Bit Rate (Kb/s)

34 Fig. 12. CABAC and CAVLC modes, Top: Foreman, Bottom: Akio

32 All Intra Modes


Only 4x4 DC mode
41
30 Only 16x16 Modes
39
28
25 45 65 85 105 125 145 37
Bit Rate (Kb/s)
PSNR Y (dB)

44 35

42 33

40 31 Foreman, E
Foreman, D
38 Mobile, E
29
Mobile, D
PSNR Y

36
27
0 50 100 150 200
34
Bit Rate (Kb/s)
All Intra Modes
32 Fig. 13. Foreman and Mobile tests, deblocking filter enabled and disabled
Only 4x4 DC Mode
30 Only 16x16 Modes

28
25 45 65 85 105 125 145
Bit Rate (Kb/s)

Fig. 11. All Intra pictures, 3 Hz, Top: Foreman, Bottom: News

Fig. 11 shows the results for the Foreman and News


sequences where it is clear that the advanced intra prediction
modes of H.264 have a significant improvement in intra
coding.
Fig. 14. Foreman, QP: 40, Frame 1, Filter disabled (left) and enabled (right)
2) CABAC: The next test is to evaluate the improvement
caused by the used entropy coding algorithm. In Fig. 12 the R- 3) Deblocking Filter: As mentioned earlier, the use of
D (quality) curves of the Foreman and Akio tests are deblocking filter in the encoder loop will improve the R-D
efficiency as well as the subjective quality. Fig. 13 shows that
IAEEE Journal 8

how applying the deblocking filter has improved the R-D coding efficiency. In fact this feature of H.264 is one of the
efficiency for the Foreman test, especially in lower bit rates. key reasons of it success. Note that in lower bit rates the
On Mobile test due to the complex texture and high bit rate, improvement caused by the sub-MB-modes is less than the
in-loop filter has not improved the efficiency. Fig. 14 higher rates. The reason is that when the bit budget is very
demonstrates the subjective quality of the Foreman picture limited, the R-D optimization process selects 16x16 modes
when the filter is enabled and disabled. It is clear that the mostly rather than smaller division modes. Fig. 16 shows the
deblocking filter has subjectively improved the picture quality overlay of MB sizes for the high and low bit rate of the second
by smoothing the blocking artifacts. frame of the Foreman test sequence.
5) Multiple Reference Prediction: Fig. 17 shows the R-D
43
(quality) curves of the Foreman and News tests when the
41
number of reference frames for inter prediction is varied from
1 to 9. We have removed some middle values to make the
39 graphs more readable. From the figure is can be observed that
PSNR Y (dB)

having more references for prediction improves the coding


37
efficiency. However, it can be seen that when the movement
35 between frames is not significant (like the News test,) there is
All Inter Modes
not much to gain from multiple reference, since there are little
33
No Sub Mode differences between the reference pictures. Furthermore, note
Only 16x16 Mode
31 that when the encoder searches for the best match in more
5 15 25
Bit Rate (Kb/s)
35 45 55 references, the encoding delay will increase. For example in
9-frame reference scenario, the ME process is 9 times slower
39 than 1-frame reference scenario, and hence the overall coding
37
delay will significantly grow.
35 42
PSNR Y (dB)

33
40

31
38
PSNR Y (dB)

29
All Inter Modes 36
27 No Sub Mode
Only 16x16 Mode 34
25
50 100 150 200 250 300 350 400 450 32 9 Ref
Bit Rate (Kb/s) 5 Ref
30 3 Ref
Fig. 15. Top: Akio test, Bottom: Mobile test. (i) all inter prediction modes are
1 Ref
allowed, (ii) only 16x16 to 8x8 modes are allowed and (iii) only 16x16 mode
28
is allowed. 10 60 110 160 210
Bit Rate (Kb/s)

43

41

39

37
PSNR Y (dB)

35

33

31 9 Ref
5 Ref
Fig. 16. MB inter partitioning modes, Foreman frame 2, high bit rate QP: 20 29
1 Ref
(left) and low bit rate QP: 40 (right)
27
0 20 40 60 80 100
4) Inter Prediction Modes: We have compared the Bit Rate (Kb/s)
performance of prediction modes under various scenarios of: Fig. 17. Top: Foreman, Bottom: News. Different number of prediction
(i) all inter modes are allowed (normal H.264 case), (ii) only reference frames.
16x16 to 8x8 modes are allowed and (iii) only 16x16 inter
mode is allowed. For the second and the third scenarios, we B. Evaluation of Error Resilience Features
have amended the bitstream semantics in a way that no sub- To be able to simulate the impact of channel errors, we
mb-mode is sent and for the 16x16 scenario, there is no mb- have modified the JVT test model decoder software to cope
mode either (since there is only one mode). Fig. 15 shows the with errors. To introduce the channel noise on a bitstream, a
results for Akio and Mobile. It can be seen that the use of all discrete two-stage Elliot-Gilbert model is used [37]. Based on
H.264 inter prediction modes significantly improves the the channel bit error rate and the mean burst length the model
IAEEE Journal 9

randomly alters the polarity of the bits (0 or 1). During the sliced bitstreams have significantly better qualities. It should
decoding procedure, when the first error in a slice is detected, be noted that when the number of slices is more than a
the decoder skips the bits to the next slice and marks un- specific amount (in this test 15), no further improvement is
decoded MBs as corrupted. achieved.
When a picture is completely decoded, the corrupted MBs
are concealed using their correctly received neighboring MBs. 3) Unequal Error Protection: Fig. 20 shows the quality of
In the concealment method [38], the MVs are recovered using the received video when the DPA part of the data has error
a boundary matching technique. Finally, the quality of the protection and supposed to be error free. This is possible by
decoded and concealed pictures is assessed based on their applying advanced channel coding techniques [39]. In this
PSNR. To make the statistics more reliable, every simulation particular test, DPA is 40% of the bit rate. Additionally, the
is run 30 times and the resulting distortions are averaged. applied channel coding technique introduces a 25% overhead.
1) Data Partitioning: In Fig. 18 the average Luma PSNR of Hence, to make the comparison fair, we have adjusted the
the foreman coded bitstream with different bit error rates is non-protected bitstream bit rate by 10% more than the
illustrated. Bit rates of the bitstreams were set to 100 kb/s, protected one. From the figure it is clear that by applying error
using our Lagrangian rate controller [28]. From the figure it protection to the DPA, the average output quality in high error
can be observed that enabling the DP which is simple with rates has been dramatically improved.
negligible overhead (and hence, small quality degradation) has
significantly improved the resilience of the codec to channel 37
errors.
34
39
31

Y PSNR (dB)
36

33 28

30 25
Y PSNR (dB)

27 DP:ON, 9 slice, no error protection


22
DP:ON, 9 slice, DPA error protected
24

DP: ON 19
21
0.01 0.001 0.0001 0.00001 0.000001
DP: OFF bit error rate
18
Fig. 20. PSNR vs. bit error rate, DP enabled, 9 slices per frame, with and
15 without DPA error protection. Foreman 100 and 110 kb/s respectively.
0.01 0.001 0.0001 0.00001 0.000001
bit error rate

Fig. 18. PSNR vs. bit error rate when DP is enabled and disabled. Foreman,
100 KBits/Sec. VI. CONCLUSION
The H.264/AVC video coding standard has achieved a
significant improvement compared to its predecessors. As
37
well as new features that improves the compression efficiency
34 such as advanced inter and intra prediction, H.264 supports a
number of error resilience techniques that facilitate the codec
31
to cope with different channel situations. These characteristics
Y PSNR (dB)

28 no slice
of AVC make it an ideal codec for applications with very
9 slice/frame
limited channel capacity and extremely error prone channels
25
such as mobile systems and video telephony. Due to its high
15 slice/frame
22
compression efficiency, the codec can be used for coding of
20 slice/frame
high quality video at lower rates. Therefore, this standard will
19
be a serious contender for a variety of next generation
0.01 0.001 0.0001 0.00001 0.000001
bit error rate multimedia applications. For instance, the DVD Forum
Fig. 19. PSNR vs. bit error rate, DP enabled, with and without slices. Steering Committee has recently selected H.264 decoding as a
Foreman, 100 KBits/Sec. mandatory capability for players of its upcoming new HD-
DVD format. This is the first selection of the new standard by
2) Slice Structure: Fig. 19 shows the quality of the decoded
a major industry consortium for a consumer end-user product
erroneous video bitstream of the same test sequence when the
technology with clear potential for extremely widespread use.
slice structure is enabled. Due to the insertion of more
resynchronization headers (i.e. more slices per frame) higher
ACKNOWLEDGMENT
overhead is used and hence, the video quality in the error free
situation is degraded. However, in higher bit error rates, the This work is supported by the Engineering and Physical
Sciences Research Council (EPSRC) of the UK.
IAEEE Journal 10

REFERENCES [26] Antonio Ortega and Kannan Ramchandran, “Rate-distortion methods for
image and video compression,” IEEE signal processing magazine, vol.
[1] Mohammed Ghanbari, Standard Codecs: Image Compression to 15, Nov. 1998, pp. 23-50.
Advanced Video Coding. London, UK: IEE, 2003, ch. 1. [27] Gary J. Sullivan and Thomas Wiegand, “Rate-distortion optimization for
[2] “Video Codec for Audio Visual Services at px64 kbits/s,” ITU-T video compression,” IEEE signal processing magazine, vol. 15, Nov.
Recommendation H.261, 1990 1998, pp. 74-90.
[3] “Coding of Moving Pictures and Associated Audio for Digital storage [28] M. Mahdi Ghandi and Mohammed Ghanbari, “A Lagrangian optimized
Media at up to About 1.5 Mbits/s,” ISO/IEC 1117-2: Video (MPEG-1), rate control algorithm for the H.264/AVC encoder,” IEEE International
November, 1991 Conference on Image Processing (ICIP), October 2004.
[4] “Generic Coding of Moving Pictures and Associated Audio [29] Siwei Ma, Wen Gao, Feng Wu and Yan Lu, “Rate control for JVT video
Information,” ISO/IEC 13818-2: Video (MPEG-2), May 1996 coding scheme with HRD consideration,” in Proc. Int. Conf. Image
[5] “Video Coding for Low Bit Rate Communication, Version 1,” ITU-T Processing, Spain, Sep. 2003, pp. 793-796.
Recommendation H.263, 1995 [30] Simone Milani, Luca Celetto and Gian Antonio Mian, “A rate control
[6] “Coding of Audio-Visual Objects, Part 2: Visual,” ISO/IEC 14496-2 algorithm for the H.264 encoder,” in Proc. 6th baiona Workshop Signal
(MPEG-4 visual version 1), 1999 Processing Comm., Spain, Sep. 2003.
[7] “Draft ITU-T Recommendation and Final Draft International Standard [31] Zhengguo Li, Feng Pan and Keng Pang, “Adaptive Basic Unit Layer
of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 144496-10 Rate Control for JVT,” JVT of ISO/IEC MPEG & ITU-T VCEG,
AVC),” Joint Video Team of ISO/IEC and ITU-T, March 2003 Pattaya, Thailand, 7-14 March, 2003
[8] Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard and Ajay Luthra, [32] Chuan-Yu Cho, Shiang-Yang Huang and Jia-Shung Wang, “An
“Overview of the H.264/AVC video coding standard,” IEEE Trans. embedded merging scheme for H.264/AVC motion estimation,” in Proc.
Circuits Syst. Video Technol., vol. 13, July 2003, pp. 560-576. Int. Conf. Image Processing, Spain, Sep. 2003, pp. 909-912.
[9] Thomas Wiegand, Heiko Schwarz, Anthony Joch, Faouzi Kossentini and [33] Peng Yin, Hye-Yeon Cheong Tourapis, Alexis Michael Tourapis and Jill
Gary J. Sullivan, “Rate-constrained coder control and comparison of Boyce, “Fast mode decision and motion estimation for JVT/H.264,” in
video coding standards,” IEEE Trans. Circuits Syst. Video Technol., vol. Proc. Int. Conf. Image Processing, Spain, Sep. 2003, pp. 853-856.
13, July 2003, pp. 688-703. [34] Feng Pan, Xiao Lin and Rahardja, “Fast mode decision for intra
[10] Daniel Alfonso, Daniele Bagni, Danilo Pau and Antonio Chimienti, “A prediction,” JVT of ISO/IEC MPEG & ITU-T VCEG, Pattaya, Thailand,
Performance analysis of H.264 video coding standard,” in Proc. 23rd 7-14 March, 2003.
Picture Coding Symposium, Saint-Malo, France, April 2003, pp. 23-28. [35] K. P. Lim, S. Wu, D. J. Wu, S. Rahardja, X. Lin, F. Pan and Z. G. Li,
[11] Anthony Joch, Faouzi Kossentini and Panos Nasiopoulos, “A “Fast inter mode selection,” JVT of ISO/IEC MPEG & ITU-T VCEG,
Performance analysis of the ITU-T draft H.26L video coding standard,” San Diego, USA, 2-5 Sep., 2003.
Proc. 12th Int. Packet Video Workshop, Pittsburgh, PA, April 2002 [36] JVT, “Test model software,” http://bs.hhi.de/~suehring/tml/download
[12] Mohammed Ghanbari, Standard Codecs: Image Compression to [37] Mohammed Ghanbari, Standard Codecs: Image Compression to
Advanced Video Coding. London, UK: IEE, 2003, ch. 9. Advanced Video Coding. London, UK: IEE, 2003, Appendix E.
[13] Henrique S. Malvar, Antti Hallapuro, Marta Karczewicz and Louis [38] Ye-Kui Wang, M.M. Hannuksela, V. Varsa, A. Hourunranta and M.
Kerofsky, “Low-complexity transform and quantization in H.264/AVC,” Gabbouj, “The Error Concealment Feature in the H.26L Test Model,” in
IEEE Trans. Circuits Syst. Video Technol., vol. 13, July 2003, pp. 598- Proc. Int. Conf. Image Processing, Sep. 2002, pp. 729-732.
603. [39] Peter Sweeney, Error Control Coding, Chichester, UK: John Wiley &
[14] Ville Lappalainen, Antti Hallapuro, and Timo D. Hamalainen, Sons LTD, 2002
“Complexity of optimized H.26L video decoder implementation,” IEEE
Trans. Circuits Syst. Video Technol., vol. 13, July 2003, pp. 717-725.
[15] Michael Horowitz, Anthony Joch, Faouzi Kossentini and Antti
Hallapuro, “H.264/AVC baseline profile decoder complexity analysis,”
IEEE Trans. Circuits Syst. Video Technol., vol. 13, July 2003, pp. 704-
716.
[16] Mohammed Ghanbari, Standard Codecs: Image Compression to
Advanced Video Coding. London, UK: IEE, 2003, ch. 3.
[17] Iain E.G. Richardson, H.264 and MPEG-4 Video Compression, Video
Coding for Next Generation Multimedia. Chichester, UK: John Wiley &
Sons Ltd, 2003, ch. 6.
[18] Mathias Wien, “Variable block-size transforms for H.264/AVC,” IEEE
Trans. Circuits Syst. Video Technol., vol. 13, July 2003, pp. 604-613.
[19] Detlev Marpe, Heiko Schwarz and Thomas Wiegand, “Context-based
adaptive binary arithmetic coding in the H.264/AVC video compression
standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, July
2003, pp. 620-636.
[20] Peter List, Anthony Joch, Jani Lainema, Gisle Bjontegaard and Marta
Karczewicz, “Adaptive deblocking filter,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, July 2003, pp. 614-619.
[21] Thomas Wedi and Georg Musmann, “Motion- and aliasing-compensated
prediction for hybrid video coding,” IEEE Trans. Circuits Syst. Video
Technol., vol. 13, July 2003, pp. 577-586.
[22] Markus Flierl and Bernd Girod, “Generalized B pictures and the draft
H.264/AVC video-compression standard,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, July 2003, pp. 587-597.
[23] Stephan Wenger, “H.264/AVC over IP,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, July 2003, pp. 645-656.
[24] Thomas Stockhammer, Miska M. Hannuksela and Thomas Wiegand,
“H.264/AVC in wireless environment,” IEEE Trans. Circuits Syst.
Video Technol., vol. 13, July 2003, pp. 657-673.
[25] Bernd Girod and Niko Farber, “Feedback-based error control for mobile
video transmission,” Proceedings of the IEEE, vol. 87, No. 10, October
1999

You might also like