Robust Image Watermarking Based On Generative Adversarial Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

COMMUNICATIONS THEORIES & SYSTEMS

Robust Image Watermarking Based on Generative


Adversarial Network
Kangli Hao, Guorui Feng*, Xinpeng Zhang
School of Communication and Information Engineering, Shanghai University, Shanghai 200444, China
* The corresponding author, email: [email protected]

Abstract: Digital watermark embeds informa- I. INTRODUCTION


tion bits into digital cover such as images and
videos to prove the creator’s ownership of his With the rapid development of multimedia in-
work. In this paper, we propose a robust image formation technology, various electronic prod-
watermark algorithm based on a generative ucts such as smart phones and notebooks have
adversarial network. This model includes two entered each field of life. A large number of
modules, generator and adversary. Generator smart products connect the world, and every-
is mainly used to generate images embedded one can post their work on the web and down-
with watermark, and decode the image dam- load other people’s work, leading to a large
aged by noise to obtain the watermark. Adver- number of images and videos. How to protect
sary is used to discriminate whether the image the copyright of these images and videos has
is embedded with watermark and damage the become an important issue at the moment,
image by noise. Based on the model Hidden which has inspired the scholars’ research inter-
(hiding data with deep networks), we add a est on digital watermark [1][2]. Digital water-
high-pass filter in front of the discriminator, mark embeds a string of information bits into
making the watermark tend to be embedded in a cover which needs to be protected. The in-
the mid-frequency region of the image. Since formation can be a piece of text or a sequence
the human visual system pays more attention of serial numbers. The embedded watermark
to the central area of the image, we give a should be robust enough to prevent tampering
higher weight to the image center region, and or destruction by the third party. The embed-
a lower weight to the edge region when calcu- ded image should be visually similar to the
lating the loss between cover and embedded original cover to maintain its primary value.
image. The watermarked image obtained by In 2014, a deep learning model called gener-
this scheme has a better visual performance. ative adversarial networks (GAN) quickly swept
Experimental results show that the proposed the entire AI circle [3]. In the process of GAN
architecture is more robust against noise training, the generator and the discriminator play
interference compared with the state-of-art against each other, and finally generator can
schemes. generate some images that look like real, which
Keywords: robust image watermark; deep makes the discriminator unable to distinguish
Received: Oct. 31, 2019
learning; generative adversarial network; con- whether the image is machine-generated.
Revised: Mar. 9, 2020
volutional neural network Digital watermark and noise destruction Editor: Jianjun Lei

China Communications • November 2020 131

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
are also a pair of competitors. Based on the not. Depending on the embedding domain, it
In this paper, we pro- idea of GAN, we propose a digital watermark can be divided into the spatial and transform
pose a robust image method, and compare the performance of our domain algorithms. Embedding a watermark
watermark algorithm model with the previous method Hidden [4] based on the spatial domain is a simple way,
based on a generative
(hiding data with deep networks) on the invis- such as embedding watermark into the least
adversarial network.
ibility and robustness of watermark. The rest significant bit of an image [7]. But such meth-
of the paper is organized as follows. Section II ods are not robust enough. The method of
introduces the related work in the field of dig- embedding watermarks based on the transform
ital watermark research. Section III explains domain is more popular in recent years. These
the architecture of our model. The experimen- techniques generally modify the transform do-
tal results and analysis are shown in Section main coefficients of the image and then evenly
IV. Section V summarizes the proposed model. distribute the watermark to various locations
of the image. This type of method can achieve
II. RELATED WORK higher robustness and invisibility. Discrete
cosine transform (DCT) and discrete wavelet
The indicators for measuring the digital wa- transform (DWT) are currently the most wide-
termark method mainly include the robustness ly used [8][9]. Later, Kumsawat et al. [10] pro-
of secret information after being damaged and pose a semi-blind watermark algorithm based
the invisibility of the embedded watermark in on the multi-wavelet transform with two-band,
the image. Watermarking methods can be di- which is robust to attacks such as compres-
vided into traditional methods and neural-net- sion. Bao et al. [11] propose an adaptive em-
works-based methods. Traditional methods bedded watermark scheme for images. They
can be classified regarding to the embedding apply quantization index modulation (QIM) to
domain and the approach [5]. Neural-net- perform singular value decomposition (SVD)
works-based methods can be divided into ear- on image wavelet domain. This scheme is
ly methods and deep-learning-based methods. robust to JPEG compression, but it is difficult
In general, the traditional methods and early to resist attacks such as filtering and random
neural-networks-based methods required a noise. Different from the traditional water-
large amount of prior knowledge and a series mark algorithm based on the two-band wave-
of complex operations, such as preprocessing let transform, which usually embed watermark
and postprocessing. Deep learning provides bits directly into wavelet coefficients, Bi et
an end-to-end thought, which greatly simpli- al. [12] embed the watermark bits in the mean
fies the complexity of the model design. Nu- trend of some middle-frequency sub-images in
merous experimental results show that deep- the wavelet domain. This algorithm is based
learning-based methods perform better than on the multiband wavelet transformation and
traditional and early neural-networks-based the empirical mode decomposition. Liu et al.
methods. To further improve the performance [13] propose a quantization-based image wa-
of deep-learning-based methods and consider termarking method, which takes advantage of
the characteristics of the human visual system, the perfect reconstruction, approximate shift
the proposed method pays more attention to invariance and directional selectivity of the
the central area of the image and weakens the dual-tree complex wavelets. Experimental re-
modification of the central area. sults show that the proposed method is robust
According to whether the original cover to JPEG compression, additive white Gaussian
needs to be accessed when decoding the wa- noise (AWGN) and some geometric attacks.
termark, the digital watermark algorithm is Watermarks can be embedded not only in im-
divided into non-blind watermark and blind ages, but also in media such as audio or video
watermark [6]. The former requires the partici- to protect their copyright. Lei et at. [14] com-
pation of the original cover, and the latter does bine DWT, DCT and SVD to propose an novel

132 China Communications • November 2020

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
audio watermark algorithm. The simulation to attacks such as cropping and Gaussian
results show that the proposed algorithm is ro- blur, but its performance can still be further
bust to conventional speech signal processing improved, and the ability to treat other attacks
methods, such as quantization, MP3 compres- remains to be explored. In this paper, we pro-
sion, and white Gaussian noise addition. pose a robust image watermark method based
A number of methods have been proposed on GAN. Experimental results show that its
for embedding and extracting a watermark in a ability to resist several attacks exceeds the
digital medium using neural networks. In [15], algorithm from [4]. The innovations of this ar-
the binary watermarked image is embedded ticle mainly include the following two points:
into the mid-frequency coefficient of the DCT 1) We add a high-pass filter before the dis-
transform of the cover image. The maximum criminator to improve the sensitivity of the
embedding payload is obtained by the RBF discriminator to high-frequency components
neural network. Mei et al. [16] use an artificial of the image. This makes the model more
neural network (ANN) to model the human inclines to embed the watermark into the
visual system (HVS) to determine the degree mid-frequency region of the image, which can
of modification of the DCT coefficients, and balance the robustness and invisibility of the
then adaptively embed the watermark bits watermarked information.
into the DCT coefficients. The results show 2) According to the feature that the vision
that ANN can simulate HVS well, and the system pays more attention to the central re-
watermark strength calculated is much larger gion of the image, we strengthen the penalty
than the traditional method. Kang et al. [8] use of pixel modification in the central region.
neural networks as a part of the encoder in the This ensures the visual similarity of the image
model. In recent years, deep learning has de- before and after embedding the watermark,
veloped rapidly and demonstrated its powerful and improves the invisibility of the watermark
performance in some directions. In the field of in the image.
information hiding, deep learning is initially
applied to the field of steganalysis and steg- III. DIGITAL WATERMARK BASED ON
anography. Qian et al. [17] propose a custom- GAN
ized convolutional neural network (CNN) for
steganalysis. The proposed model can capture
3.1 Overall architecture of our
the complex dependencies. Feature extraction
framework
and classification steps are unified in a single
framework, which means that classification The proposed model includes two modules,
information can be used in feature extraction generator and adversary. The overall frame-
steps. Volkhonskiy et al. [18] first propose a work of our model is shown in Figure 1. Gen-
new model for generating image-like contain- erator includes the encoder and the decoder.
ers based on deep convolutional generative The input of the encoder is the image cover X
adversarial networks (DCGAN). The image and the watermark M in to be embedded, and
generated by this method is more suitable as the output is the image Y embedded water-
a cover of steganography. According to our mark. The input of the decoder is the image N
knowledge, the formal watermarking method being damaged by the noise layer. The input
based on deep learning was first proposed in image X sized (H × W) and the noise image
2017. Mun et al. [19] use CNN as part of the N (H ' × W ') is not necessarily equal. The out-
decoder. In [4], a digital watermark algorithm put is decoded watermark M out . Adversary
based on adversarial samples is proposed. includes noise layer and discriminator. The
Both encoder and decoder are based on CNN, input of noise is the original cover X and the
which further improves the robustness of the image Y embedded watermark. The output is
watermark algorithm. The algorithm is robust

China Communications • November 2020 133

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
the image damaged by the noise layer. The termarked image. It is used to control the sim-
noise layer we design includes crop, cropout, ilarity of the cover image X and the encoded
Gaussian blur, flip (including horizontal flip image Y, and it also affects the intensity of the
and vertical flip), JPEG compression. The in- watermark embedded in each position in the
put of the discriminator is the original cover image. Generator and adversary play against
X or the image Y with watermark. The output each other during the training process. The
is the probability D ( x ) that the image is a wa- watermarked image generated by our model
can retain most of the information bits after
some noise attacks.
‘˜‡”‹ƒ‰‡
We call Conv-BN-ReLU a group. Conv
stands for a convolutional layer. BN stands for
a batch normalization layer. ReLU stands for a
linear rectification active layer, and the activation
ƒ‰‡‡…‘•–”—…–‹‘‘••𝐿𝐿𝑥𝑥𝑥𝑥
‹•…”‹‹ƒ–‘”‘••𝐿𝐿𝑑𝑑
function is set to ReLU. The size of the convolu-

‡‡”ƒ–‘”‘••𝐿𝐿𝑔𝑔
tion kernel is 3 × 3. The stride is 1. And the pad-
…‘†‡†‹ƒ‰‡
ding is 1. The activation function is set to ReLU.

3.2 The structure of generator


…‘†‡” ‹•…”‹‹ƒ–‘”
‡••ƒ‰‡𝑀𝑀 𝑖𝑖𝑖𝑖
ǫ
‡••ƒ‰‡ ‘˜‡”‘”…‘†‡†
‡…‘•–”—…–‹‘ In the proposed model, the generator module
‘••𝐿𝐿𝑚𝑚
includes components such as encoder and
Decoded
message 𝑀𝑀𝑜𝑜𝑜𝑜𝑜𝑜
‡…‘†‡” ‘‹•‡ decoder. Encoder adopts model [4] and the
structure is shown in Figure 2. The input is the

‡‡”ƒ–‘” †˜‡”•ƒ”›
original cover X with the shape C × H × W, and
The model includes two modules, generator and adversary. Generator includes the encoder and the
decoder. Adversary includes noise layer and discriminator. The discriminator is used to determine the watermark bit M in with length L.
whether the input image is a watermarked image. During the training process, the generator plays
1) The input is the cover X. Through
against it, the goal is to reduce loss function Lg. This operation makes the generated encoded image
closer to the cover group1-group4, each convolution kernel has
a depth of 64. The output is 64 feature maps,
Fig. 1. Overall architecture of our framework. and the size is H × W.
2) The input is information bit M in with
length L. M in is replicated H × W times. The
‘˜‡”‹ƒ‰‡ L × H × W information bits, the 64 feature maps
𝐶𝐶 × 𝐻𝐻 × generated in the previous step, and the original
ƒ‰‡™‹–Š cover X are concatenated. This ensures that
‘˜‘Ž—–‹‘ƒŽ
3 × 3 × 𝐶𝐶 × 64
™ƒ–‡”ƒ”
the watermarked information is distributed
𝐶𝐶 × 𝐻𝐻 ×
‰”‘—’ͳ  to the entire image and will not be removed
‡— due to noise such as cropping. The output
‘˜‘Ž—–‹‘ƒŽ
‰”‘—’ʹ
1 × 1 × 64 × 𝐶𝐶
is an middle representation with the shape
‰”‘—’͵
‡—
( L + C + 64 ) × H × W.
‘˜‘Ž—–‹‘ƒŽ
3 × 3 × 64 × 64  3) Through group5, the depth of the con-
‰”‘—’Ͷ ‰”‘—’ͷ
 ‘˜‘Ž—–‹‘ƒŽ volution kernel is 64. The output is an middle
‡— 3 × 3 × 𝐿𝐿 + 𝐶𝐶 + 64 × 64 representation with the shape 64× H × W.

4) Through the final convolutional layer,
𝐿𝐿 × 1 × 1 𝐿𝐿 × 𝐻𝐻 × the kernel has a size of 1×1, step size of 1,
𝑀𝑀𝑖𝑖𝑖𝑖 no padding, and depth of C. The output is an
The input is the original cover X and watermark bit M in. The output is a watermarked image Y which has been embedded with wa-
image Y. ⊕ means concatenating all input feature layers together
termark, with the shape C × H × W.
Decoder refers to the model [4] and the
Fig. 2. The structure of encoder.

134 China Communications • November 2020

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
structure is shown in Figure 3. The input is
a noise image N with the shape C × H ' × W ',
‘‹•‡†‹ƒ‰‡
which carries watermarked information. H '
and W ' are the height and width of the noise 𝐶𝐶 × 𝐻𝐻 × 𝑊𝑊 ′

image, respectively. Sometimes, it is not the ‘˜‘Ž—–‹‘ƒŽ


same size as the cover because it may face 3 × 3 × 𝐶𝐶 × 64
‰”‘—’ͳ 𝑀𝑀𝑜𝑜𝑜𝑜𝑜𝑜
cropping attack. 
1) The input is the noise image N. Through ‡—
‹‡ƒ”
Žƒ›‡”ሺ𝐿𝐿 ×ሻ
group1-group7, each of the kernels has a depth
‰”‘—’ʹ
of 64. The output is 64 feature maps, and the
Ž‘„ƒŽ•’ƒ–‹ƒŽ


size is H ' × W '. ‰”‘—’͸ ƒ˜‡”ƒ‰‡’‘‘Ž‹‰

2) Through group8, the depth of the kernel ‘˜‘Ž—–‹‘ƒŽ ‡—


is L. The output is the middle representation 3 × 3 × 64 × 64
 ‰”‘—’ͺ
with the shape L × H ' × W '. ‰”‘—’͹ 
‘˜‘Ž—–‹‘ƒŽ
3) The middle representation with the shape ‡— 3 × 3 × 64 ×
L × H ' × W ' is globally averaged pooling. Then
through the final linear layer with the shape The input is a noise image N. The output is decoded information bit M out. Global
L × L. The output is the decoded information spatial average pooling refers to summing all the points of each feature map and then
averaging to obtain a value. The Linear layer is the fully connected layer
bit M out with the length L.

3.3 The structure of discriminator


Fig. 3. The structure of decoder.
In the proposed model, the adversary module
includes the discriminator and noise. The in-
put of the discriminator is the original cover X
or the watermarked image Y. Both shapes are
C × H × W. The structure of discriminator is ƒ‰‡™‹–Š
shown in Figure 4. ‘˜‡”‹ƒ‰‡ ™ƒ–‡”ƒ”

1) The input is the image x. The input im-


𝐶𝐶 × 𝐻𝐻 × or 𝐶𝐶 × 𝐻𝐻 ×
age passes through the high pass filter F, and
then we normalize the filtered data. The output
‹‰ŠǦ’ƒ•• ‹‰ŠǦ’ƒ••
is a high-frequency component of the image x ˆ‹Ž–‡” ˆ‹Ž–‡”
such as the edge information of the object. The
parameters of the filter are referenced from ‘˜‘Ž—–‹‘ƒŽ
[17], which defines as follows: 3 × 3 × 𝐶𝐶 × 64
‰”‘—’ͳ
 -1 2 -2 2 -1  
 
2 -6 8 -6 2  ‡—
1
F =  -2 8 -12 8 -2 .(1) ‰”‘—’ʹ
12  
2 -6 8 -6 2 

‰”‘—’Ͷ
 -1 2 -2 2 -1 


Ž‘„ƒŽ•’ƒ–‹ƒŽ
2) Through group1-group3, each of the ker- ƒ˜‡”ƒ‰‡’‘‘Ž‹‰

nels has a depth of 64. The output is 64 feature ‹‡ƒ”Žƒ›‡”

maps with the size H × W. 𝐷𝐷 𝑥𝑥


3) Then perform globally averaged pooling
The input is original cover X or the watermarked image Y. The output
to the obtained feature maps. The output is 64
is the probability D ( x ) that the image x carries watermark bits
pooling values.
4) The 64 values pass through the final fully
Fig. 4. The structure of discriminator.

China Communications • November 2020 135

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
connected layer with shape 64 ×1. The output bedded. In order to constrain the bit-wise sim-
is the probability D ( x ) that the input x is a ilarity of the original watermark M in and the
watermarked image. decoded watermark M out, we set loss function
After the convolution with the high pass Lm based on the L2 distance:
filter, the high-frequency components of both
cover and watermarked image are fed to the
(
Lm M in , M=
out
)
|| M in − M out ||22 / L.(2)
discriminator, which is convenient for the Moreover, to constrain the similarity of
discriminator to distinguish the high-frequen- each element of the cover image X and the
cy components of the two types of images. watermarked image Y, we set the loss function
Moreover, in the training process, the gener- Lxy based on the L2 distance. Since the human
ator and the discriminator are in a competing
visual system first focuses on the central re-
relationship. In order to escape the detection
gion of the image, when designing the loss
of the discriminator, the model is more willing
function, we give the image center region a
to embed the watermarked information into
higher cost value and the edge region a lower
the middle frequency region of the image. The
cost value,
structure of decoder in the model is similar to
H W
discriminator. In general, the discriminator is
xy ( X , Y )
L= ∑∑ | m ( x ij ij − yij ) |2 / ( C × H × W ).
more likely to find disturbances in the low fre- =i 1 =j 1

quency part of the image. Therefore, it is easy .(3)


for the decoder to extract the watermarked Mask mij is defined as:
information embedded in the image middle
frequency, thereby improving the accuracy of 
 ε
(H − H ) / 2 < i < (H + H ) / 2
'' ''

the decoding information M out. mij = 



(W − W ) / 2 < j < (W + W ) / 2 .(4)
'' ''

The input of the noise layer is the original


1 otherwise
cover X and the watermarked image Y. Both
shapes are C × H × W. A series of perturbations where H '' × W '' is the size of the central area of
are made to the watermarked image Y. The interest. ε is the cost value of the central area,
noise we have designed are as follows: and the cost value of the edge area is 1.
1) Crop: The image is randomly cropped to Figure 5 compares the mean square error
the size H ' × W '. between the original cover and the water-
2) Cropout: A region is randomly select- marked image generated by different methods.
ed from the encoded image Y with the size One is the global cost value is the same, i.e.
H ' × W ' . Except for this region, other pixel ε =1 and the other is cost value of the central
values in Y are replaced with the values in the region enhancement. In our experiment, we
original cover. The image size does not change set ε =1.5. From the above figure, we can con-
after the noise layer. clude when the cost value of the central region
3) Gaussian blur: Blur the image Y with the is increased, the probability that the water-
Gaussian kernel. mark is embedded in the center of the image
4) Flip: Include horizontal flip and vertical becomes lower, so that the loss to the central
flip. region of the image is reduced, thereby ensur-
5) JPEG compression: The quality factor is Q. ing visual effect after embedding watermark.
In order to ensure the visual similarity
3.4 Loss function between the image X and the image Y, we
The purpose of our model is to generate a make the generator and discriminator playing
watermarked image which can oppose several against each other. We set the loss function Lg
types of noise attacks and maintain the quality based on the original GAN:
of the image after the watermark bits are em-
, Y ) log (1 − D (Y ) ).(5)
Lg ( X=

136 China Communications • November 2020

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
For the training of generator, we set dif- ment is NVIDIA 1080Ti.
ferent weights θ m , θ xy , θ g for the three loss Our model is compared with Hidden (pro-
functions of Lm, Lxy, Lg to control the trade-off gramming environment is python). In the pro-
between the robustness and invisibility of the cess of network training, we need to spend an
watermark. average of 5.34 hours, whereas Hidden takes
2 hours. We spend more time because we need
L =θ m Lm + θ xy Lxy + θ g Lg.(6) to do high-pass filtering on each image and
For the optimization of discriminator, we give higher weight to the loss of the central
set the loss function Ld: area. The indicators include the invisibility
and robustness of the watermark.
Ld ( X , Y ) =log (1 − D ( X ) ) + log ( D (Y ) ).(7)
We design two groups of noise attacks. One
group is: crop (H=' H × 0.2, W=' W × 0.25),
IV. EXPERIMENTAL RESULTS c r o p o u t ( H=' H × 0.55 , W=' W × 0.6 ) ,
Gaussian blur (σ = 2.0), vertical flip, JPEG
In our experiments, we use COCO [20] data- compression (Q = 70). The other group is:
sets for training, testing and validation. We crop (H=' H × 0.2, W=' W × 0.15), cropout
selected 10,000 images for training, 1,000 (H=' H × 0.4, W=' W × 0.4), Gaussian blur
images for testing and another 1,000 for val- (σ = 3.0), horizontal flip, JPEG compression
idation. All images are cropped to the size of (Q = 50).
64 × 64. We randomly select a string of binary
4.1 Invisibility
bit data with length L = 30 as the watermark to
be embedded. Figure 6 shows the watermarked images gen-
We use the Adam [21] optimization algo- erated by our model after for different noise
rithm to train the model, with learning rate attacks, and the watermarked images generat-
10-3. The weight parameter of the generator ed by Hidden through corresponding training
loss function is 10-3, θ m =1, θ xy =0.7, θ g =0.001. process. As can be seen from Figure 6, the
The image center area cost value is ε =1.5. All watermarked image generated by our model is
models are trained with the batch size of 48. visually distinct from the original cover.
The models are trained for 300 epochs. For For image quality assessment, we choose
each mini-batch, we alternately update the structural similarity (SSIM [22]). By com-
weight parameters of the encoder, decoder, paring with the original cover, we obtain the
and discriminator. The experimental environ- average SSIM of the watermarked image gen-

cropout(0.55, 0.6) crop(0.2, 0.25)


0.002 0.002

0.0015 0.0015
MSE

MSE

0.001 0.001

0.0005 0.0005

0 0
1 3 5 7 9 11 13 15 17 19 21 1 3 5 7 9 11 13 15 17 19 21
Number of experiments Number of experiments

ε=1.5 ε=1 ε=1.5 ε=1

Fig. 5. The mean square error (MSE) comparison between the original cover and the watermarked image
generated by different cost value. The red line corresponds to ε =1.5. The blue line corresponds to ε =1.

China Communications • November 2020 137

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
erated by Hidden and the proposed method 4.2 Robustness
under different attacks. As can be seen from
We test the average bit-wise error rate of the
Table I, our model performs better in most of
decoded watermark bits under two groups of
cases. When against Gaussian blur (3.0), Hid-
attacks. Table II shows the average bit-wise er-
den’s SSIM is 0.987, and our model is 0.990.
ror rate of Hidden and our model under differ-
It improves by 0.003. Therefore, compared
ent attacks. It can be seen that under the attack
with the existing method, the performance of
of crop, cropout, Gaussian blur, flipping and
our model is better.
JPEG compression five kinds of noises, the
average bit-wise error rate of decoded infor-

Gaussian JPEG
Cover Crop Cropout Flip
blur compression

Hidden

The
proposed
model

Fig. 6. A cover image and watermarked images from both Hidden and our model. From left to right are: cover, crop (0.2, 0.25), cropout
(0.55, 0.6), Gaussian blur (σ = 2.0), vertical flip, JPEG compression (Q=70).

Table I. SSIM value between cover and encoded image under different attacks.
Group 1
Crop(0.2, 0.25) Cropout(0.55, 0.6) Vertical flip Gaussian Blur(2.0) JPEG compression (Q=70)
Hidden 0.974 0.988 0.986 0.988 0.974
Our model 0.982 0.988 0.989 0.990 0.979
Group 2
Crop(0.2, 0.15) Cropout(0.40, 0.40) Horizontal flip Gaussian Blur(3.0) JPEG compression (Q=50)
Hidden 0.976 0.986 0.988 0.987 0.977
Our model 0.980 0.989 0.989 0.990 0.982

Table II. Bit-wise error rate of different methods under different attack.
Group 1
Crop(0.2, 0.25) Cropout(0.55, 0.6) Vertical flip Gaussian Blur(2.0) JPEG compression (Q=70)
Hidden 0.026 0.029 0.007 0.004 0.148
Our model 0.010 0.014 0.001 0.001 0.086
Group 2
Crop(0.2, 0.15) Cropout(0.40, 0.40) Horizontal flip Gaussian Blur(3.0) JPEG compression (Q=50)
Hidden 0.028 0.041 0.009 0.010 0.152
Our model 0.017 0.034 0.001 0.005 0.101

138 China Communications • November 2020

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
mation from our model is lower than Hidden. V. CONCLUSION
When against crop (0.2, 0.25), Hidden’s error
rate is 0.026, and our model is 0.010. It drops In this paper, we propose a robust image
by 0.016. When against cropout (0.4, 0.4), watermark algorithm based on generative
Hidden’s error rate is 0.041, and our model adversarial network. Convolutional neural
is 0.034. It drops by 0.007. Compared with network is easy to detect the modification
Hidden, in our model, the error rate of decoder of the low-frequency part of the image, but
extracting watermarks is significantly reduced HVS is sensitive to the modification of the
under different attacks. This is because we low-frequency region. Combined with the
prefer to embed the watermark into the middle characteristics, we adopt a compromise meth-
frequency region of the image, making it easi- od and choose to embed the watermark into
er for the decoder to extract the watermark. the middle frequency region of the image.
Because the human eye pays more attention to
4.3 Analysis
the central area of the image, we set different
Unlike the general classifier, the discriminator cost value for different areas of the image. The
of our model adds a predefined high-pass filter experimental results show that the proposed
in front of the convolutional layer for filtering. method has a good performance in terms of
Therefore, after the generator competes with invisibility and robustness of the watermark.
the discriminator, the generated watermarked
image tends to be consistent with the original ACKNOWLEDGEMENT
cover at high frequency coefficients. Because
the human eye is sensitive to the noise of the This work was supported by the National Nat-
low frequency part, if some low frequency ural Science Foundation of China under Grants
coefficients in the image are modified, this 62072295, 61525203, U1636206, U1936214
area will become significant in the neighbor- and Natural Science Foundation of Shanghai
hood region, which is easily detected by the under Grant 19ZR1419000.
discriminator. In our model, the watermark
References
is more likely to be embedded in the image’s
[1] M. Kutter, S. K. Bhattacharjee, and T. Ebrahimi,
middle frequency to fill the invisibility and “Towards second generation watermarking
robustness of the watermark. In the proposed schemes,” in Proceedings 1999 Internation-
model, the structure of the decoder is similar al Conference on Image Processing (Cat.
99CH36348), vol. 1, 1999, pp. 320– 323, IEEE.
to the discriminator. [2] M. Lesk, “The good, the bad, and the ugly:
Similar to the HVS, general classifiers eas- What might change if we had good drm,” IEEE
ily detect changes in the low frequency area security & privacy, vol. 1, no. 3, 2003, pp. 63–66.
[3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B.
of the image. The high-frequency part of an Xu, D. Warde-Farley, S. Ozair, A. Courville, and
image is vulnerable to compression and quan- Y. Bengio, “Generative adversarial nets,” in Ad-
tization attacks. In order to make the decoder vances in neural information processing systems,
2014, pp. 2672–2680.
more accurately detect the watermark em- [4] J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei,
bedded in the image, we prefer to embed the “Hidden: Hiding data with deep networks,” in
watermark into the middle frequency region of Proceedings of the European Conference on
Computer Vision (ECCV), 2018, pp. 657–672.
the image. The experimental results show that [5] H. Sadreazami and M. Amini, “A robust image
embedding the watermark bits into the middle watermarking scheme using local statistical dis-
frequency region of an image is beneficial to tribution in the contourlet domain,” IEEE Trans-
actions on Circuits and Systems II: Express Briefs,
the decoder to extract the information and re- vol. 66, no. 1, 2018, pp. 151–155.
duces the average bit-wise error rate of M out. [6] J. R. Hernandez, F. Perez-Gonzalez, J. M. Rodri-
guez, and G. Nieto, ´ “Performance analysis of a
2-d-multipulse amplitude modulation scheme
for data hiding and watermarking of still imag-
es,” IEEE Journal on Selected areas in Communi-

China Communications • November 2020 139

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.
cations, vol. 16, no. 4, 1998, pp. 510–524. Girshick, J. Hays, P. Perona, D. Ramanan, C. L.
[7] R. G. Van Schyndel, A. Z. Tirkel, and C. F. Os- Zitnick, and P. Dollar, “Microsoft coco: Common
borne, “A digital watermark,” in Proceedings of objects in context,” 2014.
1st International Conference on Image Process- [21] D. P. Kingma and J. Ba, “Adam: A method for
ing, vol. 2, 1994, pp. 86–90, IEEE. stochastic optimization,” arXiv preprint arX-
[8] X. Kang, J. Huang, Y. Q. Shi, and Y. Lin, “A dwt- iv:1412.6980, 2014.
dft composite watermarking scheme robust to [22]  Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Si-
both affine transform and jpeg compression,” moncelli, “Image quality assessment: from error
IEEE transactions on circuits and systems for vid- visibility to structural similarity,” IEEE transac-
eo technology, vol. 13, no. 8, 2003, pp. 776–786. tions on image processing, vol. 13, no. 4, 2004,
[9] S. D. Lin and C.-F. Chen, “A robust dct-based pp. 600–612.
watermarking for copyright protection,” IEEE
Transactions on Consumer Electronics, vol. 46, Biographies
no. 3, 2000, pp. 415–421.
Kangli Hao, received the B.S.
[10] P. Kumsawat, K. Attakitmongcol, and A. Srikaew,
degree in engineering from
“A new approach for optimization in image wa-
ShangHai University, China, in
termarking by using genetic algorithms,” IEEE
2018. Now she is studying for a
Transactions on Signal Processing, vol. 53, no.
M.S. degree in Shanghai Uni-
12, 2005, pp. 4707–4719.
versity, China. Her current re-
[11] P. Bao and X. Ma, “Image adaptive water-
search interests include image
marking using wavelet domain singular value processing, image analysis and
decomposition,” IEEE transactions on circuits computational intelligence.
and systems for video technology, vol. 15, no. 1,
2005, pp. 96–102. Guorui Feng, received the B.S.
[12] N. Bi, Q. Sun, D. Huang, Z. Yang, and J. Huang, and M.S. degree in computa-
“Robust image watermarking based on multi- tional mathematic from Jilin
band wavelets and empirical mode decomposi- University, China, in 1998 and
tion,” IEEE Transactions on Image Processing, vol. 2001 respectively. He received
16, no. 8, 2007, pp. 1956–1966. Ph.D. degree in electronic engi-
[13] J. Liu and K. She, “Quantization-based robust neering from Shanghai Ji-
image watermarking using the dual tree com- aotong University, China, 2005.
plex wavelet transform,” China Communications, From January 2006 to Decem-
vol. 7, no. 4, 2010, pp. 1–6. ber 2006, he was an assistant professor in East China
[14]  M. Lei, Y. Yang, X. Liu, M. Cheng, and R. Wang, Normal University, China. During 2007, he was a re-
“Audio zero-watermark scheme based on search fellow in Nanyang Technological University,
discrete cosine transform-discrete wavelet Singapore. Now he is with the school of communica-
transform-singular value decomposition,” Chi- tion and information engineering, Shanghai Universi-
na Communications, vol. 13, no. 7, 2016, pp. ty, China. His current research interests include image
117–121. processing, image analysis and computational intelli-
[15] Z. Zhang, R. Li, and L. Wang, “Adaptive wa- gence.
termark scheme with rbf neural networks,” in
International Conference on Neural Networks Xinpeng Zhang, received the
and Signal Processing, 2003. Proceedings of the B.S. degree in computational
2003, vol. 2, 2003, pp. 1517– 1520, IEEE. mathematics from Jilin Univer-
[16] S. Mei, R. Li, H. Dang, and Y. Wang, “Decision of sity, China, in 1995, and the
image watermarking strength based on artifi- M.E. and Ph.D. degrees in com-
cial neural-networks,” in Proceedings of the 9th munication and information
International Conference on Neural Information system from Shanghai Univer-
Processing, 2002. ICONIP’02., vol. 5, 2002, pp. sity, China, in 2001 and 2004,
2430–2434, IEEE. respectively. Since 2004, he was
[17]  Y. Qian, J. Dong, W. Wang, and T. Tan, “Deep with the faculty of the School of Communication and
learning for steganalysis via convolutional neu- Information Engineering, Shanghai University, where
ral networks,” in Media Watermarking, Security, he is currently a Professor. He is also with the faculty
of the School of Computer Science, Fudan University.
and Forensics 2015, vol. 9409, p. 94090J, Inter-
He was with The State University of New York at
national Society for Optics and Photonics, 2015.
Binghamton as a Visiting Scholar from 2010 to 2011,
[18] D. Volkhonskiy, I. Nazarov, B. Borisenko, and E.
and also with Konstanz University as an experienced
Burnaev, “Steganographic generative adversarial
Researcher, sponsored by the Alexander von Hum-
networks,” arXiv preprint arXiv:1703.05502, 2017.
boldt Foundation from 2011 to 2012. His research in-
[19] S. Mun, S. Nam, H. Jang, D. Kim, and H.-K. Lee,
terests include multimedia security, image process-
“A robust blind watermarking using convo-
ing, and digital forensics. He has published over 200
lutional neural network,” arXiv preprint arX-
papers in these areas. He was an Associate Editor of
iv:1704.03248, 2017.
the IEEE Transactions on Information Forensics and
[20] T. Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Security from 2014 to 2017.

140 China Communications • November 2020

Authorized licensed use limited to: ULAKBIM-UASL - Gumushane University. Downloaded on April 30,2021 at 08:08:03 UTC from IEEE Xplore. Restrictions apply.

You might also like