Fingerprinting Technique For YouTube Videos Identification in Network Traffic
Fingerprinting Technique For YouTube Videos Identification in Network Traffic
Fingerprinting Technique For YouTube Videos Identification in Network Traffic
ABSTRACT Recently, many video streaming services, such as YouTube, Twitch, and Facebook, have
contributed to video streaming traffic, leading to the possibility of streaming unwanted and inappropriate
content to minors or individuals at workplaces. Therefore, monitoring such content is necessary. Although
the video traffic is encrypted, several studies have proposed techniques using traffic data to decipher users’
activity on the web. Dynamic Adaptive Streaming over HTTP (DASH) uses Variable Bit-Rate (VBR) - the
most widely adopted video streaming technology, to ensure smooth streaming. VBR causes inconsistencies
in video identification in most research. This research proposes a fingerprinting method to accommodate for
VBR inconsistencies. First, bytes per second (BPS) are extracted from the YouTube video stream. Bytes per
Period (BPP) are generated from the BPS, and then fingerprints are generated from these BPPs. Furthermore,
a Convolutional Neural Network (CNN) is optimized through experiments. The resulting CNN is used to
detect YouTube streams over VPN, Non-VPN, and a combination of both VPN and Non-VPN network
traffic.
INDEX TERMS Video identification, fingerprinting, deep learning, classification, variable bitrate.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 76731
W. Afandi et al.: Fingerprinting Technique for YouTube Videos Identification in Network Traffic
optimized model is used for video classification with different is buffered in smaller bursts. These characteristics are also
traffic combinations of Virtual Private Network (VPN) discussed by Rawattu and Balasetty [43]. The On-Off period
and Non-VPN network traffic. The main contributions between each burst is discussed by Rao et al. [44], and
of this study are the answers to the following research Liu et al. [45] leverages these On-Off periods to identify
questions: the streaming video using traditional machine learning
• Can a Sequential Convolutional Neural Network (SCNN) approaches.
be used to identify the video in the network traffic? BPS is an important feature that plays an essential
• How does SDF fingerprinting technique fare with other role in video identification. Khan et al. [21], [46] extracted
fingerprinting methods, and which technique is the most the BPS of a video stream multiple times in different
effective? video qualities and used them as a feature. This feature is
The rest of the paper is organized as follows. Section II used to train different machine learning models, including
presents a summary of previous works and Section III Naïve Bayes, SVMs (Support Vector Machines), and CNN.
presents the method for fingerprint creation to handle the However, extracting the BPS and using it as raw data to
inconsistencies in the network traffic. Section IV presents identify the video in network traffic is not enough to deal
the experimental setup and hyperparameter tuning of CNN. with the irregularities in video identification caused by the
Section V presents the comparison of different finger- VBR.
printing methods and finally Section VI concludes the To address the irregularities of VBR, the study most related
paper. to our work [20] discusses the method of differential finger-
prints. The authors propose an algorithm to aggregate BPS
II. RELATED WORK into several periods. This approach reduces the inconsistency
Hypertext Transfer Protocol Secure (HTTPS) started gaining that occurs due to VBR. However, they only used the feature
attention as Google was one of the earliest adopters of distance measuring technique to predict the queried video.
HTTPS. In contrast to its predecessor, HTTP, it provides Furthermore, the differential fingerprinting method presented
a secure environment and captures the interest of many by the authors ignores the condition of dividing by zero,
researchers to find vulnerabilities in this protocol. Some and the dataset is missing videos streamed over a VPN.
research has been conducted to identify video streaming on a Furthermore, the dataset consists of only Facebook videos
client computer. that stream over 180 seconds on Non-VPN traffic. Based
Chen et al. [22] demonstrated the severity of side-channel on the limitations of previous studies mentioned above, this
attacks even with modern encryption techniques. Certainly paper aims to modify the algorithm proposed in [20] to handle
some hardware-level attacks are possible, as shown in [23]. the cases of zero as the denominator. The dataset in this
Several works have been done for attacking network traffic research contains both VPN and Non-VPN streamed videos,
of Skype to identify user actions [24], [25]. Furthermore, it is and the video stream length is 120 seconds. Furthermore, the
possible to identify the website that is being viewed on the baseline convolutional neural network presented in [21] is
network [26]–[28]. Moreover, user activities can be revealed fine-tuned, and hyperparameters are changed to improve the
by the network traffic [10], [29]–[31]. Private information can accuracy of the results. The accuracy of the baseline model
also be leaked in location-based applications [32]–[35]. WiFi on our dataset is 54.77%.
signals can be sniffed [29], and routers can be hacked to sniff
packets if the adversary is present inside LAN [36]. At first, III. METHODOLOGY
video identification researchers leveraged QoE metrics to This section illustrates the methodology used for data
optimize network bandwidth sharing. Mangla et al. [37] collection, fingerprinting methodologies, and producing a
predict these QoE metrics of video streams by weighing list of predictions through various classifiers as shown in
packet headers in network traffic. Figure 3. The methodology is defined in steps as (a) data
In contrast, [38] uses a set of statistical features that include collection, (b) preprocessing of data, (c) bytes per period,
the quantity and size of the packets to classify the resolution (d) fingerprinting, and (e) summary of neural network.
and bitrate of the video streams. Statistical features are also
used by [39] to identify the flow of video in the network. A. DATA COLLECTION
Gutterman et al. [40] predict quality metrics for YouTube We use Wireshark to capture the network traffic and generate
encrypted videos by exploiting chunk statistics, including packet capture (PCAP) files against each video to generate
chunk length and chunk duration, as well as flow statistics the dataset of video streams. We utilize the Chrome browser
such as flow duration and direction. Chunk statistics are also to play the YouTube videos and Selenium for automation.
leveraged by [41] to identify variable bitrate adaption under We selected 43 random videos from YouTube and each
HTTP and QUIC protocol. video is downloaded 55 times. A desktop client SurfShark
Ameigeiras et al. [42] described a characteristic burst fea- is used for capturing the VPN streams. In conclusion, the
ture in the YouTube network streams. These bursts are of resultant data set consists of 86 total labels - 43 non-VPN
two types: a long burst and a short burst. At the beginning titles and 43 VPN titles and each video stream is captured for
of streaming, there is a long burst, after which the video 120 seconds.
TABLE 1. List of acronyms used in the paper. address of the server, which in this case is YouTube, and
selecting only the downlinks. This process is done for VPN
and Non-VPN data. Thus, it mitigates the streaming noise
of unwanted applications. This is achieved by the integrated
Wireshark filter in the conversation section.
D. FINGERPRINTING
Fingerprinting is a process of representing a large data by
a small bit of string, that uniquely identifies the data in
a process. Particularly, fingerprints are the small labels for
large data [47]. Due to the effectiveness of fingerprinting,
many researchers have effectively utilized the fingerprinting
technique in different scenarios. For instance, fingerprinting
B. PRE-PROCESSING is actively used in application discrimination [48], video
Wireshark exports the captured data in pcap file format. Each identification [20], Web page recognition [49], user activity
generated pcap file contains 120 seconds of the streaming monitoring [25], and mobile application identification [50].
video. As this file contains both uplink and downlink traffic, In our paper, we utilize the fingerprinting technique for
this dataset is cleaned by applying a filter through the IP video identification in the encrypted network traffic. For
this purpose, we created fingerprints of the BPPs of a layer to its neighboring outputs, ultimately reducing the size
video stream. The created fingerprints help to differentiate of the data without losing the key features. This reduction of
individual video streams. After generating a BPP sequence data generalizes the repeating patterns while simultaneously
of a stream, all the 0 in the sequence are replaced with reducing memory requirements.
1 to resolve the zero division problem encountered during After convolutional and max-pooling layers, a dropout
fingerprints creation. For a video consisting of n seconds, layer is added. The dropout layer randomly disables some of
we get a sequence denoted as a = (a1 , a2 , . . . , ai , . . . ). For the inputs of the previous layer to prevent the model from
two adjacent data amounts ai−1 and ai , fingerprints can be learning only a few input values and restrain overfitting. The
generated as r = (r1 , r2 , . . . , ri , . . . ) by applying the one of relative amount of input features are disabled by defining the
the following equations described below: probability value p in the dropout layer.
The dropout layer’s output is passed to the flatten layer,
1) SIMPLE DIFFERENCE FINGERPRINT (SDF) which performs conversion of the multidimensional pooled
Video fingerprint ri can be calculated by subtracting the ith feature map into a one-dimensional vector to make it
term of sequence a with the previous term (i − 1) as shown in compatible with forwarding into the dense layer. The dense
Equation (1): layer is a fully connected layer having all of its neurons
connected with the neurons of the previous layer. The output
ri = ai − ai−1 (1) of the dense layer is passed to a final dense layer, also called
Output layer [51].
2) ABSOLUTE DIFFERENCE FINGERPRINT (ADF) The fingerprint of BPP is a one-dimensional array;
This is a modified form of Equation (1) proposed in [20]. therefore, the input of the proposed CNN model is a
To eliminate the negative values generated by subtracting one-dimension series of BPP with the size of 20. The
ai−1 from ai , we take the absolute of the difference as shown first convolutional layer has a kernel size equal to 5 with
in Equation 2. 300 filters and a stride of 1. A single neuron is connected
to a cluster of five features of the input data. The output of
ri = |ai − ai−1 | (2) the layer is 300 feature maps of size 19. Subsequently, the
first convolutional layer contains 1800 trainable parameters
3) DIFFERENTIAL FINGERPRINT (DF) (1500 weights and 300 bias parameters.) This layer is
Equation (3) is proposed by [20]. In this equation, the followed by a max-pooling layer that consists of a 300 feature
differential of two consecutive periods is calculated as shown map of size 50. Each feature map of this layer is connected
below: to two feature maps of the previous convolutional layer. The
ai − ai−1 max-pooling layer has no trainable parameters.
ri = (3)
ai−1 The second convolutional layer has 512 kernels, each of
size 3, forming a total number of 461,312 parameters that
E. CONVOLUTIONAL NEURAL NETWORK (CNN) MODEL yield 512 feature maps of size 3. The trailing max-pooling
A convolutional neural network (CNN) is a variant of the layer after the second convolutional layer generates 512 fea-
traditional neural network because it can learn directly from ture maps of size 11. The third convolutional layer containing
data without manual feature extraction. A CNN generally 524,800 trainable parameters has 512 kernels of size 1,
consists of convolution layers, pooling layers, and a fully which output 512 feature maps of size 1. Its successive
connected layer. They are mainly used in pattern recognition max-pooling layer generates 512 feature maps of size 1. The
and their architecture makes them a preferred model for last convolutional layer has 300 kernels of size 1, having
object detection in image, voice in audio, natural language 307,500 trainable parameters.
processing (hate speech detection [2]), activity recognition Consequently, the last max-pooling layer produces
(bot detection [11], human activity recognition [10], malware 300 feature maps of size 1. After the last pooling layer,
detection [3]), and classify digital signals. The CNN model a dropout layer is added to disable arbitrary neurons from
designed in this paper comprises four 1D convolutional the previous layer with the probability of 0.8. The dropout
layers, each having ReLU as its activation function. Each layer is followed by a flatten layer that converts the pooled
convolutional layer employs distinct kernels (also called feature maps of the previous layer into a one-dimensional
filters) that independently convolve the input data and feature size of 3,300. The output layer, the last layer in the
produce a feature map as the output. The kernel size is model, contains 141,743 trainable parameters for 43 labels in
assigned a small number relative to the input size. The the dataset.
smaller kernel size helps the model learn more feature maps The activation function selected for all the convolutional
and improve the overall prediction accuracy. The generated layers in this model is the ReLU function. The softmax
feature map is passed through an activation function (ReLU function is assigned as the activation function for the output
in our case) and passes to the pooling layers. layer. The ReLU function is quite simple as it outputs the
The max-pooling layers separate the four convolutional input directly if it is a positive number. However, it outputs
layers. A pooling layer summarizes the result of the previous a zero in the case of a negative number. The softmax function
is a generalization of logistic regression to handle multiple TABLE 2. Fine tuned CNN model summary.
classes. For N output classes, it normalizes an N-dimensional
vector of actual values to an N-dimensional probability
distribution vector of actual values in the range [0,1]. The
N-dimensional output vector is the probabilistic score of each
corresponding class.
The cost function measures the difference between the
model’s prediction with the actual output and returns an
error value. This error rate helps the model determine how
much more optimization is needed. The cost function selected
for the proposed model is categorical cross-entropy. The
optimization function, which assigns optimal weights to
neurons in each layer, is the Adam optimizer. The complete
architecture of the CNN model is shown in Figure 4
The softmax activation function is used for the dense layer
and adam optimizer is used for model optimization. The 6GB GPU memory. The experiment setup includes changing
model is trained on three types of datasets: SDF, ADF, and various hyperparameter values, including the number of
DF. The model summary is presented in the Table 2. filters, kernel sizes, pool size, adding another layer, dropout
ratio, batch size, and the number of epochs. Table 3 illustrates
IV. EXPERIMENTAL SETUP AND MODEL FINE TUNING the summary of each experiment.
The experiments performed in this paper are heavily based on A series of experiments are performed on each convolu-
a Graphics Processing Unit (GPU). Therefore, all the experi- tional layer of the baseline model presented in [21]. In each
ments are conducted on an Intel Core i7 processor @ 3.4GHz experiment, several hyperparameters of the respective layer
with 16GB RAM and Nvidia GeForce GTX 1060 with are changed. In the first experiment, we change the number
B. EXPERIMENT #2 TUNING 2nd CONVOLUTIONAL LAYER C. EXPERIMENT #3 TUNING 3rd CONVOLUTIONAL LAYER
After deducing the values of the first layer, this experiment In the continuation of Experiment #1 and Experiment #2, this
is performed to tune the values of the second layer. The experiment is performed to fine-tune the third convolutional
same procedure is followed as in Experiment #1. In this layer of the CNN. The experiment highlights that the value
experiment, we use various filters and kernel sizes. However, 512 of filters is the most suitable for this layer. Changing
in the case of filters, the settings of the baseline model this value decreases the accuracy. The size of the kernel has
provide higher accuracy; increasing or decreasing the number a positive impact on the accuracy of the model. Reducing
of filters results in a decrease in accuracy. On the contrary, the kernel size to the minimum, that is, 1, increases the
FIGURE 11. Accuracy comparison of various settings applied to third convolutional layer.
As the proposed framework works on the known videos, the [10] M. U. S. Khan, A. Abbas, M. Ali, M. Jawad, and S. U. Khan,
model must be trained on a large dataset, limiting the scope ‘‘Convolutional neural networks as means to identify apposite sensor
combination for human activity recognition,’’ in Proc. IEEE/ACM Int.
of implementation. Conf. Connected Health, Appl., Syst. Eng. Technol., Sep. 2018, pp. 45–50.
Moreover, there are some shortcomings of this technique. [11] S. Mohammad, M. U. S. Khan, M. Ali, L. Liu, M. Shardlow, and
The framework is set back by the phenomenon of ’concept R. Nawaz, ‘‘Bot detection using a single post on social media,’’ in Proc. 3rd
World Conf. Smart Trends Syst. Secur. Sustainablity (WorldS), Jul. 2019,
drift’. Hence, the proposed model requires a substantial pp. 215–220.
amount of computational and space requirements at the [12] O. Jogunola, B. Adebisi, K. V. Hoang, Y. Tsado, S. I. Popoola,
observer’s end, thus creating a challenge for large-scale M. Hammoudeh, and R. Nawaz, ‘‘CBLSTM-AE: A hybrid deep learning
framework for predicting energy consumption,’’ Energies, vol. 15, no. 3,
deployment. Moreover, detection is only possible if the video p. 810, Jan. 2022.
is streamed exactly from the start to the first 120 seconds. [13] S.-U. Hassan, M. Shabbir, S. Iqbal, A. Said, F. Kamiran, R. Nawaz, and
Changing the video runtime between the first 120 seconds U. Saif, ‘‘Leveraging deep learning and SNA approaches for smart city
policing in the developing world,’’ Int. J. Inf. Manage., vol. 56, Feb. 2021,
can lead to abnormal predictions Art. no. 102045.
[14] H. Waheed, M. Anas, S.-U. Hassan, N. R. Aljohani, S. Alelyani,
VII. CONCLUSION AND FUTURE WORK E. E. Edifor, and R. Nawaz, ‘‘Balancing sequential data to predict students
at-risk using adversarial networks,’’ Comput. Electr. Eng., vol. 93,
Irregularities and inconsistencies due to the VBR encoding Jul. 2021, Art. no. 107274.
of the video make it challenging to identify the videos [15] F. Zaman, M. Shardlow, S.-U. Hassan, N. R. Aljohani, and R. Nawaz,
in the network traffic. To address the aforementioned ‘‘HTSS: A novel hybrid text summarisation and simplification architec-
ture,’’ Inf. Process. Manage., vol. 57, no. 6, Nov. 2020, Art. no. 102351.
problem, this paper converts BPSs into BPPs and presents [16] A. Dvir, A. K. Marnerides, R. Dubin, and N. Golan, ‘‘Clustering the
a stable fingerprinting method, SDF. The SDF works on unknown—The YouTube case,’’ in Proc. Int. Conf. Comput., Netw.
the difference between the BPPs to identify the VBR video Commun. (ICNC), Feb. 2019, pp. 402–407.
streamed in encrypted network traffic. The created SDFs [17] A. Dvir, A. K. Marnerides, R. Dubin, N. Golan, and C. Hajaj, ‘‘Encrypted
video traffic clustering demystified,’’ Comput. Secur., vol. 96, Sep. 2020,
are used to train the CNN model. After tuning the model’s Art. no. 101917.
hyperparameters, the model achieves an accuracy of 90% and [18] A. Reed and M. Kranch, ‘‘Identifying HTTPS-protected Netflix videos in
99% in predicting videos and classifying traffic, respectively. real-time,’’ in Proc. 7th ACM Conf. Data Appl. Secur. Privacy, Mar. 2017,
pp. 361–368.
Additionally, the effects of variable period length on the [19] R. Schuster, V. Shmatikov, and E. Tromer, ‘‘Beauty and the burst: Remote
model’s prediction accuracy are yet to be analyzed. We aim identification of encrypted video streams,’’ in Proc. 26th Secur. Symp.,
to modify the technique to cope with the concept drift 2017, pp. 1357–1374.
[20] J. Gu, J. Wang, Z. Yu, and K. Shen, ‘‘Walls have ears: Traffic-based
problem in the future. Observing the effect of variable period side-channel attack in video streaming,’’ in Proc. IEEE Conf. Comput.
length and finding the optimal value will make this tech- Commun., Apr. 2018, pp. 1538–1546.
nique more foolproof and increase the practical deployment [21] M. U. S. Khan, S. M. A. H. Bukhari, S. A. Khan, and T. Maqsood, ‘‘ISP
can identify YouTube videos that you just watched,’’ in Proc. Int. Conf.
applications. Frontiers Inf. Technol. (FIT), Dec. 2021, pp. 1–6.
[22] S. Chen, R. Wang, X. Wang, and K. Zhang, ‘‘Side-channel leaks in web
REFERENCES applications: A reality today, a challenge tomorrow,’’ in Proc. IEEE Symp.
Secur. Privacy, Jan. 2010, pp. 191–206.
[1] U. Cisco. (2020). CISCO Annual Internet Report (2018–2023) White [23] J. Han, C. Qian, P. Yang, D. Ma, Z. Jiang, W. Xi, and J. Zhao, ‘‘GenePrint:
Paper. Accessed: Dec. 15, 2021. [Online]. Available: https://www. Generic and accurate physical-layer identification for UHF RFID tags,’’
cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual- IEEE/ACM Trans. Netw., vol. 24, no. 2, pp. 846–858, Apr. 2016.
internet-report/whitepaper-c11-741490html [24] M. Korczynski and A. Duda, ‘‘Classifying service flows in the encrypted
[2] M. U. S. Khan, A. Abbas, A. Rehman, and R. Nawaz, ‘‘HateClassify: skype traffic,’’ in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2012,
A service framework for hate speech identification on social media,’’ IEEE pp. 1064–1068.
Internet Comput., vol. 25, no. 1, pp. 40–49, Jan. 2020. [25] W. Wang and D. N. Cheng, ‘‘Skype traffic identification based on trends-
[3] M. Khan, D. Baig, U. S. Khan, and A. Karim, ‘‘Malware classification aware protocol fingerprints,’’ in Vehicle, Mechatronics and Information
framework using convolutional neural network,’’ in Proc. Int. Conf. Cyber Technologies II (Applied Mechanics and Materials), vol. 543. Zurich,
Warfare Secur. (ICCWS), Oct. 2020, pp. 1–7. Switzerland: Trans Tech Publications, 2014, pp. 2249–2254.
[4] A. Reed and B. Klimkowski, ‘‘Leaky streams: Identifying variable bitrate [26] X. Gong, N. Kiyavash, and N. Borisov, ‘‘Fingerprinting websites using
DASH videos streamed over encrypted 802.11n connections,’’ in Proc. remote traffic analysis,’’ in Proc. 17th ACM Conf. Comput. Commun.
13th IEEE Annu. Consum. Commun. Netw. Conf. (CCNC), Jan. 2016, Secur., 2010, pp. 684–686.
pp. 1107–1112. [27] X. Cai, X. C. Zhang, B. Joshi, and R. Johnson, ‘‘Touching from a distance:
[5] R. Dubin, O. Hadar, I. Richman, O. Trabelsi, A. Dvir, and O. Pele, ‘‘Video Website fingerprinting attacks and defenses,’’ in Proc. ACM Conf. Comput.
quality representation classification of safari encrypted DASH streams,’’ Commun. Secur., 2012, pp. 605–616.
in Proc. Digit. Media Ind. Acad. Forum (DMIAF), Jul. 2016, pp. 213–216. [28] T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Goldberg, ‘‘Effective
[6] A. Bremler-Barr, Y. Harchol, D. Hay, and Y. Koral, ‘‘Deep packet attacks and provable defenses for website fingerprinting,’’ in Proc. 23rd
inspection as a service,’’ in Proc. 10th ACM Int. Conf. Emerg. Netw. Exp. Secur. Symp., 2014, pp. 143–157.
Technol., Dec. 2014, pp. 271–282. [29] F. Zhang, W. He, X. Liu, and P. G. Bridges, ‘‘Inferring users’ online
[7] S. Miller, K. Curran, and T. Lunney, ‘‘Detection of virtual private network activities through traffic analysis,’’ in Proc. 4th ACM Conf. Wireless Netw.
traffic using machine learning,’’ Int. J. Wireless Netw. Broadband Technol., Secur., 2011, pp. 59–70.
vol. 9, no. 2, pp. 60–80, Jul. 2020. [30] M. Conti, L. V. Mancini, R. Spolaor, and N. V. Verde, ‘‘Analyzing Android
[8] M. U. S. Khan, M. Jawad, and S. U. Khan, ‘‘Adadb: Adaptive diff- encrypted network traffic to identify user actions,’’ IEEE Trans. Inf.
batch optimization technique for gradient descent,’’ IEEE Access, vol. 9, Forensics Security, vol. 11, no. 1, pp. 114–125, Jan. 2015.
pp. 99581–99588, 2021. [31] R. Irfan, O. Khalid, M. U. S. Khan, F. Rehman, A. U. R. Khan, and
[9] K. S. Zaidi, S. Hina, M. Jawad, A. N. Khan, M. U. S. Khan, H. B. Pervaiz, R. Nawaz, ‘‘SocialRec: A context-aware recommendation framework with
and R. Nawaz, ‘‘Beyond the horizon, backhaul connectivity for offshore explicit sentiment analysis,’’ IEEE Access, vol. 7, pp. 116295–116308,
IoT devices,’’ Energies, vol. 14, no. 21, p. 6918, Oct. 2021. 2019.
[32] Z. Zhou, Z. Yang, C. Wu, W. Sun, and Y. Liu, ‘‘LiFi: Line-of-sight WALEED AFANDI is a Research Assistant with
identification with WiFi,’’ in Proc. IEEE Conf. Comput. Commun., the Department of Computer Science, COMSATS
Apr. 2014, pp. 2688–2696. University Islamabad, Abbottabad Campus. His
[33] X. Chen, X. Wu, X.-Y. Li, X. Ji, Y. He, and Y. Liu, ‘‘Privacy-aware high- research interest includes computer security.
quality map generation with participatory sensing,’’ IEEE Trans. Mobile
Comput., vol. 15, no. 3, pp. 719–732, Mar. 2015.
[34] Y. Guo, L. Yang, B. Li, T. Liu, and Y. Liu, ‘‘RollCaller: User-friendly
indoor navigation system using human-item spatial relation,’’ in Proc.
IEEE Conf. Comput. Commun., Apr. 2014, pp. 2840–2848.
[35] Q. Ma, S. Zhang, T. Zhu, K. Liu, L. Zhang, W. He, and Y. Liu, ‘‘PLP:
Protecting location privacy against correlation analyze attack in crowd-
sensing,’’ IEEE Trans. Mobile Comput., vol. 16, no. 9, pp. 2588–2598,
Sep. 2016. SYED MUHAMMAD AMMAR HASSAN
[36] M. Conti, N. Dragoni, and V. Lesyk, ‘‘A survey of man in the middle BUKHARI is a Senior Research Assistant with
attacks,’’ IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 2027–2051, the Department of Computer Science, COMSATS
3rd Quart., 2016. University Islamabad, Abbottabad Campus. His
[37] T. Mangla, E. Halepovic, M. Ammar, and E. Zegura, ‘‘Using session research interest includes computer security.
modeling to estimate HTTP-based video QoE metrics from encrypted
network traffic,’’ IEEE Trans. Netw. Service Manage., vol. 16, no. 3,
pp. 1086–1099, Sep. 2019.
[38] S. Wassermann, M. Seufert, P. Casas, L. Gang, and K. Li, ‘‘Let me
decrypt your beauty: Real-time prediction of video resolution and bitrate
for encrypted video streaming,’’ in Proc. Netw. Traffic Meas. Anal. Conf.
(TMA), Jun. 2019, pp. 199–200.
[39] Y. Liu, S. Li, C. Zhang, C. Zheng, Y. Sun, and Q. Liu, ‘‘ITP-KNN: MUHAMMAD U. S. KHAN (Member, IEEE)
Encrypted video flow identification based on the intermittent traffic pattern received the Ph.D. degree in electrical and com-
of video and K-nearest neighbors classification,’’ in Proc. Int. Conf. puter engineering at North Dakota State Univer-
Comput. Sci. Cham, Switzerland: Springer, 2020, pp. 279–293. sity, USA, in 2015. He is an Assistant Professor
[40] C. Gutterman, K. Guo, S. Arora, X. Wang, L. Wu, E. Katz-Bassett, and with COMSATS University Islamabad, Abbot-
G. Zussman, ‘‘Requet: Real-time QoE detection for encrypted YouTube tabad Campus. His research interests include
traffic,’’ in Proc. 10th ACM Multimedia Syst. Conf., Jun. 2019, pp. 48–59. data science, artificial intelligence, and computer
[41] S. Xu, S. Sen, and Z. M. Mao, ‘‘CSI: Inferring mobile ABR video security.
adaptation behavior under HTTPS and QUIC,’’ in Proc. 15th Eur. Conf.
Comput. Syst., Apr. 2020, pp. 1–16.
[42] P. Ameigeiras, J. J. Ramos-Munoz, J. Navarro-Ortiz, and
J. M. Lopez-Soler, ‘‘Analysis and modelling of YouTube traffic,’’ Trans.
Emerg. Telecommun. Technol., vol. 23, no. 4, pp. 360–377, Jun. 2012. TAHIR MAQSOOD is currently an Assistant
[43] R. Ravattu and P. Balasetty, ‘‘Characterization of YouTube video streaming Professor with COMSATS University Islam-
traffic,’’ M.S. thesis, School Comput., Blekinge Inst. Technol., Sweden, abad, Abbottabad, Pakistan. His research interests
2013. Accessed: Jun. 1, 2022. [Online]. Available: https://www.diva- include resource allocation, multi/manycore sys-
portal.org/smash/get/diva2:830691/FULLTEXT01.pdf tems, reliable systems, the Internet of Things, and
[44] A. Rao, A. Legout, Y.-S. Lim, D. Towsley, C. Barakat, and W. Dabbous, mobile edge computing.
‘‘Network characteristics of video streaming traffic,’’ in Proc. 7th Conf.
Emerg. Netw. EXperiments Technol., 2011, pp. 1–12.
[45] Y. Liu, S. Li, C. Zhang, C. Zheng, Y. Sun, and Q. Liu, ‘‘DOOM: A training-
free, real-time video flow identification method for encrypted traffic,’’ in
Proc. 27th Int. Conf. Telecommun. (ICT), Oct. 2020, pp. 1–5.
[46] M. U. S. Khan, S. M. A. H. Bukhari, T. Maqsood, M. A. B. Fayyaz,
D. Dancey, and R. Nawaz, ‘‘SCNN-attack: A side-channel attack to SAMEE U. KHAN (Senior Member, IEEE)
identify YouTube videos in a VPN and non-VPN network traffic,’’ received the Ph.D. degree from the University
Electronics, vol. 11, no. 3, p. 350, Jan. 2022. of Texas, in 2007. He was the Cluster Lead of
[47] A. Z. Broder, ‘‘Some applications of Rabin’s fingerprinting method,’’ in the Computer Systems Research at the National
Sequences II. Cham, Switzerland: Springer, 1993, pp. 143–152. Science Foundation, from 2016 to 2020, and the
[48] M. Korczynski and A. Duda, ‘‘Markov chain fingerprinting to classify Walter B. Booth Professor at North Dakota State
encrypted traffic,’’ in Proc. IEEE Conf. Comput. Commun., Apr. 2014,
University. Currently, he is the James W. Bagley
pp. 781–789.
Chair Professor and the Head of the Department
[49] M. Shen, Y. Liu, S. Chen, L. Zhu, and Y. Zhang, ‘‘Webpage fingerprinting
using only packet length information,’’ in Proc. IEEE Int. Conf. Commun. of Electrical and Computer Engineering with
(ICC), May 2019, pp. 1–6. Mississippi State University (MSU). His work
[50] S. Miskovic, G. M. Lee, Y. Liao, and M. Baldi, ‘‘AppPrint: Automatic has appeared in over 400 publications. His research interests include
fingerprinting of mobile applications in network traffic,’’ in Proc. Int. Conf. optimization, robustness, and security of computer systems. He is an
Passive Act. Netw. Meas. Cham, Switzerland: Springer, 2015, pp. 57–69. Associate Editor of IEEE TRANSACTIONS ON CLOUD COMPUTING and Journal
[51] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521, of Parallel and Distributed Computing.
no. 7553, pp. 436–444, Sep. 2015.