计算机科学 ›› 2019, Vol. 46 ›› Issue (10): 295-398.doi: 10.11896/jsjkx.180801473

汤文亮1, 汤树芳2, 张平2   

  1. (华东交通大学信息工程学院 南昌330013)1
    (华东交通大学软件学院 南昌330013)2
  • 收稿日期:2018-08-09 修回日期:2018-08-27 出版日期:2019-10-15 发布日期:2019-10-21
Research and Improvement of Web Fingerprint Identification Algorithm Based on Cosine Measure

TANG Wen-liang1, TANG Shu-fang2, ZHANG Ping2   

  1. (School of Information Engineering,East China Jiaotong University,Nanchang 330013,China)1
    (School of Software,East China Jiaotong University,Nanchang 330013,China)2
  • Received:2018-08-09 Revised:2018-08-27 Online:2019-10-15 Published:2019-10-21

摘要: 为了在Web指纹数据库中实现对Web指纹的准确识别,需要对Web指纹识别算法进行研究。采用当前识别算法对Web指纹数据库中的Web指纹进行识别时,识别的结果与实际结果之间存在误差、识别所用的时间较长,因此存在识别准确率低和识别效率低的问题。在余弦测度的基础上提出了一种Web指纹识别算法,在结构特征、静态文件、Cookie设计和关键字4个方面采用源码审计方法完成了对Web指纹的选取,建立了Web指纹数据库。首先提取Web指纹数据库中数据的特征,根据特征提取结果剔除Web指纹数据库中存在的异常数据;然后将余弦距离函数当作相似性度量函数,采用K-means算法对Web指纹数据库中的Web指纹进行聚类;最后根据聚类结果完成对Web指纹的识别。实验结果表明,所提方法可在较短的时间内准确地完成对Web指纹数据库中Web指纹的识别,具有识别准确率高和识别效率高的优点。

关键词: Web指纹, 识别算法, 余弦测度

Abstract: In order to realize the accurate identification of Web fingerprints in the Web fingerprint database,it is necessary to study the Web fingerprint identification algorithm.When the current fingerprint recognition algorithm is used to identify the Web fingerprint in the Web fingerprint database,there is an error between the recognition result and the actual result,and the recognition takes a long time,which result in low recognition accuracy and recognition efficiency.Based on the cosine measure,a Web fingerprint identification algorithm was proposed.The source fingerprint method is used to select the Web fingerprint in the four aspects of structural features,static files,cookie design and keywords,and a Web fingerprint database is established.Firstly,the characteristics of the data in the Web fingerprint database are extracted,and the abnormal data existing in the Web fingerprint database are removed according to the feature extraction result.Then,the cosine distance function is used as the similarity measurement function,and the K-means algorithm is used to cluster the Web fingerprints in the Web fingerprint database.Finally,the identification of the web fingerprint is completed according to the clustering result.The experimental results show that the proposed method can accurately complete the Web fingerprint identification in the Web fingerprint database in a short time,and has the advantages of high recognition accuracy and high recognition efficiency.

Key words: Cosine measure, Recognition algorithm, Web fingerprint


  • TP391.41
