Automated Keyword Extraction From One-Day Vulnerabilities at Disclosure
Automated Keyword Extraction From One-Day Vulnerabilities at Disclosure
Automated Keyword Extraction From One-Day Vulnerabilities at Disclosure
Vulnerabilities at Disclosure
Clément Elbaz Louis Rilling Christine Morin
Univ Rennes, Inria, CNRS, IRISA DGA Univ Rennes, Inria, CNRS, IRISA
Rennes, France Rennes, France Rennes, France
[email protected] [email protected] [email protected]
Abstract—Common Vulnerabilities and Exposures (CVE) Vulnerability Scoring System (CVSS) score and vector [6].
databases such as Mitre’s CVE List and NIST’s NVD database NIST security experts take at least a few days to analyze and
identify every disclosed vulnerability affecting any public soft- annotate a vulnerability, and often weeks (see Section III-A). It
ware. However, during the early hours of a vulnerability disclo-
sure, the metadata associated with these vulnerabilities is either is common to find vulnerabilities that have been disclosed for
missing, wrong, or at best sparse. This creates a challenge for ro- several days that are still not analyzed by NVD. For example
bust automated analysis of new vulnerabilities. We present a new CVE-2019-9084, disclosed on the CVE List on 06/07/2019,
technique based on TF-IDF to assess the software products most has no NVD analysis as of 06/11/2019. This delay means that
probably affected by newly disclosed vulnerabilities, formulated in order to reliably analyze one-day vulnerabilities, one should
as an ordered list of relevant keywords. For doing so we rely
only on the human readable description of a new vulnerability not rely at all on enriched metadata provided by databases such
without any need for its metadata. Our evaluation results suggest as NVD. Instead one should focus on the data available when
real world applicability of our technique. the vulnerability is first disclosed on Mitre’s CVE List, which
consists of three elements only: a unique CVE identifier, a
I. I NTRODUCTION
free-form human readable description, and at least one public
The disclosure of a vulnerability is the most critical part reference [7].
of its life cycle. As a confidential zero-day, a vulnerability The vulnerability analysis ecosystem presented above makes
is a high value asset used sparingly to attack high value it expensive for organizations to analyze one-day vulnerabili-
targets. On the other hand, well known public vulnerabilities ties at disclosure. On the one hand achieving real-time threat
can be mitigated using standard security practices such as evaluation of new vulnerabilities through manual analysis
applying software updates diligently, or using a signature- requires extensive man power as hundreds of vulnerabilities
based intrusion detection system (IDS). Bilge et al. [1] showed are disclosed daily. On the other hand there is not enough
that at disclosure, the usage of exploits of a vulnerability in the machine-readable metadata available at disclosure for auto-
wild increases as high as five orders of magnitude while tran- mated analysis. Real-time threat analysis is therefore pro-
sitioning from a zero-day to a public vulnerability. A software hibitively expensive for most organizations, although it would
patch is sometimes already available, but its adoption may benefit them as severe vulnerabilities such as Shellshock have
not be widespread. At this early stage the vulnerability is not been massively exploited within hours of their disclosure [8].
understood well enough to author a proper signature rule for Automating real-time threat evaluation for newly disclosed
an IDS. All these factors contribute to making the disclosure vulnerabilities would make it affordable for more organiza-
a dangerous time, since a lot of systems are vulnerable in tions. This would allow cloud service providers (CSP) and
practice. We call one-day these newly disclosed vulnerabilities information systems to react in real-time to vulnerability
that are still in the critical part of their life cycle. One-day disclosures. Examples of automated reactions include reconfig-
should not be taken literally here: a vulnerability disclosure uring security policies by elevating logging levels for critical
can be 72 hours old and still be at its most threatening period. systems, switching these systems into degraded mode or even
The vulnerability disclosure process is coordinated by the shutting them down while waiting for a remediation to be
Common Vulnerabilities and Exposures (CVE) system over- applied. Such a reaction service could help the CSP to protect
seen by Mitre’s Corporation [2]. Newly disclosed vulnerabili- both its internal systems and tenants (the latter constituting a
ties are first published on the CVE List data feed managed by potential source of revenues for the CSP).
Mitre. They are then forwarded to other security databases, We propose an automated system that uses free-form de-
such as NIST’s NVD database [3] or SCAP data feeds [4], scriptions of newly-disclosed one-day vulnerabilities to extract
where they will eventually be annotated by multiple security the most probable affected software from the description,
experts. These annotations include metadata such as the af- and can do so in near real-time (at most seconds after the
fected software, as described by an entry from the Common disclosure). Identifying which systems are vulnerable can be
Platform Enumeration (CPE) [5]. It also includes a Common achieved by extracting relevant keywords from the free-form
978-1-7281-4973-8/20/$31.00
c 2020 IEEE vulnerability description and forwarding them to an alert
service monitoring specific keywords related to these systems
(such as names of public software used in the system).
Text description of analyzed vulnerability
Our system associates CVE vulnerabilities to keywords
extracted from past CPE URIs to quickly point out the most Available CPE URIs Word filtering
Fig. 4: When discarding the CPE dictionary we get a more robust data life
cycle while retaining most of the inherent data. Fig. 5: Number of days between vulnerability disclosure and analysis in NVD
from 2007 to 2018.
Description SQL injection vulnerability in register.php
in GeniXCMS before 1.0.0 allows remote
attackers to execute arbitrary SQL com-
mands via the activation parameter.
Keyword set activation, before, commands, genixcms, in, Time
(in alphabetical parameter, php, register, remote, sql, the, to,
order) via, vulnerability
Description of Metadata of
TABLE II: Description and extracted keyword set for CVE-2016-10096, a vulnerability V1 vulnerability V1
vulnerability disclosed on 01/01/2017. The filtering list included all CPE URIs
published between 01/01/2007 and 12/24/2016. Description of Metadata of
vulnerability V2 vulnerability V2
Description of Metadata of
vulnerability V3
of the CPE URIs, as described in Figure 4. As CPE URIs vulnerability V3