Obstfeld, J., Chen, X., Frebourg, O. Towards Near Real-Time BGP Deep Analysis A Big-Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Towards Near Real-Time BGP Deep Analysis: A Big-Data

Approach
Joel Obstfeld Xiaoyu Chen Olivier Frebourg
Cisco Cisco Cisco
UK UK UK
[email protected] [email protected] [email protected]

Pavan Sudheendra
Cisco
UK
[email protected]

ABSTRACT
KEYWORDS
BGP (Border Gateway Protocol) serves as the primary
Border Gateway Protocol, Big-data, Anomaly detection,
routing protocol for the Internet, enabling Autonomous
Apache Spark, Hadoop.
Systems (individual network operators) to exchange
network reachability information. Alongside significant
on-going research and development efforts, there is a
practical need to understand the nature of events that occur 1 INTRODUCTION
on the Internet. On the 26th of April, 2017, a ‘prefix hijacking’ event
Network operators are acutely aware of security-related occurred that affected a number of financial services
incidents such as ‘Prefix Hijacking’ as well as the impact companies around the world. The impact of the event was
of network instabilities that ripple through the Internet. such that traffic was in part, diverted and directed to
Recent research focused on the study of BGP anomalies another network that claimed to be the owner of the IPv4
(both network/prefix instability and security-related address space.
incidents) has been based on the analysis of historical logs. The nature of the event was such that, depending on
Further analysis to understand the nature of these ‘distance’ between your network and that of the impacted
anomalous events is not always sufficient to be able to companies, the ‘newly’ announced network would now
differentiate malicious activities, such as prefix- or sub- appeared be ‘closer’ (or ‘shorter’ in BGP terms) and
prefix- hijacking, from those events caused by inadvertent therefore would be preferred.
misconfigurations. In addition, such techniques are
challenged by a lack of sufficient resources to store and
process data feeds in real-time from multiple BGP Vantage
Points (VPs).
In this paper, we present a BGP Deep-analysis
application developed using the PNDA (Platform for
Network Data Analytics) ‘Big-Data’ platform. PNDA
provides a highly scalable environment that enables the
Figure 1: BGP AS Path length preference
ingestion and processing of ‘live’ BGP feeds from many
vantage points in a schema-agnostic manner. The Apache
In the diagram above, the network operator responsible
Spark-based application, in conjunction with PNDA’s
for Autonomous System (AS) 10 advertises the IPv4 prefix
distributed processing capabilities, is able to perform high-
X. From the position of AS20, the ‘distance’ is considered
level insights as well as near-to-real-time statistical
to be an AS Path length of 4, (AS13, AS12, AS11, AS10).
analysis.
In the case of prefix hijacking, the same IPv4 prefix, X,
ICM 2017 J. Obstfeld et al.

was announced as being located in AS15. Given the It should also be noted that if network operators follow
proximity of AS20 to AS15 with an AS Path length of 3, BGP security best practices, in respect to filtering
traffic that originally flowed from AS20 toward AS10 will incoming advertisements and specifically blocking the
now be directed to AS15. However, if you were located in receipt of their own prefixes, they are typically unaware of
AS11 or AS12, the AS Path length is still shorter towards a prefix hijack incident taking place until an impacted user
AS10 than towards AS15 and as a result, traffic will contacts them!
continue to flow towards AS10. In order to be able to detect and mitigate a prefix
An additional element of the 26th of April event, was hijack, network operators need to have an ability to ‘see’
the announcement of ‘more-specific prefixes’ from the outside of their network perimeter, in order to understand
source of the attack, also known as ‘sub-prefix injection’. what other networks ‘see’ in respect to the prefixes relating
The BGP protocol’s path selection algorithm prefers to that AS.
‘more-specific’ prefixes before considering the AS Path With at the time of writing, over 660,000 prefixes
length. present in the Internet routing table, looking for erroneous
BGP events such as the one described above, is akin to
looking for a ‘needle in a haystack’. In a world where real-
time system, such as those used by the financial services
industry, are impacted, real-time, or close to real-time
detection and reporting system are also required.
In this paper, we will describe how event analysis can
be performed over the BGP data-set using Big-data
analysis techniques, and that we can move towards a
Figure 2: BGP AS Path length with ‘longer prefix’ position where real-time events can be detected and
reported in real-time. In addition, we will describe a
In the diagram above both AS10 and AS15 announce generic Big-data platform that can be readily adapted to
the IPv4 prefix X. In addition, AS15 announced the ‘more- consume and process a wide range of data types.
specific prefix’, X.1. Since routers in the Internet operate
on the premise of ‘longest match’, in such a case, 2 Related Work
regardless of the AS path length, the path to the more- According to the recent technical review [2],
specific prefix (X.1) will be preferred and traffic from all techniques in BGP anomaly detection have been
AS’s other than AS10, destined to addresses in the X.1 dominated by time-series analysis, machine learning,
address range, will be diverted towards the attacker’s statistical pattern recognition, and validation of BGP
network (AS15). updates based on history log. These significant works are
For the purposes of illustration, let us assume that either limited to identifying some anomalies or focused on
AS10 announced the IPv4 prefix 123.123.0.0/18. During ‘after-event’ analysis based on datasets where available.
the attack, AS15 also announced the IPv4 prefix As the survey correctly pointed out in its’ conclusion, there
123.123.0.0/18 and in addition, the IPv4 prefix is still much to be done to detect BGP anomaly in real-time
123.123.63.0/24. Given the use of ‘longest match’, any such that network operators can mitigate the spreading of
traffic destined to an IPv4 address in the range anomalies across networks.
123.123.63.0-255 would be sent towards AS15, rather than Due to the ‘openness’ of the BGP protocol, one can
the legitimate origin AS. claim the false ownership of one or more prefixes (i.e.
While it is possible that the announcement was as a prefix hijacking) and start a Denial-of-Service (DoS)
result of an error, the announcement of more-specific attack. Such false prefix path announcement can also be
prefixes that ‘targeted’ the financial institutions, makes it propagated over networks due to policy misconfigurations.
less likely to have been a mistake. There are well-known Existing automated anomaly detection techniques with
solutions such as RPKI[1], that provide a mechanism by rules learned from a ‘stable’ environment (experimental
which one can verify the authenticity of prefix testbed or historical data) can easily become obsolete
announcements but gaps in adoption and usage of the overtime and fail to deal with modern network attacks.
RPKI solution provide the opportunity for further events to In parallel to automated anomaly detection techniques,
occur. research efforts have also focused on advanced

2
Towards Near Real-Time BGP Deep Analysis: A Big-Data
IMC 2017
Approach

visualization techniques to help network operators to How one can collect BGP updates from the VPs in a time-
understand BGP routing instability and identify potential synchronous and format agnostic manner, becomes the
anomalies. Such visualization tools range from those that first technical issue. In this work, we leverage an open-
visualize raw BGP update events, such as BGPlay[3] and source project, SNAS.io [10](Streaming Network Analysis
LinkRank[4], to those with additional statistical view to System), that supports streaming live BGP updates and is
show historical trends and current deviations, such as BGP highly scalable.
eye[5] and Vistracer[6]. Another challenge comes from the accumulation of
The emergence of big data technologies makes it data over time. There are two issues related to this. First,
possible to combine machine-automated analysis and we need to provide a scalable storage. Secondly, we need a
exploratory analysis in a hybrid manner, in order to solution that will enable us to perform the required analysis
provide anomaly detection in real-time or near real-time. in near real-time. In this work, we adopted state-of-the-art
Machine-automated processes are used to generate alerts big data technologies and leveraged the PNDA.io [11], an
based on patterns/deviations learned from incrementally open-source big data framework. With PNDA, we were
increasing historical data and live updates. Visualization able to utilize a file system distributed across multiple data
tools then helps network operators to dive into observed nodes to store historical data meanwhile minimizing
BGP events and validate reported alerts. In 2016, the processing time in batch mode to keep pace with live data
Center for Applied Internet Data Analysis (CAIDA) streaming.
published BGPStream [6], an open-source software Finally, the solution aims to provide a versatile and
framework for both offline analysis of historical MRT extensible solution for a broad range of applications and
feeds and real-time prefix/AS monitoring. There are also users. Using advanced visualization techniques, we are
commercial players Kentik [8] and BGPmon (OpenDNS) able to provide continuous monitoring on BGP anomalies
[9] that provide platforms for network data analytics and along with capability of investigating historical trends from
real-time intelligence. three different viewpoints, AS connectivity, AS path view,
The work presented in this paper uses a similar semi- and prefix view.
automated approach, leveraging an open-source BGP feed
collection framework in conjunction with an open-source 4 End-to-End Architecture
big data platform. Compared to other existed approaches, Figure 3 shows the end-to-end architecture of BGP
we provided a versatile and scalable end-to-end solution deep analysis application, which consists of three main
that enables the collection of live BGP updates in a layers:
schema-agnostic manner. In addition, the work offers the
ability to process data at scale as well as being a platform • Data collection layer
• Data storage & processing layer
that can be applied to other data-sets and applications.
• Data visualization layer
3 Goals and Challenges
BGPv4 is the control-plane protocol that drives the
Internet. The Internet itself is a highly volatile
environment; from a single vantage point, we observe on
average ~ 3 million BGP updates a day. Our main goal was
to create a solution to help network operators extract useful
signals from the ‘noise’ such that they can understand the
changes in the Internet that may impact their own
operations, and to do so in near real-time.
In order to gain better insight, it is important to perform
the analysis on data taken from multiple BGP Vantage
Points (VPs), each of which has a view of the Internet.
Collecting BGP data from globally distributed VPs poses a Figure 3: BGP deep analysis end-to-end technical
series of technical difficulties. Due to the distributed nature architecture
of VPs, BGP event data cannot be readily synchronized.

3
ICM 2017 J. Obstfeld et al.

4.1 Data collection and aggregation written directly to the Kafka [18] bus. This makes the
In order to ingest BGP information, the system makes PNDA system flexible and extensible.
use of an open-source project called SNAS.io. SNAS
4.2 Underlying big data platform
supports the consumption of data presented in the BGP
Monitoring Protocol (BMP) format [12]. Consuming and processing data of a particular type and
In many cases, BGP monitoring is performed using an construct can be performed using specifically-optimized
instrumented BGP ‘listener’ process running on a compute processing pipelines. Such pipelines are typically designed
host. The process is added into a network operator’s BGP for deployment in a specific environment, reducing the
topology and therefore hears the BGP event updates that ability to reuse the solution in other environments or with
pass across the network. A major limitation of this method other data types. While there may be performance
is that it relies on BGP’s advertisement selection process. optimizations that can be realized as a result of the design
The selection process will consume all of the received and technology selection, one is left with a bespoke
BGP updates, make its ‘best-path’ selection and advertise solution.
only the best-paths (Post-Rib). Those received paths that When considering the technologies that can be applied
have not been selected are discarded and more importantly to the particular problem space, it is valuable to look at
in the context of this work, cannot be readily obtained from adjacent fields for inspiration and for a sense of ‘scale’.
the router. What may be considered to be a sizeable data-set in one
field of work, is considered small or even insignificant in
another.
The BGP Deep Analysis work is considered to be an
application running on top of generalized ‘Big-data’
platform. For many people, ‘Big-data’ is synonymous with
‘Hadoop’ and a series of technologies that have grown up
in this space. While the use of Hadoop ecosystem
technologies is common-place in some areas of business
and technology today, they have not been applied to the
networking space until very recently.
Figure 4: BGP data source Moreover, building a Big-data platform does not stop
with the deployment of a Hadoop cluster. Additional
Routers that support the BMP protocol are able to send components are required in order to be able to ingest data,
to a BMP ‘listener’, such as SNAS.io, all of the BGP potentially in a wide variety of forms, volumes and
updates that router has received before the BGP path velocities. Another set of components are required to
selection algorithm comes into effect (Pre-Rib). This key expose the data in a manner that Data Scientists can
capability means that the system can record events before interact with. The rate of developments in open source Big
any form of filtering takes place. In addition, SNAS.io data analytics technologies is extremely rapid, but
supports RouteViews [13], enabling the system to obtain combining multiple technologies into an end-to-end
observations from many locations around the world. solution can be extremely complex and time-consuming.
Once the data has been received by SNAS.io, the data The current state of the networking industry is that each
is passed to a Logstash [14] process, an open-source network operator is solving this problem independently,
component that is used to consume and in this case, encode leveraging upstream Hadoop distributions to build custom
the BMP data for storage in the PNDA platform’s Hadoop analytics platforms and applications. These custom
[15] storage system. The BMP data is encoded in an avro implementations become operator-specific siloes, which
[16] wrapper and placed onto the PNDA system’s Kakfa are not interoperable; each operator owns the software
bus. The wrapper provides common metadata that is used lifecycle and associated development costs for their
by Gobblin [17] in order to organize and write the data implementation.
within the HDFS folder on a per-topic, per-hour, per-BMP For the BGP Deep Analysis work, the open source
‘listener’ basis. The BGP data is now available for PNDA environment has been used. PNDA brings together
applications to inspect and process. a number of open source technologies to provide a scalable
One should note that a wide variety of data formats can open platform on top of which analytics applications can
be processed and encoded using Logstash or could be be developed and deployed. PNDA provides a Hadoop
4
Towards Near Real-Time BGP Deep Analysis: A Big-Data
IMC 2017
Approach

cluster, with data distributed across ‘data-nodes’ both for BGP data streamed into PNDA via Kafka brokers and then
efficiency of processing and for resilience, as well as the persisted in HDFS by a Gobblin process.
components and capabilities required to offer support for The BGP deep analysis application is scheduled on
multiple end-to-end pipelines, on a single platform. frequent interval basis and processes raw BGP data to
aggregate behaviors on a per-AS and per-AS Path basis. In
particular, it characterizes the stabilities of AS and AS path
using the key features as listed in Table 1. The key features
obtained from the BGP data are the ‘AS path change’
(announcement and withdrawal of a prefix), the rank value
derived from the number of prefixes that transit via a
particular AS or a particular inter-AS path, rank change,
and rank change frequency for each AS and AS path. By
using these features we can derive the importance of an AS
and AS path over the Internet as well as the stability (the
higher change frequencies, the less stable).

ID Feature
1 AS Path change
2 AS rank
3 AS path rank
Figure 5: PNDA block diagram
4 AS rank change
A fundamental premise of ‘Big-Data’ is to obtain and 5 AS path rank change
store data in its rawest form. In many traditional data 6 AS rank change frequency
applications, data is modified or transformed on ingest, 7 AS path rank change frequency
resulting in the reduction of information as the data is
‘massaged’ into a required format. A side-effect of this Table 1: Key fields for AS analysis
method of operation is that if another application wishes to
In parallel to the BGP deep analysis application, the
make use of the data, it is already operating on a refined
BGP security application is scheduled to analyze network
subset, rather than the initial ‘raw’ information.
anomalies including RFC5735 [19] prefixes and netblocks
The PNDA platform supports a wide-range of off-the-
that are unallocated and therefore should not ever seen in
shelf plugins as well the creation of custom adapters,
BGP updates, short prefixes (i.e. those prefixes with a
enabling the ingestion of structured and unstructured data
mask between /0-/7 and those with a mask of /25-/32.
such as log messages, SNMP events, metrics/time-series
Further work identifies those prefixes that originate from
data and network telemetry.
multiple AS’s and ‘sub-prefix injection’ cases. Table 2 lists
Another premise of a Big-data platform is the
the key features used to detect network anomalies.
performance of processing where the data resides, rather
than moving the data from storage to a location where the
ID Feature
analysis application executes. The platform supports the
1 Prefix origin change
use of Apache Spark-based batch and streaming
applications 2 Prefix length
3 Rare Prefix
4.3 BGP app 4 Subprefix origin change
Similar to conventional three-tier Web application, the
Table 2: Key fields for Security analysis
BGP Web server presented a Web-based UI presenting to
the operator results generated by two Spark-based Both applications are running in batch mode and
applications, the BGP deep analysis application and the schedule apache Spark MapR jobs across data nodes via
BGP security application. Both applicaitions are scheduled Hadoop YARN resource manager, in short ‘spark-on-
as Apache Spark micro-batch processes and process raw yarn’. At the end of batch processing, the applications

5
ICM 2017 J. Obstfeld et al.

write their results into HBase tables, a key-value database, Focused analysis on paths between two AS has
and make them available via an Impala query interface. identified in one 24-hour period:
The front-end portal provides a Web-based UI that notifies
• Shortest path – 4 hops
network operators to network anomalies and employs
advanced visualization techniques to enrich the network • Longest path – 29 hops
operators’ understandings of the nature of anomalies, as • Longest unique AS path – 6
well as enabling the investigation of data using long • Unique paths - 9
historical analysis. • Largest prepend count – 17
• Prepend variation – [7-17]
5 BGP Deep Analysis
The Deep Analysis applications investigate two key The Security analysis looked for BGP prefixes that
areas today; BGP AS path and prefix behavior and BGP should not be observed. This includes looking for
Security. With the data platform consuming event data ‘unallocated’ address space. Observations over a 12-hour
from multiple VPs, the application is able to process period revealed over 4000 unique ‘unallocated’ prefixes
aggregate data to understand, for example, which AS’s being announced. In addition, a module was developed in
appear in AS Paths most frequently? Which AS’s originate order to identify cases of ‘sub-prefix injection’. At the time
the largest number of prefixes? Which AS’s exhibit the of submission, this study was still ongoing.
largest number of ‘changes’ over time.

7 Conclusions
The work covered in this paper has shown how techniques
and technologies from the Big-data realm can be applied to
the Networking space. The work has demonstrated the use
of applications that examine, identify and alert network
operators to events that have the potential to be highly
disruptive to their customers and services.
In addition, the use of a Big-data platform enables the
Figure 6: Top N analysis further exploration and analysis of other data sets both
inside and outside of the networking realm, without having
Using further mapping techniques, we can determine to create entirely new data acquisition, storage and
the connectivity characteristics of a particular AS, processing pipelines.
rendering the results in a graph.

ACKNOWLEDGMENTS
This work was realized with the assistance of the PNDA and SNAS project teams at
Cisco. PNDA.io and SNAS.io are open source projects under the Linux Foundation.

APPENDIX

Appendix 1
The figure below shows a simplistic graph. The ‘top’ line shows
the original prefix announcement, originating from AS 13118. On
the 26th of April, at 22:40, the ‘sub-prefix hijack’ was advertised,
originating in AS12389. This is shown in the ‘spot’ point in the
lower third of the graph. This announcement led to significant
service interruption for the affect institution.

Figure 7: Plot comparison to two AS’s

6
Towards Near Real-Time BGP Deep Analysis: A Big-Data
IMC 2017
Approach

Figure 1: Prefix Hijacking of attacking financial institutions,


Apr. 26th, 2017

REFERENCES
[1] R. Bush, R. Austein, https://tools.ietf.org/html/rfc6810, 2013
[2] B. Al-Musawi, P. Branch and G. Armitage, "BGP Anomaly Detection
Techniques: A Survey," in IEEE Communications Surveys & Tutorials, vol.
19, no. 1, pp. 377-396, Firstquarter 2017. doi: 10.1109/COMST.2016.2622240
[3] Di Battista G., Mariani F., Patrignani M., Pizzonia M. (2004) BGPlay: A
System for Visualizing the Interdomain Routing Evolution. In: Liotta G. (eds)
Graph Drawing. GD 2003. Lecture Notes in Computer Science, vol 2912.
Springer, Berlin, Heidelberg
[4] M. Lad, D. Massey, and L.Zhang. Linkrank: A graphical tool for capturing bgp
routing dynamics. In Proceedings of the IEEE/IPIF Network Operations and
Management Symposium (NOMS), April 2004
[5] S. T. Teoh, S. Ranjan, A. Nucci, and C.-N. Chuah. Bgp eye: a new
visualization tool for real-time detection and analysis of bgp anomalies. In
VizSEC ’06: Proceedings of the 3rd international workshop on Visualization
for computer security, pages 81–90, New York, NY, USA, 2006. ACM.
[6] Asdf Orsini, C., King, A., Giordano, D., Giotsas, V. and Dainotti, A., 2016,
November. BGPStream: a software framework for live and historical BGP data
analysis. In Proceedings of the 2016 ACM on Internet Measurement
Conference (pp. 429-444). ACM.
[7] Fischer, F., Fuchs, J., Vervier, P.A., Mansmann, F. and Thonnard, O., 2012,
October. Vistracer: a visual analytics tool to investigate routing anomalies in
traceroutes. In Proceedings of the ninth international symposium on
visualization for cyber security (pp. 80-87). ACM.
[8] https://www.kentik.com/
[9] https://bgpmon.net/
[10] http://www.snas.io/
[11] http://www.pnda.io/
[12] https://tools.ietf.org/html/rfc7854, 2016, Scudder, Fernando, Stuart
[13] http://www.routeviews.org/
[14] https://www.elastic.co/products/logstash
[15] http://hadoop.apache.org/
[16] https://avro.apache.org/
[17] https://engineering.linkedin.com/data-ingestion/gobblin-big-data-ease
[18] https://kafka.apache.org/
[19] M. Cotton, L. Vegoda, https://tools.ietf.org/html/rfc6598, 2010

You might also like