NetView - Towards On-Demand Network-Wide Telemetry in The Data Center

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

NetView: Towards On-Demand Network-Wide

Telemetry in the Data Center


Yunsenxiao Lin∗† , Yu Zhou∗† , Zhengzheng Liu∗† , Ke Liu‡ ,
Yangyang Wang∗† , Mingwei Xu∗† , Jun Bi∗† , Ying Liu∗† , Jianping Wu∗†
∗ Department of Computer Science and Technology, Tsinghua University, Beijing, China
† Beijing National Research Center for Information Science and Technology (BNRist), Beijing, China
‡ Department of Mathematical Sciences, Tsinghua University, Beijing, China

Abstract—Network telemetry is to collect information (e.g., Unfortunately, existing telemetry methods are fundamen-
hop latency, throughput) from network devices. Network-wide tally constrained, and none of them can attain all the above
telemetry is critical for operators to understand the quality of requirements simultaneously.
network performance and to diagnose on-going failures. The
state-of-the-art telemetry approaches are far from ideal as they Passive network telemetry provides information about nor-
are unable to fully satisfy diverse requirements of operators, mal traffic. In-band network telemetry (INT) [4] is a typ-
specifically for on-demand, full coverage, and scalable telemetry. ical technique of passive network telemetry, which writes
In this paper, we provide a new framework of network device performance information along the path in packets.
telemetry for data center networks, called NetView. NetView Although providing visibility, INT itself does not define how to
can support various telemetry applications and frequencies on
demand, monitoring each device via proactively sending dedi- achieve network-wide telemetry, only monitors specific paths
cated probes. Technically, NetView leverages source routing to and specific traffic at the frequency of packet arrival, and
forward probes, achieving full coverage. Besides, a series of probe cannot continuously monitor network-wide state at the desired
generation algorithms largely reduce probe number, providing frequency, lacking both coverage and on-demand telemetry.
high scalability. The evaluation shows that NetView reduces the Besides, due to INT’s unplanned path, overlapping paths
bandwidth occupancy by more than two orders of magnitude
compared with Pingmesh and INT-path, and conducts network- cause too much redundant data, making it difficult to analyze,
wide telemetry for large-scale data center network using only especially when the network is at large scale. Other passive
one vantage server, without bringing about resources bottleneck. methods, like sampling [5] [6], mirroring [7], and counting [8]
I. I NTRODUCTION [9] are quite restrictive. Sampling and mirroring miss events of
Network operators routinely perform continuous monitoring interest as it is infeasible to collect information on all packets,
to collect network status ranging from performance informa- while counters only track traffic volume statistics, missing
tion to failure information. This type of monitoring requires information like queue length.
constant, real-time measurement and analysis, which is com- Proactive network telemetry works at endpoints, proactively
monly referred to as network telemetry [1]. Network telemetry sending probes to acquire telemetry data. Early techniques,
can help applications diagnose whether the network is respon- like Pingmesh [3] and SCMon [10], lack visibility to localize
sible for poor performance, simplify network OAM, and give performance problems at the device level. Moreover, Pingmesh
a clue of network failure among thousands of switches [2]. [3] needs all servers to act as vantage points to get information
Despite the importance, designing an efficient network between any two servers, generating massive probes. INT-path
telemetry platform is non-trivial due to three key requirements. [11] uses Euler trail-based path planning policy to generate the
On-demand telemetry. Operators may have various teleme- probe path. Although it achieves non-overlapped probe paths,
try demands, e.g., SLA compliance and node black hole dis- INT-path needs a large number of hosts to send and collect
covery [1] [3]. Besides, due to various telemetry applications probes, which is complicated and bandwidth-consuming to
have different data collection frequency requirements (e.g., synchronize and coordinate telemetry data among these hosts
10s, 5min and 2h) [3], a telemetry system should support both for deployment and maintenance [12].
telemetry applications and telemetry frequencies on demand. In this paper, we propose a new telemetry framework of
Full coverage. Telemetry targets should cover all of the proactive network telemetry, called NetView. NetView can eas-
devices in the data center network. Otherwise, the specific ily express network-wide telemetry applications via telemetry
parts of the network glitch or performance degradation will API, empowering network operators to customize telemetry
evade the supervision of network operators, in severe cases, targets, characteristics, and frequency on demand, and dynam-
bringing about a sequence of chained events and potential ically query network at runtime without disrupting the under-
commercial loss [2]. lying infrastructure. Then, NetView automatically transforms
Scalability. Network telemetry inevitably introduces the queries into underlying probes. Leveraging source routing to
overhead of monitoring, which grows with the scale of the encode forwarding path into the probe, it can traverse arbitrary
network. With the expansion of the network topology, both desired path, achieving full coverage. Besides, to enable high
bandwidth occupancy and the number of vantage servers scalability and reduce the bandwidth occupancy, NetView
should not grow exaggeratedly. designs a series of probe generation algorithms to merge

978-1-7281-5089-5/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
Figure 2: Sample applications of NetView.
space limitations, the full version of two types of meta-
data in this section appears in our technical report [14].
(1) Primary metadata is directly supported by data plane
Figure 1: Architecture and workflow of NetView.
[15], comprising switch metadata (e.g., switch id), ingress
telemetry metadata, multiplex telemetry frequencies, and use metadata (e.g., ingress port rx pkt count), buffer metadata
node-level metadata to compose path-level information. (e.g., instantaneous queue length), and egress metadata (e.g.,
NetView can be implemented on the programmable ASICs egress port tx utilization). (2) Derived metadata is high-level
(e.g., Tofino [13]). In fact, we built the NetView prototype on empirical metadata, which can be calculated by primary meta-
the Tofino hardware and confirmed its feasibility, occupying data. It contains path-level metadata (e.g., path utilization),
moderate hardware resources. Contributions are as follows. node-level metadata (e.g., switch workload), and probe-level
• We propose a new framework of network telemetry for metadata (e.g., rx probe timestamp). Particularly, we can
data center network, called NetView, which only needs tackle path-level metadata by approximating it into primary
one vantage server to inject and collect probes, providing metadata at the node (§III-Method B). As for delay, it consists
on-demand, full coverage, and scalable telemetry. (§II) of four parts: transmission delay, propagation delay, queue-
• We provide a suite of APIs to simplify network OAM ing delay, and processing delay. The queueing delay and
and to express various telemetry applications. (§II) processing delay represents network performance deviation
• We develop a suite of efficient algorithms for probe among the above four parts. As for throughput, the path
generation and data processing, largely reducing total throughput is decided by the minimal port throughput along
bandwidth occupancy by more than two orders of mag- the path. Therefore, without measuring delay (throughput)
nitude compared with the state-of-the-art methods. (§III) along the path, instead, NetView measures queueing delay
• We implement NetView prototype with 2659 lines of (port throughput) of each node, and use node-level queueing
code on the vantage server and Tofino [13], elaborately delay (port throughput) to compose path-level information.
evaluating the performance of NetView. (§IV) 2) Network Telemetry API: NetView provides declarative
APIs that express queries for a wide range of telemetry tasks.
II. D EISGN OF N ET V IEW
• PathQuery(“SrcNode:SrcPort”, “DstNode:DstPort”)
A. Architecture and Workflow of NetView means that query the information of (a) path(s) between
As shown in Figure 1, NetView consists of four main the port of source node and the port of destination node.
components: telemetry service provider, telemetry coordina- • NodeQuery(“Node:Port”) means that query the informa-
tor, telemetry antenna, and telemetry analyzer. Operators can tion of (a) given node(s).
specify telemetry applications to NetView and gain telemetry • ProbeQuery(“TerminalIP”) means that query the informa-
reports, without manipulating underlying infrastructure. Gen- tion of probes from a single terminal.
eral procedures to apply the platform are as follows: Telemetry • Where(“Scope”) means that specify the metadata scope.
applications enforce various telemetry queries to telemetry • Period(“DesiredTime”) means that set a timer to query
service APIs exposed by the telemetry service provider. The network periodically.
telemetry service provider then allocates telemetry queries to • Return(“Metadata”) means that return a derived/primary
the telemetry coordinator, which is responsible for generat- metadata as the telemetry result.
ing probes. Afterward, the telemetry antenna injects probes 3) Network-Telemetry-Based Applications: As Figure 2
into the underlying network and collects probes. Finally, the shows, we introduce non-exhaustive applications as examples.
telemetry analyzer analyses the received probes and passes the (a) Prove Network Innocence. Operators can let NetView pe-
analysis telemetry results to the telemetry service provider, riodically measure end-to-end throughput, identifying whether
through which applications gain the desired telemetry reports. end hosts or the network should be blamed for service in-
terruption. (b) Network Planning. With NetView, Operators
B. Metadata and API can accumulate a large volume of telemetry data (e.g., queue
1) Supported Metadata: NetView supports two kinds of length) for the long term network capacity planning and
metadata: primary metadata and derived metadata. Due to topology augmentation. (c) Failure Location. Operators can

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
the others, the vantage server has to wait until the return
of the last probe from the longest path, which does degrade
the effectiveness and accuracy of telemetry results. Hence,
balanced path generation is preferable. Besides, the longer path
the probe traverses, the more labels will be needed, resulting
in exceeding the MTU. We solve the above two problems by
Figure 3: Probe format of NetView. introducing the threshold of forwarding stack kF wd and the
query parts of the network prone to failure or even the entire threshold of telemetry stack kT ele specified by the operator.
data center network to detect sporadic failures in the first place. 1) Method A: Probe generation without optimization:
C. Probe Design and Switch Operations When operators’ query set Q arrives (Q contains n queries,
1) Probe Format: As Figure 3 shows, the probe contains and qi is the i-th query in the query set), a naı̈ve method is to
two label stacks. (1) Forwarding (Fwd) stack comprises an generate a subset of probes Pi for each query qi independently.
outport label list and list length for flexible probe forwarding. For each probe pij (the j-th probe in the i-th probe subset
In the list, each Fwd label corresponds to each hop traversed by Pi ), the two types of label number no more than kF wd and
probes, and popping the current Fwd label means forwarding kT ele . The function Decompose() analyzes what and where
the probe according to the outport specified by the label. (2) metadata needs to be acquired for each query qi . As long as
Telemetry (Tele) stack comprises a label list for telemetry the metadata set is not empty, NetView probe starts at vantage
records. Each Tele label is composed of switch ID, metadata server, chooses one node to visit, and acquires all the metadata
bitmap, and metadata value list. Metadata bitmap determines at that node, adding Tele labels. Then, along with the shortest
which type(s) of metadata need to be collected by the probe, path, NetView adds Fwd label to the probe. Finally, the probe
and the metadata value list carries these metadata. returns to the vantage server. If there are remaining metadata,
Besides the Fwd stack and the Tele stack, the probe’s header iterate program, and we get a subset of probes Pi for query
looks like a normal packet header, containing second to fourth qi . When all queries are processed, return the probe set P .
layer headers. To inform the parser of the switch that it is Specifically, we call the problem choosing node to visit with
a NetView probe, the destination port number is set to a hop limit and finally getting back to the start point as problem
fixed value, “NetView probe”. Furthermore, the destination IP 1. Because multi-traveling salesman problem (mTSP) can be
address of the probe is set as vantage server’s IP, to make sure transformed into Problem 1 by polynomial time [16], Problem
that the probe will finally be forwarded to the vantage server. 1 is NP-hard. Thus, we adopt a heuristic algorithm, closest
2) Switch Operations: During the probe forwarding, Fwd neighbor heuristic [16] (time complexity O(N 2 ), N switches),
labels in the Fwd stack are popped to forward the probe, for function ChooseTeleNode() to make fast decision.
while metadata values in the Tele stack are pushed to record 2) Method B: Probe generation with optimization:
telemetry data. As long as the probe enters a switch, the switch Method A only generates probes for each query independently,
checks whether the destination port number of TCP/UDP resulting in too many probes and impeding normal traffic.
header is “NetView probe”. If not, the switch regards it as Moreover, a large number of probes can affect the accuracy
a normal packet and refresh the metadata related to it (e.g., of telemetry results, and make the link connecting the vantage
queue length experienced by the packet) in the register. Then, server to the edge switch become a bottleneck.
the packet will be processed and forwarded normally. If The optimization insights of NetView are to (1) merge
confirmed to be a NetView probe, the switch compares switch the redundant metadata, improving telemetry efficiency, (2)
ID in the Tele stack with its ID one by one. Once finding the ID use node-level metadata to approximately compute path-level
identical to itself, which means that the current switch needs metadata, reducing telemetry targets, and (3) multiplex the
to be monitored, the switch will rewrite the metadata value telemetry frequencies, increasing probes utilization. As Figure
list in the Tele stack to add all the telemetry data according to 4 shows, Method B contains three steps.
the metadata bitmap. At last, one Fwd label will be popped, Step 1: Classify queries into query clusters, and compute
and the probe will be forwarded according to it. each query cluster independently. Design insights (1) and (3)
Under such design, NetView probes do not need to cover let fewer probes carry more telemetry results. Although it
all the links, while only need to cover all the switches to reduces probe number, only one probe dropped can affect
obtain per queue information, reducing the telemetry targets multiple telemetry applications simultaneously. To reduce the
from O(N 2 ) to O(N ), N indicating the number of switches. impact of probe loss, we classify queries into two types:
performance query and failure query. Performance query refers
III. A LGORITHMS OF N ET V IEW
to queries that need to acquire the performance information in
A. Probe Generation Algorithm the network (e.g., queue length detection), and its probe loss
In this section, we start with a straw-man Method A to rate is negligible (about 1 × 10−5 ) [3]. Failure query refers to
show how to translate queries into telemetry probes. Then, we queries that need to detect the failures in the network (e.g.,
consider optimizing the number of probes in Method B. node black hole discovery), and probes encounter significant
Before proposing the algorithm, we introduce the concept risks to be dropped. Therefore, in the following steps, we
of kF wd and kT ele . If one probe path is much longer than optimize probe generation for performance query. As for the

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
Figure 4: Method B: Probe generation with optimization algorithm workflow.

Algorithm 1: Step 2-a Algorithm 2: Step 3


Input: QueryCluster, G(V, E) Input: P rimaryM etadataSet, M, G, V0 , kF wd , kT ele
Output: DerivedM etadataSet Output: P (P1 , P2 , ..., Pm )
1 DerivedM etadataSet ← ∅; 1 P (P1 , P2 , ..., Pm ) ← ∅;
2 for i in m do 2 for i in reversed(m) do
3 for qij in QueryClusteri do 3 while ChooseTeleNode(Mi ) ! = ∅ do
4 T emp.M eta ← Decompose(qij ); 4 F wd ← V0 ;
5 T emp.F req ← qij .F req; 5 while pij .F wdStack.length <
6 DerivedM etadataSeti .add(T emp); kF wd &&pij .T eleStack.length < kT ele do
6 T ele ← ChooseTeleNode(Mi );
7 for i in m do 7 for N ode.F req in F req do
8 merge(DerivedM etadataSeti ); 8 pij .TeleStack.add(T ele.P rimaryM etadata);
9 Mi ← MaptoTopo(G, DerivedM etadataSeti );
9 while T ele ! = F wd do
10 DerivedM etadataSet.add(DerivedM etadataSeti );
10 F wd ← ChooseFwdNode(G, F wd, T ele);
11 return DerivedM etadataSet; 11 pij .FwdStack.add(F wd.port);

12 while F wd ! = V0 do
failure query, we can use Method A to generate probes for each 13 F wd ← ChooseFwdNode(G, F wd, V0 );
query independently. A network operator can specify the query 14 pij .FwdStack.add(F wd.port);
type (performance or failure query) when writing a query. 15 Pi .add(pij );
16 j + +;
What’s more, to multiplex queries with different frequen-
cies, we classify them into a series of clusters that are multi- 17 return P (P1 , P2 , ..., Pm );
plied in turn (e.g., 0.25Hz, 0.5Hz, 1Hz, 2Hz, etc). If the fre-
quency cannot be divisible, we classify it as a higher frequency composes derived metadata into primary metadata, according
(e.g., 3Hz is classified as 4Hz). In this way, probes in higher to mathematical expression. The reason for the two-step de-
frequency cluster can pick up metadata in lower frequency composition is that derived metadata without merge can lead to
clusters (seeing step 3 for details). Queries with the same too much redundant primary metadata. As for derived meta-
taxonomy and frequency are in the same QueryClusteri . The data distributed on links, NetView splits them into adjacent
time complexity of step 1 is O(n) for n queries. nodes, using primary metadata to approximately present them.
Step 2-a: Decompose queries, and merge derived metadata. For example, considering delay between nodes B and C in
Different queries may need to acquire the same derived Figure 4, we use egress timestamp − ingress timestamp
metadata. Therefore, we can merge the derived metadata to to calculate processing delay and queuing delay in a single
reduce the telemetry target. However, we can only merge switch, and ignore propagation delay to approximately repre-
derived metadata in a single query cluster because crossing sent half of RT TBC . In this way, we can divide one path into
different query cluster means different query frequency. As multiple segments to measure, and merge path information &
shown in algorithm 1, in each query cluster (Line 2), each node information at the same granularity. The time complexity
query qij (Line 3) is decomposed into several derived metadata of this paragraph is O(md) in each DerivedM etadataSeti .
(Line 4). If operators write queries mixing derived metadata At last, NetView merges duplicate primary metadata in
and primary metadata, we leave primary metadata to handle each cluster respectively, getting primary metadata sets, only
in step 3. Then, each metadata is bound to a query frequency distributed on nodes. Similarly, The time complexity of this
(Line 5). The time complexity (Line 2-6) is O(n) for n queries. paragraph is O(mp) for m query clusters and p primary
At last, NetView merges duplicate derived metadata in metadata in each P rimaryM etadataSeti .
each cluster respectively (Line 8), getting a series of derived Step 3: Generate probe path, and multiplex different fre-
metadata sets, distributed on nodes and links (Line 9). The quency. Algorithm 2 iterates from the highest frequency query
time complexity (Line 7-10) is O(md) for m query clusters cluster to the lower one (Line 2). In this way, as long as the
and d derived metadata in each DerivedM etadataSeti . probe passes through a node, it picks up all of the primary
Step 2-b: Decompose derived metadata, approximate pri- metadata (current frequency and all of the lower frequencies)
mary metadata, and merge primary metadata. NetView de- from that node (Line 7-8). Therefore, we can multiplex the

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
telemetry frequency and increase probes utilization, without
the extra detouring overhead. Furthermore, as the program
iterates (Line 2-16), nodes in the query cluster with lower
frequency need to be visited become much fewer, due to higher
frequency probes have already picked up the primary metadata.
The main time complexity of step 3 comes from Line 2-16
and Line 5-11, which iterates at most m times and N 2 times
respectively, for m query clusters and N switches. (a) Total bandwidth occupancy (b) # of needed vantage servers
B. Result Analyzing Figure 5: Comparing NetView with Countermeasures.
As long as probes back to the vantage server, the telemetry
antenna gives them to telemetry analyzer. Telemetry analyzer
parses each probe into a series of dictionaries, like {(switch
ID, primary metadata type): primary metadata value}, which
means the primary metadata collected from the corresponding
switch, and stores them in the local database at the vantage
server. Encountering the same tuple (i.e., both switch ID
and primary metadata type are identical), telemetry analyzer (a) Each step with network scale (b) Each step with query increase
refreshes the older one. Therefore, the latest whole network Figure 6: Probe generation time.
status can be summarized at the vantage server. Telemetry an- shows, when the switch number is 3920, NetView reduces total
alyzer calculates the query results according to the expression, bandwidth occupancy by more than two orders of magnitude
composes the telemetry reports, and submits them to telemetry compared with INT-path, and reduces bandwidth occupancy by
service provider. Then, query applications periodically poll the more than three orders of magnitude compared with Pingmesh.
telemetry service provider, according to query frequency. Number of needed vantage servers. As shown in figure
5(b), Pingmesh needs every server in the data center to deploy
IV. E VALUATION the Pingmesh agent to work properly. Besides, the number of
NetView is implemented with 2037 lines of Python code servers required for INT-path is proportional to the number of
and 622 lines of P4 code. We conduct our experiments on a Euler paths. When the data center is at large scale, INT-path
server with 32GB RAM and E3-1280 3.9GHz CPU, running needs thousands of servers to conduct network telemetry. In
Ubuntu 16.04 OS, and programmable hardware switches, contrast, NetView only needs one vantage server to conduct
Tofino. The evaluation of NetView consists of three parts. (A) network-wide telemetry in the data center, without bringing
Comparing NetView with countermeasures. We compare about the single server bottleneck (§IV-B).
NetView with state-of-the-art countermeasures in terms of B. Algorithm Evaluation
total bandwidth occupancy and number of needed vantage
servers. (B) Algorithm evaluation. It aims at evaluating the Probe generation time. Figure 6(a) shows the probe
overall performance of NetView algorithms, including probe generation time of Method B as network scales. We fix
generation time, the number of probes, and resource usage. query number to 500 and increase fat-tree topology size by
(C) Testbed testing. We deploy NetView on hardware testbed increasing the port number k per switch. Thus, the increase of
Tofino, to reflect telemetry results and demonstrate the data x-axis is followed by 54 k 2 . The time of step 1 and step 2 are
plane resource usage of NetView. too short, so we use logarithmic coordinate to show results.
Figure 6(a) shows that probe generation time increases slowly.
A. Comparing NetView with Countermeasures Total generation time is 6.63s at 2000 switches and 12.6s at
In this part, we compare NetView with the state-of-the-art 3920 switches. Figure 6(b) shows the probe generation time
network-wide telemetry systems in the data center network, of Method B as query increases. We fix the switch number
Pingmesh, and INT-path. Due to MTU consideration and to 2000 and add queries. Specifically, we randomly construct
making sure the probe return to the vantage server, the label queries in terms of query types, query targets, metadata, and
threshold of NetView should not be too large or too small. frequency. In figure 6(b), total generation time is 0.08s at 10
Especially, the probe length less than 1500 Bytes and kF wd queries and 6.63s at 500 queries, showing great scalability.
larger than 9 are appropriate in the data center network [14]. Number of probes. Figure 7(a) shows the number of
Therefore, throughout the evaluation, we set label threshold generated probes from Method A and Method B on fat-tree
kF wd of NetView as 20 and kT ele as 50. topology containing 2000 switches. The Method B generates
Total bandwidth occupancy. We evaluate the total band- fewer probes than Method A, due to merging redundant
width occupancy (the sum of bandwidth occupancy of each metadata and multiplexing telemetry frequency. In figure 7(b),
link) under different network size on the fat-tree topology, we fix the number of queries to a constant (100, 500, 1000,
comparing three systems (Pingmesh, INT-path, and NetView) respectively), and increase the network size, using Method
under 1 query (monitoring the forwarding delay between all B to generate probes. In figure 7(b), we can find that the
host pairs), with 0.1 Hz telemetry frequency. As figure 5(a) probe number grows fast at first and slowly in the end, due to

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.
(a) Method A and Method B (b) Scalability of Method B (a) Using NetView to detect network per- (b) Hardware resources con-
Figure 7: Number of probes. formance degradation sumed by NetView on Tofino
Figure 9: Testbed on Tofino.
actions to implement compound actions and 13.34% stateful
ALUs to manipulate register, while all the other resources
are marginal. In summary, NetView brings moderate resource
overheads to the hardware target and has strong practicality.
V. C ONCLUSION
In this paper, we propose NetView, a new framework
(a) Memory usage (b) Bandwidth usage
of network telemetry. Unlike the existing probe-based so-
Figure 8: Resource usage of vantage server. lutions, NetView can support various telemetry applications
algorithm optimization. Besides, comparing different curves in and telemetry frequencies on demand. With a series of op-
figure 7(b), we can find that although the number of queries is timization algorithms, NetView can conduct network-wide
100, 500, 1000, the probe number does not increase according telemetry to check all locations and can monitor the detailed
to the multiple, because multiplex different query frequency performance of each device, simultaneously achieving on-
further reduces the number of probes, improving scalability. demand, full coverage, and scalable telemetry.
Resource usage of vantage server. We randomly construct
ACKNOWLEDGMENTS
queries in terms of query types, query targets, and metadata
to measure resource usage. Since the program only runs a few The research is supported by the National Key R&D
seconds, the CPU is almost idle for the rest of time, and thus Program of China (2017YFB0801701) and the National Sci-
it is not a resource bottleneck. As for memory usage, in figure ence Foundation of China (61625203, 61832013, 61872426,
8(a), even under 3920 nodes and 1000 queries, the memory 11771247). Mingwei Xu is the corresponding author.
footprint is less than 190MB, which is negligible for the server R EFERENCES
in the data center. As for bandwidth usage, in figure 8(b), we
[1] Haoyu Song et al. Network Telemetry Framework. Internet-Draft draft-
set the number of queries to 500. Even under 3920 nodes ietf-opsawg-ntf-02, IETF, October 2019. Work in Progress.
and all the query frequency is 1000 Hz, the total bandwidth [2] Q. Wu et al. Network Telemetry and Big Data Analysis. Internet-Draft
usage of the switch connected to the vantage server is less than draft-wu-t2trg-network-telemetry-00, IETF, March 2016. Expired.
[3] Chuanxiong Guo et al. Pingmesh: A large-scale system for data center
25MB/s, occupying about 0.24% bandwidth for the data cen- network latency measurement and analysis. In ACM SIGCOMM, 2015.
ter which supports 10Gbps Ethernet network. Therefore, the [4] In-band network telemetry via programmable dataplanes. In Proceedings
single vantage server design of NetView brings convenience of ACM SOSR, 2015.
[5] Benoit Claise. Cisco systems netflow services export version 9. Website.
while it does not bring about single server bottleneck. http://www.rfc-editor.org/rfc/rfc3954.txt.
C. Testbed Testing [6] sflow. Website. https://sflow.org/.
[7] Yibo Zhu et al. Packet-level telemetry in large datacenter networks.
Testbed on Tofino. We use Tofino to implement NetView SIGCOMM Comput. Commun. Rev., 45(4):479–491, August 2015.
[8] Graham Cormode and S. Muthukrishnan. An Improved Data Stream
prototype on the leaf-spine topology consisting of 6 switches Summary: The Count-Min Sketch and Its Applications. 2004.
and 8 hosts, connected by 10Gbps cables and NICs. Then, we [9] Zaoxing Liu et al. One sketch to rule them all: Rethinking network flow
use data center traffic pattern [17] to artificially construct in- monitoring with univmon. In ACM SIGCOMM Conference, 2016.
[10] Franois Aubry et al. Scmon: Leveraging segment routing to improve
cast traffic, increase the workload sending from seven hosts network monitoring. In the IEEE INFOCOM, 2016.
to one host, and use NetView prototype to reflect network [11] Tian Pan et al. Int-path: Towards optimal path planning for in-band
performance degradation. In figure 9(a), as the green curve network-wide telemetry. In IEEE INFOCOM, pages 487–495, 2019.
[12] Behnaz Arzani et al. Taking the blame game out of data centers
shows, with the workload increase, end-to-end RTT increases operations with netpoirot. In ACM SIGCOMM Conference, 2016.
rapidly at first and then tends to be stable, due to link [13] Barefoot tofino switch. https://barefootnetworks.com/technology/.
congestion. As the brown curve shows, with the workload [14] https://github.com/ljjk/NetView/blob/master/TechnicalReport.pdf.
[15] In-band network telemetry. https://p4.org/assets/INT-current-spec.pdf.
increase, the actual throughput of the host increases linearly at [16] Rajesh Matai, Surya Singh, et al. Traveling salesman problem: an
first and then tends to stabilize, due to the link fully occupied. overview of applications, formulations, and solution approaches. In
Figure 9(b) shows the hardware resource usage of NetView Traveling Salesman Problem, Theory and Applications. 2010.
[17] Mohammad Alizadeh. Empirical traffic generation. Website. https:
on Tofino, and we normalize all the measured results with //github.com/datacenter/empirical-traffic-gen.
Switch.P4 [18], a canonical datacenter switch implemented [18] The P4 Language Consortium. Consolidated switch repo (api, sai and
with P4. In figure 9(b), NetView only need 3.52% VLIW nettlink). Website. https://github.com/p4lang/switch.

Authorized licensed use limited to: Wenzhou University. Downloaded on November 27,2021 at 07:53:29 UTC from IEEE Xplore. Restrictions apply.

You might also like