Lecture 3 P4 NetFPGA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

CS344 – Lecture 3

Copyright © 2018 – P4.org


P4 Toolchain for BMv2 software simulation

Copyright © 2018 – P4.org 2


Basic Workflow
simple_switch_CLI
Program-independent
test.p4 CLI and Client
TCP Socket
(Thrift)
Program-independent
Control Server

p4c-bm2-ss test.json
L
o
PRE g

simple_switch (BMv2)

Egress
Ingress
P4
D Debugger
test.json e
b
u
g

Parser Deparser
Packet Packet
generator Port Interface sniffer

veth0..n
Linux Kernel
Copyright © 2018 – P4.org 3
Step 1: P4 Program Compilation

test.p4 $ p4c-bm2-ss -o test.json test.p4

p4c-bm2-ss

test.json

Copyright © 2018 – P4.org 4


Step 2: Preparing veth Interfaces

test.p4 $ sudo ~/p4lang/tutorials/examples/veth_setup.sh

# ip link add name veth0 type veth peer name veth1


# for iface in “veth0 veth1”; do
ip link set dev ${iface} up
sysctl net.ipv6.conf.${iface}.disable_ipv6=1
TOE_OPTIONS="rx tx sg tso ufo gso gro lro rxvlan txvlan rxhash”
for TOE_OPTION in $TOE_OPTIONS; do
/sbin/ethtool --offload $intf "$TOE_OPTION”
done
done

test.jso
test.json veth
n 0 2 4 2n

Linux
Kernel
2n
veth 1 3 5 +1

Copyright © 2018 – P4.org 5


Step 3: Starting the model
$ sudo simple_switch --log-console --dump-packet-data 64 \
–i 0@veth0 -i 1@veth2 … [--pcap] \
test.json
test.p4
TCP Socket
(Thrift)
Program-independent
Control Server

L
PRE o
g
g
i

Egress
n

BMv2

Ingress
g
test.jso
test.json
n

Parser Deparser

Port Interface veth0.pcap

veth0..n
Linux Kernel
Copyright © 2018 – P4.org 6
Step 4: Starting the CLI
$ simple_switch_CLI BMv2 CLI
Program-independent
CLI and Client
TCP Socket
(Thrift)
test.p4 Program-independent
Control Server

L
test.json
PRE o
g
g
i

Egress
n

Ingress
BMv2
g

test.jso
test.json
n Parser Deparser

Port Interface
veth0..n
Linux Kernel
Copyright © 2018 – P4.org 7
Step 5: Sending and Receiving Packets

Program-independent
Control Server

L
test.json
PRE o
g
g
• scapy i
n
p = Ethernet()/IP()/UDP()/”Payload” • scapy

Egress
BMv2

Ingress
g
sendp(p, iface=“veth0”) sniff(iface=“veth9”, prn=lambda x: x.show())
• Ethereal, etc.. • Wireshark, tshark, tcpdump

Packet Packet
Generator Sniffer
Parser Deparser

Port Interface
veth 0 2 4 2n
Linux
Kernel 2n
veth 1 3 5 +1

Copyright © 2018 – P4.org 8


Overview

Copyright © 2018 – P4.org 9


NetFPGA = Networked FPGA
• A line-rate, flexible, open networking platform for teaching and
research

Copyright © 2018 – P4.org 10


NetFPGA Family of Boards

NetFPGA-1G (2006)
NetFPGA-10G (2010)

NetFPGA-1G-CML (2014) NetFPGA-SUME (2014)


Copyright © 2018 – P4.org 11
International Community
• Over 1,200 users, using over 3500 cards at 200 universities in
over 47 countries

• Join the mailing list: [email protected]


Copyright © 2018 – P4.org 12
NetFPGA board

Networking
Software CPU Memory

running on a
standard PC
PCI-Express
PC with NetFPGA
10GbE
A hardware
accelerator built FPGA 10GbE
with FPGA driving
1/10/ 100Gb/s 10GbE
network links Memory
10GbE

Copyright © 2018 – P4.org 13


NetFPGA consists of …

Four elements:

• NetFPGA board

• Tools + reference designs

• Contributed projects

• Community

Copyright © 2018 – P4.org 14


Xilinx Virtex 7 690T
• Optimized for high-
performance
applications

• 690K Logic Cells

• 52Mb RAM

• 3 PCIe Gen. 3
Hard cores

Copyright © 2018 – P4.org 15


Memory Interfaces
• DRAM:
2 x DDR3 SoDIMM
1866MT/s, 4GB

• SRAM:
3 x 9MB QDRII+, 500MHz

Copyright © 2018 – P4.org 16


Host Interface

• PCIe Gen. 3

• x8 (only)

• Hardcore IP

Copyright © 2018 – P4.org 17


Front Panel Ports
• 4 SFP+ Cages
• Directly connected to
the FPGA
• Supports 10GBase-R
transceivers (default)
• Also Supports
1000Base-X
transceivers and
direct attach cables

Copyright © 2018 – P4.org 18


Expansion Interfaces
• FMC HPC connector
◦ VITA-57 Standard
◦ Supports Fabric Mezzanine
Cards (FMC)
◦ 10 x 12.5Gbps serial links

• QTH-DP
◦ 8 x 12.5Gbps serial links

Copyright © 2018 – P4.org 19


Storage
• 128MB FLASH

• 2 x SATA connectors

• Micro-SD slot

• Enable standalone
operation

Copyright © 2018 – P4.org 20


Reference Switch Pipeline
• Five stages
10GE 10GE 10GE 10GE
◦ Input port RxQ RxQ RxQ RxQ DMA
◦ Input arbitration
◦ Forwarding decision and packet
modification Input Arbiter

◦ Output queuing
◦ Output port Output Port
Lookup
• Packet-based module
interface
• Pluggable design Output Queues

10GE 10GE 10GE 10GE


DMA
Tx Tx Tx Tx
Copyright © 2018 – P4.org 21
Full System Components

Software
nf0 nf1 nf2 nf3 ioctl

PCIe Bus Registers

CPU CPU AXI Lite


RxQ TxQ

NetFPGA user data path

10GE 10GE
Tx Rx

Ports
Copyright © 2018 – P4.org 22
NetFPGA – Host Interaction

• Linux driver interfaces with hardware

◦ Packet interface via standard Linux network stack

◦ Register reads/writes voa ioctl system call with wrapper functions

■ rwaxi(int address, unsigned *data);

■ Eg: rwaxi(0x7d4000000, &val)

Copyright © 2018 – P4.org 23


NetFPGA to Host Packet Transfer

1. Packet arrives –
forwarding table
sends to DMA
queue

PCIe Bus
2. Interrupt notifies 3. Driver sets up and
driver of packet arrival initiates DMA transfer

Copyright © 2018 – P4.org 24


NetFPGA to Host Packet Transfer

PCIe Bus
4. NetFPGA transfers 5. Interrupt signals
packet via DMA completion of DMA

6. Driver passes packet


to network stack
Copyright © 2018 – P4.org 25
Host to NetFPGA Packet Transfer

PCIe Bus
2. Driver sets up and 3. Interrupt signals
initiates DMA transfer completion of DMA

1. Software sends packet


via network sockets.
Packet delivered to driver

Copyright © 2018 – P4.org 26


NetFPGA Register Access

PCIe Bus
2. Driver performs
PCIe memory
read/write
1. Software makes ioctl call
on network socket. ioctl
passed to driver

Copyright © 2018 – P4.org 27


Overview

Copyright © 2018 – P4.org 28


General Process for Programming a P4 Target

P4->NetFPGA tools

Control Plane

RUNTIME
P4 Program P4 Compiler Add/remove Extern Packet-in/out
table entries control

CPU port
P4 Architecture Target-specific Extern
configuration Load Tables Data Plane
Model objects
binary

Target
SimpleSumeSwitch
Architecture Copyright © 2018 – P4.org
NetFPGA SUME 29
P4àNetFPGA Compilation Overview
NetFPGA Reference Switch
P4 Program

10GE 10GE 10GE 10GE


RxQ RxQ RxQ RxQ DMA

Xilinx P416 Compiler


Input Arbiter

Xilinx SDNet Tools Output Port


SimpleSume
Lookup
Switch

SimpleSumeSwitch Architecture
Output Queues

10GE 10GE 10GE 10GE


DMA
Copyright © 2018 – P4.org Tx Tx Tx Tx 30
Xilinx SDNet Design Flow & Use Model

.sdnet
Firmware

Packet Processing Spec.


• PX (domain specific language)
• describe function in
packet-oriented terms
HDL description

SDNet Compiler
• Throughput & Latency
• Resources
• Programmability

Tailored Packet Processor

Copyright © 2018 – P4.org Page 31


Xilinx P4 Design Flow & Use Model

.p4
Xilinx P416 Compiler .sdnet
$ p4c-sdnet switch.p4

Verification Environment

Verilog Lookup Engine High level C++


Top level Verilog System Verilog
Engines C++ Drivers Testbench
wrapper Testbench
(Encrypted)

Copyright © 2018 – P4.org Page 32


Considerations When Mapping to SDNet
• Identifying parallelism within P4 parser and control blocks
◦ table lookups
◦ actions
◦ etc.
• P4 packet processing model
◦ extract entire header from packet
◦ updates apply directly to header
◦ deparser re-inserts header back into packet
• SDNet packet processing model
◦ stream packet through “engines”
◦ modify header values in-line without removing and re-inserting

Copyright © 2018 – P4.org Page 33


Mapping P4 Architectures to SDNet

Ingress Egress
Parser Match+Action Match+Action Deparser

Lookup Editing
Parsing Lookup Editing Editing
Engine
Lookup Engine
Editing
Engine Engine Engine Engine
Engine Engine

read packet read tuples read tuples


write tuples write tuples write packet

Copyright © 2018 – P4.org Page 34


Support for Multiple Architectures

SimpleSumeSwitch Only Parser


Ø Pull information from packet w/o updates

Copyright © 2018 – P4.org Page 35


SimpleSumeSwitch Architecture Model for SUME Target
tdata tdata

tuser tuser

AXI AXI
Lite Lite

• P4 used to describe parser, match-action pipeline, and deparser


Copyright © 2018 – P4.org 36
Standard Metadata in SimpleSumeSwitch Architecture
/* standard sume switch metadata */
struct sume_metadata_t {
bit<16> dma_q_size;
bit<16> nf3_q_size;
bit<16> nf2_q_size;
bit<16> nf1_q_size;
bit<16> nf0_q_size;
bit<8> send_dig_to_cpu; // send digest_data to CPU
bit<8> dst_port; // one-hot encoded
bit<8> src_port; // one-hot encoded
bit<16> pkt_len; // unsigned int
}

*_q_size – size of each output queue, measured in terms of 32-byte words, when packet starts being
processed by the P4 program
src_port/dst_port – one-hot encoded
user_metadata/digest_data – structs defined by the user

Copyright © 2018 – P4.org 37


Interface Naming Conventions

src / dst port fields:


x-x-x-x-x-x-x-x
nf0 nf1 nf2 nf3 ioctl

Registers

CPU CPU AXI Lite


RxQ TxQ

user data path

10GE 10GE
Tx Rx

nf3 nf2 nf1 nf0 Ports


Copyright © 2018 – P4.org 38
Overall P4 Program Structure
#include <core.p4>
#include <sume_switch.p4>

/******** CONSTANTS ********/


#define IPV4_TYPE 0x0800

/******** TYPES ********/


typedef bit<48> EthAddr_t;
header Ethernet_h {...}
struct Parsed_packet {...}
struct user_metadata_t {...}
struct digest_data_t {...}

/******** EXTERN FUNCTIONS ********/


extern void const_reg_rw(...);

/******** PARSERS and CONTROLS ********/


parser TopParser(...) {...}
control TopPipe(...) {...}
control TopDeparser(...) {...}

/******** FULL PACKAGE ********/


SimpleSumeSwitch(TopParser(), TopPipe(), TopDeparser()) main;

Copyright © 2018 – P4.org 39


P4àNetFPGA Extern Function library

• Implement platform specific functions


• Black box to P4 program

• Implemented in HDL
• Stateless – reinitialized for each packet
• Stateful – keep state between packets
• Xilinx Annotations
• @Xilinx_MaxLatency() – maximum number of clock cycles an extern function needs to
complete
• @Xilinx_ControlWidth() – size in bits of the address space to allocate to an extern
function

Copyright © 2018 – P4.org 40


Stateless vs. stateful operations
Stateless operation: pkt.f4 = pkt.f1 + pkt.f2 – pkt.f3

f1 f1 f1
f2 f2 f2
pkt.tmp = pkt.f4 =
f3 f3 f3
pkt.f1 + pkt.f2 pkt.tmp - pkt.f3
f4 f4 f4 =
tmp – f3
tmp tmp = f1 tmp = f1
Can pipeline stateless
+ f2 operations + f2
Stateless vs. stateful operations
Stateful operation: x = x + 1 X should be 2,
not 1!
X = 01

tmp tmp
tmp pkt.tmp = x pkt.tmp ++ x = pkt.tmp
=0 =1

tmp tmp
tmp
=0 =1
Stateless vs. stateful operations
Stateful operation: x = x + 1

tmp X++

Cannot pipeline, need atomic operation in h/w


P4àNetFPGA Extern Function library
• HDL modules invoked from within P4 programs
• Stateful Atoms [1] Atom Description
R/W Read or write state
RAW Read, add to, or overwrite state
PRAW Predicated version of RAW
ifElseRAW Two RAWs, one each for when predicate is true or false
Sub IfElseRAW with stateful subtraction capability

• Stateless Externs
Atom Description
IP Checksum Given an IP header, compute IP checksum
LRC Longitudinal redundancy check, simple hash function
• Add your own! timestamp Generate timestamp (granularity of 5 ns)
[1] Sivaraman, Anirudh, et al. "Packet transactions: High-level programming for line-rate switches." Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 2016.
Copyright © 2018 – P4.org 44
Adding Custom Externs

1. Implement verilog extern module


2. Add entry to $SUME_SDNET/bin/extern_data.py

• No need to modify and existing code


• AXI Lite control interface module auto generated

Copyright © 2018 – P4.org 45


Using Atom Externs in P4 – Resetting Counter
Packet processing pseudo code:

count[NUM_ENTRIES];

if (pkt.hdr.reset == 1):
count[pkt.hdr.index] = 0
else:
count[pkt.hdr.index]++

Copyright © 2018 – P4.org 46


Using Atom Externs in P4 – Resetting Counter
#define REG_READ 0 u State can be accessed exactly 1 time
#define REG_WRITE 1
#define REG_ADD 2 u Using RAW atom here
// count register
@Xilinx_MaxLatency(64)
@Xilinx_ControlWidth(3)
extern void count(in bit<3> index, in bit<32> newVal, Instantiate atom
in bit<32> incVal,in bit<8> opCode,
out bit<32> result);

bit<16> index = pkt.hdr.index;


bit<32> newVal; bit<32> incVal; bit<8> opCode;
if(pkt.hdr.reset == 1) {
newVal = 0;
incVal = 0; // not used
opCode = REG_WRITE; Set metadata for state access
} else {
newVal = 0; // not used
incVal = 1;
opCode = REG_ADD;
}
47
bit<32> result; // the new value stored in count reg
count_reg_raw(index, newVal, incVal, opCode, result);
Single state access!
Copyright © 2018 – P4.org
API & Interactive CLI Tool Generation

• Both Python API and C API


• Manipulate tables and stateful elements in P4 switch
• Used by control-plane program
• CLI tool
• Useful debugging feature
• Query various compile-time information
• Interact directly with tables and stateful externs in at run time

Copyright © 2018 – P4.org 48


P4àNetFPGA Workflow

1. Write P4 program All of your effort


will go here
2. Write externs
3. Write python gen_testdata.py script
fail

4. Compile to Verilog / generate API & CLI tools


5. Run simulations
pass

6. Build bitstream
7. Check implementation results
8. Test the hardware Copyright © 2018 – P4.org 49
Debugging P4 Programs

• SDNet HDL Simulation


• SDNet C++ simulation
◦ Verbose packet processing info
◦ Output PCAP file

• Full SUME HDL simulation


• Custom Python Model

Copyright © 2018 – P4.org 50


Assignment 1: Switch as a Calculator

Copyright © 2018 – P4.org 51


Switch as a Calculator
• Supported Operations
◦ ADD – add two operands
◦ SUBTRACT – subtract two operands
◦ ADD_REG – add operand to current value in the register
◦ SET_REG – overwrite the current value in the register
◦ LOOKUP – Lookup the given key in the table

header Calc_h {
bit<32> op1;
bit<8> opCode;
bit<32> op2;
bit<32> result;
}
Copyright © 2018 – P4.org 52
Switch as a Calculator
User PC NetFPGA SUME

Ethernet DST: MAC1


SRC: MAC2
Type: CALC_TYPE
Calc op1: 1
opCode: ADD
op2: 2
result: 0
Payload…

Copyright © 2018 – P4.org 53


Switch as a Calculator
User PC NetFPGA SUME

Ethernet DST: MAC1


SRC: MAC2
Type: CALC_TYPE
Calc op1: 1
opCode: ADD
op2: 2
result: 0
X 3

Payload…

Copyright © 2018 – P4.org 54


Switch as a Calculator
User PC NetFPGA SUME

Ethernet DST: MAC2


SRC: MAC1
Type: CALC_TYPE
Calc op1: 1
opCode: ADD
op2: 2
result: 3
Payload…

Copyright © 2018 – P4.org 55


Switch Calc Operations
ADD SUB ADD_REG
op1 op2 op1 op2 op2 const[op1]

+ - +

result result: op1-op2 result


LOOKUP
SET_REG
key val
op2 result:
key: 0 1
op1 val
1 16

2 162
const[op1]
3 163
56
Copyright © 2018 – P4.org
FIN

Copyright © 2018 – P4.org 57


Research topics

Copyright © 2018 – P4.org 58


Examples of ongoing P4 Research Topics
• P4 Infrastructure
◦ Programmable scheduling
◦ Programmable target architectures
◦ PacketMod
• Data-plane Programs
◦ In-band network telemetry
◦ Congestion control
◦ Load balancing
• Networking-Offloading Applications
◦ Aggregation for MapReduce applications
◦ Key-value caching
◦ Consensus

Copyright © 2018 – P4.org 59


Programmable Scheduling

Sivaraman, Anirudh, et al. "Programmable packet scheduling at line rate." Proceedings of the 2016 ACM SIGCOMM Conference. ACM, 2016.

Copyright © 2018 – P4.org 60


Why scheduler is not programmable ... so far
● Plenty of scheduling algorithms, but no consensus on right
abstractions. Contrast to:
○ Parse graphs for parsing
○ Match-Action tables for forwarding

● Scheduler has tight timing requirements


○ One decision every few ns

Copyright © 2018 – P4.org 61


What does the Scheduler do?
Decides:
● In what order are packets sent?
○ Ex: FCFS, Priorities, WFQ
● At what time are packets sent?

Key observation:
● For many algorithms, the relative order in which packets are
sent does not change with future arrivals
○ i.e. scheduling order can be determined before enqueue

Copyright © 2018 – P4.org 62


PIFO
● PIFO - proposed abstraction that can be used to implement
many scheduling algorithms
● Packets are pushed into an arbitrary location based on
computed rank
Rank Computation PIFO scheduler

(programmable) (fixed logic)


Copyright © 2018 – P4.org 63
PIFO Tree

Copyright © 2018 – P4.org 64


PIFO Remarks
● Very limited scheduling in modern switching chips
○ Deficit Round Robin, traffic shaping, strict priorities
● Scheduling algorithms that can be implemented with PIFO
○ Weighted Fair Queueing, Token Bucket Filtering, Hierarchical Packet Fair
Queueing, Least-Slack Time-First, the Rate Controlled Service Disciplines,
and fine-grained priority scheduling (e.g., Shortest Job First)
● PIFO cannot implement algorithms that require
○ Changing the scheduling order of all packets of a flow
○ Output rate limiting
● PIFO implementation feasibility?

Copyright © 2018 – P4.org 65


Programmable Target Architectures

Observations:
◦ Current P4 expectation: target architectures are fixed, specified in English
◦ FPGAs can support many different architectures

Idea:
◦ Extend P4 to allow description of target architectures
■ More precise definition than English description

◦ Generate implementation on FPGA


◦ Easily integrate custom modules
◦ Explore performance tradeoffs of different architectures

Copyright © 2018 – P4.org 66


Many Possible Architectures…
SimpleSumeSwitch

Output
Parser M/A Deparser
Queues

V1 Model

Output
Parser M/A TM M/A Deparser
Queues

Portable Switch Architecture

Output
Parser M/A Deparser TM Parser M/A Deparser
Queues

Copyright © 2018 – P4.org 67


Many Possible Architectures…

Custom Traffic Manager

My Output
Parser M/A M/A Deparser
TM Queues

Programmable Packet Generator


Pkt
Gen

Output
Parser M/A TM M/A Deparser
Queues

Copyright © 2018 – P4.org 68


Programmable Target Architectures
package SimpleSumeSwitch<H, M, D>( // TopDeparser <-- TopPipe
Parser<H, M, D> TopParser, TopDeparser.p = TopPipe.p;
Pipe<H, M, D> TopPipe, TopDeparser.user_metadata = TopPipe.user_metadata;
Deparser<H, M, D> TopDeparser) { TopDeparser.digest_data = TopPipe.digest_data;
TopDeparser.sume_metadata = TopPipe.sume_metadata;
// Top level I/O
packet_in instream; // TopDeparser output connections
inout sume_metadata_t sume_metadata; digest_data = TopDeparser.digest_data;
out D digest_data; sume_metadata = TopDeparser.sume_metadata;
packet_out outstream; outstream = TopDeparser.b;
}
// Connectivity of the architecture }
connections {
// TopParser input connections
TopParser.b = instream;
TopParser.sume_metadata = sume_metadata;

// TopPipe <-- TopParser


TopPipe.p = TopParser.p;
TopPipe.user_metadata = TopParser.user_metadata;
TopPipe.digest_data = TopParser.digest_data;
TopPipe.sume_metadata = TopParser.sume_metadata;

Copyright © 2018 – P4.org 69


Workflow
• Two Actors: (1) Target Architecture Designer, (2) P4 Programmer

Provides: Implements:
• P4+ architecture declaration • non-P4 elements
• externs
in target architecture

• Someone who is more familiar with FPGA development

Copyright © 2018 – P4.org 70


Workflow
• Two Actors: (1) Target Architecture Designer, (2) P4 Programmer

Implementation of P4 elements P4+ architecture description


Compile to PX Compile to PX

PX subsystems Partial PX System

Complete PX System

Compile to Verilog

HDL switch design


Copyright © 2018 – P4.org 71
In-band Network Telemetry (INT)
“I visited: switch 1 @ 780ns,
1 Which path did my packet take? switch 9 @ 1.3us,
switch 12 @ 2.4us

# Rule

3 “In switch 1, I followed rules


… 2 Which rules did my packet follow? 75 and 250.In switch 9, rules
75 192.168.0 3 and 80”
/24

Copyright © 2018 – P4.org INT Slides courtesy of Nick McKeown 72


In-band Network Telemetry (INT)

3 “Delay: 100ns, 200ns, 19740ns”


How long did my packet queue at each switch?

Queue

4 Who did my packet share the queue with?

Time
Copyright © 2018 – P4.org INT Slides courtesy of Nick McKeown 73
In-band Network Telemetry (INT)

1 Which path did my packet take?

2 Which rules did my packet follow?

3 How long did my packet queue at each switch?

4 Who did my packet share the queue with?

No need to add a single additional packet!

Copyright © 2018 – P4.org INT Slides courtesy of Nick McKeown 74


Congestion Control

Reactive Congestion Control

Adjust Flow
Rate

Measure
Congestion

• No use of explicit information about traffic matrix


• Can only react and move in right direction
• Typical flows will finish in just a few RTTs as we
• Reactive techniques are slow to converge move towards higher link speeds
(10s-100s of RTTs)
Copyright © 2018 – P4.org 75
Proactive Congestion Control

10 Gbps • Proactive
techniques
converge much
more quickly than
reactive
Reactive
• Faster
convergence
times lead to
lower flow
Proactive
proactive
completion times
proactive

Copyright © 2018 – P4.org 76


An example proactive scheme
10 Gb/s link
Flow A

Sending host
adjusts sending
rate

Switch Computation Control Packet


N = 1 flow Per-flow state:
C = 10Gb/s • BW demand
Fair share = C/N = 1 Gb/s • Current bottleneck
Per-link state

Copyright © 2018 – P4.org 77


An example proactive scheme
10 Gb/s link
Flow A
Flow B

Switch Computation

N = 2 flow
C = 10 Gb/s
Fair share = C/N = 5 Gb/s

Copyright © 2018 – P4.org 78


Proactive Algorithm in P4

Set low priority

L2 Priority
Forwarding Set high priority Output
Logic is ctrl pkt Queues
Read/update link state

Compute fair share rate

Set bandwidth demand

Copyright © 2018 – P4.org 79


In-Network Computation
• Programmable data plane hardware à opportunity to reconsider
division of computation
• What kinds of computation should be delegated to network?
• Network computations are constrained:
◦ Limited memory size (10’s of MB of SRAM)
◦ Limited set of actions (simple arithmetic, hashing, table lookups)
◦ Few operations per packet (10’s of ns to process each packet)
• Goals:
◦ Reduce: application runtime, load on servers, network congestion
◦ Increase: application scalability

Sapio, Amedeo, et al. "In-Network Computation is a Dumb Idea Whose Time Has Come." Proceedings of the 16th ACM Workshop on Hot Topics in
Networks. ACM, 2017.

Copyright © 2018 – P4.org 80


In-Network Aggregation
• Aggregate data at intermediate network nodes to reduce network
traffic
• Simple arithmetic operations at switches
• Widely applicable to many distributed applications
◦ Machine learning training
◦ Graph analytics
◦ MapReduce applications

Copyright © 2018 – P4.org 81


In-Network Aggregation
• Network controller is informed of MapReduce job
◦ Configures switches in aggregation tree to perform aggregation
• Significant network traffic reduction à reduced run time
• How to make robust to loss? Encryption?

Aggregation Tree Reduction Results


91 50
90
45
89
88 40

Reduction [%]
87 35
86
30
85
84 25
83 20
82
15
81
80 10
Data Reduce # packets # packets
volume time (UDP baseline) (TCP baseline)

Figure 3: Reduction on the amount of data, running time


and number of packets received at reducers.
Copyright © 2018 – P4.org 82
The P4 Language Consortium

• http://p4.org

• Consortium of academic
and industry members

• Open source, evolving,


domain-specific language

• Permissive Apache license,


code on GitHub today

• Membership is free:
contributions are welcome

• Independent, set up as a
California nonprofit

Copyright © 2018 – P4.org 83

You might also like