1 Introduction

The rapid development of telecommunication business and the continuous emergence of new applications have put forward higher requirements for bandwidth [1]. With the advent of multiple network interfaces on devices [2], aggregating the bandwidth of multiple paths to generate throughput benefits has become a natural evolution. However, the traditional Transmission Control Protocol (TCP) allows only single path to access the Internet at a time [3]. As a promising solution, Multipath TCP (MPTCP) extends TCP, enabling the devices with multiple network interfaces to transmit data over multiple paths simultaneously [4]. Supported by MPTCP, a higher end-to-end throughput can be potentially achieved and the robustness can be significantly improved during times of path failure.

Applying MPTCP to data transmission can effectively cope with the growing bandwidth demands. However, each heterogeneous path in MPTCP varies in transmission characteristics, including bandwidth, transmission delay, packet loss rate, and so on. Compared with single path in TCP, the asymmetric heterogeneous paths in MPTCP will cause more out-of-order packets, which will block the receive buffer and degrade the transmission performance [56]. Since the out-of-order problem is more prominent in multipath transmission scenarios, it has attracted widespread research interest in both academia and industry. A variety of solutions have been proposed to overcome the out-of-order problem [7], most of which are sender-centric. In such solutions, the receiver is restricted to acknowledging receipt of packets and using acknowledgment chunks to provide feedback, while the sender executes all key operations based on the feedback. However, the network state may change several times before the sender receives the feedback such that the operations of the sender cannot fit the current network state [8]. Furthermore, the situation can be even worse in wireless heterogeneous networks due to frequent packet loss [9]. Besides, to enable these solutions, customized changes on the sender and/or the receiver are required, which is relatively complicated. Therefore, these solutions cannot effectively solve the out-of-order problem and fully utilize the bandwidth of multiple paths.

In this paper, we propose a novel packet scheduling mechanism named Adaptive Delay-Aligned Scheduling (ADAS), which is a fresh attempt to consider both out-of-order mitigation and throughput improvement for data transmission in heterogeneous wireless networks. Different from the existing designs, ADAS schedules and processes the packets on the last-hop connected the receiver. Firstly, a virtual link loop is equipped on the last-hop to buffer the out-of-order packets within time threshold T. Secondly, a packet scheduling algorithm is proposed to efficiently process the out-of-order packets based on the first-hand knowledge parsed from the packet header. At last, an adaptive weighting algorithm is proposed to dynamically adjust the time threshold T based on the connect state information. With an adaptive time threshold T, the over scheduling can be avoided and the overall throughput is improved. The main contributions of this paper are summarized as follows:

  • We design ADAS for multipath transmission in heterogeneous wireless networks. ADAS utilizes the wisdom of the last-hop connected to the receiver and frees bandwidth aggregation from the out-of-order problem, leaving both the sender and the receiver unchanged.

  • We propose an adaptive weighting algorithm to dynamically adjust the time threshold T. The overall throughput will decrease due to the out-of-order packets staying in the virtual link loop too long. An adaptive time threshold T can avoid over scheduling and improve the overall throughput.

  • We establish a real-network environment and carry out extensive experiments to validate the effectiveness and superiority of ADAS. The results show that ADAS outperforms state-of-the-art mechanisms and allows parallel implementation with the current scheduling mechanisms to further promote the performance.

The remainder of this paper is organized as follows. Section 2 briefly reviews the related work. The core components and design principles of ADAS are introduced in details in Section 3. In Section 4, extensive experiments are carried out to validate the effectiveness and superiority of ADAS. Besides, the compatibility of ADAS is also validated through combining ADAS with LowRTT and OTIAS. Finally, we conclude this paper in Section 5.

2 Related work

In the past few years, researchers have proposed various methods to solve the out-of-order problem in MPTCP. As a rule-of-thumb, expanding the receive buffer size to store more out-of-order packets is the most straightforward way to address this issue [10]. However, it does not reduce any out-of-order packets. Besides, the reordering of the out-of-order packets will cause a long latency, which is unacceptable for time-sensitive applications.

Some proposed mechanisms focus on the design of scheduling algorithms to ensure that the packets arrive at the receiver in order. Round-Robin (RR) is the simplest scheduling algorithm, which sends packets to each path in turn if the path has available window. RR can achieve a good performance in the scenario where the paths have homogeneous transmission characteristics. However, the paths are heterogeneous in most of the real-network scenarios. Using RR will cause a wealth of out-of-order packets.

To cope with the asymmetry of heterogeneous paths, many efforts have been devoted to optimize the scheduling algorithms. Currently, the latest MPTCP implementation (MPTCP v0.95 based on the Linux Kernel Longterm Support release v4.19) uses Lowest-RTT-First (LowRTT) [1112] as the default scheduling algorithm, which schedules packets based on the Round-Trip Time (RTT) estimation. It first sends packets to the path with the lowest RTT estimation, the path with next lower RTT estimation, and so forth. LowRTT performs well when transmitting short flows (size is in KB), the performance of which depends on transmission delay rather than bandwidth [13]. However, LowRTT does not consider other factors in the heterogeneous networks, such as congestion window, packet loss rate, and so on. The receiver may still receive out-of-order packets due to the different transmission characteristics of the asymmetric paths. Unlike LowRTT, some scheduling algorithms are developed based on multiple factors. Sarwar et al. proposed Delay Aware Packet Scheduling (DAPS) based on the awareness of per-path delay with respect to the combined overall capacity of the paths to proactively minimize the blocking inside receive buffer [14]. And Kuhn et al. developed an analytical model of maximum receive buffer blocking time to further extend DAPS [15]. Yang et al. proposed Out-of-order Transmission for In-order Arrival Scheduler (OTIAS) to mitigate jitter for time-sensitive applications [16]. Ferlin et al. pointed out that both DAPS and OTIAS cannot react upon network changes in a timely manner through extensive simulations and real-network experiments. They then proposed a send-window BLocking ESTimation scheduler (BLEST) to minimize HoL-blocking in heterogeneous networks [8]. In addition, Ke et al. proposed a multiple attribute-aware data scheduling strategy for MPTCP (MPTCP-MA\(^{2}\)) by taking RTT and congestion window into account and using an optimized path sorting algorithm to compare and sort all available paths [17]. Similar to MPTCP-MA\(^{2}\), Luo et al. also designed a new MPTCP multi-attribute aware data scheduling algorithm based on RTT and congestion state (RCDS) [18]. RCDS sends packets through the path with minimum RTT firstly for the RTT ratio times and then selects the path with the best congestion state once.

Although the above proposals can reduce the probability of packet out-of-order and improve overall throughput to a certain extent, they all ignore the influence of packet loss and suffer from significant performance degradation in lossy heterogeneous networks. To improve the robustness of the scheduling algorithms in lossy heterogeneous networks, Xue et al. proposed Forward Prediction based Dynamic Packet Scheduling and Adjusting with Feedback (DPSAF) to detect and reduce out-of-order packets by considering packet loss and using feedback information from SACK options for further correction [19]. However, DPSAF does not consider the scenario where MPTCP may suffer from retransmission timeout (RTO) in a highly lossy network. Dong et al. proposed Loss-Aware MPTCP Scheduler (LAMPS) for highly lossy networks, which considers both transmission delay and packet loss [20]. LAMPS relies heavily on the accuracy of the packet loss rate estimation. However, the packet scheduling process will influence the packet loss rate estimation when pushing many packets to a certain path. Hence, LAMPS cannot deal with loss rate switching as expected and suffers a performance degradation when burst packet losses happen. Yang et al. proposed Loss-Aware Throughput Estimation scheduler (LATE) through establishing a throughput estimation model, which comprehensively considers the transmission characteristics of each path including RTT, congestion window and packet loss rate [21]. With the throughput estimation model, LATE can estimate the data amount that each path can delivery in a certain round and schedule packets adaptively into different paths accordingly. The authors presented the experimental results of LATE under discrete increasing trends of packets loss rate, RTT, and filesize, which show that LATE outperforms BLEST and DPSAF. However, both RTT and packet loss rate may fluctuate irregularly, especially in heterogeneous wireless networks.

Recently, more and more novel mechanisms of Machine Learning (ML) based are used to solve the out-of-order problem in MPTCP due to the fusion of ML into the transport layer [2223]. Serval works leverage Deep Reinforcement Learning to redesign scheduling algorithms for MPTCP [24,25,26]. However, there are no significant improvements compared to traditional algorithms. Li et al. proposed SmartCC and used an asynchronous reinforcement learning framework to learn a set of congestion rules. With these rules, the sender can observe the environment and take actions to adjust the congestion windows of the paths adaptively [27]. Wu et al. proposed Peekaboo which is aware of the dynamic transmission characteristics of the heterogeneous paths [28]. However, the speed at which the mobile network changes may surpass the online learning speed of Peekaboo. Besides, there is not sufficient data for Peekaboo to learn the mobile network. Moreover, there are series of works applying ML to MPTCP for IoT applications [2930], industrial scenarios [31,32,33], and mobile devices [3435]. These proposals may work well in specific scenarios, but cannot be applied to others. In general, applying ML to MPTCP and utilizing ML to generate scheduling strategies face two challenges: (1) where to deploy the ML model and execute the training tasks, (2) the learning speed should be faster than the speed at which the network state changes. On the one hand, the end devices may have limited computing resources and are not capable of the training tasks. Executing the training tasks on the end devices is prone to lead a slow convergence, which causes the training results to be inapplicable to the current network state. On the other hand, despite offloading the training tasks to the remote servers with powerful computing capacity can profoundly improve the learning speed, it will introduce additional communication latency due to the parameter exchanging between the end devices and the remote servers. Only when these challenges are solved, can ML be better applied to MPTCP.

Based on the above discussion, we propose that multiple factors shall be considered, especially the packet loss, to overcome the out-of-order problem for MPTCP in heterogeneous wireless networks. Furthermore, the proposed mechanism shall be compatible to facilitate realistic deployment. Taking these into consideration, we propose ADAS to work on the last-hop connected to the receiver. On the one hand, ADAS can take full advantage of the first-hand knowledge parsed from the packet header, which can be easily obtained by the last-hop. On the other hand, ADAS can run in parallel with the current packet scheduling algorithms because it can be easily deployed on the network edge device and does not change any implementations of both the sender and the receiver. In fact, ADAS has better compatibility and robustness since it is implemented based on the data plane P4 language, which gives ADAS the potential to support various existing designs.

3 ADAS mechanism

ADAS is designed for data transmission in heterogeneous wireless networks. A typical scenario of heterogeneous wireless networks with multiple paths is illustrated in Fig. 1. The Sending Unit consists of the sender and the edge device S1. Similarly, the receiver and the edge device S2 compose the Receiving Unit. The Sending Unit transmits data to the Receiving Unit using a plurality of different access methods, including WIFI, 4G/5G, and so on.

ADAS mainly focuses on the Receiving Unit and is empowered with three core components, namely (1) virtual environment configuration, (2) packet scheduling algorithm, and (3) adaptive weighting algorithm. Specifically, a virtual link loop is configured on the edge device S2 to buffer the out-of-order packets within time threshold T. A packet scheduling algorithm is proposed to send the packets to the receiver as sequentially as possible to address the out-of-order problem. Besides, the time threshold T is dynamically adjusted to achieve a better overall throughput with the adaptive weighting algorithm.

3.1 Virtual environment configuration

Virtual network appliances make packet processing in the network edge devices more flexible, leaving both the sender and the receiver unchanged. For example, network namespaces provide isolation of the system resources associated with networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewall rules, and so on. A physical network device can live in exactly one network namespace. A virtual Ethernet (veth) pair provides a pipe-like abstraction which can be used to create tunnels between network namespaces or bridges to physical network devices in another namespace. When a network namespace is freed, the physical network devices are moved back to the initial network namespace.

The edge device S2 isolates traffic in the virtual link loop by using a network namespace that shares a single set of network interfaces and routing table entries. To connect the current network namespace with the newly added network namespace, the veth interfaces need to be assigned to the network namespace. Suppose the edge device S2 has four physical interfaces, three of which are used to connect to the heterogeneous wireless networks with multiple paths, and the other one is used to connect to the receiver. As shown in Fig. 2, two veth pairs are added to connect different namespaces. The packets passing through port 4 will return to port 5 in the path of veth11 \(\rightarrow\) veth12 \(\rightarrow\) veth22 \(\rightarrow\) veth21, which forms a virtual link loop. Here, we use the powerful Linux Traffic Control tools to delay veth12 for 1 ms to transmit packets to accomplish 1 ms delay of the virtual link loop. In this way, each out-of-order packet will take 1 ms to go through the virtual link loop every time. Therefore, the delay-aligned scheduling with millisecond accuracy can be achieved as long as the out-of-order packets enter the virtual link loop for an appropriate number of times before being sent to the receiver.

Fig. 1
figure 1

A typical scenario of heterogeneous wireless networks with multiple paths

3.2 Packet scheduling algorithm

In heterogeneous wireless networks, the packets with smaller sequence number sent over a slower path may arrive at the edge device later than the packets with bigger sequence number sent over a faster path. Forwarding directly these packets from the edge device to the receiver would cause the receive buffer to be blocked by the out-of-order packets.

The edge device S2 can take full advantage of the first-hand knowledge parsed from the packet header to process the out-of-order packets as far as possible before forwarding them to the receiver. Specifically, the first-hand knowledge includes source IP address, destination IP address, source port number, destination port number and protocol (i.e., the five-tuples). In ADAS, Cyclic Redundancy Check 16 (CRC16) is used to hash the five-tuples of the packets so that each connection can be identified by the hash value with the connection information being masked. The calculation method of the hash value is expressed by Equation (1).

$$\begin{aligned} hash(\bullet ) = CRC16(F)\ mod \ N \end{aligned}$$
(1)

where F represents the contents of the five-tuples of the current packet, and N represents the size of the connection state register, which is a P4 programmable module. The larger the value of N is, the more space the register takes up, and the smaller probability of hash value conflict is. Considering the number of data streams and storage space in the real-network, N is set to 256, and the available interval of the index is 0x00–0xff. Therefore, the register contains 256 groups, and the size of each register does not exceed \(2 ^{32}-1\).

Fig. 2
figure 2

The virtual link loop configured on S2

Once a connection is well identified by the corresponding hash value, the sequence number of the expected next packet in this connection shall be calculated. To accomplish this task, the edge device S2 also collects the sequence number, the length of each protocol header, and other specific fields of the current packet so as to calculate the sequence number \(E{s_i}\) of the expected next packet in the ith connection. Taking the IPv4 protocol as an example, the abstract expression of the calculation method of \(E{s_i}\) is described as Equation (2).

$$\begin{aligned} \begin{aligned} {E_{{S_i}}} =&\, hdr.tcp.seq + hdr.ipv4.totalLen\\ {}&- hdr.ipv4.hdrLen - hdr.tcp.hdrLen \end{aligned} \end{aligned}$$
(2)

When a packet arrives, the edge device S2 will parse it and calculate the hash value to check whether the connection is newly established. If so, the sequence number \(E{s_i}\) of the expected next packet will be calculated and stored in the register with the hash value as the corresponding index. If not, the sequence number field will be extracted to compare with the corresponding \(E{s_i}\). If the sequence number is greater than \(E{s_i}\), the packet will be sent to the virtual link loop. Otherwise, the packet will be forwarded directly to the receiver with the \(E{s_i}\) stored in the register being updated. Once a packet \({P_i}\) enters the virtual link loop, it will be continuously monitored. If the expected next packet arrives within time threshold T, it will be sent to the receiver together with packet \({P_i}\). Otherwise, if the process time exceeds time threshold T or the number of packets staying in the virtual link loop exceeds maximum number M, packet \({P_i}\) will be sent directly to the receiver. The pseudo-code of packet processing is shown in Algorithm 1.

figure k

Additionally, a scenario example of packet processing on the edge device S2 is depicted as Fig. 3. The packets are transmitted through two heterogeneous paths with different transmission delay. Since Path 1 is faster than Path 2, packet P5 arrives earlier at the edge device S2 than packet P4. According to the processing logic of ADAS, packet P5 is parsed and sent to the virtual link loop. After the arrival of packet P4, the two packets are delivered to the receiver in order. Besides, when packet loss occurs, such as packet P6, the next packet P7 will enter the virtual link loop. Once the process time exceeds time threshold T, the edge device S2 will directly forward packet P7 to the receiver. And the sender will send packet P6 again after receiving triple-duplicate ACKs or timeout.

Fig. 3
figure 3

A scenario example of packet processing on S2

With the packet scheduling algorithm, the last-hop delivers the packets to the receiver as sequentially as possible, which looks as if all packets take the same time to travel across the network. In this way, the delay-aligned scheduling is achieved and the out-of-order problem can be effectively addressed.

3.3 Adaptive weighting algorithm

The packet scheduling algorithm theoretically minimizes the probability of out-of-order conditions. However, due to the complexity of the heterogeneous wireless networks, the transmission characteristics of the heterogeneous paths may fluctuate frequently and packet loss may also occur. Simply setting a larger time threshold T and letting the out-of-order packets stay in the virtual link loop for a fixed time will result in performance degradation caused by over scheduling. Therefore, the scheduling efficiency of ADAS is largely affected by the value of time threshold T. There must be an appropriate time threshold T to distinguish packet out-of-order and loss hole so that the packet scheduling algorithm can achieve a better performance, minimize the out-of-order rate and maximize the overall throughput. For these reasons, an adaptive weighting algorithm for adjusting the time threshold T is necessary.

The adaptive weighting algorithm is mainly for the scenario where the transmission characteristics of the heterogeneous paths fluctuate frequently and packet loss occurs at times. Generally, packet loss is detected by the reception of triple-duplicate ACKs or timeout in traditional TCP. The former will further trigger fast retransmit events. Padhye et al. have validated that there are much more retransmission timeout events than fast retransmit events [36]. In particular, MPTCP may suffer from retransmission timeout especially when transmitting short flows [37]. Besides, it is difficult to obtain the probabilities of fast retransmit events and retransmission timeout events. Therefore, we suppose that packet loss will trigger retransmission timeout each time when initialing the time threshold T.

Firstly, the propagation delay \({\tau _i}\) considering retransmission of each path is calculated according to Equation (3).

$$\begin{aligned} {\tau _i} = (1 - los{s_i})\frac{{RT{T_i}}}{2} + los{s_i}(RT{O_i} + \frac{{RT{T_i}}}{2}) \end{aligned}$$
(3)

where \({loss}_{i}\) represents the packet loss rate of the ith path. \({RTT}_{i}\) and \({RTO}_{i}\) represent the round-trip delay and retransmission time-out of the ith path, respectively. It should be noted that both the packet loss rate \({loss}_{i}\) and the round-trip delay \({RTT}_{i}\) of the ith path are measured multiple times in advance. Furthermore, \({RTO}_{i}\) is calculated by Equations (4), (5), and (6) according to RFC 2988 [38].

$$\begin{aligned} RT{O_i} = SRT{T_i} + 4VA{R_i} \end{aligned}$$
(4)
$$\begin{aligned} SRT{T_i} = (1 - 1/8)SRT{T_{i - 1}} + 1/8RT{T_i} \end{aligned}$$
(5)
$$\begin{aligned} VA{R_i} = (1 - 1/4)VA{R_{i - 1}} + 1/4|{SRT{T_i} - RT{T_i}} | \end{aligned}$$
(6)

Based on the propagation delay \({\tau _i}\), the average propagation delay difference \(\alpha\) of n heterogeneous paths can be calculated by Equation (7).

$$\begin{aligned} \alpha = \sum \limits _{i = 1}^n {({\tau _i} {{ - \overline{{\tau _i}} )}/} n} \end{aligned}$$
(7)

Secondly, the propagation delay jitter is introduced to reflect the influence of the transmission characteristics fluctuation of the heterogeneous paths. Equation (8) is used to calculate the propagation delay jitter \(\beta\).

$$\begin{aligned} \beta = \sqrt{{{\sum \limits _{i = 1}^n {{{\left( {\frac{{rt{t_i}}}{2} - \frac{{\overline{rtt} }}{2}} \right) }^2}} } \bigg / n}} \end{aligned}$$
(8)

where \({rtt}_{i}\) represents the round-trip delay of the ith packet while \(\overline{rtt}\) represents the average round-trip delay of m packets. Here, each round-trip delay \({rtt}_{i}\) is also measured in advance. The average propagation delay jitter \(\beta\) represents the overall propagation delay jitter of heterogeneous wireless networks with multiple paths.

Given the definition of the average propagation delay difference \(\alpha\) and the average propagation delay jitter \(\beta\), the initial value of time threshold T can be expressed as Equation (9).

$$\begin{aligned} T_{init} = min\left( {\alpha +\beta ,2\overline{rtt}} \right) \end{aligned}$$
(9)

Next, we focus on the dynamic adjustment of time threshold T. When the packet scheduling algorithm is running, labels can be used to distinguish packets that are processed differently in the program. Among all the packets forwarded from the virtual link loop to the receiver, the packets forwarded in order and the exception handling packets (including the packets that are forwarded directly to the receiver due to the process time exceeding time threshold T or the number of packets staying in the virtual link loop exceeding maximum number M) are counted separately. The number of the former is denoted by \(N_{o}\) while the number of the latter is denoted by \(N_{e}\). Therefore, the out-of-order improvement rate \(\eta\) can be calculated by Equation (10), which indicates the ratio of the number of out-of-order packets improved by the packet scheduling algorithm to the number of original out-of-order packets.

$$\begin{aligned} \eta = \frac{{{N_o}}}{{{N_o} + {N_e}}} \end{aligned}$$
(10)

In addition to the out-of-order improvement rate, the overall throughput P is also used as a criterion to evaluate the time threshold T. When P is maximum, it means that the packet scheduling algorithm can maximize the overall throughput in heterogeneous wireless networks. Therefore, we use the product of the out-of-order improvement rate \(\eta\) and overall throughput P as the out-of-order optimization factor to evaluate the effectiveness of the packet scheduling algorithm. Equation (11) indicates the calculation method of the out-of-order optimization factor \(\varepsilon\). The bigger \(\varepsilon\) is, the better the performance of the packet scheduling algorithm is.

$$\begin{aligned} \varepsilon = \eta P \end{aligned}$$
(11)

Finally, with the definition of the out-of-order optimization factor \(\varepsilon\), the edge device S2 can adjust the time threshold T according to Equation (12).

$$\begin{aligned} \begin{array}{l} T(t + 1) = \left\{ {\begin{array}{*{20}{c}} {T(t) + \Delta ,\theta > 0}\\ {T(t) - \Delta ,\theta \le 0} \end{array}} \right. \\ \\ \theta = [\varepsilon (t) - \varepsilon (t - 1)] \times [T(t) - T(t - 1)] \end{array} \end{aligned}$$
(12)

where \(\Delta\) is the scheduling granularity while \(\theta\) reflects the reconcile utility of time threshold T.

The pseudo-code of the adaptive weighting algorithm is shown in Algorithm 2. During the packet scheduling process, ADAS measures and calculates the out-of-order optimization factor \(\varepsilon\) periodically and executes adaptive weighting according to Equation (12). In this way, ADAS improves the overall throughput while reducing the out-of-order rate.

4 Performance evaluation

In this section, extensive experiments are carried out and the performance of ADAS is comprehensively evaluated. Firstly, we set up an experimental platform and implement RR mechanism as a benchmark to validate the effectiveness of ADAS. Then, comparative experiments involving video stream transmission are carried out to compare ADAS with state-of-the-art mechanisms to demonstrate the superiority of ADAS. Finally, we combine ADAS with LowRTT and OTIAS to further promote the performance of ADAS.

figure g

4.1 Experiment settings

The experimental topology is illustrated in Fig. 4. The sender and the receiver are connected to P4 switches S1 and S2, respectively. Specially, the switches are equipped with two 4G Subscriber Identification Module (SIM) cards, respectively. One is assigned an IPv4 address while the other is assigned an IPv6 address. Besides, S2 is configured with a virtual link loop for packet scheduling and processing. The main parameters of the P4 switches are summarized in Table 1.

Fig. 4
figure 4

The experimental network topology

In the system, two paths are responsible for data transmission. The bandwidth and delay can be set by configuring the interface parameters of the switches. Both paths are wireless, and there are unavoidable delay jitter and packet loss. Besides, the actual throughput is generally slightly less than the available bandwidth. The maximum number M is used to indicate the total number of out-of-order packets that can stay in the virtual link loop. Theoretically, the total volume of traffic in fly is up to \(\sum \limits _{i = 1}^n {{b_i}RT{T_{max}}}\), where \({b_i}\) is the bandwidth of ith path and \(RT{T_{max}}\) is the highest RTT of all the paths. Besides, \({b_i}\) and \(RT{T_{max}}\) are measured in bits per second and second, respectively. In the worst case, all these bytes or packets arrive at S2 out-of-order. Therefore, the maximum number M is \(\sum \limits _{i = 1}^n {{b_i}RT{T_{max}}} /(8 \times 1500)\). The scheduling granularity \(\Delta\) is used to adjust the time threshold T to let the out-od-order packets stay longer (\(T + \Delta\)) or shorter (\(T - \Delta\)) in the virtual link loop. Hence, it is reasonable to make the scheduling granularity \(\Delta\) equal to an integral multiple of the time to go through the virtual link loop once. Here, the scheduling granularity \(\Delta\) is set to 1 ms, which equals to the time to go through the virtual link loop once. In this way, the out-of-order packets will enter the virtual link loop one more time or one less time when the time threshold T is adjusted with the scheduling granularity \(\Delta\).

Table 1 Main Parameters of the P4 Switches

4.2 Validation of the effectiveness

To validate the effectiveness of ADAS, we first use P4 to implement the RR mechanism. Then, Linux ethtool is used to configure the interface parameters of the switches to set different bandwidth and delay for the two paths. Here, we set the bandwidth of the Path 1 and Path 2 to be 5 Mbps and 3 Mbps, respectively. Besides, the delay of Path 1 is set to 50 ms while the delay of Path 2 is set to 100 ms. Iperf is used to send fixed size packets from the sender to the receiver. The out-of-order rate and overall throughput are measured to evaluate the effectiveness of ADAS. Through the statistics of packets captured on the receiver, the out-of-order rate is shown in Fig. 5a. The average out-of-order rate of RR is 36.95% while it decreases to 5.62% when ADAS is applied. With the efficiently scheduling of ADAS, the probability of packet out-of-order is greatly reduced, and it tends to get smaller and smaller in the later stage.

To further validate that ADAS can efficiently reduce the out-of-order packets, the out-of-order improvement rate is calculated and quantified. Specifically, all the packets forwarded from the virtual link loop (i.e., the edge device S2) to the receiver are counted in conformance with two different processing logic. That is, the packets forwarded in order and the exception handling packets are counted separately. In this way, the out-of-order improvement rate can be calculated according to Equation (10). Fig. 5b shows that the average out-of-order improvement rate is 63.41% and the maximum out-of-order improvement rate reaches 83.25%. However, the minimum out-of-order improvement rate is only 37.30%. This is because the sudden packet loss leads to a large number of out-of-order packets. There is still a certain lag even if time threshold T is adjusted in time.

In addition, we also validate that ADAS can promote the overall throughput. As shown in Fig. 6, the average overall throughput of RR is about 2.58 Mbps, which is much smaller than the sum of the available bandwidth of the two paths. In the case of ADAS without the adaptive weighting algorithm (labeled as Unweighting ADAS), the average overall throughput is about 4.86 Mbps. By configuring a virtual link loop on the edge device S2, the out-of-order packets are scheduled and processed reasonably before being sent to the receiver. However, since the time threshold T is fixed and does not change with feedback, there is still a gap between the average overall throughput and the sum of the available bandwidth of the two paths. After using the adaptive weighting algorithm, the average overall throughput reaches 6.41 Mbps, which is approximately 2.52 times of that of RR. This fully shows that dynamically adjusting time threshold T can further improve the performance of ADAS and promote the overall throughput.

Fig. 5
figure 5

The simulation results for ADAS in out-of-order mitigation: a The out-of-order rate; b The out-of-order improvement rate

4.3 Comparison with state-of-the-art mechanisms

In this subsection, we mainly compare ADAS with the aforementioned three sender-centric scheduling mechanisms, namely LowRTT, OTIAS and LAMPS. To carry out the experiments, we deploy a video service on the sender and let the receiver access it. As shown in Fig. 7a and Fig. 7b, ADAS outperforms the three other mechanisms, achieving lowest average out-of-order rate and highest average overall throughput. Specifically, the average out-of-order rate of LowRTT, OTIAS, LAMPS and ADAS are 31.62%, 20.86%, 9.04% and 7.19%, respectively. And the average overall throughput of LowRTT, OTIAS, LAMPS and ADAS are 3.25 Mbps, 2.66 Mbps, 5.38 Mbps and 5.69 Mbps, respectively.

Fig. 6
figure 6

The simulation results for ADAS in throughput improvement

LowRTT simply allocates more packets to the path with lowest RTT, which means that the value of RTT dominates the performance of LowRTT. When the transmission characteristics of the paths are unstable or the RTT values are the same, LowRTT fails to work well. Besides, the path with lowest RTT is not always the best path to transmit data when the bandwidth of the path is very small. Pushing a large number of packets onto a path with a small bandwidth will only result in congestion and packet loss. Hence, the out-of-order rate of LowRTT is the highest among the four mechanisms.

Apart from LowRTT, OTIAS further considers the impact of congestion window size and introduces the concept of Delivery Delay (DeD) to indicate the time range between when a packet is scheduled to a path and when that packet arrives in order at the receiver. On the basis of DeD, OTIAS transmits packets on different paths possibly out-of-order so that they arrive in order at the receiver. However, similar to LowRTT and other sender-centric scheduling mechanisms, the packets may not traverse the paths as expected once leaving the sender since the transmission characteristics of the paths fluctuate frequently and packet loss may occur in heterogeneous wireless networks. For this reason, OTIAS can only reduce the out-of-order rate to a certain degree but cannot completely eliminate the out-of-order packets. Besides, every packet may wait for several RTTs if the selected path has no available congestion window, which means that the volume of packets sent out by the sender in a certain time interval is restricted. As a result, the average overall throughput of OTIAS is the lowest among the four mechanisms.

LAMPS further considers packet loss in highly lossy networks, which selects the path with lowest transfer time and picks packet according to different path states (NORMAL state and REDUNDANT state). LAMPS can effectively reduce the out-of-order packets for video streaming traffic, reducing the extra bandwidth consumption while maintaining QoS. LAMPS shows better performance than LowRTT and OTIAS, whether in out-of-order mitigation or throughput improvement. However, both the computation of transfer time and the distinction between NORMAL state and REDUNDANT state rely heavily on the accuracy of the packet loss rate estimation. Pushing many packets to a certain path may influence the packet loss rate estimation because these packets may increase the probability of congestion when the capacity of the path is limited. Therefore, LAMPS fails to reduce the number of out-of-order packets as expected when burst packet loss happens.

Fig. 7
figure 7

The experimental results for comparing ADAS with state-of-the-art mechanisms: (a) The out-of-order rate; (b) The real-time throughput

Different from the sender-centric scheduling mechanisms, ADAS makes no changes on the sender but utilizes a virtual loop link to schedule and process packets on the last-hop connected to the receiver. According to the processing logic of the packet scheduling algorithm, the last-hop deliveries the packets to the receiver as sequentially as possible. In this way, all out-of-order packets processing is done directly before sending the packets to the receiver, which greatly eliminates the influence of transmission characteristics fluctuation of the paths. As a consequence, the final out-of-order packets are only the exception handling packets and the average out-of-order rate of ADAS is the lowest among the four mechanisms. In addition, ADAS uses the adaptive weighting algorithm to dynamically adjust time threshold T to avoid over scheduling. With a reasonable and adaptive time threshold T, the overall throughput can be further improved, which is the highest among the four mechanisms.

4.4 Combination with LowRTT and OTIAS

We have validated the effectiveness and superiority of ADAS in the previous subsection. Next, we will combine ADAS with LowRTT and OTIAS respectively to validate that ADAS can be easily compatible with other scheduling mechanisms. Specifically, we use LowRTT and OTIAS to replace RRDS to send packets from the sender to the receiver and compare the performance of the combinations with that of the original ADAS (i.e., ADAS combined with RRDS). As shown in Fig. 8a, the average out-of-order rate reduces from 7.19% to 5.73% when combining ADAS with OTIAS while it increases from 7.19% to 11.45% when combining ADAS with LowRTT. In addition, Fig. 8b shows that the average overall throughput increases from 5.69 Mbps to 6.76 Mbps when combining ADAS with LowRTT while it reduces from 5.69 Mbps to 5.11 Mbps when combining ADAS with OTIAS.

On the one hand, different from the manner RRDS sends packets to each path in turn, OTIAS sends out packets out-of-order to ensure in order arrival at the receiver. Therefore, combining ADAS with OTIAS can further reduce the average out-of-order rate by 20.31% compared to that of the original ADAS. However, the packets may wait for several RTTs to send out when the path has no available congestion window, which results in the average overall throughput of combining ADAS with OTIAS slightly reduces by 0.58 Mbps compared to that of the original ADAS.

On the other hand, LowRTT prefers the path with lowest RTT, and more packets can be sent by the sender in a certain time interval compared to RRDS. Hence, combining ADAS with LowRTT significantly increases the average overall throughput by 1.07 Mbps compared to that of the original ADAS. However, lots of out-of-order packets will generate when one packet is lost due to LowRTT continuously sending packets on the selected path until the congestion window is filled. The out-of-order packets contain not only the packets sent on the other path with bigger sequence number, but also the packets fall behind the lost one on the same path. Besides, the out-of-order problem will become more severe when the packet loss occurs on the slow path. For this reason, combining ADAS with LowRTT increases the average out-of-order rate by 59.25% compared to that of the original ADAS.

Fig. 8
figure 8

The experimental results for combining ADAS with LowRTT and OTIAS: a The out-of-order rate; b The real-time throughput

To sum up, ADAS outperforms state-of-the-art mechanisms and can easily work together with both LowRTT and OTIAS, which shows that ADAS has good compatibility. Unfortunately, compared to combining ADAS with RRDS, neither combining ADAS with LowRTT nor combining ADAS with OTIAS will improve overall throughput while reducing out-of-order packets. So we recommend choosing one of these combinations according to the requirements of specific application scenarios.

5 Conclusion

The order-of-order problem is a critical factor which influences the transmission performance of MPTCP, especially in heterogeneous wireless networks. This paper proposed ADAS to consider both out-of-order mitigation and throughput improvement in a more comprehensive fashion. ADAS took full advantage of the wisdom of the last-hop connected to the receiver. A virtual link loop was used to buffer the out-of-order packets within time threshold T, scheduling and sending them to the receiver as sequentially as possible. Besides, an adaptive weighting algorithm was used to dynamically adjust the time threshold T to avoid over scheduling and further improve the overall throughput. Extensive experiments showed that ADAS not only outperforms but also is compatible with state-of-the-art mechanisms.

In the future, we will test ADAS in a more complex scenario where there are more than two heterogeneous paths. In addition, we will try to combine the latest sender-centric scheduling mechanisms to further improve the performance of ADAS.