1. Introduction
The demand for mobile network capacity is increasing rapidly, with Ericsson forecasting an increase in monthly data transmitted from 130 exabytes (EB) per month in 2023 to 403 EB per month by 2029 [
1]. The fifth generation of cellular networks (5G) aims to address this escalating demand, in part by facilitating the simultaneous usage of multiple access technologies by the user equipment (UE)—thereby enhancing the efficiency—for both its proprietary cellular access network and non-trusted wireless access, such as WiFi. Notably, the introduction of Access Traffic Steering, Switching, and Splitting (ATSSS) in 3GPP Rel. 16 serves the purpose of enabling multi-access aggregation seamlessly, without necessitating alterations to applications or Internet servers. To realize ATSSS, a multipath proxy can be deployed, e.g., in the operator core network, transparently forwarding packets coming from the Internet over multiple paths, each path connecting a different wireless interface on the UE, effectively implementing multipath connectivity between the UE and the tunnel endpoint in the proxy. Apart from capacity aggregation, multipath offers additional advantages including fault tolerance and potential for load balancing, thus shifting traffic away from congested wireless access according to UE needs and congestion states of the different tunnels.
A considerable amount of research and standardization has been conducted to develop multipath protocols like MPTCP [
2], MP-QUIC [
3], MP-DCCP [
4], and CMT-SCTP [
5], with MPTCP emerging as the de facto standard for multipath support for TCP traffic. In the context of ATSSS, end-to-end TCP connections may be split at a TCP proxy and transformed into MPTCP spanning multiple paths between the proxy and the user. However, handling unreliable traffic poses a distinct challenge, e.g., by a multipath tunnel that may not guarantee reliable delivery. Moreover, any reliable protocol for which the connection cannot be split, such as end-to-end encrypted QUIC, would also require a multipath tunnel within the ATSSS architecture. Recently, MP-QUIC and MP-DCCP have emerged as viable candidates for providing multipath support for various transport-layer traffic, including UDP, QUIC, and TCP, as well as lower layer protocols such as IP or Ethernet. The unique capabilities of MP-DCCP, notably its ability to resolve packet scrambling, make it an attractive option for tunneling any type of traffic including TCP, UDP, or Ethernet, mitigating concerns regarding multipath head-of-line blocking.
In this paper, we employ both network emulations and a physical test bed (for a brief overview, see
Figure 1) to assess the impact of different CC algorithms (CCA) on capacity aggregation within a multipath tunnel context. For network emulations, we use a user-space multipath tunneling framework based on MP-DCCP, where we evaluate the achievable performance when using both TCP and DCCP as tunneling protocols. A scheduler of particular interest in the ATSSS context is a strict priority scheduler [
6], which prioritizes the cheaper path whenever available, known as the Cheapest Path First scheduler (CPF). Such cost-based scheduler, which uses the free WiFi connection whenever available, is likely to be preferred by telecom operators and users alike, as it can divert traffic from expensive cellular networks. We identify several aspects that influence achievable aggregation when using CPF and demonstrate that this impact is specific to the scheduler in use.
Our results demonstrate both the gains when using multipath tunnels to increase the throughput over using just a single wireless access and the challenges to achieve effective capacity aggregation. We focus on the impact of nested congestion control, with one CC instance in the server and another CC instance for each path in the tunnel proxy. The challenges for nested CC are numerous, stemming from ATSSS proxy positioning, end-to-end RTT, tunnel reliability in the face of packet loss, bottleneck buffer size, flow size, and a combination of server and tunnel CCA. To illustrate the scale of the challenges,
Figure 2 presents the download time when a multipath tunnel and the CPF scheduler are utilized, normalized against the download time when single-path end-to-end (E2E) transport is exclusively utilized. Ideally, the multipath solution should reduce the download time by 50%, as each path between the proxy and the UE is provided with the same capacity. A relative download time exceeding 100% demonstrates a worse throughput compared to using single path. The four bars in
Figure 2 compare aggregation for far and near proxy deployments, when using either UDP or TCP end to end. The top two bars represent end-to-end UDP, showing a download time at approximately 50% for both proxy deployments. The bottom two bars depict end-to-end TCP, showing a download time which is cut in half when the proxy is situated close to the user but unchanged relative to single-path download when the proxy is far from the user. In other words, the addition of an end-to-end congestion control mechanism may have a very negative effect on the aggregation, contingent on some other parameter, such as the proxy deployment.
Our findings in this paper demonstrate several key insights: (i) BBRv2 emerges as the preferred choice for a tunnel congestion control algorithm (CCA), demonstrating its ability to effectively circumvent congestion and buffer overflows; (ii) partial reliability for the tunnel framework may be useful to mask loss between the UE and the proxy as some unreliable traffic may still use a CCA that reacts to loss; (iii) CPF performs poorly and is often unable to achieve aggregation; (iv) while our primary focus is on CPF, we also highlight that the commonly used Shortest Round-Trip Time First (SRTT) scheduler encounters fewer challenges within our multipath tunnel framework with respect to throughput aggregation.
The remainder of this paper is outlined as follows:
Section 2 presents background information on MP-DCCP and our user-space multipath framework based on MP-DCCP.
Section 3 describes the setup of our test environment and the evaluation scenarios used.
Section 4 presents our results.
Section 5 introduces related work. The paper concludes and outlines future work in
Section 6.
3. Related Work
The work described in this paper rests at the intersection of (i) transport-layer congestion control, (ii) multipath, and (iii) tunneling. Although there is a plethora of related research, previous work has focused only on some specific aspects. While there is work on TCP-in-TCP single-path tunneling, the effects investigated in this paper do not appear without a multipath packet scheduler and having multiple paths. Similarly, much of the work on multipath scheduling does not consider the performance if there is a congestion control algorithm running on top of the scheduler, which is the case in the tunneling context.
Traditionally, CC in multipath networks has been considered an important challenge since the early works on integrated routing and CC in multipath networks [
14] and in the general case of wireless mesh networks [
15]. MPTCP, the dominant multipath transport-layer protocol, establishes a connection composed of multiple sub-flows, each one managing its own individual CC and re-transmission states. A known design problem with CC in MPTCP is presented in [
16], and MPTCP CC approaches are summarized in [
17]. Originally, each MPTCP sub-flow CC instance was independent, i.e., uncoupled. However, due to unfairness when run in parallel with single-path TCP flows [
18], new semi-coupled and closely coupled algorithms emerged that limited the increase in the congestion window variable on some paths. Although the aforementioned research did address
parallel congestion control, it did not address the interactions between
nested congestion control approaches which arise when a transport-layer tunnel is deployed; the latter is the focus of this paper, and is made highly relevant by, e.g., the 5G ATSSS architecture.
Other work [
19] has shown the limitations of coupled CC in wireless networks, where the
head-of-line (HoL) blocking problem of MPTCP is further increased due to the volatility of wireless links, and packet loss due to transmission errors and re-transmissions negatively affect both the average throughout and its stability. MP-QUIC [
3] can overcome HoL blocking with QUIC’s stream-based semantic, demonstrating an advantage over TCP, which lacks streams. However, multipath HoL blocking remains a potential issue as long as a flow or a stream is
split over multiple paths and is particularly pronounced if the protocol is reliable and thereby
never release packets out of order. Other avenues that have been explored in CC in multipath environments include the application of reinforcement learning [
20], or deep-Q learning [
21]. While this is interesting from our perspective, our attempt in this paper is primarily to identify performance problems. Another approach is using forward error correction (FEC) to help with the head-of-line blocking [
22]. FEC or network coding has been combined with rate allocation on different paths to maximize the performance in some of the works [
23,
24]. Finally, solutions have been proposed for application-layer congestion management for video streaming and real-time video traffic in multipath [
25,
26], utilizing delay information to address congestion earlier compared to when using loss-based algorithms.
Multipath solutions may be deployed at different levels in the network stack. A multipath tunnel solution at the
network layer [
10,
27,
28] has the advantage that the Internet Protocol (IP) is universally used, and that
any transport-layer protocol can lay on top, i.e.,
a single architecture can serve all traffic types. The main advantage of splitting or converting at the
transport layer [
29,
30] is that many transport-layer protocols include CC and therefore also path state detection; however, any such solution can only handle a single transport-layer protocol; for instance, a solution that converts between TCP and MPTCP [
31,
32] can only serve TCP traffic. The above examples of multipath proxies and anchors are either at a layer
other than the transport layer or are based on
converting a single-path protocol to its multipath analogue at a proxy, rather than tunneling at an anchor. If a multipath transport-layer solution is implemented
as a tunnel rather than a protocol converter, the tunnel may serve
all traffic types, but as demonstrated in this paper, issues of adverse interactions between nested CC instances arise.
When transport-layer tunnels have been considered, they have typically been
single-path tunnels. Issues explored include things like how to use TCP tunnels to force CC upon UDP [
13], the usage of TCP-tunnels with network coding over high loss links [
33], the performance difference between TCP and UDP tunnels [
12], or when TCP-over-TCP is helpful and when it is harmful [
11]. Obviously, that research lacks the multipath component which is a key aspect of this paper. A very recent development is
Multiplexed Application Substrate over QUIC Encryption (MASQUE) [
34,
35], whereby multiple end-to-end connections may use a single QUIC tunnel. One possible use case is for end-to-end QUIC; as end-to-end QUIC traffic is encrypted, a tunnel is required to support network functionality that has historically been implemented as a TCP proxy [
36]. With a future multipath QUIC, MASQUE could serve a function very much similar to MP-DCCP in this paper.
There are several CC algorithms (CCA) in common use today. These typically regulate the sending rate by limiting the quantity of in-flight data to be no more than the congestion window (CWND). NewReno increases its CWND linearly and is thereby often bad at utilizing
high BDP paths,
i.e., paths having a high throughput or a long RTT. Cubic [
37], on the other hand, increases its CWND using a cubic function. The rationale behind this design choice is that a cubic function helps the CCA to plateau at a CWND that ensures the full utilization of the path, without consistently filling the bottleneck buffer. Using a cubic function also more rapidly utilizes high BDP paths. BBR is designed based on a run-time model of the path that BBR maintains [
38]. This requires BBR to monitor the achievable throughput and the minimum round-trip time. By multiplying these two into a BDP, BBR may achieve the full utilization of the link by having a number of bytes in-flight which is equal to or greater than that of the BDP. BBR version 2 (BBRv2) is currently being developed as an improvement on the original BBR and will, among other things, react to packet loss,
unlike BBR. To understand the issues that arise in the paper, and how to solve them, it becomes necessary to understand and analyze the CC algorithms used. The particulars of a given tunnel CCA may well drive certain problems, as well as constitute a solution in other circumstances; as our results demonstrate. Although we do not take this step in the paper, it is fairly clear that the problems we have illustrated could be solved by using a CCA designed for the transport-layer multipath tunneling context.
Packet scheduling algorithms are actively being researched. It is known that a bad traffic scheduling strategy cannot fully utilize multipath resources, and that too many data sent over a long-RTT path may cause blocking on that path [
39]. The min RTT scheduler [
40] is the default scheduler for MPTCP and prioritizes the path with the shortest estimated RTT. However, min RTT does not estimate how much traffic should be sent over each path and can be improved, for example, by using a Blocking Estimation-based MPTCP scheduler algorithm [
41], which adds such an estimation. More recent work consider how to avoid blocking for streams in QUIC [
42]. Other scheduling algorithms include Earliest Completion First [
43], Peekaboo [
44], DEMS [
45], and QSAT [
46]. Our work here focus on the CPF scheduler, which, much like nested CC, is made particularly relevant given the 5G ATSSS architecture, where a cheap path, e.g., WiFi, is often available.
5. Results
In our evaluation, we aimed to determine which factors impacted the capacity aggregation of our multi-access system, for different CCA combinations. We pick up from the results in
Section 1, where aggregation varied from nearly optimal to non-existent. In particular, in our evaluation, we aimed to answer the following questions:
How does the proxy deployment impact the performance of the multipath framework (cf.
Section 4.1)?
What is the impact of bottleneck buffer sizes for different congestion controls (cf.
Section 4.2 and
Section 4.3)?
How do different nested congestion control algorithm combinations impact download completion times (cf.
Section 4.4)?
To present our results, we plotted examples of achievable throughput over time, with the per-path throughput stacked in graphical plots. Additionally, we plotted the average throughput for multiple tests using bar diagrams. For a more comprehensive presentation, we used contour plots to illustrate the impact of the proxy placement and end-to-end RTT. This involves varying both the end-to-end RTT and the percentage of that RTT within the tunnel itself. For instance, if the end-to-end RTT was 40 ms and the tunnel RTT accounted for 25%, then the tunnel RTT was 10 ms and the RTT from proxy to server was 30 ms. We organized the results in batches of 400 tests, where both the end-to-end RTT and the tunnel RTT were randomly selected. The former ranged from 0 to 100 ms, and the latter was determined as a percentage of the end-to-end RTT, ranging from 0 to 100. To distribute the points more evenly, any point too close to a previous point was discarded and regenerated. The same distribution of points was used for all batches. To convey the information effectively, we used color coding. A deep blue color indicates good aggregation, with throughput reaching 95 Mbps—double the single-path throughput. A deep red color indicates no aggregation at all, with throughput being as low as 40 Mbps—less than the single-path throughput.
5.1. Impact of Proxy Deployment When Using BBR over BBR as Congestion Control
As illustrated in the introduction, achievable performance is heavily influenced by the proxy’s location relative to the user (see
Figure 2). This is mainly due to nested congestion control and buffer bloat, which does not push enough packets so that the CPF scheduler would use the available capacity on the second path. We expected that any solution capable of mitigating buffer bloat would likely improve performance. We focused here on BBR over BBR for the CCs, using both TCP with BBR and BBRv2 (end to end) and two DCCP tunnels between the proxy and the UE using BBR (CCID5) for the DCCP tunnel.
Figure 4 shows the per-tunnel throughput over time for four experimental tests when BBR was the end-to-end congestion control algorithm applied. Each of the two paths offered a maximum of 50 Mbps, and it can be seen from
Figure 4 that different scenarios offered very different levels of traffic aggregation over multiple paths. In the top case, when RTT was short and the server was positioned close to the proxy (the starting point of the multipath tunnels running BBR), full aggregation was achieved. In the second case, also using BBR tunnels but with the server now being far away from the proxy, there was no aggregation and only one path was used. The CCID5 tunnels—plots 3 and 4—resulted in partial aggregation. The reasons for the results presented in
Figure 4 are explained in more detail below, using a larger set of results given in
Figure 5. In that Figure, the tunnel server-side endpoint location is specified as the tunnel RTT, as a percent of the end-to-end RTT; a high percent means the proxy (PRX) is close to the server.
The following key take-away messages highlight our observations:
A proxy near the user increases the aggregation ratio. According to
Figure 5, minimal aggregation is achieved when approximately 20% or less of the end-to-end RTT is spent between the server and proxy, as indicated near the upper
x-axis. The color scheme uses white to indicate a secondary path utilization of approximately 50% or an aggregated throughput of around 70 Mbps. The deep red indicates no aggregation at around 45 Mbps throughput. Poor aggregation results from the server and tunnel BBR instances measuring a similar minimum RTT, leading to similar timing and poor interaction between the two CC control loops.
Aggregation does not occur if the end-to-end RTT is short.
Figure 5 shows minimal aggregation when the end-to-end RTT is less than approximately 20 ms, as demonstrated near the left
y-axis. This result is particularly disappointing, given the general expectation that a short RTT would typically be advantageous for aggregation in a distributed system. Interestingly, this issue applies to a lesser extent when tunneling BBR over CCID5, although tunnels shorter than about 10 ms notably restrict aggregation. This problem appears to be driven by BBR exhibiting an effective gain higher than two for low RTTs, and for low RTTs, the actual time difference between the two CC control loops is small, even for larger relative differences.
BBRv2 tunnels achieve better aggregation that BBR tunnels.
Figure 6 and
Figure 7 illustrate aggregation achieved over BBRv2, normalized against (left) not using a tunnel and (right) using BBR tunnels. In details,
Figure 6 compares the aggregate throughput of BBRv2 tunnels against single-path transport, while
Figure 7 compares BBRv2 against multipath transport over BBR. The latter shows that BBRv2 achieves higher aggregation than BBR when either (i) the tunnel RTT is really long, or (ii) CoDel is used at the bottleneck, or (iii) the bottleneck buffer is relatively small. This is not a surprising result, considering that BBRv2—unlike BBR—reacts to loss, and all the aforementioned scenarios will generate proportionally greater packet loss, considering the buffer gets smaller in terms of BDP.
5.2. Cubic over BBR and Inadequate Bottleneck Buffer Size
In this section, we analyze how the degree of the achieved aggregation depends on the bottleneck buffer size relative to the number of packets that BBR attempts to keep within that buffer. It is worth noting that this number is typically around one bandwidth-delay product.
Figure 8a shows four examples of throughput over time. As with the plots given in
Figure 4, in
Figure 8a, we observe the level of aggregation for four different testing scenarios. This time CUBIC end-to-end congestion control is applied over different tunnels and with a different location of the server related to the multipath proxy. We can see again that only in the top case is full aggregation with a 100 Mbps rate achieved, with the other three scenarios offering much lower rates of aggregation. The reasons for this are explained in detail below, using the results from
Figure 8b.
Figure 8b,c show contour plots of a large set of results; the tunnel server-side endpoint location is specified as the tunnel RTT, as a percent of the end-to-end RTT; a high percent means the proxy is near the server. For each plot, the path asymmetry was set to 20 ms. The circle segment lines indicate a constant tunnel RTT corresponding to the maximum queuing delay, for the following bottleneck buffer sizes: a 25 ms tunnel RTT when the buffer was 250 packets large; 50 ms when the buffer was 500 packets large; no line when the buffer size was 1000 packets.
A long tunnel RTT reduces aggregation. In
Figure 8b, a circular segment line serves as a reference, representing all points where the tunnel RTT is constant at 25 ms. Clearly, a steep slope extends from the origin to the top-right corner, suggesting a significant influence of the tunnel RTT on aggregation. This is consistently observed across all nine plots in
Figure 8c, albeit not as steep. As we move along this gradient towards the top-right corner, BBR/CCID5 becomes more likely to overflow the bottleneck buffer.
Figure 8a illustrates this issue using time-series plots—the longer RTT results is less aggregation.
A small bottleneck buffer reduces aggregation. Examining the middle row in
Figure 8c, we observe the red circle segment shifting towards the upper-right corner as the bottleneck buffer size increases. A larger bottleneck buffer reduces the risk of the buffer overflowing when tunneling Cubic over BBR. Given the sharp shift from blue to red in the middle row, it is possible to notice a hard limit to how small the buffer may become before it has a severe impact on the aggregation. This pattern reproduces itself in the other two rows, highlighting the necessity for the bottleneck buffer to be large relative to the tunnel RTT.
BBRv2 tunnels enable more aggregation than BBR tunnels. Examining the top row of
Figure 8c, we observe that BBRv2 yields more aggregation compared to BBR or CCID5. This is not surprising as BBRv2 reacts to loss to some degree, whereas the other two CCIDs do not. Therefore, we may expect that BBRv2 will avoid buffer overflows to a greater extent than BBR or CCID5. Poor aggregation is in this case caused by BBR or CCID5 trying to maintain more in-flight packets than the bottleneck buffer can hold.
A receiver wait limit that allows for re-transmissions is beneficial. When using BBR as tunneling protocol across a set of scenarios, we found that (i) an added margin of 150 ms decreased the download completion time (DLCT) to 68.7% of the single-path DLCT, and 76.2% of the multipath DLCT with the standard margin of 5 ms; when (ii) using BBRv2 as tunneling protocol, the corresponding numbers were 65.4% and 74.0%, respectively.
5.3. BBR over NewReno and Excessive Bottleneck Buffer Size
Figure 9a shows four examples of throughput over time generated in the user-space framework, while
Figure 9b depicts results from the MP-DCCP testbed.
Figure 9c,d show contour plots of a large set of results; the tunnel server-side endpoint location is specified as the tunnel RTT, as a percent of the end-to-end RTT; a high percent means the proxy is near the server. In the figures, the prefixes
A and
B denote the asymmetry in ms and buffer size in packets. RTT values obtained in the kernel-space testbed are labeled * while the RTT values set in the user-space emulation are labeled **.
This section examines an issue that becomes particularly evident when tunneling BBR over NewReno. Rather counter-intuitively, the challenge here is that the RTT has to be long enough so that BBR at the server retains more in-flight traffic than tunnel-NewReno can sustain over the primary path. It is important to highlight that this presents an opposite problem to the one described in section IV-B.
A short RTT inhibits aggregation. In
Figure 9c, we see a nearly vertical red-blue divide where the end-to-end RTT is around 40 ms.
Figure 9 plots the throughput for different results, while
Figure 9a provides further real-world testbed results obtained using the Linux kernel implementation of MP-DCCP. Matching these results with the corresponding points in
Figure 9d demonstrates that these examples are consistent with the results in
Figure 9c.
Excessive bottleneck buffer size inhibits aggregation. The vertical red-blue divide in
Figure 9c is reproduced in all six plots in
Figure 9d, albeit not at the same RTT. The key pattern that can be observed is the shifting of the vertical line to the right as the bottleneck buffer size increases. It is well known that NewReno tends to fill bottleneck buffers, leading to the buffer size determining the tunnel RTT. As a result of this, a larger bottleneck buffer necessitates a longer end-to-end RTT when the average number of packets kept in-flight end to end by BBR exceeds the number kept in-flight by NewReno over the primary path tunnel.
5.4. Performance Comparison of Nested CCA Combinations
This section presents the achieved aggregation results for all CCA combinations and scenarios described in
Section 4.4. Each scenario was repeated 10 times for every combination of server and tunnel CCA. Results are presented as the download completion time (DLCT) when downloading a 300 MB file, relative to the DLCT without a tunnel. Recall that the optimal DLCT is around 50% when using the multipath framework as this would allow to double the available capacity. A relative DLCT of more than 100% indicates that the DLCT increases when using multipath compared to only single path, which clearly should be avoided.
Figure 10 shows the average DLCT over a multipath tunnel relative to the average DLCT without a tunnel for a set of 15 scenarios, repeated 10 times for each CCA combination. In the figure,
and
refer to the RTT;
denotes the bottleneck MAC layer buffer size, in packets, in both paths; the prefix
refers to the difference in path RTT;
and
refer to the proxy being near to, or far from the user;
and
refers to whether CoDel was used at the bottleneck or not. The
bright and
dark highlights help to indicate the RTT; each result is paired with the result of the opposite scenario for easy comparison. The results in
Figure 10a,b are a subset of those in
Figure 10c,f.
Figure 10a,b arrange these results by scenario for a subset of all scenarios, while
Figure 10c,f arrange the same results by CCA combination. Here, we can see how some configurations particularly impact some CCA combinations.
BBR over BBR/CCID5: The four top bars in
Figure 10c demonstrate that when tunneling BBR over BBR the performance improves when the proxy is deployed near the user, as explained in
Section 4.1.
Figure 10c shows that a small bottleneck buffer or presence of CoDel also inhibits aggregation, likely due to BBR’s limited reaction to loss. A similar pattern is observed when tunneling BBR over CCID5, albeit with a more favorable outcome for CCID5. Our conclusion is that CCID5 performs better, in part because it bases its congestion window (CWND) on packets rather than bytes. While it is plausible that the ability of DCCP and CCID5 to avoid single-path head-of-line blocking contributed to this performance disparity, this explanation remains speculative and requires further validation.
Cubic over BBR/CCID5: The six top bars in
Figure 10d demonstrate that a short tunnel and sufficiently large bottleneck buffer are beneficial when tunneling Cubic over BBR/CCID5. As explained in
Section 4.2, these scenarios experience fewer packet losses. CoDel is another source of packet losses, additionally deteriorating the performance. A similar pattern is seen when tunneling BBR over CCID5, but the performance is worse for CCID5. The visibly longer minimum-RTT probing periods of CCID5 have an adverse impact on the more sensitive Cubic congestion control algorithm. In addition,
Section 4.2 already illustrated that both using BBRv2 instead of BBR and tuning the tunnel receiver to wait longer for re-transmitted packets could improve the performance significantly.
BBR over NewReno/CCID2: In
Figure 10e, the top eight bars demonstrate that when tunneling BBR over NewReno/CCID2, a long RTT or a small bottleneck buffer results in very good performance. This was previously analyzed in
Section 4.3. With congestion being one of the fundamental challenges, CoDel at the bottleneck may also improve aggregation, although not when the RTT is long. While a long end-to-end RTT improves the performance, this is not something that is in itself desirable or feasible. Our conclusion, given these results, is that NewReno and CCID2 are bad candidates for a tunneling CCA.
Cubic over NewReno/CCID2: This combination was not dealt with in previous sections. Part of the reason we omitted a deeper evaluation of this combination was that we were unable to pose a good single hypothesis as to which factors determined the performance. As when tunneling BBR over NewReno/CCID2, we conclude that NewReno and CCID2 are in themselves bad candidates for a tunneling CCA. As packet loss is part of the congestion control cycle of NewReno/CCID2, recovering from these losses within the tunnel should improve the performance when tunneling Cubic over NewReno/CCID2. This is only possible using NewReno, as CCID2 and DCCP are unreliable. Although these results were omitted for lack of space, when tuning the receiver wait limit such that re-transmissions arrived before a packet was forcibly released, we found that performance improved in almost all scenarios, and most of all in those scenarios where CoDel was used. We see in
Figure 10f that CoDel is very detrimental to the DLCT when CCID2 is used, but much less so when NewReno is used. This is because the packet loss that CoDel generates is partially hidden from the E2E Cubic sender. We also see in
Figure 10f that a big buffer is very detrimental to performance for NewReno, but preferable for CCID2. A smaller buffer makes packet loss more frequent, so this seeming contradiction makes sense, i.e., when using CCID2, we prefer scenarios where packet loss is infrequent.
Summary: Finally,
Table 3 offers a comprehensive overview of the average DLCT across various scenarios, highlighting each combination of congestion control algorithm (CCA) and tunnel protocol, relative to the baseline scenario when using a single path without using a tunnel. This presentation allows us to make some determination as to which tunnel CCA is actually preferable. The results are sorted by DLCT for CPF, with lower values indicating better performance. This order is maintained for the SRTT results. It is worth noting that an optimal DLCT hovers around 50%, assuming that paths have identical capacity. A DLCT above 100% indicates worse performance than using a single path. Notably, BBRv2 emerges as the top-performing tunnel CCA, irrespective of whether BBR or Cubic is employed end to end. Alternatively, if NewReno/CCID2 is the preferred CCA, reasonably good performance is achievable by pairing it with the static reordering module alongside NewReno—excluding CCID2—and using the reliability inherent in TCP.
Figure 10b–f show that deploying the proxy near the user is only beneficial when the tunnel CCA is BBR/CCID5, regardless of which end-to-end CCA is in use. The same is true for a large bottleneck buffer. Conversely, a small bottleneck buffer is seen to be preferable only if the tunnel CCA is NewReno/CCID2, regardless of the end-to-end CCA. The exception to this is tunneling Cubic over CCID2, in which case, a large buffer is preferable. In all these cases, the preferred configuration is that which leads to the least amount of congestion and packet loss. With that in mind, it becomes clear why tunneling Cubic over CCID2 shows performance problems, as a small buffer leads to less congestion but also more frequent loss. In contexts like ATSSS, where deploying a transport-layer multipath tunnel is envisioned, adjusting the proxy deployment might be more feasible than modifying the bottleneck buffer size, thereby favoring the adoption of BBR/BBRv2/CCID5 as the tunnel CCA.
Table 3 also includes results obtained using the SRTT scheduler. When comparing to CPF, SRTT performed better in all scenarios. One reason for this is that SRTT often switches to the secondary path earlier than CPF, namely, at the point at which the congestion on the primary path has caused the RTT of the two paths to equalize. As mentioned earlier, SRTT is, however, often less preferable than CPF in that it makes more heavy use of the expensive path.
6. Conclusions and Future Work
In this paper, we examined the viability of employing transport-layer multipath tunnels alongside the CPF scheduler. The driving scenario behind this investigation was the aspiration to efficiently utilize both cost-effective and expensive wireless access, such as wireless LAN and cellular networks. Our analysis showed various performance challenges associated with such a scenario. These challenges existed for both of the server CCAs assessed in this study and can be expected to exist for other algorithms as well. Specifically, when the server used BBR, aggregation occurred more or less perfectly or not at all. When the server used Cubic, the performance tended to improve gradually over time, with aggregation occurring if the transfer time or file size was sufficiently large.
A helpful conceptual framework for understanding these issues is to consider the distribution of the server’s congestion window over multiple paths. A larger server CWND or a smaller CWND over the primary path both increase the likelihood that the server CWND is distributed over the secondary path. Therefore, it is advantageous to inflate the server CWND whenever possible and manage congestion on the primary path. Strategies that may help in this regard include: (i) adjusting the deployment of the proxy relative to the user; (ii) adjusting the bottleneck buffer size; (iii) deploying an AQM at the bottleneck buffer; (iv) changing the CCA used at the tunnel; (v) using a reliable tunnel protocol to allow for re-transmissions of packets within the tunnel; (vi) changing CPF such that it becomes aware of congestion and considers congestion states when making packet scheduling decisions over the different paths.
Related to (v), a potential solution could involve implementing semi-reliability at the scheduler level, which is a planned area of future work. Finally, with respect to (vi) efforts have already begun to adjust CPF [
49,
50]. Specifically, work has been conducted using BBR in the tunnel to enable CPF to leverage the bandwidth-delay product knowledge, which in turn allows for a more intelligent scheduling to limit congestion. Future work on this enhanced CPF scheduler will include enhancing its ability to handle dynamic paths with fluctuating capacity.