We all know the feeling: You’re at home browsing the Web or sending an important email message when suddenly your Internet connection slows to a crawl, or stops entirely. Perhaps you can reset your modem to restore service, but often the problem lies elsewhere, in the maze of wide area networks that connect you to the rest of the world. All you can do is call your service provider’s help desk and hope for the best.

Now imagine the plight of military users who rely on these WANs to communicate, only to experience similar network disruptions when accessing mission-critical data. The potential consequences of these disruptions are far more serious.

WAN reliability can fall well short of what mission-critical communication requires. The increasing complexity of WAN infrastructure, along with its susceptibility to cyberattacks, continue to exacerbate both the frequency and duration of service disruptions. Moreover, when disruptions occur, users have little or no visibility into the nature or extent of the problem. This lack of situational awareness hinders the ability to assess the impact to missions, or to identify feasible alternative plans.

From their positions at the network edge, enterprises and users need new “fight-through” mechanisms that restore, automatically and in real time, as much connectivity as possible during disruptions without waiting for network operators to diagnose and fix the problem. These mechanisms should convey actionable network situational awareness to decision makers and should also enable enterprises to specify communications priorities for different users, applications, locations and data flows.

Three keys to bolstering resilience from the edge

To ensure network resilience, edge-based systems must observe and understand the nature of disruptions; decide which actions they should take to best mitigate the disruptions, while remaining consistent with priorities; and then effect the mitigations by adapting data flows. These three functions must work closely in tandem and in real time.

- Understanding the network via analytics

WAN operators routinely analyze network measurements to flag performance problems. Many military networks utilize encryption gateways to maintain secure data tunnels across the WAN. Unfortunately, the encryption gateways prevent edge-based systems from observing details of WAN operation and from easily inferring the nature of disruptions. This “blind inference” challenge places new requirements on analysis methods that must discern, for example, between packet drops due to routine network congestion and random drops that could result from, say, hardware malfunction, cyberattack, or route instability. This distinction is important because knowing the nature of the disruption can help to determine how to overcome it.

The good news is edge-based analytics can often provide more-detailed understanding of users’ application and data-flow performance than is possible in the WAN. Because WAN tunnels carry encrypted aggregations of many data flows, WAN performance monitoring cannot distinguish between different flows within a tunnel, what application(s) generated them, or what network performance requirements they might have (say, for bandwidth or latency). Given access to unencrypted data flows on either end of a tunnel, edge-based systems can easily make these determinations, allowing them to estimate and govern flow performance with high accuracy.

- Determining holistically favorable adaptations

Consider the common case where many enterprise locations utilize a common WAN, and where each location has an edge-based system to bolster communications performance for its users. If all such systems decide, “Let’s send all of our packets in triplicate, just to be safe,” the collective traffic load might destroy WAN performance for everyone. Similarly, if low-priority locations or users grabbed the lion’s share of available network capacity, while higher-priority data flows suffer, overall enterprise or mission benefit would not be well-served.

Accordingly, edge-based systems must coordinate their actions when attempting to overcome network disruptions. This coordination is challenging because it must occur rapidly in response to major changes in network conditions (say, the loss of a high-capacity transmission link), and it must occur with a minimum of signaling overhead between the edge-based systems. Otherwise, the signaling overhead might itself cause congestion and service disruption.

- Adaptive transport

Adaptive transport provides an array of actuators that modify data flows on the fly to help users’ applications overcome WAN disruptions. Many such adaptations are possible. The most-effective adaptation(s) will depend on the nature of the disruption, the user application, and priority. For example, one edge-based system could act as a relay for another’s traffic, thus potentially enabling data flows to bypass trouble spots within the WAN. Another possibility is for a system to throttle some data flows, limiting the bandwidth that they consume in favor of other data flows that have higher priority or impending deadlines.

Additional alternatives include tailoring the transport protocol to best suit the WAN’s observed performance characteristics (such as high latency), adding error-correction overhead to overcome packet loss, or repairing flows whose packets arrive out of order due to hardware malfunction, routing problems or cyberattack. Out-of-order packet delivery can severely impair nearly all user applications and will often go unnoticed by standard WAN performance monitoring tools.

A technology ripe for widespread deployment

Edge-based systems incorporate features to facilitate their deployment in existing military and enterprise network architectures. These edge-based capabilities should require no specialized hardware, and can be designed to suit a variety of “form factors” that military and commercial deployments demand. This ties in with the telecom industry’s movement toward software-defined networking.

Edge-based systems can help enterprise and military networks bolster resilience and avoid disruption by better understanding network issues, identifying the proper solution and taking action in real time. No help desk and long hold times are required.

Tony Bogovic is vice president at Perspecta Labs.