<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<rfc
      xmlns:xi="http://www.w3.org/2001/XInclude"
      category="info"
      docName="draft-xiong-hpwan-problem-statement-03"
      ipr="trust200902"
      obsoletes=""
      updates=""
      submissionType="IETF"
      xml:lang="en"
      tocInclude="true"
      tocDepth="4"
      symRefs="true"
      sortRefs="true"
      version="3">

 <!-- ***** FRONT MATTER ***** -->

 <front>

   <title abbrev="Problems Statement for High Performance Wide Area Networks">Problem Statement for High Performance Wide Area Networks</title>
    <seriesInfo name="Internet-Draft" value="draft-xiong-hpwan-problem-statement-03"/>
   
   <author fullname="Quan Xiong" initials="Q" surname="Xiong">
      <organization>ZTE Corporation</organization>
      <address>
        <postal>
          <street/>
         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>xiong.quan@zte.com.cn</email>
     </address>
    </author>

	<author fullname="Kehan Yao" initials="K" surname="Yao">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>yaokehan@chinamobile.com</email>
     </address>
    </author>
	
    <author fullname="Cancan Huang" initials="C" surname="Huang">
      <organization>China Telecom</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>huangcanc@chinatelecom.cn</email>
      </address>
    </author>
	
    <author fullname="Zhengxin Han" initials="Z" surname="Han">
      <organization>China Unicom</organization>

      <address>
        <postal>
          <street></street>
          
          <city></city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>hanzx21@chinaunicom.cn</email>
      </address>
    </author>
	
	<author fullname="Junfeng Zhao" initials="J" surname="Zhao">
      <organization>CAICT</organization>

      <address>
        <postal>
          <street></street>
          
          <city>Beijing</city>
          
          <region></region>
  
          <code></code>

          <country>China</country>
        </postal>

        <phone></phone>

        <email>zhaojunfeng@caict.ac.cn</email>
      </address>
    </author>		

   <area>Wit</area>
    <workgroup></workgroup>
   <keyword></keyword>
   
   <abstract>
	
	<t>High Performance Wide Area Network (HP-WAN) is designed for many 
	applications such as scientific research, academia, education and 
	other data-intensive applications which demand high-speed data 
	transmission over WANs, and it needs to provide high-throughput transmission 
    within a completion time. This document outlines the problems for HP-WANs.</t>
	  
    </abstract>
  </front>
  <middle>
  
   <section numbered="true" toc="default"> <name>Introduction</name>
	
   <t>As described in <xref target="I-D.kcrh-hpwan-state-of-art" pageno="false" format="default"/>, data is fundamental
   for research, academia, education, industrial and other data-intensive 
   applications, such as High Performance Computing (HPC) for scientific 
   research, cloud storage and backup of industrial internet data, distributed
   training of Artificial Intelligence (AI), and so on. The use cases in 
   non-dedicated networks from public operators such as large file transfer, 
   traffic across data centers and sharing traffic between dedicated network
   and non-dedicated network are also described in 
   <xref target="I-D.yx-hpwan-uc-requirements-public-operator" pageno="false" format="default"/>.</t>
   
   <t>Within these applications, they may generate huge volumes of data by using 
   advanced instruments and high-end computing devices. They need to be 
   connected between research institutions, universities, and data centers 
   across large geographical areas over long-distance links. For example, 
   sharing data between research institutes must transfer over hundreds or 
   thousands of kilometers. It needs to ensure large-scale data transfer and
   provide stable and efficient transmission services over Wide Area Networks (WANs). 
   These applications may require a periodic or on-demand high-speed transfer 
   with variable start time, data volume and transmission patterns, which
   demanding data transmission within a completion time. </t>
   
   <t>More recently, the massive data transmission and long-distance connection 
   over WANs have become a key factor affecting the performance of existing
   transport layer protocols such as Transfer Control Protocol (TCP), 
   Quick UDP Internet Connections (QUIC), Remote Direct Memory Access (RDMA) and 
   so on. Moreover, the traditional congestion control algorithms are typically 
   implemented at the host (sender and receiver) perform blind transmission 
   by controlling the size of the congestion window with rate adjusting by 
   detection of overloaded links. It will be difficult to predict the 
   performance due to the unpredictable behaviour of the WANs. For example, 
   for the host, without awareness of network capability, it will lead to 
   a poor convergence speed impacting the completion time due to the slow
   start and passive rates adjusting. It will also lead to RTT fluctuation 
   due to large buffer and long queues upon long feedback loop. For the network, 
   it will transfer the unscheduled traffic with low bandwidth utilization 
   due to the bottleneck links and instantaneous congestion. A concurrent 
   transmission of multiple flows can lead to slow-flow tailing and deviations 
   in Flow Completion Time (FCT) jitter. All of above will impact the performance
   and result in the untimely transmission of high-volume data. </t>
   
   <t>High Performance Wide Area Network (HP-WAN) is designed for many 
   applications such as scientific research, academia, education and 
   other data-intensive applications which demand high-speed data 
   transmission over WANs, and it needs to provide high-throughput transmission 
   within a completion time. A variety of problems about what
   are specifically in the way for HP-WAN requirements are outlined in
   this document.</t>
	
    
      <section numbered="true" toc="default"><name>Requirements Language</name>
	  
	 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
       "OPTIONAL" in this document are to be interpreted as described in BCP
       14 <xref target="RFC2119" pageno="false" format="default"/> 
	   <xref target="RFC8174" pageno="false" format="default"/> when, and only when, 
	   they appear in all capitals, as shown here.</t>
	   
      </section>
    </section>
	
    <section anchor="Terminology" numbered="true" toc="default"> <name>Terminology</name>
	<t>This document adopts the terminology defined in <xref target="I-D.kcrh-hpwan-state-of-art" pageno="false" format="default"/>. </t>
	
	<t>It also makes use of the following abbreviations and definitions
	 in this document:</t>
	   
	    <dl newline="false" spacing="normal" indent="15" pn="section-2-3">
		<dt>BDP: </dt>
		<dd>Bandwidth Delay Product</dd>			
		<dt>DC: </dt>
		<dd>Data Center</dd>	
	    <dt>DCI: </dt>
	    <dd>Data Centers Interconnection</dd>
	    <dt>HPC: </dt>
	    <dd>High Performance Computing</dd>
	    <dt>WAN: </dt>
	    <dd>Wide Area Networks</dd>
		<dt>PFC: </dt>
		<dd>Priority Flow Control</dd>	
	    <dt>ECN: </dt>
	    <dd>Explicit Congestion Notification</dd>
	    <dt>ECMP: </dt>
	    <dd>Equal-Cost Multipath</dd>
	    <dt>RTT: </dt>
	    <dd>Round-Trip Time</dd>
	    <dt>TCP: </dt>
	    <dd>Transfer Control Protocol </dd>
	    <dt>RDMA: </dt>
	    <dd>Remote Direct Memory Access</dd>
	    <dt>QUIC: </dt>
	    <dd>Quick UDP Internet Connections</dd>	
		<dt>FCT: </dt>
	    <dd>Flow Completion Time</dd>	
		</dl>
    </section>   
   
    
   <section numbered="true" toc="default"><name>Technical Goals for HP-WANs</name>
   
   <t>The services need to be provided in HP-WANs mainly focus on massive
   data with timely transmission while multiple services may co-exist over 
   long-distance WANs as described below.</t>
  
   <ul spacing="normal">
   <li>Massive data transmission, high-volume data with high-speed transfer, 
   e.g. the data speed of a flow could be at 2Gbps~1Tbps.</li>
   <li>Requested completion time, the data transmission should be completed
   within a requested completion time, e.g. the completion time could be 
   minutes~milliseconds.</li>
   <li>Scheduled transmission, traffic patterns could be scheduled by the 
   sender, e.g. data volume, start time, finish time, service type.</li>
   <li>Long-distance transmission over non-dedicated WANs, with multiple hops
   and domains, long RTT latency, routing changes, network congestion,
   packet loss, and link quality fluctuations, e.g. the distance between
   two sites or DCs could be more than 100km or 1000km.</li>   
   <li>Multiple services are co-existed with concurrent flows.</li>

   </ul>
   
	<t>It is required to achieve high-speed data transmission within 
	a completion time. Moreover, it is also crucial to maximize bandwidth 
	utilization while ensuring fairness among multiple services. This 
	document outlines the technical goals for HP-WANs as described below.</t>
	
   <ul spacing="normal">
   
   <li>Completion time: achieve the target job completion time within seconds 
   to minutes, while meeting FCT requirements for all incoming traffic flows. </li>

   <li>High throughput: ensuring the high-speed data transmission within
   a requested completion time for a flow,  which could be impacted by 
   the bandwidth, convergence speed, start time and RTT.</li>
   
   <li>Efficient use of capacity: efficiently using available network 
   capacity with fairness to maximize data transfer rates and minimize 
   the completion time for multiple flows.</li> 

   <li>Efficient transmission of concurrent multi-flows: ensuring fair sharing of
   link resources among multiple concurrent flows, avoiding slow-flow tailing and 
   FCT jitter caused by competition of multi-flows. </li>   
   
   </ul>
	
	</section>

   
   <section numbered="true" toc="default"> <name>Problem Statement</name>
   
   <t>The specific requirements of HP-WANs may encompass a wide range of 
   aspects. These include transport-related technologies such as proxy, 
   flow control, QoS negotiation, congestion control, admission control
   and traffic scheduling. Additionally, they also involve routing-related
   technologies like traffic engineering, resource scheduling, and load 
   balancing.</t>
   
   <t>Existing network technologies face numerous challenges and fall short
   of meeting performance requirements. This document highlights the key
   issues associated with HP-WANs in the following sub-sections.</t>
   
   
    <section  numbered="true" toc="default"> <name>Poor Convergence Speed</name>
	
	<t>The traditional congestion control mechanisms perform blind transmission
    by controlling the size of the congestion window with rate adjusting by 
    detection of overloaded links. WAN is a black box to provide unpredictable 
	behaviors for high-speed transmission due to the issues such as multiple 
	hops and domains, long Round-Trip Time (RTT), routing changes, network 
	congestion, packet loss, and link quality fluctuations.	The BDP (Bandwidth 
	Delay Product) which represents the maximum amount of data that can be in
	transit on the network at any given time is variable over WANs, so the 
	inflight data is difficult to predict for host-based congestion control 
	algorithms. It will lead to the poor convergence speed that the host always
	takes significantly long time to identify the optimal sending rate comparing
	to the requested completion time. </t>
	
	<t>For example, it will use the slow start and blind detection with 
   unawareness of network capability leading to long convergence time
   such as Cubic (e.g.over 50s), BBR (e.g.over 30s) and BBRv2 (e.g.30~50s). 
   BBR divides the entire process into four stages, Startup, Drain, 
   ProbeBW and ProbeRTT. The probe cycle of ProbeRTT state is long, 
   e.g. 10s. The convergence time will be multiple probe cycle which
   will impact the completion time at seconds level. There is a significant
   transmission capacity gaps between the appropriate sending rate and the
   available network capacity. The transport protocols should signal and 
   collaborate with the network to negotiate the rate for the host to send
   traffic.</t>

    </section>	
	
	<section  numbered="true" toc="default"> <name>Unscheduled Traffic</name>
	
	<t>The host sending large unscheduled traffic without collaboration will
	lead to the instantaneous congestion in WANs. For multiple high-speed
	flows, the random arrival and departure of cross-traffic without scheduling 
	creates significant fluctuations for available capacity in WANs. The network
	infrastructure may struggle to handle high-volume data transfers efficiently
	if applications do not proactively schedule the traffic. Without awareness 
	of the traffic patterns, the network risks unscheduled resource allocation, 
	leading to low bottleneck bandwidth utilization, reduced overall throughput, 
	and uncontrolled completion time.</t>

	<t>For example, for HPC applications, a large amount of data will be transmitted,
	e.g. the data volumes of a single flow may be from 10G to 1TB, the host sends 
	the unscheduled large traffic leading to the instantaneous congestion, packet
	loss, and queuing delay within network devices in WANs, resulting in low throughput. 
	Considering the multiple services with various types of flows, the optimal 
    bandwidth and transmission time may be different and the traffic is random to 
	join and leave without to be scheduled to multiple paths and fine-grained 
	network resources, which can not achieve the timely transmission. The resource 
	of WANs should be scheduled at the elements along the path to provide predictable
	capability for high-speed transmission.</t>
   
   </section> 
   
   <section  numbered="true" toc="default"> <name>Long Feedback Loop</name>
   
   <t>The congestion algorithms are implemented by controlling the size 
   of the congestion window and adjusting the sending rates upon the network
   status feedback. It will delay the network feedback due to the long-distance 
   transmission delays and large RTT, resulting in the inability to adjust
   the transmission rate in a timely manner. It will be challenging for congestion
   control over WANs for controlling the total amount of data entering the 
   network to maintain the traffic at an acceptable level, leading to RTT 
   fluctuation due to long queues and large buffer at network devices 
   with high-speed transmission upon the long network state feedback loop. 
   Especially when multiple flows targeting an aggregating node, the maximum 
   value is exceeding devices buffer capacity.</t>
   
   <t>For example, the loss-based congestion control algorithms, such as 
   Reno and CUBIC, depends on the congestion notification with packet loss.
   Explicit Congestion Notification (ECN) can be used to achieve an 
   end-to-end congestion notification based on IP and transport layers.
   When a congestion occurred, the network may signal congestion
   by ECN markings or by dropping packets, and the receiver passes this
   information back to the sender in transport-layer acknowledgements, 
   notifying the source to adjust the transmission rate. It will use the
   slow start, requiring large buffer which is impacted by multiple hops 
   and long RTT latency over WANs.</t>
   
   <t>And the congestion-based congestion control algorithms such as BBR, 
   depends on the measurement of congestion, it actively measures 
   bottleneck bandwidth (BtlBw) and round-trip propagation time (RTprop)
   based on the model to calculate the BDP and then to adjust the 
   transmission rate to maximize throughput and minimize latency. But
   BBR relies on real-time measurement of the parameters, and will 
   optimize the buffer overflow, but it is not significant under large
   RTT, e.g. retransmission will increase when the buffer size is 
   less than two BDPs, thereby affecting the control precision of BBR
   in long-distance networks. </t>
   </section>
   
   <section  numbered="true" toc="default"> <name>Multi-flow Concurrent Transmission</name>
   
   <t>An AI/HPC job may be decomposed into multiple tasks for parallel transmissions over a network. 
   The insufficient transmission throughput and blind competition among multiple flows will lead to 
   slow flow tailing and FCT transmission jitter.</t>
   
   <t>For a single flow, traditional congestion control mechanisms implemented on hosts lack rate controls, 
   resulting in unbounded rate adjustments and the transmission rate exhibits a sawtooth-like
   fluctuation. When this flow is transmitted concurrently with other flows, it causes
   competing for bottleneck bandwidth, resulting in tail latency that drags down overall task throughput.
   This will trigger queuing delays and congestion packet loss, creating slow flows and making the 
   completion time of a single flow uncontrollable.</t>

   <t>For multiple flows within a job, the passive competition for bandwidth resources often leads 
   to a cyclical pattern of "peak overflows" (causing queuing delays) and "valley underflows" 
   (causing waiting delays), resulting in significant jitter and deviation in FCTs of multiple flows. 
   The FCT jitter significantly undermines job completion reliability and performance in
   concurrent network environments.</t>
   
   </section>
   </section>
   

   <section  numbered="true" toc="default"> <name>Security Considerations</name>
   <t>This document covers several of representative applications and
   network scenarios that are expected to make use of HP-WAN
   technologies. Each of the potential use cases does not raise
   any security concerns or issues, but may have security 
   considerations from both the use-specific perspective and
   the technology-specific perspective.</t>
   </section>
   <section numbered="true" toc="default"> <name>IANA Considerations</name>
   <t>This document makes no requests for IANA action.</t>
   </section>
	
   <section numbered="true" toc="default"> <name>Acknowledgements</name>
   <t>The authors would like to acknowledge Bin Tan, Guangping Huang, Yao Liu and 
    Zheng Zhang for their thorough review and very helpful comments.</t>
   </section> 
   
  </middle>
  
  <!--  *****BACK MATTER ***** -->

 <back>
 
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8664.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9232.xml"/>
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7424.xml"/>	
        <xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/>
		<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9438.xml"/>
		<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9331.xml"/>	
		<xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-yx-hpwan-uc-requirements-public-operator.xml"/>
        <xi:include href="https://datatracker.ietf.org/doc/bibxml3/draft-kcrh-hpwan-state-of-art.xml"/>
      </references>
    </references>
 
 </back>
</rfc>
