4.6 Article

Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited]

期刊

出版社

Optica Publishing Group
DOI: 10.1364/JOCN.10.00A270

关键词

Hybrid OCS/EPS; Memory, accelerator, and storage disaggregation; On-board silicon photonic transceivers; Reconfigurable and function embedded architecture

资金

  1. EU [687632]
  2. Huber+Suhner Polatis
  3. Luxtera
  4. EPSRC [EP/J017582/1] Funding Source: UKRI
  5. Engineering and Physical Sciences Research Council [EP/J017582/1] Funding Source: researchfish

向作者/读者索取更多资源

Disaggregated rack-scale data centers have been proposed as the only promising avenue to break the barrier of the fixed CPU-to-memory proportionality caused by main-tray direct-attached conventional/traditional server-centric systems. However, memory disaggregation has stringent network requirements in terms of latency, energy efficiency, bandwidth, and bandwidth density. This paper identifies all the requirements and key performance indicators of a network to disaggregate IT resources while summarizing the progress and importance of optical interconnects. Crucially, it proposes a rack-and-cluster scale architecture, which supports the disaggregation of CPU, memory, storage, and/or accelerator blocks. Optical circuit switching forms the core of this architecture, whereas the end-points (IT resources) are equipped with on-chip programmable hybrid electrical packet/circuit switches. This architecture offers dynamically reconfigurable physical topology to form virtual ones, each embedded with a set of functions. It analyzes the latency overhead of disaggregated DDR4 (parallel) and the proposed hybrid memory cube (serial) memory elements on the conventional and the proposed architecture. A set of resource allocation algorithms are introduced to (1) optimally select disaggregated IT resources with the lowest possible latency, (2) pool them together by means of a virtual network interconnect, and (3) compose virtual disaggregated servers. Simulation findings show up to a 34% resource utilization increase over traditional data centers while highlighting the importance of the placement and locality among compute, memory, and storage resources. In particular, the network- aware locality-based resource allocation algorithm achieves as low as 15 ns, 95 ns, and 315 ns memory transaction round-trip latency on 63%, 22%, and 15% of the allocated virtual machines (VMs) accordingly while utilizing 100% of the CPU resources. Furthermore, a formulation to parameterize and evaluate the additional financial costs endured by disaggregation is reported. It is shown that the more diverse the VM requests are, the higher the net financial gain is. Finally, an experiment was carried out using silicon photonic midboard optics and an optical circuit switch, which demonstrates forward error correction free 10-12 bit error rate performance on up to five-tier scale-out networks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据