☆ 4.6 Article

Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited]

JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING (2018)

期刊

JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING

卷 10, 期 2, 页码 A270-A285

出版社

Optica Publishing Group

DOI: 10.1364/JOCN.10.00A270

关键词

Hybrid OCS/EPS; Memory, accelerator, and storage disaggregation; On-board silicon photonic transceivers; Reconfigurable and function embedded architecture

类别

Computer Science, Hardware & Architecture Computer Science, Information Systems Optics Telecommunications

资金

EU [687632]
Huber+Suhner Polatis
Luxtera
EPSRC [EP/J017582/1] Funding Source: UKRI
Engineering and Physical Sciences Research Council [EP/J017582/1] Funding Source: researchfish

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Disaggregated rack-scale data centers have been proposed as the only promising avenue to break the barrier of the fixed CPU-to-memory proportionality caused by main-tray direct-attached conventional/traditional server-centric systems. However, memory disaggregation has stringent network requirements in terms of latency, energy efficiency, bandwidth, and bandwidth density. This paper identifies all the requirements and key performance indicators of a network to disaggregate IT resources while summarizing the progress and importance of optical interconnects. Crucially, it proposes a rack-and-cluster scale architecture, which supports the disaggregation of CPU, memory, storage, and/or accelerator blocks. Optical circuit switching forms the core of this architecture, whereas the end-points (IT resources) are equipped with on-chip programmable hybrid electrical packet/circuit switches. This architecture offers dynamically reconfigurable physical topology to form virtual ones, each embedded with a set of functions. It analyzes the latency overhead of disaggregated DDR4 (parallel) and the proposed hybrid memory cube (serial) memory elements on the conventional and the proposed architecture. A set of resource allocation algorithms are introduced to (1) optimally select disaggregated IT resources with the lowest possible latency, (2) pool them together by means of a virtual network interconnect, and (3) compose virtual disaggregated servers. Simulation findings show up to a 34% resource utilization increase over traditional data centers while highlighting the importance of the placement and locality among compute, memory, and storage resources. In particular, the network- aware locality-based resource allocation algorithm achieves as low as 15 ns, 95 ns, and 315 ns memory transaction round-trip latency on 63%, 22%, and 15% of the allocated virtual machines (VMs) accordingly while utilizing 100% of the CPU resources. Furthermore, a formulation to parameterize and evaluate the additional financial costs endured by disaggregation is reported. It is shown that the more diverse the VM requests are, the higher the net financial gain is. Finally, an experiment was carried out using silicon photonic midboard optics and an optical circuit switch, which demonstrates forward error correction free 10-12 bit error rate performance on up to five-tier scale-out networks.

Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited]

期刊

JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING

出版社

Optica Publishing Group

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Optically Disaggregated Data Centers With Minimal Remote Memory Latency: Technologies, Architectures, and Resource Allocation [Invited]

期刊

JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING

出版社

Optica Publishing Group

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文