An Automatic Performance Model-Based Scheduling Tool for Coupled Climate System Models
2019年10月1日·,
,,·
0 分钟阅读时长
Nan Ding
Wei Xue
宋振亚
Corresponding
,Haohuan Fu
Shiming Xu
Weimin Zheng

摘要
The prediction ability of the climate system is highly depended on the efficient integration of observations and simulations of the Earth, which is regarded as a canonical example of the cyber–physical system. The climate system model, the simulation engine in this cyber–physical system, is one of most challenging applications in scientific computing. It utilizes the multi-physics simulation that couples multiple components, conducts decadal to millennium simulations, and has long been an important application on supercomputers. However, current climate system models suffer from the inefficient task scheduling methods resulting in an intolerable simulation time. Take the Community Earth System Model (CESM), the most widely used climate system model, as an example, one major reason that CESM suffers from bad performances is the huge overhead to rationally distribute processes among the coupled heterogeneous components. According to the report of NCAR, every percent improvement in CESM performance frees up to the equivalent of $250,000 in computing resources in their scientific experiments. To address such challenge, our paper first constructs a lightweight and accurate performance model for effectively capturing and predicting the heterogeneous time-to-solution performance of end-to-end CESM components with a given simulation configuration. Then, based on the performance model, we further propose an efficient scheduling strategy based on rectangular packing method to determine the best process layout among different components, and the process numbers assigned to each component. Our evaluations show that we can achieve 58% average run time reductions on CESM comparing to the widely used sequential process layout for a scale of 144–480 cores on typical CPU clusters. And we can save 4 million CPU hours when we conduct one standard scientific experiment (a 2870-year simulation), which equals to save $40,089 with a charge of $0.01 per CPU hour. Meanwhile, 26% extra performance improvements also could be gained in our methods comparing to the heuristic branch and bound algorithm with the guidance of the known curve-fitting performance model.
类型
出版物
Journal of Parallel and Distributed Computing
Authors
Authors

Authors
研究员
博导,物理海洋学博士,研究员,目前担任学术期刊Ocean Modelling执行编辑、Scientific Data编委、中国海洋学会海气相互作用专业委员会秘书长、CLIVAR 海洋模式发展组OMDP委员等。一直从事地球系统模式发展与应用等方面的研究,率先将海浪的非破碎垂向混合作用和对海气通量作用引入到气候模式中,揭示了小尺度海浪过程在大尺度气候系统中的重要作用及机制;开展了海洋数值模式基于国产处理器的高效并行算法、地球系统模式的负载均衡算法以及AI4ClimateModeling等研究,有效提升了模式计算效率;发展了两代耦合海浪的地球系统模式FIO-ESM,通过完善模式所包含的小尺度过程,有效减缓模拟偏差,提高模拟和预测能力;构建了短期气候预测系统FIO-CPS,在国家海洋环境预报中心、国家气候中心等多个国家级和地方业务中心应用。先后主持NSFC青年、面上、重点、优青、杰青以及重点研发计划项目等多个项目;先后入选自然资源部第一海洋研究所“束星北”青年学者、自然资源部高层次科技创新人才领军人才和第二人才梯队等。
Authors
Authors
Authors