1(College of Computer Science and Engineering,Shandong University of Science and Technology,Qingdao 266590,China)2(Shandong Computer Science Center(National Supercomputing Center in Jinan),Jinan 250101,China)
Abstract:Silicon-Crystal application uses molecular dynamics(MD)to simulate the thermal conductivity of crystals and uses Tersoff potential function to simulate silicon crystals growth.we successfully ported the Silicon-Crystal application on the Sunway TaihuLight using the Sunway Athread,and proposed five main optimizations for the problem caused by the memory constraints of SW26010 heterogeneous many-core processor:1)Prefetch the parameters required for the calculation to LDM;2)Transfer central particle data by DMA;3)Design the software cache reasonably and use the software cache to read the neighbor particle data;4)Customize the transcendental functions in CPE(Computing Processing Element)to avoid the discrete access when the CPE calls the transcendental functions;5)Use register level communication to realize step-to-step pipeline and double buffering between CPEs.After implementing these optimizations,the single-core group has achieved 12.89 times speed-up than the serial version in MPE(Management Processing Element),and 8.7 times speed-up than the serial version in the Intel Xeon E5-2620 v4 processor.This paper also conducted the scalability tests and analysis on the Silicon-Crystal application.Experimental results show that the Silicon-Crystal application has good scalability on the Sunway TaihuLight.