روشی برای بهبود الگوریتم بهینه سازی اجتماع ذرات با استفاده از CUDA بر روی پردازنده گرافیکی

نویسندگان

1 واحد سنندج،دانشگاه آزاد اسلامی،سنندج،ایران

2 دانشکده کامپیوتر،واحد سنندج،دانشگاه آزاد واحد سنندج

10.22052/8.2.2

چکیده

همواره زمان صرف‌شده برای حل مسائل سنگین محاسباتی، یکی از دغدغه‌های برنامه‌نویسان کامپیوتر بوده است. الگوریتم PSO، الگوریتمی فرا‌ابتکاری است که به‌دلیل ساد‌گی پیاده‌سازی، برای حل مسائل سنگین محاسباتی استفاده می‌شود ولی با وجود ساد‌گی، این الگوریتم برای حل مسائل سنگین واقعی ناکارآمد است. از طرفی، وجود ویژگی تعاملات محلی ذرات در الگوریتم PSO، این الگوریتم را برای موازی‌سازی مناسب کرده است؛ از طرف دیگر، NVIDIA با اختراع پردازنده‌گرافیکی و معرفی معماری CUDA، تحولات بنیادی را در حل این نوع مسائل، از طریق پیاده‌سازی آن بر روی پردازنده‌گرافیکی ایجاد کرده است. با وجود تمام تحقیقات انجام‌گرفته در زمینه پیاده‌سازی، برخی از جنبه‌های تکنیکی موازی‌سازی به‌منظور پیاده‌سازی الگوریتم به‌صورتی که تسریع و بازدهی مناسب بر روی تمام پردازنده‌های گرافیکی NVIDIA را داشته باشد، رعایت نشده است. در این مقاله سعی شده با انتخاب Geforce GT 525M که پردازنده‌گرافیکی نسبتاً ضعیفی است، جنبه مقیاس‌پذیری روش پیشنهادی رعایت شود؛ به‌طوری که با رسیدن به بیشینه تسریع الگوریتم پیاده‌سازی‌شده بر روی این پردازنده، به بازدهی قابل قبول برای اجرا بر روی سایر پردازنده‌های ‌گرافیکی رسید. برای نیل به این هدف، از مدل چند‌کرنلی ارائه‌شده استفاده شده است. نتایج حاصل از انجام آزمایش‌ها رسیدن به بیشینه تسریع 15/98 برای حل تابع Rastrigin را نشان می‌دهد.

کلیدواژه‌ها


عنوان مقاله [English]

An approach to Improve Particle Swarm Optimization Algorithm Using CUDA

نویسندگان [English]

  • Mohammad Pouya Akbarpour 1
  • Keyhan Khamforoosh 1
  • Vafa Maihami 2
چکیده [English]

The time consumption in solving computationally heavy problems has always been a concern for computer programmers. Due to simplicity of its implementation, the PSO (Particle Swarm Optimization) is a suitable meta-heuristic algorithm for solving computationally heavy problems. However, despite the simplicity, the algorithm is inefficient for solving real computationally heavy problems but the presence of local interactions between particles has made this algorithm suitable for parallelization. On the other hand, by the invention of GPU (Graphical Processor Unit) and introducing the CUDA architecture as a GPU in the NVIDIA graphical processor, fundamental changes has been made in solving this type of problems. Despite all the research done in the field of implementing the algorithms through GPUs, some aspects of parallelization have not been addressed for suitable speedup and efficiency on NVIDIA GPUs. By considering the Geforce GT 525M, which is a relatively weak GPU, this paper tries to achieve the maximum speedup of the algorithm by implementing on this GPU. This experience led to reaching the acceptable efficiency on other GPUs. To reach the achievement, the multi-kernel model was used. The results show the speedup of 15.98 in solving the Rastrigin function.

کلیدواژه‌ها [English]

  • Parallel Algorithm
  • PSO
  • GPU Computing
  • CUDA
  • Fermi
  • HPC
  1. [1] A. P. Engelbrecht, Fundamentals of Computational Swarm Intelligence, John Wiley & Sons, 2005. [2] R. C. Eberhart, Y. Shi and J. Kennedy, Swarm Intelligence (Morgan Kaufmann series in evolutionary computation), Morgan Kaufmann publisher, 2001. [3] X. S. Yang, "Swarm intelligence based algorithms:a critical analysis.Evolutionary intelligence," pp. 17-28, 2014. [4] J. Kennedy and R. Eberhart, "Particle swarm optimization," in IEEE international conference on neural networks, Perth, Australia, 1995. [5] D. Bratton and J. Kennedy, "Defining a standard for particle swarm optimization," IEEE swarm intelligence symposium, pp. 120-127, 2007. [6] Y. Tan and Y. Zhu, “Fireworks algorithm for optimization,” in Advances in Swarm Intelligence (LNCS 6145). Berlin, Germany: Springer, 2010, pp. 355–364. [7] P. Ross, “Why CPU frequency stalled,” IEEE Spectr., vol. 45, no. 4, p. 72, Apr. 2008 [8] 2010. NVIDIA’S Next Generation CUDA Compute Architecture: Fermi. Nvidia. [9] Zahran, M., 2017. Heterogeneous computing. Communications of the ACM, 60(3), pp.42-45. [10] Cheng, J., 2014. Professional Cuda C Programming. Indianapolis, IN: John Wiley and Sons, Inc. [11] Essaid, M., Idoumghar, L., Lepagnot, J. and Brévilliers, M., 2018. GPU parallelization strategies for metaheuristics: a survey. International Journal of Parallel, Emergent and Distributed Systems, 34(5), pp.497-522. [12] Ujaldon, M., 2016. HPC Accelerators with 3D Memory. 2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES),. [13] Mu, D., Lee, E. and Chen, P., 2017. Rapid earthquake detection through GPU-Based template matching. Computers & Geosciences, 109, pp.305-314. [14] Lim, S. and Kang, P., 2020. Implementing Scientific Simulations on GPU-accelerated Edge Devices. 2020 International Conference on Information Networking (ICOIN),. [15] Barajas, C., Gobbert, M. and Wang, J., 2019. Performance Benchmarking of Data Augmentation and Deep Learning for Tornado Prediction. 2019 IEEE International Conference on Big Data (Big Data),. [16] Reddy, B., 2017. Performance Analysis of GPU V/S CPU for Image Processing Applications. International Journal for Research in Applied Science and Engineering Technology, V(II), pp.437-443. [17] Jurczuk, K., Czajkowski, M. and Kretowski, M., 2019. Multi-GPU approach for big data mining. Proceedings of the Genetic and Evolutionary Computation Conference Companion,. [18] Mittal, S., and Vetter, J. S., “A survey of CPU-GPU heterogeneous computing techniques,” ACM Computing Surveys (CSUR), Vol. 47, No. 4, 2015, p. 69 [19] Domanski, L., Bednarz, T., Gureyev, T. E., Murray, L., Huang, E., and Taylor, J. A., “Applications of heterogeneous computing in computational and simulation science,” 2011 Fourth IEEE International Conference on Utility and Cloud Computing, IEEE, 2011, pp. 382–389. [20] Fei, X., Li, K., Yang, W. and Li, K., 2020. Analysis of energy efficiency of a parallel AES algorithm for CPU-GPU heterogeneous platforms. Parallel Computing, 94-95, p.102621. [21] Top500.org. 2020. November 2019 | TOP500 Supercomputer Sites. [online] Available at: <https://www.top500.org/lists/2019/11/> [Accessed 27 May 2020]. [22] Soyata, T., 2019. GPU Parallel Program Development Using CUDA [23] T. Nvidia, CUDA C BEST PRACTICES GUIDE, NVIDIA Corporation, 2015 [24] TechPowerUp. 2020. Techpowerup. [online] Available at: <https://www.techpowerup.com/gpu-specs/> [Accessed 27 May 2020]. [25] Ma, S., Liu, Z., Chen, S., Huang, L., Guo, Y., Wang, Z. and Zhang, M., 2019. Coordinated DMA: Improving the DRAM Access Efficiency for Matrix Multiplication. IEEE Transactions on Parallel and Distributed Systems, 30(10), pp.2148-2164. [26] J. Ghorpade, J. Parande, M. Kulkarni and A. Bawaskar, "GPGPU PROCESSING IN CUDA ARCHITECTURE," Advanced Computing: An International Journal ( ACIJ ), vol. 3, pp. 105-120, January 2012. [27] S. Cook, in CUDA programming: a developer's guide to parallel computing with GPUs, Newnes, 2012. [28] Y. Shi, "Particle swarm optimization: developments, applications and resources," in Proceedings of the 2001 congress on evolutionary computation, 2001. [29] Y. Shi and R. C. Eberhart, "Parameter selection in particle swarm optimization," in International conference on evolutionary programming, Berlin, Heidelberg, 1998. [30] R. C. Eberhart and Y. Shi, "Evolving artificial neural networks," in Proceedings of the international conference on neural networks and brain, PRC, 1998. [31] Y. Shi and R. Eberhart, "A modified particle swarm optimizer.In 1998 IEEE international conference on evolutionary computation proceedings," in IEEE world congress on computational intelligence, Anchorage, AK, USA, USA, 1998, May. [32] Swarmintelligence.org. 2006. Particle Swarm Optimization: Tutorial. [online] Available at: <http://www.swarmintelligence.org/tutorials.php> [Accessed 27 May 2019]. [33] Y. Zhou and Y. Tan, “GPU-based parallel particle swarm optimization,” in Proc. IEEE Congr. Evol. Comput. (CEC), Trondheim, Norway, May 2009, pp. 1493–1500 [34] W. Zhu and J. Curry, “Parallel ant colony for nonlinear function optimization with graphics hardware acceleration,” in Proc. IEEE Int. Conf. Syst. Man Cybern. (SMC), San Antonio, TX, USA, 2009, pp. 1803–1808. [35] A. J. Umbarkar, M. S. Joshi and N. M. Rothe, "Genetic algorithm on general purpose graphical," ICTACT Soft, 2013, pp. 492-497. [36] Y. Zhou and Y. Tan, "GPU-based parallel multiobjective particle swarm optimization," Artificial Intelligence, 2011, pp. 125-141. [37] . L. Mussi, C. Stefano and F. Daolio, "GPU-based road sign detection using particle swarm optimization," in 9th IEEE International Conference on Intelligent Systems Design and Applications, 2009. [38] Y. Zhou and Y. Tan, "GPU-based parallel particle swarm optimization," in IEEE Congress on Evolutionary Computation, 2009. [39] L. P. Veronese and R. A. Krohling, "Swarms flight: Accelerating the particles using C-CUDA," in IEEE Congress on Evolutionary Computation, 2009. [40] Y. L. Chang and J. P. Fang, "Band selection for hyperspectral images based on parallel particle swarm optimization schemes," in IEEE International Geoscience and Remote Sensing Symposium, 2009. [41] W. Zhu and J. Curry, "Particle swarm with graphics hardware acceleration and local pattern search on bound constrained problems," in IEEE Swarm Intelligence Symposium, 2009. [42] C. J. Bastos-Filho, M. A. Oliveira, D. N. Nascimento and A. D. Ramos, "Impact of the random number generator quality on particle swarm optimization algorithm running on graphic processor units," in 10th IEEE International Conference on Hybrid Intelligent Systems (HIS), 2010. [43] P. A. Jambhlekar, M. Mishra and S. V. Subramaniam, "Parallel implementation of MOPSO on GPU using OpenCL and CUDA," in 18th IEEE International Conference on High Performance Computing (HiPC), 2011. [44] C. M. Miguel, A. Miguel, J. J. Rodrguez-Vazquez and G. I. Antonio, "Accelerating particle swarm algorithm with GPGPU," in IEEE 19th International Euromicro Conference on Parallel, Distributed and Network Based Processing, 2011. [45] H. Zhu and Y. Guo, "Paralleling Euclidean particle swarm optimization in CUDA," in 4th International Conference on Intelligent Networks and Intelligent Systems, 2011. [46] L. Wenna and Z. Zhenyu, "A CUDA-based multi channel particle swarm algorithm," in 4th International Conference on Control, Automation and Systems Engineering, 2011. [47] J. Platos, V. Snasel, T. Jezowicz, P. Kromerand and A. Abraham, "A PSO-based document classification algorithm accelerated by the CUDA platform," in IEEE International Conference on Systems,Man and Cybernetics, 2012. [48] B. Zhang, H. Zheng, M. Wei, R. Wu and X. Sheng, "Particle swarm optimization of frequency selective surface," in IEEE International Conference on Cross Strait Quad- Regional Radio Science and Wireless Technology, 2012. [49] B. Sharma, R. K. Thulasiram and P. Thulasiraman, "Portfolio management using particle swarm optimization on GPU," in 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012. [50] R. D. M. Calazan, N. Nedjah and L. D. M. Mourelle, "A Cooperative Parallel Particle Swarm Optimization for High-Dimension Problems on GPUs," in IEEE Computational Intelligence and 11th Brazilian Congress on Computational Intelligence (BRICS-CCI &CBIC), 2013. [51] Cataveral, I. and Rubio, F., 2019. Dealing with Swarm Intelligence on GPUs. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC),. [52] Tan, Y. and Ding, K., 2016. A Survey on GPU-Based Implementation of Swarm Intelligence Algorithms. IEEE Transactions on Cybernetics, 46(9), pp.2028-2041. [53] C. Woolley, GPU Optimization Fundamentals, NVIDIA, 2013. [54] Y. Tan, GPU-based Parallel Implementation of Swarm Intelligence Algorithms, Morgan Kaufmann, 2016. [55] Docs.nvidia.com. 2020. PTX ISA :: CUDA Toolkit Documentation. [online] Available at: http://docs.nvidia.com/cuda/parallel-thread-execution/index.html [56] M. Molga and C. Smutnicki, "Test functions for optimization needs In Using anEvolutionary Heuristics for Solving the Outdoor Advertising Optimization Problem," Journal of Computer Sciences and Applications, 2005. [57] Y. Shi, "Swarm Intelligence," 2006. [Online]. Available at: <http://www.swarmintelligence.org/codes.php> [58] " Benchmarks / Tech," notebookcheck, 26 05 2012. [Online]. Available: <https://www.notebookcheck.net/Intel-Core-i3-2330M-Notebook-Processor.52200.0.html>.