ارزیابی روش‌های یادگیری کلاسیک و یادگیری عمیق در تجزیه و تحلیل احساسات داده‌های تلگرام فارسی

نوع مقاله : مقاله پژوهشی

نویسندگان

گروه آموزشی هوش مصنوعی، دانشکده مهندسی کامپیوتر، دانشگاه یزد، یزد، ایران

چکیده

امروزه اینترنت و به خصوص شبکه‌های اجتماعی مانند توییتر، فیس‌بوک و تلگرام به بستری برای تبادل ایده‌ها و به اشتراک‌گذاری نظرات کاربران تبدیل شده است. تجزیه و تحلیل احساسات بر اساس نظرات کاربران در این شبکه‌ها می‌تواند کمک شایانی در توضیح و پیش‌بینی پدیده‌های اجتماعی و همچنین یافتن محصولات یا خدمات مناسب برای افراد، شرکت‌ها و سازمان‌ها نماید. تاکنون پژوهش‌های زیادی بر روی داده‌های شبکه‌های اجتماعی به زبان انگلیسی انجام شده است؛ اما برای زبان فارسی پژوهش‌های محدودی انجام شده است. در این مقاله یک سیستم تجزیه و تحلیل احساسات بر روی داده‌های تلگرام فارسی پیشنهاد شده است. برای این منظور، چند روش‌ استخراج ویژگی شامل بردار رخداد، فراوانی اصطلاح-معکوس فراوانی سند و ماتریس تعبیه کلمات جهت بازنمایی داده‌های متنی به عددی بررسی شده ‌است. سپس جهت طبقه‌بندی داده‌ها روش‌های مختلف یادگیری ماشین کلاسیک شامل ماشین بردار پشتیبان، درخت تصمیم، K نزدیک‌ترین همسایه، بیز ساده و رگرسیون منطقی، تلفیق روش‌های کلاسیک و همچنین روش‌های یادگیری عمیق شامل شبکه عصبی عمیق، شبکه عصبی پیچشی و شبکه‌های حافظه طولانی کوتاه مدت یک‌طرفه و دوطرفه بررسی شده است. در نهایت ارزیابی و تحلیل نتایج بر روی داده‌های جمع‌آوری شده از تلگرام فارسی نشان می‌دهد که بهترین کارایی توسط روش استخراج ویژگی ماتریس تعبیه کلمات به همراه شبکه‌ حافظه طولانی کوتاه مدت دوطرفه با دقت 90.67، صحت 90.01، فراخوان 89.54 و معیار F، 89.77 درصد به دست آمده‌ است.

کلیدواژه‌ها


عنوان مقاله [English]

Evaluating classical machine learning and deep-learning methods in sentiment analysis of Persian telegram message

نویسندگان [English]

  • Fatemeh Zare Mehrjardi
  • Mahdi Yazdian-Dehkordi
  • Alimohammad Latif
Department of Computer Engineering Yazd University, Yazd, Iran
چکیده [English]

Today, the Internet, especially social networks such as Twitter, Facebook, and Telegram, has become a platform for exchanging ideas and sharing user opinions. Sentiment analysis based on user opinions in these networks can help explain and predict social phenomena and find suitable products or services for individuals, companies, and organizations. So far, a lot of research has been done on social media data in English; But limited research has been done for the Persian language. In this paper, a Sentiment analysis system on Persian Telegram data is proposed. For this purpose, several feature extraction methods including Countvectorizer, TF-IDF, and word embedding matrix have been studied to represent textual data numerically. Then, to classify the data, different classical machine learning methods including support vector machine, decision tree, K-nearest neighbor, Naïve Bayes, and logistic regression, the combination of classical methods as well as deep learning methods including deep neural network (DNN), convolutional neural network (CNN), long short-term memory network and bidirectional long short-term memory network has been investigated. Finally, the evaluation and analysis of the results on the data collected from Persian Telegram shows that the best performance has been obtained by word embedding and bidirectional long short-term memory network with an accuracy of 90.67%, precision of 90.01%, recall of 89.54% and F1 of 89.77%.

کلیدواژه‌ها [English]

  • Sentiment Analysis
  • Telegram Message
  • Machine Learning
  • Deep Learning
[1] R. Moraes, J.F. Valiati, and W.P.G. Neto, “Document-level sentiment classification: an empirical comparison between SVM and ANN,” Expert Syst. Appl., vol. 40, no. 2, pp. 621–633, 2013, doi: 10.1016/j.eswa.2012.07.059.
[2] B.N.R. Chagas, J.A.N. Viana, O. Reinhold, F.M.F. Lobato, A.F.L.J. Jr., and R. Alt, “Current applications of machine learning techniques in CRM: A Literature Review and Practical Implications,” in 2018 IEEE/WIC/ACM International Conference on Web Intelligence, Santiago, Chile, December 3-6, 2018 pp. 452–458, doi: 10.1109/WI.2018.00-53.
[3] F.A. Pozzi, E. Fersini, E. Messina, and B. Liu, “Challenges of sentiment analysis in social networks: an overview,” Sentiment Anal. Soc. Networks, pp. 1-11, 2017, doi: 10.1016/B978-0-12-804412-4.00001-2.
[4] A. Khosravi, H. Abdulmaleki, and M. Fayazi, “Predicting the academic status of admitted applicants based on educational and admission data using data mining techniques,” Soft Comput. J., vol. 9, no. 2, pp. 94-113, 2021, doi: 10.22052/scj.2021.242837.0 [In Persian].
[5] S.A. Asghari, M. Enayati, G. Abaei, and M. R. Binesh-Marvesti, “Providing an Improved Webmining Algorithm for Semantic Web,” Soft Comput. J., vol. 5 no. 1, pp. 2-13, 2016 [In Persian].
[6] M. Keshavarzi-Farashah and M.A. Zare-Chahoki, “Sentiment analysis in Persian topics based on rules integration,” Comput. Distributed Syst., vol. 1, no. 1, pp. 31-46, 2018 [In Persian].
[7] V.A. Kharde and S. Sonawane, “Sentiment analysis of twitter data: a survey of techniques,” Int. J. Comput. Appl., vol. 139, no. 11, pp. 5–15, 2016, doi: 10.5120/ijca2016908625.
[8] G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, “Sentiment classification: The contribution of ensemble learning,” Decis. Support Syst., vol. 57, pp. 77–93, 2014, doi: 10.1016/j.dss.2013.08.002.
[9] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” WIREs Data Mining Knowl. Discov., vol. 8, no. 4, 2018, doi: 10.1002/widm.1253.
[10] Shahnawaz and P. Astya, “Sentiment analysis: approaches and open issues,” in 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 2017, pp. 154-158, doi: 10.1109/CCAA.2017.8229791.
[11] A. Kumar and T.M. Sebastian., “Sentiment analysis: A perspective on its past, present and future,” Int. J. Intell. Syst. Appl., vol. 4, no. 10pp. 1-14, 2012, doi: 10.5815/ijisa.2012.10.01.
[12] A. Valdivia, M.V. Luzon, and F. Herrera, “Sentiment analysis in tripadvisor,” IEEE Intell. Syst., vol. 32, no. 4, pp. 72–77, 2017, doi: 10.1109/MIS.2017.3121555.
[13] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, 2014, doi: 10.1016/j.asej.2014.04.011.
[14] S. Shayaa, N.I. Jaafar, S. Bahri, A. Sulaiman, S.W. Phoong, W.C. Yeong, A.Z. Piprani, and M.A. Al-garadi, “Sentiment analysis of big data: methods, applications, and open challenges,”  IEEE Access, vol. 6, pp. 37807-37827, 2018, doi: 10.1109/ACCESS.2018.2851311.
[15] K. Mite-Baidal, C. Delgado-Vera, E. Solis-Aviles, A.H. Espinoza, J. Ortiz-Zambrano, and E. Varela-Tapia, “Sentiment analysis in education domain: a systematic literature review,” in International Conference on Technologies and Innovation, Guayaquil, Ecuador, 2018, pp. 285-297, doi: 10.1007/978-3-030-00940-3_21.
[16] K. Ahmed, N. El Tazi, and A.H. Hossny, “Sentiment analysis over social networks: an overview,” In 2015 IEEE international conference on systems, man, and cybernetics, Kowloon Tong, Hong Kong, 2015, pp. 2174-2179, doi: 10.1109/SMC.2015.380.
[17] O. Kolchyna, T.T.P. Souza, P. Treleaven, and T. Aste, “Twitter sentiment analysis: lexicon method, machine learning method and their combination,” arXiv preprint arXiv: 1507.00955, 2015, doi: 10.48550/arXiv.1507.00955.
[18] A. Ligthart, C. Catal, and B. Tekinerdogan, “Systematic reviews in sentiment analysis: a tertiary study,” Artif. Intell. Rev., vol. 54, no. 7, pp. 4997-5053, 2021, doi: 10.1007/s10462-021-09973-3.
[19] F. Hemmatian and M.K. Sohrabi, “A survey on classification techniques for opinion mining and sentiment analysis,” Artif. Intell. Rev., vol. 52, no. 3, pp. 1495-1545, 2019, doi: 10.1007/s10462-017-9599-6.
[20] D. Zhang, H. Xu, Z. Su, and Y. Xu, “Chinese comments sentiment classification based on word2vec and SVMperf,” Expert Syst. Appl., vol. 42, no. 4, pp. 1857-1863, 2015, doi: 10.1016/j.eswa.2014.09.011.
[21] M. Badpeima, H. Shirazi, and S.S. Sadidpur, “Determining the polarity of Persian texts using LSTM recurrent networks”, in 3rd International Conference on Electrical, Electronic, and Computer Engineering, Norway, 2016 [In Persioan].
[22] A.H. Ombabi, W. Ouarda, and A.M. Alimi, “Deep learning CNN–LSTM framework for arabic sentiment analysis using textual information shared in social networks,” Soc. Netw. Anal. Min., vol. 10, no. 1, p. 53, 2020, doi: 10.1007/s13278-020-00668-1.
[23] A. Onan, “Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,” Comput. Appl. Eng. Educ., vol. 29, no.3, pp. 572–589, 2021, doi: 10.1002/cae.22253.
[24] X. Wang, W. Jiang, and Z. Luo, “Combination of convolutional and recurrent neural network for sentiment analysis of short texts,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 2016, pp. 2428–2437.
[25] N.C. Dang, M.N. Moreno-Garc?a, and F. Prieta, “Sentiment analysis based on deep learning: a comparative study,” Electronics, vol. 9, no. 3, p. 483, 2020, doi: 0.3390/electronics9030483.
[26] L. Yang, Y. Li, J. Wang, and R.S. Sherratt, “Sentiment analysis for e-commerce product reviews in chinese based on sentiment lexicon and deep learning,” IEEE Access, vol. 8, pp. 23522–23530, 2020, doi: 10.1109/ACCESS.2020.2969854.
[27] A. Mohammadi, M.R. Pazhohan, and M. Rezaeian, “Determining the polarity of users' opinions and recognizing requests by deep learning techniques in Telegram,” in 4th Conference of Applied Research in Electrical, Mechanical, Computer and Information Technology Engineering, Iran, 2018 [In Persian].
[28] M.E. Basiri, S. Nemati, M. Abdar, E. Cambria, and U.R. Acharya, “ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis,” Future Gener. Comput. Syst., vol. 115, pp. 279–294, 2021, doi: 10.1016/j.future.2020.08.005.
[29] F. Rustam, M. Khalid, W, Aslam, V. Rupapara, A. Mehmood, and G.S. Choi, “A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis,”  PLOS ONE, vol. 16, no. 2, 2021, doi: 10.1371/journal.pone.0245909.
[30] S. Shabanizade-Rabori, V. Khatibi-Bardsiri, and A. KhatibiBardsiri, “Presentation of a new method for predicting software defect using neural network combination and grasshopper algorithm,” J. Modeling Eng., vol. 17, no. 57, pp. 201-214., 2019, doi: 10.22075/jme.2019.15226.1514 [In Persian].
[31] A. Tripathy, A. Agrawal, and S.K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Syst. Appl., vol. 57, pp. 117–126, 2016, doi: 10.1016/j.eswa.2016.03.028.
[32] H. Veisi, H.R. Ghaedsharaf, and M. Ebrahimi, “Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features,” Soft Comput. J., vol. 8, no. 1, pp. 70-85, 2019, doi: 10.22052/8.1.70 [In Persian].
[33] S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” J. Biomed. Informatics, vol. 35, no. 5, pp. 352–359, 2002, doi: 10.1016/S1532-0464(03)00034-0.
[34] F. Zare-Mehrjardi and M. Rezaian, “Presenting the 3D model of detected facial expressions in video, using deep learning method for computer games,” in 5th International Conference on Computer Games; Opportunities and challenges, Isfahan, Iran, 2018 [In Persian].
[35] R. Rastgoo and K. Kiani, “Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning,” J. Modeling Eng., vol. 17, no. 58, pp. 103-111, 2019 doi: 10.22075/jme.2019.16299.1613 [In Persian].
[36] M. Moallem and A.A. Pouyan, “Anomaly Detection using LSTM AutoEncoder,” J. Modeling Eng., vol. 17, no. 56, 191-211, 2019, doi: 10.22075/jme.2018.12979.1270 [In Persian].