Evaluating classical machine learning and deep-learning methods in sentiment analysis of Persian telegram message

Document Type : Original Article

Authors

Department of Computer Engineering Yazd University, Yazd, Iran

Abstract

Today, the Internet, especially social networks such as Twitter, Facebook, and Telegram, has become a platform for exchanging ideas and sharing user opinions. Sentiment analysis based on user opinions in these networks can help explain and predict social phenomena and find suitable products or services for individuals, companies, and organizations. So far, a lot of research has been done on social media data in English; But limited research has been done for the Persian language. In this paper, a Sentiment analysis system on Persian Telegram data is proposed. For this purpose, several feature extraction methods including Countvectorizer, TF-IDF, and word embedding matrix have been studied to represent textual data numerically. Then, to classify the data, different classical machine learning methods including support vector machine, decision tree, K-nearest neighbor, Naïve Bayes, and logistic regression, the combination of classical methods as well as deep learning methods including deep neural network (DNN), convolutional neural network (CNN), long short-term memory network and bidirectional long short-term memory network has been investigated. Finally, the evaluation and analysis of the results on the data collected from Persian Telegram shows that the best performance has been obtained by word embedding and bidirectional long short-term memory network with an accuracy of 90.67%, precision of 90.01%, recall of 89.54% and F1 of 89.77%.

Keywords


[1] R. Moraes, J.F. Valiati, and W.P.G. Neto, “Document-level sentiment classification: an empirical comparison between SVM and ANN,” Expert Syst. Appl., vol. 40, no. 2, pp. 621–633, 2013, doi: 10.1016/j.eswa.2012.07.059.
[2] B.N.R. Chagas, J.A.N. Viana, O. Reinhold, F.M.F. Lobato, A.F.L.J. Jr., and R. Alt, “Current applications of machine learning techniques in CRM: A Literature Review and Practical Implications,” in 2018 IEEE/WIC/ACM International Conference on Web Intelligence, Santiago, Chile, December 3-6, 2018 pp. 452–458, doi: 10.1109/WI.2018.00-53.
[3] F.A. Pozzi, E. Fersini, E. Messina, and B. Liu, “Challenges of sentiment analysis in social networks: an overview,” Sentiment Anal. Soc. Networks, pp. 1-11, 2017, doi: 10.1016/B978-0-12-804412-4.00001-2.
[4] A. Khosravi, H. Abdulmaleki, and M. Fayazi, “Predicting the academic status of admitted applicants based on educational and admission data using data mining techniques,” Soft Comput. J., vol. 9, no. 2, pp. 94-113, 2021, doi: 10.22052/scj.2021.242837.0 [In Persian].
[5] S.A. Asghari, M. Enayati, G. Abaei, and M. R. Binesh-Marvesti, “Providing an Improved Webmining Algorithm for Semantic Web,” Soft Comput. J., vol. 5 no. 1, pp. 2-13, 2016 [In Persian].
[6] M. Keshavarzi-Farashah and M.A. Zare-Chahoki, “Sentiment analysis in Persian topics based on rules integration,” Comput. Distributed Syst., vol. 1, no. 1, pp. 31-46, 2018 [In Persian].
[7] V.A. Kharde and S. Sonawane, “Sentiment analysis of twitter data: a survey of techniques,” Int. J. Comput. Appl., vol. 139, no. 11, pp. 5–15, 2016, doi: 10.5120/ijca2016908625.
[8] G. Wang, J. Sun, J. Ma, K. Xu, and J. Gu, “Sentiment classification: The contribution of ensemble learning,” Decis. Support Syst., vol. 57, pp. 77–93, 2014, doi: 10.1016/j.dss.2013.08.002.
[9] L. Zhang, S. Wang, and B. Liu, “Deep learning for sentiment analysis: A survey,” WIREs Data Mining Knowl. Discov., vol. 8, no. 4, 2018, doi: 10.1002/widm.1253.
[10] Shahnawaz and P. Astya, “Sentiment analysis: approaches and open issues,” in 2017 International Conference on Computing, Communication and Automation (ICCCA), Greater Noida, India, 2017, pp. 154-158, doi: 10.1109/CCAA.2017.8229791.
[11] A. Kumar and T.M. Sebastian., “Sentiment analysis: A perspective on its past, present and future,” Int. J. Intell. Syst. Appl., vol. 4, no. 10pp. 1-14, 2012, doi: 10.5815/ijisa.2012.10.01.
[12] A. Valdivia, M.V. Luzon, and F. Herrera, “Sentiment analysis in tripadvisor,” IEEE Intell. Syst., vol. 32, no. 4, pp. 72–77, 2017, doi: 10.1109/MIS.2017.3121555.
[13] W. Medhat, A. Hassan, and H. Korashy, “Sentiment analysis algorithms and applications: A survey,” Ain Shams Eng. J., vol. 5, no. 4, pp. 1093–1113, 2014, doi: 10.1016/j.asej.2014.04.011.
[14] S. Shayaa, N.I. Jaafar, S. Bahri, A. Sulaiman, S.W. Phoong, W.C. Yeong, A.Z. Piprani, and M.A. Al-garadi, “Sentiment analysis of big data: methods, applications, and open challenges,”  IEEE Access, vol. 6, pp. 37807-37827, 2018, doi: 10.1109/ACCESS.2018.2851311.
[15] K. Mite-Baidal, C. Delgado-Vera, E. Solis-Aviles, A.H. Espinoza, J. Ortiz-Zambrano, and E. Varela-Tapia, “Sentiment analysis in education domain: a systematic literature review,” in International Conference on Technologies and Innovation, Guayaquil, Ecuador, 2018, pp. 285-297, doi: 10.1007/978-3-030-00940-3_21.
[16] K. Ahmed, N. El Tazi, and A.H. Hossny, “Sentiment analysis over social networks: an overview,” In 2015 IEEE international conference on systems, man, and cybernetics, Kowloon Tong, Hong Kong, 2015, pp. 2174-2179, doi: 10.1109/SMC.2015.380.
[17] O. Kolchyna, T.T.P. Souza, P. Treleaven, and T. Aste, “Twitter sentiment analysis: lexicon method, machine learning method and their combination,” arXiv preprint arXiv: 1507.00955, 2015, doi: 10.48550/arXiv.1507.00955.
[18] A. Ligthart, C. Catal, and B. Tekinerdogan, “Systematic reviews in sentiment analysis: a tertiary study,” Artif. Intell. Rev., vol. 54, no. 7, pp. 4997-5053, 2021, doi: 10.1007/s10462-021-09973-3.
[19] F. Hemmatian and M.K. Sohrabi, “A survey on classification techniques for opinion mining and sentiment analysis,” Artif. Intell. Rev., vol. 52, no. 3, pp. 1495-1545, 2019, doi: 10.1007/s10462-017-9599-6.
[20] D. Zhang, H. Xu, Z. Su, and Y. Xu, “Chinese comments sentiment classification based on word2vec and SVMperf,” Expert Syst. Appl., vol. 42, no. 4, pp. 1857-1863, 2015, doi: 10.1016/j.eswa.2014.09.011.
[21] M. Badpeima, H. Shirazi, and S.S. Sadidpur, “Determining the polarity of Persian texts using LSTM recurrent networks”, in 3rd International Conference on Electrical, Electronic, and Computer Engineering, Norway, 2016 [In Persioan].
[22] A.H. Ombabi, W. Ouarda, and A.M. Alimi, “Deep learning CNN–LSTM framework for arabic sentiment analysis using textual information shared in social networks,” Soc. Netw. Anal. Min., vol. 10, no. 1, p. 53, 2020, doi: 10.1007/s13278-020-00668-1.
[23] A. Onan, “Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,” Comput. Appl. Eng. Educ., vol. 29, no.3, pp. 572–589, 2021, doi: 10.1002/cae.22253.
[24] X. Wang, W. Jiang, and Z. Luo, “Combination of convolutional and recurrent neural network for sentiment analysis of short texts,” in Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan, 2016, pp. 2428–2437.
[25] N.C. Dang, M.N. Moreno-Garc?a, and F. Prieta, “Sentiment analysis based on deep learning: a comparative study,” Electronics, vol. 9, no. 3, p. 483, 2020, doi: 0.3390/electronics9030483.
[26] L. Yang, Y. Li, J. Wang, and R.S. Sherratt, “Sentiment analysis for e-commerce product reviews in chinese based on sentiment lexicon and deep learning,” IEEE Access, vol. 8, pp. 23522–23530, 2020, doi: 10.1109/ACCESS.2020.2969854.
[27] A. Mohammadi, M.R. Pazhohan, and M. Rezaeian, “Determining the polarity of users' opinions and recognizing requests by deep learning techniques in Telegram,” in 4th Conference of Applied Research in Electrical, Mechanical, Computer and Information Technology Engineering, Iran, 2018 [In Persian].
[28] M.E. Basiri, S. Nemati, M. Abdar, E. Cambria, and U.R. Acharya, “ABCDM: an attention-based bidirectional CNN-RNN deep model for sentiment analysis,” Future Gener. Comput. Syst., vol. 115, pp. 279–294, 2021, doi: 10.1016/j.future.2020.08.005.
[29] F. Rustam, M. Khalid, W, Aslam, V. Rupapara, A. Mehmood, and G.S. Choi, “A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis,”  PLOS ONE, vol. 16, no. 2, 2021, doi: 10.1371/journal.pone.0245909.
[30] S. Shabanizade-Rabori, V. Khatibi-Bardsiri, and A. KhatibiBardsiri, “Presentation of a new method for predicting software defect using neural network combination and grasshopper algorithm,” J. Modeling Eng., vol. 17, no. 57, pp. 201-214., 2019, doi: 10.22075/jme.2019.15226.1514 [In Persian].
[31] A. Tripathy, A. Agrawal, and S.K. Rath, “Classification of sentiment reviews using n-gram machine learning approach,” Expert Syst. Appl., vol. 57, pp. 117–126, 2016, doi: 10.1016/j.eswa.2016.03.028.
[32] H. Veisi, H.R. Ghaedsharaf, and M. Ebrahimi, “Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features,” Soft Comput. J., vol. 8, no. 1, pp. 70-85, 2019, doi: 10.22052/8.1.70 [In Persian].
[33] S. Dreiseitl and L. Ohno-Machado, “Logistic regression and artificial neural network classification models: a methodology review,” J. Biomed. Informatics, vol. 35, no. 5, pp. 352–359, 2002, doi: 10.1016/S1532-0464(03)00034-0.
[34] F. Zare-Mehrjardi and M. Rezaian, “Presenting the 3D model of detected facial expressions in video, using deep learning method for computer games,” in 5th International Conference on Computer Games; Opportunities and challenges, Isfahan, Iran, 2018 [In Persian].
[35] R. Rastgoo and K. Kiani, “Face recognition using fine-tuning of Deep Convolutional Neural Network and transfer learning,” J. Modeling Eng., vol. 17, no. 58, pp. 103-111, 2019 doi: 10.22075/jme.2019.16299.1613 [In Persian].
[36] M. Moallem and A.A. Pouyan, “Anomaly Detection using LSTM AutoEncoder,” J. Modeling Eng., vol. 17, no. 56, 191-211, 2019, doi: 10.22075/jme.2018.12979.1270 [In Persian].