Persian Text Classification Based on Deep Neural Networks

Document Type : Original Article

Authors

Department of Computer Engineering, University of Tabriz, Tabriz, Iran

Abstract

Nowadays, according to the growing volume of electronic documents, the classification of text has attracted the attention of information retrieval researchers. Considering the importance of text classification and the efforts done in this field in several languages in the world, the necessity of Persian text classification is understood. In general, we can classify text classification methods into two classes, including traditional methods (based on feature selection and machine learning) and methods based on deep learning. Deep learning methods, due to the ability of weight sharing, significantly reduce the number of trainable parameters and thus increase generalization and provide better results than other methods. There are a few methods based on deep learning for Persian text classification. In this study, we propose to use CNN and BLSTM with an attention layer for Persian text classification named ParsCNN and ParsBiLSTM. The experimental results on the Hamshahri dataset show that the ParsCNN method has a precision of 0.69, a recall of 0.7, and, an F-score of 0.69; Also, the ParsBiLSTM method has a precision of 0.72, a recall of 0.73 and, an F-score of 0.72, which indicates the methods based on deep Neural Networks have better performance than other approaches.

Keywords


[1] M. Nabiloo and N. Daneshpour, “A clustering algorithm for categorical data with combining measures,” Soft Comput. J., vol. 5, no. 1, pp. 14-25, 2016 [In Persian].
[2] A. Khosravi, H. Abdulmaleki, and M. Fayazi, “Predicting the academic status of admitted applicants based on educational and admission data using data mining techniques,” Soft Comput. J., vol. 9, no. 2, pp. 94-113, 2021, doi: 10.22052/scj.2021.242837.0 [In Persian].
[3] J.F. Allen, Natural language processing, in Encyclopedia of computer science, pp. 1218–1222, 2003.
[4] J. Eisenstein, Introduction to natural language processing, MIT press, 2019.
[5] M.K. Dalal and M.A. Zaveri, “Automatic text classification: a technical review,” Int. J. Comput. Appl., vol. 28, no. 2, pp. 37-40, 2011, doi: 10.5120/3358-4633.
[6] A.I. Kadhim, “Survey on supervised machine learning techniques for automatic text classification,” Artif. Intell. Rev., vol. 52, no. 1, pp. 273-292, 2019, doi: 10.1007/S10462-018-09677-1.
[7] A. Addis, “Study and Development of Novel Techniques for Hierarchical Text Categorization," PhD Thesis, Electrical and Electronic Engineering Dept., University of Cagliari, Italy, 2010.
[8] Y. Bengio, I. Goodfellow, and A. Courville, Deep learning, vol. 1, MIT press Massachusetts, USA, 2017.
[9] J.D. Kelleher, Deep learning, MIT press, 2019.
[10] W.T. Yih, K. Toutanova, J.C. Platt, and C. Meek, “Learning discriminative projections for text similarity measures,” in Proc. 5th Conf. Comput. Natural Lang. Learn., Portland, Oregon, USA, June 23-24, 2011, pp. 247–256.
[11] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model”, J. Mach. Learn. Res., vol. 3, pp. 1137-1155, 2003. 
[12] T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, Lake Tahoe, Nevada, United States, 2013, pp. 3111–3119.
[13] J.A. Bullinaria and J.P. Levy, “Extracting semantic representations from word co-occurrence statistics: A computational study,” Behav. Res. methods, vol. 39, no. 3, pp. 510–526, 2007, doi: 10.3758/BF03193020.
[14] R. Cohen, Y. Goldberg, and M. Elhadad, “Domain adaptation of a dependency parser with a class-class selectional preference model,” in Proceedings of ACL 2012 Student Research Workshop, 2012, pp. 43–48.
[15] A. Ritter and O. Etzioni, “A latent dirichlet allocation method for selectional preferences,” in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 424–434.
[16] D.O. Seaghdha, “Latent variable models of selectional preference”, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 2010, pp. 435–444.
[17] M.T. Pilehvar and J. Camacho-Collados, “Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning,” Synthesis Lectures on Human Language Technologies, vol. 13, no. 4, pp. 1–175, 2020, doi: 10.2200/S01057ED1V01Y202009HLT047.
[18] Y. Lee, H. Ke, T. Yen, H. Huang, and H. Chen, “Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement,” J. Assoc. info. Sci. Technol., vol. 71, no. 6, pp. 657–670, 2020, doi: 10.1002/ASI.24289.
[19] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, Doha, Qatar, 2014, pp. 1724-1734, doi: 10.3115/V1/D14-1179.
[20] H. Xu, J. van Genabith, D. Xiong, Q. Liu, and J. Zhang, “Learning Source Phrase Representations for Neural Machine Translation,” Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, 2020, pp. 386-396, doi: 10.18653/v1/2020.acl-main.37.
[21] C. dos Santos and M. Gatti, “Deep convolutional neural networks for sentiment analysis of short texts,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, 2014, pp. 69–78.
[22] A. Severyn and A. Moschitti, “Twitter sentiment analysis with deep convolutional neural networks,”  in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 2015, pp. 959-962, doi: 10.1145/2766462.2767830.
[23] A. Agarwal, A. Yadav, and D.K. Vishwakarma, “Multimodal sentiment analysis via RNN variants,” in 2019 IEEE International Conference on Big Data, Cloud Computing, Data Science and Engineering (BCD), Honolulu, HI, USA, 2019, pp. 19–23, doi: 10.1109/BCD.2019.8885108.
[24] B.N. Saha, A. Senapati, and A. Mahajan, “LSTM based Deep RNN Architecture for Election Sentiment Analysis from Bengali Newspaper,” in International Conference on Computational Performance Evaluation (ComPE), Shillong, India, 2020, pp. 564-569, doi: 10.1109/ComPE49325.2020.9200062.
[25] M.E. Basiri, S. Nemati, M. Abdar, E. Cambria, and U.R. Acharya, “ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis,” Future Gener. Comput. Syst., vol. 115, pp. 279–294, 2021, doi: 10.1016/j.future.2020.08.005.
[26] P.F. Muhammad, R. Kusumaningrum, and A. Wibowo, “Sentiment analysis using Word2Vec and long short-term memory (LSTM) for Indonesian hotel reviews,” Procedia Comput. Sci., vol. 179, pp. 728–735, 2021, doi: 10.1016/j.procs.2021.01.061.
[27] S. Poria, E. Cambria, and A. Gelbukh, “Aspect extraction for opinion mining with a deep convolutional neural network,” Knowl. Based Syst., vol. 108, pp. 42–49, 2016, doi: 10.1016/j.knosys.2016.06.009.
[28] T. Singh, A. Nayyar, and A. Solanki, “Multilingual opinion mining movie recommendation system using RNN,” in Proceedings of First International Conference on Computing, Communications, and Cyber-Security (IC4S 2019), Springer, Singapore,2020, pp. 589–605, doi: 10.1007/978-981-15-3369-3_44.
[29] T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S. Khudanpur, “Extensions of recurrent neural network language model,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague Congress Center, Prague, Czech Republic, 2011, pp. 5528–5531, doi: 10.1109/ICASSP.2011.5947611.
[30] A. Mnih and G.E. Hinton, “A scalable hierarchical distributed language model,” in Advances in neural information processing systems, British Columbia, Canada, 2008, pp. 1081–1088.
[31] R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, Helsinki, Finland, 2008, pp. 160–167, doi: 10.1145/1390156.1390177.
[32] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in 1st International Conference on Learning Representations, ICLR, Scottsdale, Arizona, USA, 2013.
[33] J.Pennington, R. Socher, and C.D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, Doha, Qatar, 2014, pp. 1532–1543, doi: 10.3115/v1/d14-1162.
[34] W. Zhang, X. Tang, and T. Yoshida, “Tesc: An approach to text classification using semi-supervised clustering,” Knowl. Based Syst., vol. 75, pp. 152–160, 2015, doi: 10.1016/j.knosys.2014.11.028.
[35] M. Pavlinek and V. Podgorelec, “Text classification method based on self-training and LDA topic models,” Expert Syst. Appl., vol. 80, pp. 83–93, 2017, doi: 10.1016/j.eswa.2017.03.020.
[36] B. Agarwal and N. Mittal, “Text classification using machine learning methods-a survey,” in Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS), Jaipur, India,2014, pp. 701–709, New Delhi, doi: 10.1007/978-81-322-1602-5_75.
[37] S.H.     Mohammed and S. Al-augby, “LSA & LDA Topic Modeling Classification: Comparison study on E-books,” Indonesian J. Electr. Eng. Comput. Sci., vol. 19, no. 1, pp. 353–362, 2020, doi: 10.11591/ijeecs.v19.i1.pp353-362.
[38] W. Wang, B. Guo, Y. Shen, H. Yang, Y. Chen, and X. Suo, “Twin labeled LDA: a supervised topic model for document classification”, Appl. Intell., vol. 50, no. 12, pp. 4602–4615, 2020, doi: 10.1007/s10489-020-01798-x.
[39] L. Xu, Z. Xue, and H. Huang, “Short text semantic feature extension and classification based on LDA,” in IOP Conf. Ser.: Mater. Sci. Eng., Shanghai, China, 2020, vol. 715, no. 1, p. 12110, doi: 10.1088/1757-899X/715/1/012110.
[40] P. Zhang and Y. Fang, “Research on Text Classification Algorithm Based on Machine Learning,” in Journal of Physics: Conference Series, vol. 1624, no. 4, p. 42010, 2020, doi: 10.1088/1742-6596/1624/4/042010.
[41] K. Shah, H. Patel, D. Sanghvi, and M. Shah, “A comparative analysis of logistic regression, random forest and KNN models for the text classification,” Augment. Hum. Res., vol. 5, no. 1, pp. 1–16, 2020, doi: 10.1007/s41133-020-00032-0.
[42] M. Parchami, B. Akhtar, and M. Dezfoulian, “Persian text classification based on K-NN using wordnet,” in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Dalian, China,2012, pp. 283–291, doi: 10.1007/978-3-642-31087-4_30.
[43] J. Pouramini and B. Minaei-Bidgoli, “A New Synthetic Oversampling Method Using Ontology and Feature Selection in Order to Improve Imbalanced Textual Data Classification in Persian Texts,” Bulletin de la Societe Royale des Sciences de Liege, vol. 85, pp. 358–375, 2016, doi: 10.25518/0037-9565.5414.
[44] M. Farhoodi and A. Yari, “Applying machine learning algorithms for automatic Persian text classification,” in 6th International Conference on Advanced Information Management and Service (IMS), Seoul, Korea (South), 2010, pp. 318–323.
[45] N. Rezaeian and G. Novikova, “Persian Text Classification using naive Bayes algorithms and Support Vector Machine algorithm,” Indonesian J. Electr. Eng. Informatics, vol. 8, no. 1, pp. 178–188, 2020, doi: 10.52549/ijeei.v8i1.1696.
[46] S. Mostafavi, B. Pahlevanzadeh, and M.R. Falahati Qadimi Fumani, “Classification of Persian News Articles using Machine Learning Techniques,” Comput. Knowl. Eng., vol. 4, no. 1, pp. 1-10, 2021, doi: 10.22067/cke.2021.69212.1004.
[47] M. Farhoodi, A. Yari, and A. Sayah, “N-gram based text classification for Persian newspaper corpus,” in The 7th International Conference on Digital Content, Multimedia Technology and its Applications, Busan, Korea (South), 2011, pp. 55–59.
[48] A.H. Jadidinejad and V. Marza, “Building Semantic Kernel for Persian Text Classification with a Small Amount of Training Data,” J. Adv. Comput. Res., vol. 6, no. 1, pp. 125–136, 2015.
[49] R. Johnson and T. Zhang, “Effective use of word order for text categorization with convolutional neural networks,” in the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, 2015, pp. 103-112, doi: 10.3115/v1/n15-1011.
[50] Y. Kim, “Convolutional neural networks for sentence classification,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP, Doha, Qatar, 2014, pp. 1746-1751, doi: 10.3115/v1/d14-1181.
[51] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL, Baltimore, MD, USA, 2014, pp. 655-665,doi: 10.3115/v1/p14-1062.
[52] C. Zhou, C. Sun, Z. Liu, and F. Lau, “A C-LSTM neural network for text classification”, arXiv Prepr. arXiv1511.08630, 2015.
[53] B. Jang, M. Kim, G. Harerimana, S. Kang, and J.W. Kim, “Bi-LSTM model to increase accuracy in text classification: combining Word2vec CNN and attention mechanism,” Appl. Sci., vol. 10, no. 17, p. 5841, 2020, doi: 10.3390/app10175841.
[54] Z. Liu, H. Huang, C. Lu, and S. Lyu, “Multichannel CNN with Attention for Text Classification,” arXiv Prepr. arXiv2006.16174, 2020.
[55] P. Wang, J. Xu, B. Xu, C. Liu, H. Zhang, F. Wang, and H. Hao, “Semantic clustering and convolutional neural network for short text categorization,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 2015, vol. 2, pp. 352–357, doi: 10.3115/v1/p15-2058.
[56] M. Iyyer, V. Manjunatha, J. Boyd-Graber, and H. Daume III, “Deep unordered composition rivals syntactic methods for text classification,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 2015, vol. 1, pp. 1681–1691, doi: 10.3115/v1/p15-1162.
[57] A.K. Sharma, S. Chaurasia, and D.K. Srivastava, “Sentimental Short Sentences Classification by Using CNN Deep Learning Model with Fine Tuned Word2Vec,” Procedia Comput. Sci., vol. 167, pp. 1139–1147, 2020, doi: 10.1016/j.procs.2020.03.416.
[58] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in neural information processing systems, Montreal, Quebec, Canada, 2015, pp. 649–657.
[59] X. Zhang and Y. LeCun, “Text understanding from scratch”, arXiv Prepr. arXiv1502.01710, 2015.
[60] A.N. Samatin Njikam and H. Zhao,  “Chartec-net: An efficient and lightweight character-based convolutional network for text classification”, J. Electr. Comput. Eng., vol. 2020, pp. 9701427:1-9701427:7, 2020, doi: 10.1155/2020/9701427.
[61] A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014, doi: 10.1126/science.124207.
[62] S. Ghasemi and A.H. Jadidinejad, “Persian text classification via character-level convolutional neural networks,” in 8th Conference of AI & Robotics and 10th RoboCup Iranopen International Symposium (IRANOPEN), Qazvin, Iran, 2018, pp. 1–6, doi: 10.1109/RIOS.2018.8406623.
[63] S. Minaee, N. Kalchbrenner, E. Cambria, N. Nikzad, M. Chenaghlu, and J. Gao, “Deep learning--based text classification: a comprehensive review,” ACM Comput. Surv., vol. 54, no. 3, pp. 1–40, 2021, doi: 10.1145/3439726.
[64] R. Socher, A. Perelygin, J. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 conference on empirical methods in natural language processing, Grand Hyatt Seattle, Seattle, Washington, USA, 2013, pp. 1631–1642.
[65] X. Li and D. Roth, “Learning question classifiers”, in 19th International Conference on Computational Linguistics, Howard International House and Academia Sinica, Taipei, Taiwan, 2002.
[66] [66]    Phan, X.-H., Nguyen, L.-M., Horiguchi, S., “Learning to classify short and sparse text & web with hidden topics from large-scale data collections”, in Proceedings of the 17th international conference on World Wide Web, pp. 91–100, 2008.
[67] K. Kowsari, K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey”, Inf., vol. 10, no. 4, p. 150, 2019, doi: 10.3390/info10040150.
[68] A. Qanbari-Sorkhi, H. Hassanpour, M. Fateh, “Regions Proposal Selection in Objects Detection and Recogntion Systems,” Soft Comput. J., vol. 5, no. 2, pp. 34-47, 2017, dor: 20.1001.1.23223707.1395.5.2.4.1 [In Persian].
[69] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “Imagenet classification with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, pp. 84-90, 2017, doi: 10.1145/3065386.
[70] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998, doi: 10.1109/5.726791.
[71] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, 2016, pp. 1480–1489, doi: 10.18653/v1/n16-1174.
[72] S. Hochreiter and J. Schmidhuber, “Long short-term memory”, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
[73] A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005, doi: 10.1016/j.neunet.2005.06.042.
[74] A. Graves, A. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in IEEE international conference on Acoustics, speech and signal processing, Vancouver, BC, Canada, 2013, pp. 6645–6649, doi: 10.1109/ICASSP.2013.6638947.
[75] A. AleAhmad, H. Amiri, E. Darrudi, M. Rahgozar, anf F. Oroumchian, “Hamshahri: A standard Persian text collection”, Knowl. Based Syst., vol. 22, no. 5, pp. 382–387, 2009, doi: 10.1016/j.knosys.2009.05.002.
[76] Hamshahri, (2022), [Online]. Available: http://dbrg.ut.ac.ir/Hamshahri/
[77] S. Lai, L. Xu, K. Liu, and J. Zhao, “Recurrent convolutional neural networks for text classification,” in roceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, Texas, USA, 2015, pp. 2267-2273, doi: 10.1609/aaai.v29i1.9513.
[78] S.I. Wang and C.D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, 2012, pp. 90–94.