Exploiting Big Data Technology for Opinion Mining

Document Type : Original Article

Authors

1 Faculty of Engineering, Valiasr University of Rafsanjan, Rafsanjan, Iran

2 Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

Abstract

Reviews have an important role in the decision-making process either for customers or commercial organizations. Hence, it is necessary to develop methods that mine the reviews of customers automatically. This task is referred to as Opinion mining or sentiment analysis. Opinion mining covers a wide range of sub-problems such as text mining, natural language processing, classification, etc. However, with the fast growth of opinion data on the web, the opinion mining process faces some serious problems. Storing, managing and processing such a large volume of data using the traditional approaches are very hard and, in some cases, impossible. While there are various studies in the opinion mining area, there are few studies that address the problem of mining the sentiments of the large volume of Persian texts. In this paper, we propose two approaches for sentiment analysis of Persian reviews. These approaches are developed based on a Persian sentiment lexicon and a programming language model in the MapReduce distributed systems with the Hadoop framework. We examined our proposed approaches with various stations and discuss the effectiveness of the Big Data technology for the opinion mining task. The results showed that not only for high volumes but also for volumes of about 20 MB, a 100-fold increase in efficiency was observed.

Keywords


[1] Yadav A., Vishwakarma D. K., “Sentiment analysis using deep learning architectures: a review”. Artificial Intelligence Review, 53(6): 4335-4385. 2020.
[2] Shayaa S., Jaafar N.I., Bahri S., Sulaiman A., Wai P.S., Chung Y.W., Piprani A.Z., Al-Garadi M.A., “Sentiment analysis of big data: Methods, applications, and open challenges”. IEEE Access, 6:37807-37827, 2018.
[3] Hasan M.M., Popp J., Oláh J., “Current landscape and influence of big data on finance”. Journal of Big Data, 7(1):1-17, 2020.
[4] Liu B., Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020.
[5] فرهمندپور ز.، نیک مهر ه.، منصوری زاده م.، طیب‌زاده قمصری ا.، «یک سیستم نوین هوشمند تشخیص هویت نویسنده فارسی زبان بر اساس سبک نوشتاری»، مجله محاسبات نرم، جلد 1، شماره 2، ص 35-26، 1391.
[6] Park D., Lee J., Han I., “The effect of on-line consumer reviews on consumer purchasing intention: The moderating role of involvement”, International Journal of Electronic Commerce, 11(4):125-148, 2007.
[7] Sehgal D., Agarwal A.K., “Real-time Sentiment Analysis of Big Data Applications Using Twitter Data with Hadoop Framework”, Soft Computing: Theories and Applications, Springer, Singapore, pp. 765-772, 2018.
[8] Mihanović A., Gabelica H., Krstić Ž., “Big data and sentiment analysis using KNIME_Online reviews vs. social media”, In Information and Communication Technology, Electronics and Microelectronics, pp. 1464-1468, 2014.
[9] Cui Y., Kara S., Chan K. C., “Manufacturing big data ecosystem: A systematic literature review”. Robotics and computer-integrated Manufacturing, 62:101861, 2020.
[10] Dean J., Ghemawat S., “Mapreduce: Simplified data processing on large clusters”, Communications of the ACM 51(1):107-113, 2004.
[11] Pang B., Lee L., Vaithyanathan S., “Thumbs up? sentiment classification using machine learning techniques”. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79-86, 2002.
[12] Turney P., “Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews”. In Proceedings of Annual Meeting of the Association for Computational Linguistics (ACL), pp. 417-424, 2002.
[13] Kucuktunc O., Cambazoglu B., Weber I., Ferhatosmanoglu H., “A large-scale sentiment analysis for Yahoo! Answers”, Proceedings of the fifth ACM international conference on Web search and data mining, ACM. pp. 633-642, 2012.
[14] Khuc V., Shivade C., Ramnath R., Ramanathan J., “Towards building large scale distributed systems for twitter sentiment analysis”, In Proceedings of the 27th annual ACM symposium on applied computing, pp. 459-464, 2012.
[15] Dipty S., “Study of Sentiment Analysis Using Hadoop”, Big Data Analytics. Springer, Singapore, pp. 363-376, 2018.
[16] Jena R.K., “Sentiment mining in a collaborative learning environment: capitalising on big data”. Behaviour & Information Technology, 38(9): 986-1001, 2019.
[17] Zahedi E., Baniasadi Z., Saraee M., “A distributed joint sentiment and topic modeling using Spark for big opinion mining”. In Electrical Engineering (ICEE), Iranian Conference on. IEEE, pp. 1475-1480, 2017.
[18] Lin C., He Y., Everson R., Ruger S., “Weakly supervised joint sentiment-topic detection from text”. IEEE Transactions on Knowledge and Data engineering, 24(6):1134-1145, 2012.
[19] Benedetto F., Tedeschi A., “Big Data Sentiment Analysis for Brand Monitoring in Social Media Streams by Cloud Computing”, Sentiment Analysis and Ontology Engineering, Springer International Publishing, pp. 341-377, 2016.
[20] هراتیان اول ن.، صفائی ع.، «کشف سرویس‌های ابری در زبان فارسی از طریق تکامل هستان شناسی»، مجله محاسبات نرم، جلد 4، شماره 2، ص 93-84، 1394.
[21] عسگریان ا.، کاهانی م.، شریفی ش.، «حسنگار: شبکه واژگان حسی فارسی»، پردازش علائم و داده‌ها دوره 15، شماره 1-پ 15، ص 86-71، 1397.
[22] Asgarian E., Kahani M., Sharifi S., “The Impact of Sentiment Features on the Sentiment Polarity Classification in Persian Reviews”, Cognitive Computation, pp. 1-19, 2017.
[23] نجفی ح.، دانش پور ن.، «بهبود فرآیند استخراج، تبدیل و بارگذاری در پایگاه داده تحلیلی با کمک پردازش موازی»، مجله محاسبات نرم، جلد 4 شماره 2، ص 31-18، 1394.