ارائه الگوریتم ترکیبی پالایشی-پوششی انتخاب ویژگی‌ و کاربرد آن در کاهش بعد داده های بیان ژن

نویسندگان

دانشگاه قم

چکیده

امروزه بالا رفتن حجم داده‌ها و تعداد ویژگی‌ها‌ در مجموعه داده،­ باعث کاهش دقت الگوریتم یادگیری و پیچیدگی محاسباتی شده است. . روش­های کاهش بعد، نوعی از روش انتخاب مشخصه‌ هستند که به دو صورت پالایشی و پوششی انجام می­شود. دقت روش‌های پوششی نسبت به روش‌های پالایشی بالاتر است اما در مقابل، روش‌های پالایشی­ سریع­تر عمل می­کنند و پیچیدگی­های محاسباتی کمتری دارند. با در نظر گرفتن مزایا و معایب الگوریتم­های پالایشی و پوششی، در این پژوهش یک روش ترکیبی جدید ارائه شده است. در این روش، ابتدا کل مشخصه‌های موجود در مجموعه داده­ در نظر گرفته می­شوند سپس با ترکیب الگوریتم­های پالایشی انتخاب مشخصه‌ و ارزش­گذاری نتایج آن­ به روش پوششی، زیرمجموعه­ای بهینه از مشخصه‌ها انتخاب می­شوند. با توجه به اینکه بسیاری از بیماری­ها و مسائل زیست سیستمی نظیر سرطان، به کمک بررسی داده­ی­ ریزآرایه قابل شناسایی و تشخیص هستند و با توجه به اینکه تعداد مشخصه­ها در این مجموعه داده­ها بسیار بالا است؛ روش­ ارائه شده در این پژوهش برروی داده­ی­ ریزآرایه مربوط به سه نوع سرطان مورد ارزیابی قرار گرفته­ است.این روش، در مقایسه با روش‌های مشابه، به دقت بالایی در دسته‌بندی و شناسایی عوامل مؤثر در سرطان به خصوص سرطان خون دست یافته است.

کلیدواژه‌ها


عنوان مقاله [English]

Developing a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression

نویسندگان [English]

  • zahra roozbahani
  • m yari
  • razieh ghiasi
چکیده [English]

Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With respect to the advantages and disadvantages of the filter and wrapper algorithms, a new hybrid approach is proposed in this study. In the method, all features in the dataset are considered, then the optimal subset of features is selected by combining the feature selection filter algorithms and evaluating their results using the wrapper method. Considering the many diseases and biosystem issues, such as cancer, can be identified and diagnosed by microarray data analysis and considering that there are many features in such datasets, the method proposed in this paper has been evaluated on microarray data related to three types of cancers.  Compared with similar methods, the results show the proposed method benefits from high accuracy in classifying and identifying the affecting factors on cancer.​

کلیدواژه‌ها [English]

  • dimension reduction
  • feature selection
  • filter-wrapper
  • multi layer perception neural network
  • micro-array
  1. [1] Fodor, I.K.,”A survey of dimension reduction techniques,” Technical Report, Lawrence Livermore National Laboratory, 2002. [2] Mishra, D., Sahu, B., “Feature selection for cancer classification: a signal-to-noise ratio approach”, International Journal of Scientific and Engineering Research, Vol.2, No.4, pp.1-7, 2011. [3] Senthamilarasu, S., Hemalatha, M.,” A genetic algorithm based intuitionistic fuzzification technique for attribute selection”, Indian J. Sci. Technol. Vol.6, No.4, pp. 4336–4346, 2013. [4] Hsu, H.H., Hsieh, C.W., Lu, M.D. "Hybrid feature selection by combining filters and wrappers." Expert Systems with Applications, Vol. 38, No.7, pp 8144–8150, 2011. [5] Rezaeenoor J., Yari Eili M., Hadavandi, E., Roozbahani, M.H., “Developing a new hybrid intelligence approach for prediction online news popularity”, International Journal of Information Science and Management, Vol. 16, No 1, pp.71-87, 2018. [6] Rakkeitwinai, S., Lursinsap, C., Aporntewan, Ch., Mutirangura, A. "New feature selection for gene expression classification based on degree of class overlap in principle dimensions", Computers in Biology and Medicine, Vol.64, pp.1-7, 2015 ]7[نبی لو، م.، دانشپور،ن.، «ارائه یک الگوریتم خوشه بندی برای داده‌های دسته‌ای با ترکیب معیارها »، مجله محاسبات نرم، دوره 5، شماره1، 14-25، 1395. ]8[وثیقی ذاکر، ا.، جلیلی،س.، «پیش‌بینی ژن‌های بیماری با استفاده از دسته‌بند تک کلاسی ماشین بردار پشتیبان»، مجله محاسبات نرم، دوره 4، شماره 1، 74-83، 1394. [9] Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B. “Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks”, Bioinformatics, Vol.22, No.14, pp.184–190, 2006. [10] Nanni, L., Lumini, A., “Wavelet selection for disease classification by DNA microarray data”. Expert Systems with Applications, Vol. 38, pp. 990–995, 2011. [11] Segen, J., “Feature selection and constructive inference”. In: Proceedings of Seventh International Conference on Pattern Recognition, pp.1344–1346, 1984. [12] Wang, A., Ning, A., Chen, G., Li, L., Alterovitz, G., “Accelerating wrapper-based feature selection with K-nearest-neighbor”, Knowledge-Based Systems, Vol-83, pp.81-91, 2015. [13] Kira, K. Rendell, L.A., “The feature selection problem: Traditional methods and a new algorithm”, Proceedings of Ninth National Conference on Artificial Intelligence, pp.129–134, 1992. [14] Zhang, L.-X., Wang, J.-X., Zhao, Y.-N., Yang, Z.-H., “A novel hybrid feature selection algorithm: using relief estimation for Ga-wrapper search”, international Conference on Machine Learning and Cybernetics, Vol. 1, pp. 380–384,2003. [15] Dara, S., Banka, H., Annavarapu, C. S. R. “A Rough Based Hybrid Binary PSO Algorithm for Flat Feature Selection and Classification in Gene Expression Data” Annals of Data Science, Vol.4, No.3, pp. 341-360, 2017. [16] Mohd Saberi Mohamad, Sigeru Omatu, Safaai Deris. “Particle swarm optimization for gene selection in classifying cancer classes”, Artificial Life and Robotics, Vol.14, No.1, pp.16-19, 2009. [17] Inza, I.A., Larran aga, P., Blanco, R., Cerrolaza, A.J., “Filter versus wrapper gene selection approaches in DNA microarray domains”, Artificial Intelligence in Medicine, Vol.31, pp 91-103, 2004. [18] Apollonia, J., Leguizam´ona, G., Alba, E., “Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiment”, Applied Soft Computing, Vol. 38, pp. 922-932,2016. [19] Cadenas, J.M., Garrido, M.C., Martínez, R., ” Feature subset selection Filter–Wrapper based on low quality data”, Expert Systems with Applications, Vol.40, PP. 6241–6252, 2013. [18] Frenay, B. and Doquire, G. verleysen M," Is mutual information adequate for feature selection in regression?," Neural Networks, Vol.48, pp 1-7,2013. [19] Blum, A. L., & Langley, P. "Selection of relevant features and examples in machine learning. Artificial Intelligence", vol. 97, No.1, pp. 245–271, 1997. [20] Guyon, I., Elisseeff, A. "An introduction to variable and feature selection." The Journal of Machine Learning Research, Vol.3, pp. 1157–1182, 2003. [21] Doquire, G. and Verleysen, M. "Feature selection with missing data using mutual information estimators," Neurocomputing, Vol.90, pp 3-11, 2012. [22] Jafari, P., and Azuaje, F., “An Assessment of Recently Published Gene Expression Data Analyses: Reporting Experimental Design and Statistical Factors”, BMC Medical Informatics and Decision Making, Vol.6, No.1, 2006. [23] Breitling R, Armengaud P, Amtmann A, Herzyk P.,” Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments”, FEBS Letter, Vol.573, No. 1-3, pp. 83–92, 2004. [24] Peng, H., Long, F., Ding, Ch., “Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy”, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 27, No. 8, 2005. [25] Kononenko, I., “Estimating attributes: analysis and extensions of RELIEF”. European Conference on Machine Learning, Pp.171–182, 1994. [26] Rezaeenoor J., Yari Eili M., Roozbahani Z., Ebrahimi M., “prediction of protein thermostability by an efficient neural network approach”, Journal of health management and informatics, Vol. 3, No 4, pp.102-110, 2016. [27] https://www.a-star.edu.sg/i2r/RESEARCH/DATA-ANALYTICS [28] www.upo.es/eps/aguilar/datasets.html [29] Golub, TR., Slonim, DK., Tamayo, P., Huard, C., Gaasenbeek, M., et al, Molecular classification of cancer Class discovery and class prediction by gene expression monitoring, 1999 15;286(5439):531-7.