پیش‌بینی زمان بقا عاری از بیماری تهاجمی در بیماران مبتلا به سرطان پستان با به‌کارگیری روش‌های یادگیری ماشین نیمه نظارتی مبتنی بر گراف

نوع مقاله : مقاله پژوهشی

نویسندگان

1 گروه مهندسی کامپیوتر، واحد علوم و تحقیقات، دانشگاه آزاد اسلامی، تهران، ایران

2 پژوهشگاه ملی مهندسی ژنتیک و زیست فناوری، تهران، ایران

3 دانشکده مهندسی برق و کامپیوتر، دانشکدگان فنی، دانشگاه تهران، تهران، ایران.

10.22052/scj.2022.243330.1039

چکیده

سرطان پستان در حال حاضر شایع‌ترین سرطان تشخیص‌داده ‌شده و علت اصلی مرگ‌ و میر ناشی از سرطان در زنان در سراسر جهان است. در سال‌های اخیر در حوزه مطالعات سرطان پستان و روند درمان این بیماری، تحلیل زمان بقای بیماران مبتلا، بسیار مورد توجه بوده است. انتخاب مدل مناسب برای تحلیل زمان بقا چالش اصلی در تحلیل بقا این بیماران است. در این پژوهش کاربردی به کمک روش‌های یادگیری ماشین نیمه نظارتی مبتنی بر گراف، مدلی برای تحلیل بقا بیماران مبتلا به سرطان پستان پیشنهاد شده است. اطلاعات بالینی و فارماکوژنومیکی، به همراه نتایج مصرف داروی تاموکسی‌فن در فرایند درمان سرطان تهاجمی مربوط به ۳۸۳۳ بیمار مبتلا به سرطان پستان که در بازه ۵ سال تحت پیگیری بوده‌اند، مورد استفاده قرار گرفته است. همچنین با شبیه‌سازی مدل‌ها در نرم‌افزار متلب، عملکرد مدل پیشنهادی در تخمین زمان بقای عاری از بیماری تهاجمی و سایر پارامترهای بقا با مدل‌های رایج تحلیل بقا، مورد ارزیابی قرار گرفته است. نتایج نشان می‌دهد که با به‌کارگیری مدل پیشنهادی تحلیل بقا در پیش‌بینی زمان بقا عاری از سرطان پستان تهاجمی و همچنین استفاده ترکیبی از ویژگی‌های بالینی و فارماکوژنومیکی، دقت پیش‌بینی ۱۴ درصد بیشتر از زمانی بود که فقط از ویژگی‌های بالینی استفاده شد و ۱۵ درصد بیشتر از زمانی بود که فقط ویژگی‌های فارماکوژنومیکی به کار گرفته شد. علاوه بر این، مدل پیشنهادی تحلیل بقا در پیش‌بینی زمان بقا عاری از بیماری تهاجمی و پارامتر نسبت خطر در مقایسه با مدل‌های رایج تحلیل بقا دقت بالاتری داشته است.

کلیدواژه‌ها


عنوان مقاله [English]

Predicting Invasive Disease-Free Survival Time in Breast Cancer Patients Using Graph-based Semi-Supervised Machine Learning Techniques

نویسندگان [English]

  • Ramazan Taimourei-Yansary 1
  • Mitra Mirzarezaee 1
  • Mehdi Sadeghi 2
  • Babak Nadjar Araabi 3
1 Department of computer, science and research branch, Islamic Azad University, Tehran, Iran
2 National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran
3 School of Electrical and Computer Engineering, University College of Engineering, University of Tehran, Tehran, Iran
چکیده [English]

Breast cancer is currently the most frequently diagnosed cancer and leading cause of cancer death in women worldwide. Researchers have been researching the best treatment for breast cancer. Their focus has been on preventing recurrence after the initial treatment of patients. Choosing the appropriate model for survival time analysis is the main challenge in the survival analysis of these patients. In this applied research, using graph-based semi-supervised learning method a new survival analysis model is proposed for analyzing the survival of breast cancer patients. Also, a dataset of 3833 patients with breast cancer who were followed up for 5 years was used. The clinical and pharmacogenomics information, including the results of Tamoxifen use and effect on the treatment of invasive cancer, was also available and used. To validate the proposed model, by introducing the evaluation parameters and simulation of the models in MATLAB software, the model performance is compared with previous models of survival analysis. The results demonstrate that by applying the proposed model in predicting invasive disease-free survival time and when a combination of the clinical and the pharmacogenomics features were used the estimation accuracy was 14% higher than when only the clinical features were used. Moreover, the estimated accuracy was 15% higher than when only the pharmacogenomics features were used. The proposed survival analysis model has a high capability for identifying survival risk and high accuracy in predicting patients' survival time.

کلیدواژه‌ها [English]

  • AFT Model
  • Breast cancer
  • Cox-PH model
  • Invasive Disease-free Survival Time
  • Graph-based Semi-supervised learning
  • Machine learning
  • Survival analysis
  • Tamoxifen
[1] Howell A., Sims A. H., Ong K. R., Harvie M. N., Evans D. G. R., and Clarke R. B., “Mechanisms of Disease: prediction and prevention of breast cancer--cellular and molecular interactions”, Nat Clin Pract Oncol, 2(12):635-646, 2005, doi:10.1038/ncponc0361.
[2] Mego M., Mani S.A., and Cristofanilli M., “Molecular mechanisms of metastasis in breast cancer—clinical applications”. Nature reviews Clinical oncology, 7(12): 693-701, 2010, doi:10.1038/nrclinonc.2010.171.
[3] Sharma G.N., Dave R., Sanadya J., Sharma P., and Sharma K. K., “Various types and management of breast cancer: an overview”, Journal of advanced pharmaceutical technology and research, 1(2): 109-126, 2010.
[4] Dean L., Pratt V. M., Scott S. A., Pirmohamed M., Esquivel B., Kane M. S., Kattman B. L., and Malheiro A. J., “Tamoxifen Therapy and CYP2D6 Genotype”, in National Center for Biotechnology Information (US), 2014. 
[5] Schultink A. H. M. V., Zwart W., Linn S. C., Beijnen J. H., and Huitema A. D. R., “Effects of Pharmacogenetics on the Pharmacokinetics and Pharmacodynamics of Tamoxifen”, Clinical Pharmacokinetics, 54(8): 797-810, 2015, doi:10.1007/s40262-015-0273-3.
[6] George B., Seals S., and Aban I., “Survival analysis and regression models”, J. Nucl. Cardiol, 21(4): 686-694, 2014, doi:10.1007/s12350-014-9908-2.
[7] Austin P. C., Lee D. S., and Fine J. P., “Introduction to the Analysis of Survival Data in the Presence of Competing Risks”, Circulation, 133(6): 601-609, 2016, doi:10.1161/CIRCULATIONAHA.115.017719.
[8] Prinja S., Gupta N., and Verma R., “Censoring in clinical trials: review of survival analysis techniques”, Indian journal of community medicine: official publication of Indian Association of Preventive and Social Medicine, 35(2): 217–221, 2010, doi:10.4103/0970-0218.66859.
[9] Stel V. S., Dekker F. W., Tripepi G., Zoccali C., and Jager K. J., “Survival Analysis I: The Kaplan-Meier Method”, Nephron Clinical Practice, 119(1): c83-c88, 2011, doi:10.1159/000324758.
[10] Kleinbaum D.G. and Klein M., “Survival analysis”, Vol. 3, Springer, 2010.
[11] Zare, A., Hosseini M., Mahmoodi M., Mohammad K., Zeraati H., Holakouie-Naieni K., “A Comparison between Accelerated Failure-time and Cox Proportional Hazard Models in Analyzing the Survival of Gastric Cancer Patients”, Iranian Journal of Public Health, 44(8): 1095-1102, 2015, doi:10.7314/APJCP.2015.16.18.8567.
[12] Zhu X., “Semi-supervised learning literature survey”, 2005, http://digital.library.wisc.edu/1793/60444.
[13] صادق زاده ن.، شمسی م.، رسولی کناری ع.، «حاشیه‌نویسی تصویر با استفاده از الگوریتم خوشه‌بندی نیمه‌نظارتی طیفی»، مجله محاسبات نرم، جلد 3، شماره 1، ص 35-20، 1393.
[14] Zhu X. and Goldberg A. B., “Introduction to Semi-Supervised Learning”, Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1): 1-130, 2009, doi:10.2200/S00196ED1V01Y200906AIM006.
[15] Subramanya A. and Talukdar P. P., “Graph-based semi-supervised learning”, Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(4): 1-125, 2014.
[16] Zhou D., Bousquet O., Lal T. N., Weston J., and Sch?lkopf B., “Learning with local and global consistency”, MIT Press, pp. 321-328, 2003.
[17] Jiang X., Xue D., Brufsky A., Khan S., and Neapolitan R., “A New Method for Predicting Patient Survivorship Using Efficient Bayesian Network Learning”, Cancer Informatics,13: CIN.S13053, 2014, doi:10.4137/CIN.S13053.
[18] Bashiri A., Ghazisaeedi M., Safdari R., Shahmoradi L., and Ehtesham H., “Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review”, Iranian Journal of Public Health, 46(2): 165-172, 2017.
[19] Endo A., Shibata T., and Tanaka H., “Comparison of seven algorithms to predict breast Cancer survival”, International Journal of Biomedical Soft Computing and Human Sciences: the official journal of the Biomedical Fuzzy Systems Association, 13(2):11-16, 2008, doi:10.24466/ijbschs.13.2_11.
[20] Bagherian H., Javanmard S. H., Sharifi M., and Sattari M., “Using data mining techniques for predicting the survival rateof breast cancer patients: a review article”, Tehran University Medical Journal, 79(3): 176-186, 2021.
[21] Ghasemi F., Rasekhi A., and Haghighat S., “Analyzing the Survival of Breast Cancer Patients Using Weibull and Poisson Beta-Weibull Non-Mixture Cure Models”, Pejouhesh dar Pezeshki (Research in Medicine), 42(4): 236-242, 2018.
[22] Sarkar K., Chowdhury R., and Dasgupta A., “Analysis of Survival Data: Challenges and Algorithm-Based Model Selection”, Journal of Clinical and Diagnostic Research: JCDR, 11(6): LC14-LC20, 2017.
[23] Kiani B. and Atashi A., “A Prognostic Model Based on Data Mining Techniques to Predict Breast Cancer Recurrence”, Journal of Health and Biomedical Informatics, 1(1): 26-31, 2014.
[24] Sadeghi S. and Golabpour A., “An Algorithm for Predicting Recurrence of Breast Cancer Using Genetic Algorithm and Nearest Neighbor Algorithm”, Journal of Health and Biomedical Informatics, 6(4):  309-319, 2020.
[25] Moller P., Evans D. G., Reis M. M., Gregory H., Anderson E., Maehle L., Lalloo F., Howell A., Apold J., Clark N., Lucassen A., and Steel C. M., “Surveillance for familial breast cancer: Differences in outcome according to BRCA mutation status”, Int. J. Cancer, 121(5): 1017-20, 2007, doi:10.1002/ijc.22789.
[26] Liang Y., Chai H., Liu X.-Y., Xu Z.-B., Zhang H., and Leung K.-S., “Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L(1/2) regularization”, BMC Medical Genomics, 9(1):1-11, 2016, doi:10.1186/s12920-016-0169-6.
[27] Chai H., Li Z.-N., Meng D.-Y., Xia L.-Y., and Liang Y., “A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis”, Sci. Rep., 7(1): 13053-13065, 2017, doi:10.1038/s41598-017-13133-5.
[28] کی‌پور ا.، براری م.، شیرازی ح.، «پیشگویی پیوند در شبکه‌های اجتماعی با استفاده از ترکیب دسته‌بندی‌کننده‌ها»، مجله محاسبات نرم، جلد 4، شماره 2، ص 17-2، 1394.
[29] Tharwat A., “Classification assessment methods”, Applied Computing and Informatics, 17(1): 168-192,  2020, doi:10.1016/j.aci.2018.08.003.
[30] ویسی ه.، قایدشرف ح.ر.، ابراهیمی م.، «بهبود کارایی الگوریتم‌های یادگیری ماشین در تشخیص بیماری‌های قلبی با بهینه‌سازی داده‌ها و ویژگی‌ها»، مجله محاسبات نرم، جلد 8، شماره 1، ص 85-70، 1398.
[31] Delen D., Walker G., and Kadam A., “Predicting breast cancer survivability: a comparison of three data mining methods”, Artificial Intelligence in Medicine, 34(2): 113-127, 2005, doi:10.1016/j.artmed.2004.07.002.
[32] Kiyotani K., Mushiroda T., Sasa M., Bando Y., Sumitomo I., Hosono N., Kubo M., Nakamura Y., and Zembutsu  H., “Impact of CYP2D6*10 on recurrence-free survival in breast cancer patients receiving adjuvant tamoxifen therapy”, Cancer Science, 99(5): 995-999, 2008, doi:10.1111/j.1349-7006.2008.00780.x.
[33] Brauch H. and Schwab M., “Prediction of tamoxifen outcome by genetic variation of CYP2D6 in post-menopausal women with early breast cancer”, British journal of clinical pharmacology, 77(4): 695-703, 2014, doi:10.1111/bcp.12229.
[34] Province M.A., et al., “CYP2D6 Genotype and Adjuvant Tamoxifen: Meta-Analysis of Heterogeneous Study Populations”, Clinical Pharmacology and Therapeutics, 95(2): 216-227, 2014, doi:10.1038/clpt.2013.186.
[35] Zembutsu H., “Pharmacogenomics toward personalized tamoxifen therapy for breast cancer”, Pharmacogenomics, 16(3): 287-296, 2015, doi:10.2217/pgs.14.171.
[36] Afshar H. L., Ahmadi M., Roudbari M., and Sadoughi F., “Prediction of breast cancer survival through knowledge discovery in databases”, Global journal of health science, 7(4): 392-398, 2015, doi: 10.5539/gjhs.v7n4p392.
[37] Saadatmand S. H. A., Sadeghi A., and Mohaghghegh F., “Study on association of Single Nucleotide Polymorphism in ESR? Gene rs2234693 With Breast Cancer in Markazi Province”, Arak Medical University Journal (AMUJ), 17(12) 32-38, 2015.
[38] Lei L., Wang X., Wu X.-D., Wang Z., Chen Z.-H., Zheng Y.-B., and Wang X.-J., “Association of CYP2D6*10 (c.100C>T) polymorphisms with clinical outcome of breast cancer after tamoxifen adjuvant endocrine therapy in Chinese population”, American Journal of Translational Research, 8(8): 3585-3592, 2016.
[39] Charoenchokthavee W., Panomvana D., Sriuranpong V., and Areepium N., “Prevalence of CYP2D6*2, CYP2D6*4, CYP2D6*10, and CYP3A5*3 in Thai breast cancer patients undergoing tamoxifen treatment”, Breast Cancer: Targets and Therapy, 8: 149-155, 2016, doi:10.2147/BCTT.S105563.
[40] Khalkhali H. R., Afshar H. L., Esnaashari O., and Jabbari N., “Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study”, Journal of research in health sciences, 16(1): 31-35, 2016.
[41] Damkier P., Kj?rsgaard A., Barker K. A., Cronin-Fenton D., Crawford A., Hellberg Y., Janssen E. A. M., Langefeld C., Ahern T. P., and Lash T. L., “CYP2C19*2 and CYP2C19*17 variants and effect of tamoxifen on breast cancer recurrence: Analysis of the International Tamoxifen Pharmacogenomics Consortium dataset”, Scientific Reports, 7(1): 1-8, 2017, doi:10.1038/s41598-017-08091-x.
[42] Wang L., Qian Q., Zhang Q., Wang J., Cheng W., and Yan W., “Classification Model on Big Data in Medical Diagnosis Based on Semi-Supervised Learning”, The Computer Journal, 65(2): 177-191, 2020, doi: 10.1093/comjnl/bxaa006.
[43] Afshar H. L., Jabbari N., Khalkhali H. R., and Esnaashari O., “Prediction of breast cancer survival by machine learning methods: An application of multiple imputation”, Iranian Journal of Public Health, 50(3): 598-605, 2021, doi:10.18502/ijph.v50i3.5606.
[44] Sanchez-Spitman A. B., Swen J. J., Dezentjé V. O., Moes D. J. A. R., Gelderblom H., and Guchelaar H. J., “Effect of CYP2C19 genotypes on tamoxifen metabolism and early-breast cancer relapse”, Scientific Reports, 11(1): 415, 2021, doi:10.1038/s41598-020-79972-x.
[45] Mulder T. A. M., de With M., Del Re M., Danesi R., Mathijssen R. H. J., and van Schaik R. H. N., “Clinical CYP2D6 Genotyping to Personalize Adjuvant Tamoxifen Treatment in ER-Positive Breast Cancer Patients: Current Status of a Controversy”, Cancers, 13(4): 771, 2021, doi:10.3390/cancers13040771.
[46] Al-Azzam N. and Shatnawi I., “Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer”, Annals of Medicine and Surgery, 62: 53-64, 2021, doi:10.1016/j.amsu.2020.12.043.
[47] El Shawi R., Kilanava K., and Sakr S., “An Interpretable Semi-Supervised Framework for Patch-Based Classification of Breast Cancer”, Research Square, 2022, doi:10.21203/rs.3.rs-1343955/v1.
[48] Teimouri-Yansari R., Mirzarezaee M., Sadeghi M., and Nadjar-Araabi B., “A New Survival Analysis Model in Adjuvant Tamoxifen-Treated Breast Cancer Patients Using Manifold-based Semi-Supervised Learning”, Journal of Computational Science, 61:101645, 2022, doi:10.1016/j.jocs.2022.101645.
[49] Lin R.-H., Lin C.-S., Chuang C.-L., Kujabi B. K., and Chen Y.-C., “Breast Cancer Survival Analysis Model”, Applied Sciences, 12(4): 1971, 2022, doi:10.3390/app12041971.
[50] Xiao J., Mo M., Wang Z., Zhou C., Shen J., Yuan J., He Y., and Zheng Y., “The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study”, JMIR Medical Informatics,   10(2) :e33440, 2022, doi: 10.2196/33440.