Predicting invasive disease-free survival time in breast cancer patients using semi-supervised graph-based machine learning techniques

Document Type : Original Article

Authors

1 Department of computer, science and research branch, Islamic Azad University, Tehran, Iran

2 National Institute of Genetic Engineering and Biotechnology (NIGEB), Tehran, Iran

3 School of Electrical and Computer Engineering, University College of Engineering, University of Tehran, Tehran, Iran

Abstract

Breast cancer is currently the most commonly diagnosed cancer and leading cause of cancer-related deaths among women worldwide. Analyzing the survival time of breast cancer patients has become an important area of research in recent years. The primary challenge in analyzing patient survival time is selecting an appropriate model. This study proposes a model for analyzing breast cancer patient survival using semi-supervised graph-based machine learning methods. The model utilizes clinical and pharmacogenomics data, as well as results from Tamoxifen use during invasive cancer treatment, for 3833 patients followed up for five years. Additionally, the performance of the proposed model in estimating disease-free survival time was evaluated using MATLAB software simulations and compared to common survival analysis models. The results demonstrate that by applying the proposed model for predicting invasive disease-free survival time and using a combination of clinical and pharmacogenomic features, the estimation accuracy was 14% higher than when only clinical features were used. Moreover, the estimated accuracy was 15% higher than when only pharmacogenomic features were used. Furthermore, the proposed model showed higher accuracy for identifying survival risks and predicting patients' survival time compared to commonly used survival analysis models.

Keywords


[1] Howell A., Sims A. H., Ong K. R., Harvie M. N., Evans D. G. R., and Clarke R. B., “Mechanisms of Disease: prediction and prevention of breast cancer--cellular and molecular interactions”, Nat Clin Pract Oncol, 2(12):635-646, 2005, doi:10.1038/ncponc0361.
[2] Mego M., Mani S.A., and Cristofanilli M., “Molecular mechanisms of metastasis in breast cancer—clinical applications”. Nature reviews Clinical oncology, 7(12): 693-701, 2010, doi:10.1038/nrclinonc.2010.171.
[3] Sharma G.N., Dave R., Sanadya J., Sharma P., and Sharma K. K., “Various types and management of breast cancer: an overview”, Journal of advanced pharmaceutical technology and research, 1(2): 109-126, 2010.
[4] Dean L., Pratt V. M., Scott S. A., Pirmohamed M., Esquivel B., Kane M. S., Kattman B. L., and Malheiro A. J., “Tamoxifen Therapy and CYP2D6 Genotype”, in National Center for Biotechnology Information (US), 2014. 
[5] Schultink A. H. M. V., Zwart W., Linn S. C., Beijnen J. H., and Huitema A. D. R., “Effects of Pharmacogenetics on the Pharmacokinetics and Pharmacodynamics of Tamoxifen”, Clinical Pharmacokinetics, 54(8): 797-810, 2015, doi:10.1007/s40262-015-0273-3.
[6] George B., Seals S., and Aban I., “Survival analysis and regression models”, J. Nucl. Cardiol, 21(4): 686-694, 2014, doi:10.1007/s12350-014-9908-2.
[7] Austin P. C., Lee D. S., and Fine J. P., “Introduction to the Analysis of Survival Data in the Presence of Competing Risks”, Circulation, 133(6): 601-609, 2016, doi:10.1161/CIRCULATIONAHA.115.017719.
[8] Prinja S., Gupta N., and Verma R., “Censoring in clinical trials: review of survival analysis techniques”, Indian journal of community medicine: official publication of Indian Association of Preventive and Social Medicine, 35(2): 217–221, 2010, doi:10.4103/0970-0218.66859.
[9] Stel V. S., Dekker F. W., Tripepi G., Zoccali C., and Jager K. J., “Survival Analysis I: The Kaplan-Meier Method”, Nephron Clinical Practice, 119(1): c83-c88, 2011, doi:10.1159/000324758.
[10] Kleinbaum D.G. and Klein M., “Survival analysis”, Vol. 3, Springer, 2010.
[11] Zare, A., Hosseini M., Mahmoodi M., Mohammad K., Zeraati H., Holakouie-Naieni K., “A Comparison between Accelerated Failure-time and Cox Proportional Hazard Models in Analyzing the Survival of Gastric Cancer Patients”, Iranian Journal of Public Health, 44(8): 1095-1102, 2015, doi:10.7314/APJCP.2015.16.18.8567.
[12] Zhu X., “Semi-supervised learning literature survey”, 2005, http://digital.library.wisc.edu/1793/60444.
[13] صادق زاده ن.، شمسی م.، رسولی کناری ع.، «حاشیه‌نویسی تصویر با استفاده از الگوریتم خوشه‌بندی نیمه‌نظارتی طیفی»، مجله محاسبات نرم، جلد 3، شماره 1، ص 35-20، 1393.
[14] Zhu X. and Goldberg A. B., “Introduction to Semi-Supervised Learning”, Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1): 1-130, 2009, doi:10.2200/S00196ED1V01Y200906AIM006.
[15] Subramanya A. and Talukdar P. P., “Graph-based semi-supervised learning”, Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(4): 1-125, 2014.
[16] Zhou D., Bousquet O., Lal T. N., Weston J., and Scholkopf B., “Learning with local and global consistency”, MIT Press, pp. 321-328, 2003.
[17] Jiang X., Xue D., Brufsky A., Khan S., and Neapolitan R., “A New Method for Predicting Patient Survivorship Using Efficient Bayesian Network Learning”, Cancer Informatics,13: CIN.S13053, 2014, doi:10.4137/CIN.S13053.
[18] Bashiri A., Ghazisaeedi M., Safdari R., Shahmoradi L., and Ehtesham H., “Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review”, Iranian Journal of Public Health, 46(2): 165-172, 2017.
[19] Endo A., Shibata T., and Tanaka H., “Comparison of seven algorithms to predict breast Cancer survival”, International Journal of Biomedical Soft Computing and Human Sciences: the official journal of the Biomedical Fuzzy Systems Association, 13(2):11-16, 2008, doi:10.24466/ijbschs.13.2_11.
[20] Bagherian H., Javanmard S. H., Sharifi M., and Sattari M., “Using data mining techniques for predicting the survival rateof breast cancer patients: a review article”, Tehran University Medical Journal, 79(3): 176-186, 2021.
[21] Ghasemi F., Rasekhi A., and Haghighat S., “Analyzing the Survival of Breast Cancer Patients Using Weibull and Poisson Beta-Weibull Non-Mixture Cure Models”, Pejouhesh dar Pezeshki (Research in Medicine), 42(4): 236-242, 2018.
[22] Sarkar K., Chowdhury R., and Dasgupta A., “Analysis of Survival Data: Challenges and Algorithm-Based Model Selection”, Journal of Clinical and Diagnostic Research: JCDR, 11(6): LC14-LC20, 2017.
[23] Kiani B. and Atashi A., “A Prognostic Model Based on Data Mining Techniques to Predict Breast Cancer Recurrence”, Journal of Health and Biomedical Informatics, 1(1): 26-31, 2014.
[24] Sadeghi S. and Golabpour A., “An Algorithm for Predicting Recurrence of Breast Cancer Using Genetic Algorithm and Nearest Neighbor Algorithm”, Journal of Health and Biomedical Informatics, 6(4):  309-319, 2020.
[25] Moller P., Evans D. G., Reis M. M., Gregory H., Anderson E., Maehle L., Lalloo F., Howell A., Apold J., Clark N., Lucassen A., and Steel C. M., “Surveillance for familial breast cancer: Differences in outcome according to BRCA mutation status”, Int. J. Cancer, 121(5): 1017-20, 2007, doi:10.1002/ijc.22789.
[26] Liang Y., Chai H., Liu X.-Y., Xu Z.-B., Zhang H., and Leung K.-S., “Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L(1/2) regularization”, BMC Medical Genomics, 9(1):1-11, 2016, doi:10.1186/s12920-016-0169-6.
[27] Chai H., Li Z.-N., Meng D.-Y., Xia L.-Y., and Liang Y., “A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis”, Sci. Rep., 7(1): 13053-13065, 2017, doi:10.1038/s41598-017-13133-5.
[28] کی‌پور ا.، براری م.، شیرازی ح.، «پیشگویی پیوند در شبکه‌های اجتماعی با استفاده از ترکیب دسته‌بندی‌کننده‌ها»، مجله محاسبات نرم، جلد 4، شماره 2، ص 17-2، 1394.
[29] Tharwat A., “Classification assessment methods”, Applied Computing and Informatics, 17(1): 168-192,  2020, doi:10.1016/j.aci.2018.08.003.
[30] ویسی ه.، قایدشرف ح.ر.، ابراهیمی م.، «بهبود کارایی الگوریتم‌های یادگیری ماشین در تشخیص بیماری‌های قلبی با بهینه‌سازی داده‌ها و ویژگی‌ها»، مجله محاسبات نرم، جلد 8، شماره 1، ص 85-70، 1398.
[31] Delen D., Walker G., and Kadam A., “Predicting breast cancer survivability: a comparison of three data mining methods”, Artificial Intelligence in Medicine, 34(2): 113-127, 2005, doi:10.1016/j.artmed.2004.07.002.
[32] Kiyotani K., Mushiroda T., Sasa M., Bando Y., Sumitomo I., Hosono N., Kubo M., Nakamura Y., and Zembutsu  H., “Impact of CYP2D6*10 on recurrence-free survival in breast cancer patients receiving adjuvant tamoxifen therapy”, Cancer Science, 99(5): 995-999, 2008, doi:10.1111/j.1349-7006.2008.00780.x.
[33] Brauch H. and Schwab M., “Prediction of tamoxifen outcome by genetic variation of CYP2D6 in post-menopausal women with early breast cancer”, British journal of clinical pharmacology, 77(4): 695-703, 2014, doi:10.1111/bcp.12229.
[34] Province M.A., et al., “CYP2D6 Genotype and Adjuvant Tamoxifen: Meta-Analysis of Heterogeneous Study Populations”, Clinical Pharmacology and Therapeutics, 95(2): 216-227, 2014, doi:10.1038/clpt.2013.186.
[35] Zembutsu H., “Pharmacogenomics toward personalized tamoxifen therapy for breast cancer”, Pharmacogenomics, 16(3): 287-296, 2015, doi:10.2217/pgs.14.171.
[36] Afshar H. L., Ahmadi M., Roudbari M., and Sadoughi F., “Prediction of breast cancer survival through knowledge discovery in databases”, Global journal of health science, 7(4): 392-398, 2015, doi: 10.5539/gjhs.v7n4p392.
[37] Saadatmand S. H. A., Sadeghi A., and Mohaghghegh F., “Study on association of Single Nucleotide Polymorphism in ESRa Gene rs2234693 With Breast Cancer in Markazi Province”, Arak Medical University Journal (AMUJ), 17(12) 32-38, 2015.
[38] Lei L., Wang X., Wu X.-D., Wang Z., Chen Z.-H., Zheng Y.-B., and Wang X.-J., “Association of CYP2D6*10 (c.100C>T) polymorphisms with clinical outcome of breast cancer after tamoxifen adjuvant endocrine therapy in Chinese population”, American Journal of Translational Research, 8(8): 3585-3592, 2016.
[39] Charoenchokthavee W., Panomvana D., Sriuranpong V., and Areepium N., “Prevalence of CYP2D6*2, CYP2D6*4, CYP2D6*10, and CYP3A5*3 in Thai breast cancer patients undergoing tamoxifen treatment”, Breast Cancer: Targets and Therapy, 8: 149-155, 2016, doi:10.2147/BCTT.S105563.
[40] Khalkhali H. R., Afshar H. L., Esnaashari O., and Jabbari N., “Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study”, Journal of research in health sciences, 16(1): 31-35, 2016.
[41] Damkier P., Kjaersgaard A., Barker K. A., Cronin-Fenton D., Crawford A., Hellberg Y., Janssen E. A. M., Langefeld C., Ahern T. P., and Lash T. L., “CYP2C19*2 and CYP2C19*17 variants and effect of tamoxifen on breast cancer recurrence: Analysis of the International Tamoxifen Pharmacogenomics Consortium dataset”, Scientific Reports, 7(1): 1-8, 2017, doi:10.1038/s41598-017-08091-x.
[42] Wang L., Qian Q., Zhang Q., Wang J., Cheng W., and Yan W., “Classification Model on Big Data in Medical Diagnosis Based on Semi-Supervised Learning”, The Computer Journal, 65(2): 177-191, 2020, doi: 10.1093/comjnl/bxaa006.
[43] Afshar H. L., Jabbari N., Khalkhali H. R., and Esnaashari O., “Prediction of breast cancer survival by machine learning methods: An application of multiple imputation”, Iranian Journal of Public Health, 50(3): 598-605, 2021, doi:10.18502/ijph.v50i3.5606.
[44] Sanchez-Spitman A. B., Swen J. J., Dezentjé V. O., Moes D. J. A. R., Gelderblom H., and Guchelaar H. J., “Effect of CYP2C19 genotypes on tamoxifen metabolism and early-breast cancer relapse”, Scientific Reports, 11(1): 415, 2021, doi:10.1038/s41598-020-79972-x.
[45] Mulder T. A. M., de With M., Del Re M., Danesi R., Mathijssen R. H. J., and van Schaik R. H. N., “Clinical CYP2D6 Genotyping to Personalize Adjuvant Tamoxifen Treatment in ER-Positive Breast Cancer Patients: Current Status of a Controversy”, Cancers, 13(4): 771, 2021, doi:10.3390/cancers13040771.
[46] Al-Azzam N. and Shatnawi I., “Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer”, Annals of Medicine and Surgery, 62: 53-64, 2021, doi:10.1016/j.amsu.2020.12.043.
[47] El Shawi R., Kilanava K., and Sakr S., “An Interpretable Semi-Supervised Framework for Patch-Based Classification of Breast Cancer”, Research Square, 2022, doi:10.21203/rs.3.rs-1343955/v1.
[48] Teimouri-Yansari R., Mirzarezaee M., Sadeghi M., and Nadjar-Araabi B., “A New Survival Analysis Model in Adjuvant Tamoxifen-Treated Breast Cancer Patients Using Manifold-based Semi-Supervised Learning”, Journal of Computational Science, 61:101645, 2022, doi:10.1016/j.jocs.2022.101645.
[49] Lin R.-H., Lin C.-S., Chuang C.-L., Kujabi B. K., and Chen Y.-C., “Breast Cancer Survival Analysis Model”, Applied Sciences, 12(4): 1971, 2022, doi:10.3390/app12041971.
[50] Xiao J., Mo M., Wang Z., Zhou C., Shen J., Yuan J., He Y., and Zheng Y., “The Application and Comparison of Machine Learning Models for the Prediction of Breast Cancer Prognosis: Retrospective Cohort Study”, JMIR Medical Informatics,   10(2) :e33440, 2022, doi: 10.2196/33440.