Link prediction in scientific networks using machine learning and weighted graphs

Document Type : Original Article

Authors

Electrical and Computer Engineering Faculty, University of Kashan, Kashan, Iran

Abstract

With the acceleration of the science development, article publication, and the increase of scientific fields, finding suitable research partners, sources and fields for researchers and relevant institutions is becoming more and more difficult. By choosing these cases correctly, you can get the most efficiency from the cost and time spent on research. To solve this problem, a scientific network can be created by forming a network including articles, scientists, and other scientific entities and the connections between them, and predicting the connections that will be formed in the future using link prediction. In this paper, a framework based on machine learning is presented for link prediction in scientific networks. In this framework, by weighting the network based on time and content, calculating embedded structural and textual features, performing feature selection and extraction, and finally negative sampling using clustering, a machine learning model is trained for link prediction. Each of the steps of this framework was tested separately and all together, and the results showed that the proposed weighting method for the network of references and authors' collaboration increases the accuracy of the weighted similarity criteria and, as a result, increases the accuracy of the entire algorithm. Also, negative sampling using clustering makes the machine learning algorithm better trained. The textual features of scientific data such as the title and abstract of articles also play an effective role in predicting future links.

Keywords

Main Subjects


[1] R. Taimourei-Yansary, M. Mirzarezaee, M. Sadeghi, and B.N. Araabi, “Predicting invasive disease-free survival time in breast cancer patients using semi-supervised graph-based machine learning techniques,” Soft Comput. J., vol. 10, no. 1, pp. 48-69, 2021, doi: 10.22052/scj.2022.243330.1039 [In Persian].
[2] E. Mahfooz and G. Fath-Tabar, “Sum of distance between vertices of graphs,” Soft Comput. J., vol. 5, no. 2, pp. 28-33, 2016, dor: 20.1001.1.23223707.1395.5.2.3.0 [In Persian]. 
[3] A. Keypour, “Link prediction in social networks through classifiers combination,” Soft Comput. J., vol. 4, no. 2, pp. 2-17, 2016, dor: 20.1001.1.23223707.1394.4.2.54.4 [In Persian].
[4] V. Martinez, F. Berzal, and J.-C. Cubero, “A survey of link prediction in complex networks,” ACM Comput. Surv., vol. 49, no. 4, pp. 1-33, 2016, doi: 10.1145/3012704.
[5] L. Lu and T. Zhou, “Link prediction in complex networks: A survey,” Phys. A: Stat. Mech. Appl., vol. 390, no. 6, pp. 1150-1170, 2011, doi: 10.1016/j.physa.2010.11.027.
[6] C.P. Muniz, R. Goldschmidt, and R. Choren, “Combining contextual, temporal and topological information for unsupervised link prediction in social networks,” Knowl.-Based Syst., vol. 156, pp. 129-137, 2018, doi: 10.1016/j.knosys.2018.05.027.
[7] M. Nikkar, R. Alijani, and K.M.H. Ghazizadeh, “Investigation of the presence of surgery researchers in research gate scientific network: An altmetrics study,” Iran. J. Surg., vol. 25, no. 2, pp. 76-82, 2017.
[8] H. Liu, H. Kou, C. Yan, and L. Qi, “Link prediction in paper citation network to construct paper correlation graph,” EURASIP J. Wireless Commun. Netw., vol. 2019, no. 1, pp. 1-12, 2019.
[9] E. Butun and M. Kaya, “Predicting citation count of scientists as a link prediction problem,” IEEE Trans. Cybern., vol. 50, no. 10, pp. 4518-4529, 2019, doi: 10.1109/TCYB.2019.2900495.
[10] N. Shibata, Y. Kajikawa, and I. Sakata, “Link prediction in citation networks,” J. Am. Soc. Inf. Sci. Technol., vol. 63, no. 1, pp. 78-85, 2012, doi: 10.1002/asi.21664.
[11] V. Latora, V. Nicosia, and G. Russo, Complex Networks: Principles, Methods and Applications. Cambridge, U.K.: Cambridge Univ. Press, 2017.
[12] P.M. Chuan, M. Ali, T.D. Khang, and N. Dey, “Link prediction in co-authorship networks based on hybrid content similarity metric,” Appl. Intell., vol. 48, no. 8, pp. 2470-2486, 2018.
[13] E. Butun, M. Kaya, and R. Alhajj, “Extension of neighbor-based link prediction methods for directed, weighted and temporal social networks,” Inf. Sci., vol. 463, pp. 152-165, 2018, doi: 10.1016/j.ins.2018.06.051.
[14] S. Behrouzi, Z. S. Sarmoor, K. Hajsadeghi, and K. Kavousi, “Predicting scientific research trends based on link prediction in keyword networks,” J. Informetrics, vol. 14, no. 4, Art. no. 101079, 2020, doi: 10.1016/j.joi.2020.101079.
[15] A. Daud et al., “Who will cite you back? Reciprocal link prediction in citation networks,” Lib. Hi Tech, vol. 35, no. 4, pp. 509-520, 2017, doi: 10.1108/LHT-02-2017-0044.
[16] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019-1031, 2007.
[17] S. Martincic-Ipsic, E. Mocibob, and M. Perc, “Link prediction on Twitter,” PLoS ONE, vol. 12, no. 7, p. e0181079, 2017, doi: 10.1371/journal.pone.0181079.
[18] L.A. Adamic and E. Adar, “Friends and neighbors on the web,” Soc. Netw., vol. 25, no. 3, pp. 211-230, 2003, doi: 10.1016/S0378-8733(03)00009-1.
[19] T. Zhou, L. Lu, and Y.-C. Zhang, “Predicting missing links via local information,” Eur. Phys. J. B, vol. 71, no. 4, pp. 623-630, 2009, doi: 10.1140/epjb/e2009-00335-8.
[20] M.E. Newman, “Clustering and preferential attachment in growing networks,” Phys. Rev. E, vol. 64, no. 2, p. 025102, 2001, doi: 10.1103/PhysRevE.64.025102.
[21] N. Benchettara, R. Kanawati, and C. Rouveirol, “A supervised machine learning link prediction approach for academic collaboration recommendation,” in Proc. 4th ACM Conf. Recommender Syst., 2010, pp. 253-256, doi: 10.1145/1864708.1864760.
[22] F. Almeida and G. Xexeo, “Word embeddings: A survey,” arXiv preprint arXiv:1901.09069, 2019.
[23] J. Pennington, R. Socher, and C.D. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP), 2014, pp. 1532-1543.
[24] J. Zhou, L. Liu, W. Wei, and J. Fan, “Network representation learning: From preprocessing, feature extraction to node embedding,” ACM Comput. Surv., vol. 55, no. 2, pp. 1-35, 2022.
[25] M. Grohe, “word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data,” in Proc. 39th ACM SIGMOD-SIGACT-SIGAI Symp. Princ. Database Syst., 2020, pp. 1-16.
[26] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 855-864, doi: 10.1145/2939672.2939754.
[27] D. Lande, M. Fu, W. Guo, I. Balagura, I. Gorbov, and H. Yang, “Link prediction of scientific collaboration networks based on information retrieval,” World Wide Web, vol. 23, pp. 1-19, 2020, doi: 10.1007/s11280-019-00768-9.
[28] B. Liu, S. Xu, T. Li, J. Xiao, and X.-K. Xu, “Quantifying the effects of topology and weight for link prediction in weighted complex networks,” Entropy, vol. 20, no. 5, p. 363, 2018, doi: 10.3390/e20050363.
[29] A. Hagberg, P.J. Swart, and D.A. Schult, “Exploring network structure, dynamics, and function using NetworkX,” Los Alamos National Lab (LANL), Los Alamos, NM, USA, Rep. LA-UR-08-05495, 2008.