پیش‌بینی پیوند در شبکه‌های علمی با استفاده از یادگیری ماشین و گراف‌های وزن‌دار

نوع مقاله : مقاله پژوهشی

نویسندگان

دانشکده مهندسی برق و کامپیوتر، دانشگاه کاشان، کاشان، ایران.

چکیده

با سرعت گرفتن رشد علم و انتشار مقالات و افزایش زمینه‌های علمی، یافتن همکار پژوهشی مناسب، یافتن منابع تحقیق و زمینه تحقیق برای محققان و نهادهای مربوطه، روز به روز سخت‌تر می‌شود. با انتخاب درست این موارد، می‌توان بیشترین بازدهی را از هزینه و زمان صرف شده برای پژوهش کسب کرد. برای حل این مساله می‌توان با ایجاد شبکه‌ای شامل مقالات، دانشمندان و سایر موجودیت‌های علمی و ارتباطات بین آنها، یک شبکه‌ علمی ایجاد کرد و با استفاده از پیش‌بینی پیوند ارتباطاتی که در آینده شکل می‌گیرد را پیش‌بینی کرد. در این مقاله چارچوبی مبتنی بر یادگیری ماشین برای پیش‌بینی پیوند در شبکه‌های علمی ارائه شده است. در این چارچوب با وزن‌دهی شبکه بر اساس زمان و محتوا، محاسبه‌ ویژگی‌های ساختاری و متنی جاسازی شده و انتخاب و استخراج ویژگی انجام می‌شود. در نهایت نمونه‌گیری منفی با استفاده از خوشه‌بندی تولید می‌شود تا یک مدل یادگیری ماشین برای پیش‌بینی پیوند آموزش داده ‌شود. هر یک از مراحل این چارچوب به صورت جدا و همه با هم آزمایش شدند و نتایج نشان داد روش وزن‌دهی پیشنهاد شده برای شبکه ارجاعات و همکاری نویسندگان باعث افزایش دقت معیارهای شباهت وزن‌دار و در نتیجه افزایش دقت کل الگوریتم می‌شود. همچنین نمونه‌گیری منفی با استفاده از خوشه‌بندی باعث بهتر آموزش داده شدن الگوریتم یادگیری ماشین می‌شود. ویژگی‌های متنی داده‌های علمی مانند عنوان و چکیده مقالات نیز نقش موثری در پیش‌بینی پیوندهای آینده دارند.

کلیدواژه‌ها

موضوعات


عنوان مقاله [English]

Link prediction in scientific networks using machine learning and weighted graphs

نویسندگان [English]

  • Seyed Mehdi Vahidipour
  • Alireza Mohamadi
Electrical and Computer Engineering Faculty, University of Kashan, Kashan, Iran
چکیده [English]

With the acceleration of the development of science and the publication of articles and the increase of scientific fields, finding suitable research partners, finding research sources and research fields for researchers and relevant institutions is becoming more and more difficult. By choosing these things correctly, you can get the most efficiency from the cost and time spent on research. To solve this problem, a scientific network can be created by creating a network including articles, scientists, and other scientific entities and the connections between them, and predicting the connections that will be formed in the future using link prediction. In this paper, a framework based on machine learning is presented for link prediction in scientific networks. In this framework, by weighting the network based on time and content, calculating embedded structural and textual features, feature selection and extraction, and finally negative sampling using clustering, a machine learning model is trained for link prediction. Each of the steps of this framework was tested separately and all together, and the results showed that the proposed weighting method for the network of references and authors' collaboration increases the accuracy of the weighted similarity criteria and, as a result, increases the accuracy of the entire algorithm. Also, negative sampling using clustering makes the machine learning algorithm better trained. The textual features of scientific data such as the title and abstract of articles also play an effective role in predicting future links.

کلیدواژه‌ها [English]

  • Link Prediction
  • Citation Networks
  • Author Collaboration Networks
  • Machine Learning
  • Weighted Graph
[1] R. Taimourei-Yansary, M. Mirzarezaee, M. Sadeghi, and B.N. Araabi, “Predicting invasive disease-free survival time in breast cancer patients using semi-supervised graph-based machine learning techniques,” Soft Comput. J., vol. 10, no. 1, pp. 48-69, 2021, doi: 10.22052/scj.2022.243330.1039 [In Persian].
[2] E. Mahfooz and G. Fath-Tabar, “Sum of distance between vertices of graphs,” Soft Comput. J., vol. 5, no. 2, pp. 28-33, 2016, dor: 20.1001.1.23223707.1395.5.2.3.0 [In Persian]. 
[3] A. Keypour, “Link prediction in social networks through classifiers combination,” Soft Comput. J., vol. 4, no. 2, pp. 2-17, 2016, dor: 20.1001.1.23223707.1394.4.2.54.4 [In Persian].
[4] V. Martinez, F. Berzal, and J.-C. Cubero, “A survey of link prediction in complex networks,” ACM Comput. Surv., vol. 49, no. 4, pp. 1-33, 2016, doi: 10.1145/3012704.
[5] L. Lu and T. Zhou, “Link prediction in complex networks: A survey,” Phys. A: Stat. Mech. Appl., vol. 390, no. 6, pp. 1150-1170, 2011, doi: 10.1016/j.physa.2010.11.027.
[6] C.P. Muniz, R. Goldschmidt, and R. Choren, “Combining contextual, temporal and topological information for unsupervised link prediction in social networks,” Knowl.-Based Syst., vol. 156, pp. 129-137, 2018, doi: 10.1016/j.knosys.2018.05.027.
[7] M. Nikkar, R. Alijani, and K.M.H. Ghazizadeh, “Investigation of the presence of surgery researchers in research gate scientific network: An altmetrics study,” Iran. J. Surg., vol. 25, no. 2, pp. 76-82, 2017.
[8] H. Liu, H. Kou, C. Yan, and L. Qi, “Link prediction in paper citation network to construct paper correlation graph,” EURASIP J. Wireless Commun. Netw., vol. 2019, no. 1, pp. 1-12, 2019.
[9] E. Butun and M. Kaya, “Predicting citation count of scientists as a link prediction problem,” IEEE Trans. Cybern., vol. 50, no. 10, pp. 4518-4529, 2019, doi: 10.1109/TCYB.2019.2900495.
[10] N. Shibata, Y. Kajikawa, and I. Sakata, “Link prediction in citation networks,” J. Am. Soc. Inf. Sci. Technol., vol. 63, no. 1, pp. 78-85, 2012, doi: 10.1002/asi.21664.
[11] V. Latora, V. Nicosia, and G. Russo, Complex Networks: Principles, Methods and Applications. Cambridge, U.K.: Cambridge Univ. Press, 2017.
[12] P.M. Chuan, M. Ali, T.D. Khang, and N. Dey, “Link prediction in co-authorship networks based on hybrid content similarity metric,” Appl. Intell., vol. 48, no. 8, pp. 2470-2486, 2018.
[13] E. Butun, M. Kaya, and R. Alhajj, “Extension of neighbor-based link prediction methods for directed, weighted and temporal social networks,” Inf. Sci., vol. 463, pp. 152-165, 2018, doi: 10.1016/j.ins.2018.06.051.
[14] S. Behrouzi, Z. S. Sarmoor, K. Hajsadeghi, and K. Kavousi, “Predicting scientific research trends based on link prediction in keyword networks,” J. Informetrics, vol. 14, no. 4, Art. no. 101079, 2020, doi: 10.1016/j.joi.2020.101079.
[15] A. Daud et al., “Who will cite you back? Reciprocal link prediction in citation networks,” Lib. Hi Tech, vol. 35, no. 4, pp. 509-520, 2017, doi: 10.1108/LHT-02-2017-0044.
[16] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for social networks,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019-1031, 2007.
[17] S. Martincic-Ipsic, E. Mocibob, and M. Perc, “Link prediction on Twitter,” PLoS ONE, vol. 12, no. 7, p. e0181079, 2017, doi: 10.1371/journal.pone.0181079.
[18] L.A. Adamic and E. Adar, “Friends and neighbors on the web,” Soc. Netw., vol. 25, no. 3, pp. 211-230, 2003, doi: 10.1016/S0378-8733(03)00009-1.
[19] T. Zhou, L. Lu, and Y.-C. Zhang, “Predicting missing links via local information,” Eur. Phys. J. B, vol. 71, no. 4, pp. 623-630, 2009, doi: 10.1140/epjb/e2009-00335-8.
[20] M.E. Newman, “Clustering and preferential attachment in growing networks,” Phys. Rev. E, vol. 64, no. 2, p. 025102, 2001, doi: 10.1103/PhysRevE.64.025102.
[21] N. Benchettara, R. Kanawati, and C. Rouveirol, “A supervised machine learning link prediction approach for academic collaboration recommendation,” in Proc. 4th ACM Conf. Recommender Syst., 2010, pp. 253-256, doi: 10.1145/1864708.1864760.
[22] F. Almeida and G. Xexeo, “Word embeddings: A survey,” arXiv preprint arXiv:1901.09069, 2019.
[23] J. Pennington, R. Socher, and C.D. Manning, “GloVe: Global vectors for word representation,” in Proc. Conf. Empirical Methods Nat. Lang. Process. (EMNLP), 2014, pp. 1532-1543.
[24] J. Zhou, L. Liu, W. Wei, and J. Fan, “Network representation learning: From preprocessing, feature extraction to node embedding,” ACM Comput. Surv., vol. 55, no. 2, pp. 1-35, 2022.
[25] M. Grohe, “word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data,” in Proc. 39th ACM SIGMOD-SIGACT-SIGAI Symp. Princ. Database Syst., 2020, pp. 1-16.
[26] A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2016, pp. 855-864, doi: 10.1145/2939672.2939754.
[27] D. Lande, M. Fu, W. Guo, I. Balagura, I. Gorbov, and H. Yang, “Link prediction of scientific collaboration networks based on information retrieval,” World Wide Web, vol. 23, pp. 1-19, 2020, doi: 10.1007/s11280-019-00768-9.
[28] B. Liu, S. Xu, T. Li, J. Xiao, and X.-K. Xu, “Quantifying the effects of topology and weight for link prediction in weighted complex networks,” Entropy, vol. 20, no. 5, p. 363, 2018, doi: 10.3390/e20050363.
[29] A. Hagberg, P.J. Swart, and D.A. Schult, “Exploring network structure, dynamics, and function using NetworkX,” Los Alamos National Lab (LANL), Los Alamos, NM, USA, Rep. LA-UR-08-05495, 2008.