Pemanfaatan Machine Learning dalam Prediksi Rating: Studi Kasus pada Data Abstrak Publikasi Ilmiah

Authors

  • Rizky Zaqi Megantara Universitas Kristen Maranatha
  • Pace Iryanto Faot Universitas Kristen Maranatha
  • Ridolof Haba Ito Universitas Kristen Maranatha
  • Kathleen Felicia Annabel Universitas Kristen Maranatha
  • Oscar Karnalim Universitas Kristen Maranatha

DOI:

https://doi.org/10.37802/joti.v7i1.999

Keywords:

K-Nearest Neighbors, Machine Learning, Random Forest, Rating Prediction, Support Vector Regressor, XGBoost

Abstract

As the volume of scientific publications increases, the need for automated approaches to evaluate and analyze abstracts becomes increasingly important. This research not only aims to predict the abstract rating of scientific publications using machine learning algorithms, but also offers a unique approach by integrating regression and classification analysis to evaluate the relevance of abstracts more comprehensively. Four main models, namely XGBoost Regressor, Random Forest Regressor, Support Vector Regressor (SVR), and K-Nearest Neighbors Regressor (KNN), are evaluated for this task. The dataset is processed through preprocessing stages which include removing duplications, text representation using TF-IDF, handling data imbalances with Synthetic Minority Oversampling Technique (SMOTE), and dimension reduction using Truncated Singular Value Decomposition (SVD). The research results show that SVR provides performance the best with the lowest Mean Absolute Error (MAE) value of 0.4980, Mean Squared Error (MSE) of 0.5237, and the highest R² of 0.7321. XGBoost and Random Forest show competitive performance with advantages in computational efficiency and prediction stability respectively, while KNN provides varying results depending on the data distribution. Dimensionality reduction using Truncated SVD successfully preserves more than 70% of the initial variance, enabling higher computational efficiency without losing important information. This research makes a significant contribution in supporting machine learning-based decision making, especially in the analysis of abstracts of scientific publications. This approach can be further developed through exploration of ensemble or hybrid models, as well as testing on larger datasets to improve generalization and accuracy.

Downloads

Download data is not yet available.

References

S. H. S. University,"Google Scholar Tutorial," [Online]. Available: https://library.shsu.edu/research/guides/tutorials/googlescholar/index.html. [Accessed 19 December 2024].

R. Vine, "Google Scholar," J Med Libr Assoc, vol. 94, no. 1, pp. 97-99, 2006.

M. S. Tullu, "Writing the title and abstract for a research paper Being concise, precise, and meticulous is the key," Saudi Journal of Anaesthesia, vol. 13, no. Suppl 1, pp. S12 - S17, 2019.

B. Butzer, "Bias in the evaluation of psychology studies: A comparison of parapsychology versus neuroscience," EXPLORE, vol. 16, no. 6, pp. 382-391, 2020.

P. Yang, X. Sun, W. Li and S. Ma, "Automatic academic paper rating based on modularized hierarchical convolutional neural network," arXiv preprint arXiv:1805.03977, 2018.

L. T. K. Nguyen, H. H. Chung, K. V. Tuliao and T. M. Lin, "Using XGBoost and skip-gram model to predict online review popularity.," SAGE Open, vol. 10, no. 4, 2020.

N. Ashgar, "Yelp Dataset Challenge: Review Rating Prediction," arXiv preprint arXiv, 2016.

V. S. Anoop, T. K. A. Krishnan, A. Daud, A. Banjar and A. Bukhari, "Climate Change Sentiment Analysis Using Domain Specific Bidirectional Encoder Representations From Transformers," IEEE Access, vol. 12, pp. 114912-114922, 2024.

S. EYVAZI-ABDOLJABBAR, S. KIM, M.-R. FEIZI-DERAKHSHI, Z. FARHADI and D. A. MOHAMMED, "An Ensemble-Based Model for Sentiment Analysis of Persian Comments on Instagram," IEEE Access 12, pp. 151223-151235, 2024.

P. Ghosh, O. Samanta, T. Goto and S. Sen, "Sales Forecasting of Overrated Products: Fine Tuning of Customer’s Rating by Integrating Sentiment Analysis," IEEE Access, vol. 12, pp. 69578-69592, 2024.

D. Amangeldi, A. Usmanova and P. Shamoi, "Understanding Environmental Posts: Sentiment and Emotion Analysis of Social Media Data," IEEE Access, vol. 99, pp. 1-1, 2024.

N. R. Ram, S. Gautum, A. Jadeja and H. Joisar, "Social Media Sentiment Analysis Using Twitter Dataset," in 2024 1st International Conference on Cognitive, Green and Ubiquitous Computing (IC-CGU), 2024.

S. Raheja and A. Asthana, "Sentiment Analysis of Tweets During the COVID-19 Pandemic Using Multinomial Logistic Regression," International Journal of Software Innovation, vol. 11, no. 1, 2023.

Y. Xu and N. F. Ibrahim, "Cross-Domain Aspect-Based Sentiment Analysis for Enhancing Customer Experience in Electronic Commerce," Adv. Artif. Intell. Mach. Learn., vol. 4, no. 3, pp. 2593-2613, 2024.

R. Herrero-Álvarez, E. Callejas-Castro, G. Miranda and C. León, "Analysis of Sentiment Toward Computer Science in Pre-University Education," IEEE Access, vol. 12, pp. 71205-71218, 2024.

W.-K. Chen, C. Lin and Y.-S. Tai, "Text-Based Rating Predictions on Amazon Health & Personal Care Product Review," in 28th Modern Artificial Intelligence and Cognitive Science Conference (MAICS 2017), 2015.

H. Isah, P. Trundle and D. Neagu, "Social media analysis for product safety using text mining and sentiment analysis," 2014 14th UK Workshop on Computational Intelligence (UKCI), pp. 1-7, 2014.

S. Mukherjee and P. Bhattacharyya, "Feature Specific Sentiment Analysis for Product Reviews," in Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing, 2012.

A. Bhatt, A. Patel, H. Chheda and K. Gawande, "Amazon Review Classification and Sentiment," (IJCSIT) International Journal of Computer Science and Information Technologies,, vol. 6, no. 6, pp. 5107-5110, 2015.

M. AL-Smadi, O. Qwasmeh, B. Talafha, M. Al-Ayyoub, Y. Jararweh and E. Benkhelifa, "An enhanced framework for aspect-based sentiment analysis of Hotels' reviews: Arabic reviews case study," in 2016 11th International Conference for Internet Technology and Secured Transactions (ICITST), Barcelona, 2016.

M. M. Nasr and E. Shaaban, "Building Sentiment analysis Model using Graphlab," International Journal of Scientific and Engineering Research , vol. 8, no. 6, pp. 1155-1160, 2017.

J. Jin and P. Ji, "Mining online product reviews to identify consumers' fine-grained concerns," in 12th International Symposium on Operations Research and its Applications in Engineering, Technology and Management (ISORA 2015), Luoyang, 2015.

X. Fang and J. Zhan, "Sentiment analysis using product review data," Journal of Big Data, vol. 2, no. 5, 2015.

T. U. Haque, N. N. Saber and F. M. Shah, "Sentiment analysis on large scale Amazon product reviews," in 2018 IEEE International Conference on Innovative Research and Development (ICIRD), Bangkok, 2018.

X. Lei, X. Qian and G. Zhao, "Rating Prediction Based on Social Sentiment From Textual Reviews," IEEE Transactions on Multimedia, vol. 18, no. 9, pp. 1910-1921, 2016.

G. Grefenstette, "Tokenization," in van Halteren, H. (eds) Syntactic Wordclass Tagging. Text, Speech and Language Technology, Springer, 1999.

J. Plisson, N. Lavrac and D. Mladenic, "A rule based approach to word lemmatization," in Proceedings of IS, 2004.

R. Nisbet, G. Miner and K. Yale, "Chapter 5 - Feature Selection,," in Handbook of Statistical Analysis and Data Mining Applications (Second Edition), Academic Press, 2018, pp. 89-97.

T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system.," in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016.

Z. E. Mrabet, N. Sugunaraj, P. Ranganathan and S. Abhyankar, "Random Forest Regressor-Based Approach for Detecting Fault Location and Duration in Power Systems," Sensors, vol. 22, no. 2, p. 458, 2022.

A. S. Rajawat, O. Mohammed, R. N. Shaw and A. Ghosh, "Chapter six - Renewable energy system for industrial internet of things model using fusion-AI," Applications of AI and IOT in Renewable Energy, pp. 107-128, 2022.

Z. Zhao, L. Alzubaidi, J. Zhang, Y. Duan and Y. Gu, "A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations," Expert Systems with Applications, vol. 242, no. 122807, 2024.

K. Tyagi, C. Rane, Harshvardhan and M. Manry, "Chapter 4 - Regression analysis," in Artificial Intelligence and Machine Learning for EDGE Computing, Academic Press, 2022, pp. 53-63.

D. Chicco, M. J. Warrens and G. Jurman, "The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation," PeerJ Comput Sci, 2021.

C. B. Pavithra, and J. Savitha. Topic Modeling for Evolving Textual Data Using LDA HDP NMF BERTOPIC and DTM With a Focus on Research Papers. Journal of Technology and Informatics (JoTI), vol. 5 no. 2, pp. 53-63. 2024.

A. Salsabiela, A. P. Kuncoro, P. Subarkah, and P. Arsi. Rekomendasi Restock Barang di Toko Pojok UMKM Menggunakan Algoritma K-Means Clustering. Journal of Technology and Informatics (JoTI), vol. 5 no. 2, pp. 87-92. 2024.

Downloads