Full Paper View


A Framework for Implementing Prediction Algorithm over Cloud Data as a Procedure for Cloud Data Mining

Safwan A. S. Al- Shaibani, Parag Bhalchandra
Research Paper | Journal Paper
Volume 2 , Issue 2: Special Issue on Artificial Intelligence (ICAI-2021) , PP 1-8
DOI: https://doi.org/10.54060/JIEEE/002.02.021


The cloud has become an important phrase in data storage for many reasons. Cloud services and applications are widespread in many industries including healthcare due to easy access. The limitless quantity of data available on the clouds has triggered the interest of many researchers in the recent past. It has forced us to deploy machine learning for analyzing the data to get insights as well as model building. In this paper, we have built a service on Heroku Cloud which is a cloud platform as a service (PaaS) and has 15 thousand records with 25 features. The data belongs to healthcare and is related to post-surgery complications. The boost prediction algorithm was applied for analysis and implementation was done in python. The results helped us to determine and tune some of the hyperparameters which have correlations with complications and the reported accuracy of training and testing was found to be 91% and 88% respectively.

Key-Words / Index Term

Heroku cloud, CatBoost algorithm, prediction model, binary Classification


  1. Sokol, D. K., & Wilson, J. (2008). What is a surgical complication?. World journal of sur­gery32(6), 942-944.‏
  2. Bhardwaj, R., Nambiar, A. R., & Dutta, D. (2017, July). A study of machine learning in healthcare. In 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC) (Vol. 2, pp. 236-241). IEEE.‏
  3. Ramaswami, M., & Bhaskaran, R. (2010). A CHAID based performance prediction model in edu¬cational data mining. arXiv preprint arXiv:1002.1144.‏
  4. Zheng, A., & Casari, A. (2018). Feature engineering for machine learning: principles and tech-niques for data scientists. " O'Reilly Media, Inc.".‏
  5. Abdulhammed, R., Faezipour, M., Abuzneid, A., & Alessa, A. (2018, June). Effective features selection and machine learning classifiers for improved wireless intrusion detection. In 2018 Interna¬tional Symposium on Networks, Computers and Communications (ISNCC) (pp. 1-6). IEEE.‏
  6. Cao, X. H., Stojkovic, I., & Obradovic, Z. (2016). A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC bioinformatics, 17(1), 359.
  7. Xu, J., Zhang, Y., & Miao, D. (2020). Three-way confusion matrix for classification: a measure driven view. Information Sciences, 507, 772-794.‏‏
  8. Sachs, M. C. (2017). plotROC: A tool for plotting roc curves. Journal of statistical software.‏
  9. Huang, G., Wu, L., Ma, X., Zhang, W., Fan, J., Yu, X., ... & Zhou, H. (2019). Evaluation of Cat¬Boost method for prediction of reference evapotranspiration in humid regions. Journal of Hydrol¬ogy, 574, 1029-1041.
  10. Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. In Advances in neural information processing sys-tems (pp. 6638-6648).‏‏
  11. Li, H., Huang, H., & Zheng, Z. (2019). Research on Credit Risk of P2P Lending Based on Cat-Boost Algorithm.‏
  12. Craig, J. P., Nichols, K. K., Akpek, E. K., Caffery, B., Dua, H. S., Joo, C. K., ... & Stapleton, F. (2017). TFOS DEWS II definition and classification report. The ocular surface, 15(3), 276-283.‏
  13. Das, P. K., Sinha, N., & Annappa, B. (2020). Data privacy preservation using aes-gcm encryp-tion in Heroku cloud (No. 2615). EasyChair.‏
  14. Lee, B. H., Dewi, E. K., & Wajdi, M. F. (2018, April). Data security in cloud computing using AES under HEROKU cloud. In 2018 27th Wireless and Optical Communication Conference (WOCC) (pp. 1-5). IEEE.‏
  15. Breiman, L. (1997). Arcing the edge. Technical Report 486, Statistics Department, University of California at Berkeley.‏
  16. Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.‏
  17. Malakhov, A., Goncharov, F., & Gryazina, E. (2019, March). Testing machine learning ap-proaches for wind plants power output. In 2019 International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE) (pp. 1-6). IEEE.‏
  18. Cao, X. H., Stojkovic, I., & Obradovic, Z. (2016). A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC bioinformatics, 17(1), 359.‏
  19. Han, J., Kamber, M., & Pei, J. (2011). Data mining: concepts and techniques 3rd edn. Morgan Kaufmann.‏
  20. Haykin, S. S. (2009). Neural networks and learning machines 3rd edn. Simon Haykin. Prentice hall.