Evaluation of resampling techniques for deep learning based identification of promising genotypes in sugarcane varietal trials

Main Article Content

Syed Sarfaraz Hasan
Arun Baitha
Lal Singh Gangwar
Sanjeev Kumar

Abstract

Deep learning is a class of machine learning algorithms that extract high-level features from the raw input for making intelligent decisions. Identification of promising genotypes in varietal trials is one of many agriculture domain applications requiring implementation of deep learning to perform intelligent decision using varietal trial data. However, it has been found that varietal trial data to be used for identification is highly imbalanced one providing great challenges for classification tasks in deep learning. For example, only 33 genotypes were identified as promising in zonal varietal trials of All India Coordinated Research Project (AICRP) on Sugarcane during 2016-21, while those of non-promising class are 148. Balancing an imbalanced class is crucial as the classification model, which is trained using the imbalanced class dataset will tend to exhibit the prediction accuracy according to the highest class of the dataset. One way to address this issue is to use resampling, which adjusts the ratio between the different classes, making the data more balanced. Study was conducted to implement and evaluate four resampling techniques viz. random undersampling, random oversampling, ensemble, SMOTE to balance varietal trial dataset in order to build deep learning model to identify promising genotypes in sugarcane. Paper describes the methodology used in our approach for building deep learning model using resampling techniques and then presented comparative performance of these approaches in identifying promising genotypes. Results indicate that SMOTE and random oversampling performed well for balancing imbalanced dataset for developing deep learning model in comparison to no-resampling of imbalanced dataset. SMOTE outperformed all resampling techniques by achieving high values of precision, recall and F1 score for both positive and negative classes. However, ensemble and random undersampling methods did not showed good results in comparison to SMOTE and random oversampling technique. Studies conducted will be useful in developing artificial intelligence based tools for automatic identification of promising genotypes in varietals trials of sugarcane in particular, as well as other crops in general.

Article Details

How to Cite
Hasan, S. S. ., Baitha, A. ., Gangwar, L. S. ., & Kumar, S. . (2024). Evaluation of resampling techniques for deep learning based identification of promising genotypes in sugarcane varietal trials. INDIAN JOURNAL OF GENETICS AND PLANT BREEDING, 84(01), 92–98. https://doi.org/10.31742/ISGPB.84.1.8
Section
Research Article

References

Abdi L., and Sattar H. 2016. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng. 28(1):238–51.

Bagui S., and Li K. 2021. Resampling imbalanced data for network intrusion detection datasets. J Big Data. 8, 6. https://doi.org/10.1186/s40537-020-00390-x.

Berry M. J. A., and Linoff G. 2000. Astering Data Mining. The Art and Science of Customer Relationship Management. Willey.

Chen H., Yining L., Chen C. L., and Xiaoou T. 2016. Learning Deep Representation for Imbalanced Classification. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5375-5384.

Cieslak D. A., Chawla N. W., and Striegel A. 2006. Combating Imbalance in Network Intrusion Datasets. Proc IEEE Int Conf Granular Computing, 2006, Atlanta, Georgia, USA, 732-737.

Dong Q., Gong S., and Zhu X. 2019. Imbalanced Deep Learning by Minority Class Incremental Rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 41(6): 1367-1381. doi: 10.1109/TPAMI.2018.2832629.

Ertekin S. E., Huang J., Bottou L., and Giles C. L. 2007. Learning on the border: active learning in imbalanced data classification. In: Proceedings of ACM Conference on information and knowledge management, Lisbon, Portugal, 2007, 127–36.

Estabrooks A., Jo T. J., and Japkowicz N. 2004. A Multiple Resampling Method for Learning from Imbalanced Data Sets. Computational Intelligence. 20(1):18–36.

Fernández A., García S., and Herrera F. 2011. Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution. In: Corchado E, Kurzyński M, Woźniak M. (eds.) HAIS 2011, Part I. LNCS, 6678:1–10.

He H., and Garcia E. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9):1263–1284.

Hulse J. V., Khoshgoftaar T. M., and Napolitano A. 2007. Experimental perspectives on learning from imbalanced data. In: Proceedings of the 24th international conference on machine learning, Corvallis, Oregon: Oregon State University, 2007, 935–42.

Japkowicz N., and Stephen S. 2002. The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis Journal. 6(5):429–449.

Leevy J. L., Khoshgoftaar T. M., Bauder R. A., and Seliya N. 2018. A survey on addressing high-class imbalance in big data. J Big Data. 5:42. https://doi.org/10.1186/s40537-018-0151-6.

Mohri M., Rostamizadeh A., and Talwalkar A. 2018. Foundations of machine learning. 2nd ed. Cambridge: MIT Press, 2018.

More A. 2018. Survey of resampling techniques for improving classification performance in unbalanced datasets.

Radivojac P., Chawla N. V., Dunker A. K., and Obradovic Z. 2004. Classification and knowledge discovery in protein databases. J Biomed Inform. 37(4):224–39. https://doi.org/10.1016/j.jbi.2004.07.008.

Raghuwanshi B. S., and Shukla S. 2020. SMOTE based class-specific extreme learning machine for imbalanced learning. Pattern Anal Appl. 187:104814.

Ram B. 2017. Principal Investigator’s Report 2016-17. Varietal Improvement Programme, All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Ram B. 2018. Principal Investigator’s Report 2017-18. Varietal Improvement Programme, All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Ram B. 2019. Principal Investigator’s Report 2018-19. Varietal Improvement Programme, All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Ram B. 2020. Principal Investigator’s Report 2019-20. Varietal Improvement Programme, All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Ram B. 2021. Principal Investigator’s Report 2020-21. Varietal Improvement Programme, All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Shukla S. K., Yadav S. Kl, and Pathak A. D. 2018. Low cost technologies in sugarcane agriculture published by ICAR – All India Coordinated Research Project on Sugarcane, IISR, Lucknow. pp 1-55

Viswanathan R. 2017. Technical Report, Plant Pathology (2016-17). All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Viswanathan R. 2018. Technical Report, Plant Pathology (2017-18). All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Viswanathan R. 2019. Technical Report, Plant Pathology (2018-19). All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Viswanathan R. 2020. Technical Report, Plant Pathology (2019-20). All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Viswanathan R. 2021. Technical Report, Plant Pathology (2020-21). All India Coordinated Research Project on Sugarcane. ICAR-Sugarcane Breeding Institute, Coimbatore.

Wallace B., Small K., Brodley C., and Trikalinos T. 2011. Class imbalance, redux. In: IEEE 11th international conference on data mining (ICDM), Vancouver, Canada, 2011, p 754–63.

Wang S., Liu W., Wu J., Cao L., Meng Q., and Kennedy P. J. 2016. "Training deep neural networks on imbalanced data sets," 2016 International Joint Conference on Neural Networks (IJCNN), 2016, pp. 4368-4374. doi: 10.1109/IJCNN.2016.7727770.

Wen M., Cong P., Zhang Z., Lu H., and Li T. 2018. DeepMirTar: a deep learning approach for predicting human miRNA targets. Bioinformatics. 34(22): 3781–87.

Yang Q., and Wu X. 2006. 10 challenging problems in data mining research. International Journal of Information Technology & Decision Making. 5(4):597–604.

Yun Q., Yanchun L., Mu L., Guoxiang F., and Xiaohu S. 2014. A resampling ensemble algorithm for classification of imbalance problems. Neurocomputing. 143:57-67. https://doi.org/10.1016/j.neucom.2014.06.021.

Most read articles by the same author(s)