Neural Networks in Image Processing: A Review of Architectures, Datasets, and Performance

Mohammad Abid Al-Hashim

doi:https://doi.org/10.47001/IRJIET/2025.910005

Neural Networks in Image Processing: A Review of Architectures, Datasets, and Performance

Mohammad Abid Al-HashimDepartment of Computer Science /College of Computer Science and Mathematics / University of Mosul, Iraq

Vol 9 No 10 (2025): Volume 9, Issue 10, October 2025 | Pages: 29-36

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 13-10-2025

doi.org/10.47001/IRJIET/2025.910005

Full Text PDF

Abstract

The rapid advancement of neural network-based methods has made an important transformation in image processing field. This transformation provided an unprecedented performance in a wide range of applications such as segmentation, classification enhancement, and generation. This paper provides comprehensive overview of the main neural network used in image processing, which are convolutional neural networks (CNNs), autoencoders, generative adversarial networks (GANs), and vision transformers (ViTs). The design principles behind these models have been discussed and their strengths and limitations in various image processing tasks were highlighted. Moreover, the most widely used benchmark datasets and performance metrics that facilitate objective evaluation were examined and comparison of different approaches and comparison of different approaches has been done. The trade-offs between model accuracy, computational efficiency, and scalability was also explored by analyzing recent trends. Finally, the current challenges and outline future research directions aimed at developing more efficient, interpretable, and generalizable neural network solutions for image processing have been addressed.

Keywords

Neural Networks, Image Processing, Convolutional Neural Networks (CNNs), Image Classification, Image Segmentation, Image Enhancement, Benchmark Datasets

Citation of this Article

Mohammad Abid Al-Hashim. (2025). Neural Networks in Image Processing: A Review of Architectures, Datasets, and Performance. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(10), 29-36. Article DOI https://doi.org/10.47001/IRJIET/2025.910005

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 25, 1097–1105.
Rawat, W., & Wang, Z. (2017). Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Computation, 29(9), 2352–2449.
Vaswani, A., et al. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS), 30.
Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929.
Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys, 54(10), 1–41.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
Wang, X., Yu, K., Dong, C., Loy, C. C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 606–615.
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems (NeurIPS), 30.
Khan, S., & Iqbal, R. (2025). A comprehensive survey on architectural advances in deep CNNs: Challenges, applications, and emerging research directions. arXivpreprint arXiv:2503.16546. https://arxiv.org/abs/2503.16546
Google (2024). MobileNet V4. Wikipedia
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371-3408.
El-Shafai, W. E., et al. (2023). Image retrieval using convolutional autoencoder, InfoGAN, and vision transformer unsupervised models. ResearchGate. https://www.researchgate.net/publication/368234541
Saharia, C., et al. (2023). Image Super-Resolution via Iterative Refinement. IEEE TPAMI.
Yunusa, H., et al. (2024). Hybrid CNN–ViT Architectures for Computer Vision. arXiv:2402.02941.
Howard, A., et al. (2024). MobileNetV4: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:2403.XXXX. [Note: Use actual link or publication info if available]
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html
Tiny ImageNet Challenge. (2015). Stanford CS231n: Convolutional Neural Networks for Visual Recognition. http://tiny-imagenet.herokuapp.com/
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. https://doi.org/10.1007/s11263-009-0275-4
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., ... & Schiele, B. (2016). The Cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223. https://doi.org/10.1109/CVPR.2016.350
Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. Proceedings of the IEEE International Conference on Computer Vision, 3730–3738. https://doi.org/10.1109/ICCV.2015.425
Yu, F., Zhang, Y., Song, S., Seff, A., & Xiao, J. (2015). LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. https://arxiv.org/abs/1506.03365
Recht, B., Roelofs, R., Schmidt, L., & Shankar, V. (2019). Do ImageNet classifiers generalize to ImageNet? arXiv preprint arXiv:1902.10811. https://arxiv.org/abs/1902.10811
Open Images Dataset V7. (2023). Google Research. https://storage.googleapis.com/openimages/web/index.html
R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. 14th Int. Joint Conf. Artif. Intell. (IJCAI), Montreal, Canada, 1995, pp. 1137–1143.
T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, 2014, pp. 740–755.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA, 2017, pp. 6626–6637.
C. Salimans et al., “Improved techniques for training GANs,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Barcelona, Spain, 2016, pp. 2234–2242.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, 6105–6114.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.
Jocher, G., Chaurasia, A., & Qiu, J. (2023). YOLOv8: A cutting-edge object detection and segmentation model. Ultralytics Technical Report. https://github.com/ultralytics/ultralytics
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241.
Chen, L. C., Zhu, Y., Papandreou, G., et al. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. European Conference on Computer Vision (ECCV), 801–818.
Xie, E., Wang, W., Yu, Z., et al. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2961–2969
Ledig, C., Theis, L., Huszár, F., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4681–4690.
Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4401–4410.
Ramesh, A., Pavlov, M., Goh, G., et al. (2021). Zero-shot text-to-image generation. International Conference on Machine Learning (ICML).
Schlegl, T., Seeböck, P., Waldstein, S. M., et al. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. Information Processing in Medical Imaging (IPMI), 146–157.
Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J. N., Wu, Z., & Ding, X. (2020). Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis, 63, 101693. https://doi.org/10.1016/j.media.2020.101693
Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations (ICLR).
Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://arxiv.org/abs/1702.08608
Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Neural Networks in Image Processing: A Review of Architectures, Datasets, and Performance

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links