Neural Networks in Image Processing: A Review of Architectures, Datasets, and Performance

Abstract

The rapid advancement of neural network-based methods has made an important transformation in image processing field. This transformation provided an unprecedented performance in a wide range of applications such as segmentation, classification enhancement, and generation. This paper provides comprehensive overview of the main neural network used in image processing, which are convolutional neural networks (CNNs), autoencoders, generative adversarial networks (GANs), and vision transformers (ViTs). The design principles behind these models have been discussed and their strengths and limitations in various image processing tasks were highlighted. Moreover, the most widely used benchmark datasets and performance metrics that facilitate objective evaluation were examined and comparison of different approaches and comparison of different approaches has been done. The trade-offs between model accuracy, computational efficiency, and scalability was also explored by analyzing recent trends. Finally, the current challenges and outline future research directions aimed at developing more efficient, interpretable, and generalizable neural network solutions for image processing have been addressed.

Country : Iraq

1 Mohammad Abid Al-Hashim

  1. Department of Computer Science /College of Computer Science and Mathematics / University of Mosul, Iraq

IRJIET, Volume 9, Issue 10, October 2025 pp. 29-36

doi.org/10.47001/IRJIET/2025.910005

References

  1. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
  2. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
  3. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 25, 1097–1105.
  4. Rawat, W., & Wang, Z. (2017). Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Computation, 29(9), 2352–2449.
  5. Vaswani, A., et al. (2017). Attention is All You Need. In Advances in Neural Information Processing Systems (NeurIPS), 30.
  6. Dosovitskiy, A., et al. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv preprint arXiv:2010.11929.
  7. Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., & Shah, M. (2022). Transformers in Vision: A Survey. ACM Computing Surveys, 54(10), 1–41.
  8. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
  9. Wang, X., Yu, K., Dong, C., Loy, C. C. (2018). Recovering realistic texture in image super-resolution by deep spatial feature transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 606–615.
  10. Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems (NeurIPS), 30.
  11. Khan, S., & Iqbal, R. (2025). A comprehensive survey on architectural advances in deep CNNs: Challenges, applications, and emerging research directions. arXivpreprint arXiv:2503.16546. https://arxiv.org/abs/2503.16546
  12. Google (2024). MobileNet V4. Wikipedia
  13. Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P. A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371-3408.
  14. El-Shafai, W. E., et al. (2023). Image retrieval using convolutional autoencoder, InfoGAN, and vision transformer unsupervised models. ResearchGate. https://www.researchgate.net/publication/368234541
  15. Saharia, C., et al. (2023). Image Super-Resolution via Iterative Refinement. IEEE TPAMI.
  16. Yunusa, H., et al. (2024). Hybrid CNN–ViT Architectures for Computer Vision. arXiv:2402.02941.
  17. Howard, A., et al. (2024). MobileNetV4: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:2403.XXXX. [Note: Use actual link or publication info if available]
  18. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
  19. Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html
  20. Tiny ImageNet Challenge. (2015). Stanford CS231n: Convolutional Neural Networks for Visual Recognition. http://tiny-imagenet.herokuapp.com/
  21. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., ... & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. European Conference on Computer Vision, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
  22. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. https://doi.org/10.1007/s11263-009-0275-4
  23. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., ... & Schiele, B. (2016). The Cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3213–3223. https://doi.org/10.1109/CVPR.2016.350
  24. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. Proceedings of the IEEE International Conference on Computer Vision, 3730–3738. https://doi.org/10.1109/ICCV.2015.425
  25. Yu, F., Zhang, Y., Song, S., Seff, A., & Xiao, J. (2015). LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. https://arxiv.org/abs/1506.03365
  26. Recht, B., Roelofs, R., Schmidt, L., & Shankar, V. (2019). Do ImageNet classifiers generalize to ImageNet? arXiv preprint arXiv:1902.10811. https://arxiv.org/abs/1902.10811
  27. Open Images Dataset V7. (2023). Google Research. https://storage.googleapis.com/openimages/web/index.html
  28. R. Kohavi, “A study of cross-validation and bootstrap for accuracy estimation and model selection,” in Proc. 14th Int. Joint Conf. Artif. Intell. (IJCAI), Montreal, Canada, 1995, pp. 1137–1143.
  29. T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Zurich, Switzerland, 2014, pp. 740–755.
  30. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local Nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Long Beach, CA, USA, 2017, pp. 6626–6637.
  31. C. Salimans et al., “Improved techniques for training GANs,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), Barcelona, Spain, 2016, pp. 2234–2242.
  32. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
  33. Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, 6105–6114.
  34. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28.
  35. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788.
  36. Jocher, G., Chaurasia, A., & Qiu, J. (2023). YOLOv8: A cutting-edge object detection and segmentation model. Ultralytics Technical Report. https://github.com/ultralytics/ultralytics
  37. Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention (MICCAI), 234–241.
  38. Chen, L. C., Zhu, Y., Papandreou, G., et al. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. European Conference on Computer Vision (ECCV), 801–818.
  39. Xie, E., Wang, W., Yu, Z., et al. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34.
  40. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2961–2969
  41. Ledig, C., Theis, L., Huszár, F., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4681–4690.
  42. Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4401–4410.
  43. Ramesh, A., Pavlov, M., Goh, G., et al. (2021). Zero-shot text-to-image generation. International Conference on Machine Learning (ICML).
  44. Schlegl, T., Seeböck, P., Waldstein, S. M., et al. (2017). Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. Information Processing in Medical Imaging (IPMI), 146–157.
  45. Tajbakhsh, N., Jeyaseelan, L., Li, Q., Chiang, J. N., Wu, Z., & Ding, X. (2020). Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Medical Image Analysis, 63, 101693. https://doi.org/10.1016/j.media.2020.101693
  46. Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations (ICLR).
  47. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650.
  48. Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://arxiv.org/abs/1702.08608
  49. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1–15.