Data Quality Management for Effective Machine Learning and AI Modelling, Best Practices and Emerging Trends

Abstract

In the modern day, the incorporation of artificial intelligence and machine learning in order to fulfil the requirements of user service has resulted in the establishment of a strong association between data quality and application providers. There are several challenges that come up as a result of the processing of huge amounts of data. These challenges include redundant data, unstructured data, data interruptions, discrepancies, inaccuracies, and information that is no longer relevant. The majority of the attention being paid to data defects in invariant scenarios and the discussion of the eight principles associated to data problems are being directed toward the numerous data quality challenges that are now being addressed. In order to address the issues associated with data quality, a variety of approaches are utilized, which therefore makes it easier to include machine learning and artificial intelligence. It is possible to successfully utilize dataset values in pairs within machine learning models. This is done in order to boost the relevance of the machine learning process through the utilization of a variety of approaches. The process of machine learning involves recognizing patterns and utilizing previous data to generate predictions or decisions. A number of repercussions were investigated on a different level, but the quality of the data was ignored, which resulted in the AI system's trustworthiness and effectiveness being undermined. After everything is said and done, a multitude of real-time applications are investigated for large-scale data in order to guarantee the stability of the data by resolving many risks and concerns regarding privacy. Evaluating a wide range of performance measures ensures that data quality is maintained alongside the integration of AI and ML.

Country : USA

1 Praneeth Reddy Amudala Puchakayala

  1. Data Scientist, Regions Bank, USA

IRJIET, Volume 6, Issue 12, December 2022 pp. 327-340

doi.org/10.47001/IRJIET/2022.612062

References

  1. Khandani AE, Kim AJ, Lo AW. 2010. Consumer credit-risk models via machine-learning algorithms. Journal of Banking & Finance, 34(11):2767–2787.
  2. Feng G, He J, Jiang F, Wang X. 2018. Firm fundamentals and stock returns: An industry perspective. Management Science, 64(6):2868–2889.
  3. Huang L, Pearlson K. 2019. Insurance underwriting in the age of artificial intelligence: The impact on risk management and financing. The Geneva Papers on Risk and Insurance-Issues and Practice, 44(1):1–20.
  4. McKinsey & Company. 2017. The role of big data and predictive analytics in risk management. Available: https://www.mckinsey.com.
  5. Riggins FJ, Klamm BK. 2017. Data governance case at Krause McMahon LLP. Journal of Information Systems, 31(2):21–36.
  6. Berk R, Heidari H, Jabbari S, Kearns M, Roth A. 2018. Fairness in criminal justice risk assessments: The state of the art. Sociological Methods & Research, 47(3):355–391.
  7. Chui M, Manyika J, Miremadi M. 2016. Where machines could replace humans—and where they cannot (yet). McKinsey Quarterly, 2016(3):58–68.
  8. Nguyen DQ, Reddi J. 2019. Machine learning and AI in cybersecurity: Challenges, opportunities, and applications. Journal of Information Security and Applications, 46:34–49.
  9. Brynjolfsson E, McAfee A. 2017. The business of artificial intelligence. Harvard Business Review, 1–20.
  10. Rossi, K, Raineri A, Rossi M. 2019. AI in regulatory compliance: A comprehensive guide. Compliance Journal, 12(2):4560.
  11. Davenport TH, Ronanki R. 2018. Artificial intelligence for the real world. Harvard Business Review, 96(1):108–116.
  12. Doshi-Velez F, Kim B. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.
  13. Basel Committee on Banking Supervision. 2020. Basel III: Finalising post-crisis reforms. Bank for International Settlements. Available: https://www.bis.org/bcbs/publ/d462.htm.
  14. World Economic Forum. 2019. The future of financial infrastructure: An ambitious look at how blockchain can reshape financial services. Available: https://www.weforum.org/reports/thefuture-of-financial-infrastructure-anambitious-look-at-how-blockchain-canreshape-financial-services.
  15. U.S. Department of the Treasury. 2018. A financial system that creates economic opportunities: Nonbank financials, fintech, and innovation. Available: https://home.treasury.gov/system/files/136/A-Financial-System-thatCreates-Economic-Opportunities--Nonbank-Financials-Fintech-andInnovation.pdf.
  16. PwC. 2019. PwC's global economic crime and fraud survey 2019: Fighting fraud: A never-ending battle. Price Water House Coopers. Available: https://www.pwc.com/gx/en/services/advisory/forensics/economic-crime-survey.html.
  17. IBM. 2019. IBM AI Ethics: Making AI Transparent and Accountable. Available: https://www.ibm.com/blogs/research/2019/10/ai-ethics/.
  18. EY. 2020. Global FinTech Adoption Index 2020. Ernst & Young. Available: https://www.ey.com/en_gl/ey-global-fintech-adoption-index.
  19. Goodfellow I, Bengio Y, Courville A. 2016. Deep Learning. MIT Press.
  20. LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature, 521(7553):436–444.
  21. Han J, Kamber M, Pei J. 2012. Data Mining: Concepts and Techniques, 3rd ed. Morgan Kaufmann.
  22. Aggarwal CC, Reddy CK. 2014. Data Mining: Algorithms and Applications. Springer.
  23. Lee VS, Stolfo SJ. 2000. Data mining approaches intrusion detection. IEEE Transactions on Knowledge and Data Engineering, 12(5):781–792.
  24. Hindle, Abram, et al. 2016. On the naturalness of software. Communications of the ACM, 59(5):122–131.