Improved Reliability of Photovoltaic Systems through Iterative Dataset and Feature Simplification

Shiva Prakash S; Selvaraju M; Suresh G; M. Muthuraj; Gokilhashree M; Pon Mahesh Kumar

doi:10.1051/epjconf/202636301019

Open Access

Issue		EPJ Web Conf. Volume 363, 2026 International Conference on Low-Carbon Development and Materials for Solar Energy (ICLDMS’26)


Article Number		01019
Number of page(s)		22
Section		Energy Materials
DOI		https://doi.org/10.1051/epjconf/202636301019
Published online		16 April 2026

EPJ Web of Conferences 363, 01019 (2026)
https://doi.org/10.1051/epjconf/202636301019

Improved Reliability of Photovoltaic Systems through Iterative Dataset and Feature Simplification

Shiva Prakash S¹, Selvaraju M², Suresh G³, M. Muthuraj⁴, Gokilhashree M⁵ and Pon Mahesh Kumar⁶

¹ Senior Assistant Professor, Department of Mechanical Engineering, New Horizon College of Engineering, Outer Ring Road, Bellandur, Bengaluru, Karanataka - 560103
² Associate Professor, Department of Mechanical Engineering, Rathinam Technical Campus, Coimbatore, Tamilnadu, India.
³ Professor, Department of Mathematics, Vel Tech High Tech Dr. Rangarajan Dr. Sakunthala Engineering College, Avadi, Chennai 600062
⁴ Assistant Professor, Department of Mechanical Engineering, NPR College of Engineering and technology, Natham, Dindigul, Tamilnadu
⁵ Department of Medical Electronics, Sengunthar Engineering College, Thirucegode, Namakkal, Tamilnadu - 637205
⁶ Assistant professor, Department of Mechanical Engineering, Nandha college of Technology, Perundurai, Erode, Tamilnadu, India- 638052

Published online: 16 April 2026

Abstract

Photovoltaic (PV) systems have a high number of potential problems. The conventional types of security are often broken down. This resulted in the development of state-of-the-art and fully automated, AI-based, methods, specifically, ML, which have already proven their utility in PV prevention. The cost of AI algorithms is a significant factor because it is highly complex, although they are rapidly developed and evolved. To ensure that the traditional ML algorithms can be applied to protect PV arrays under the new AI algorithm development, this study proposes how to simplify the models of the ML. In this paper, a framework of constructing an aggregative model with numerous ML methods is presented. It is an iterative method, whose main aim is to simplify model training. To simplify the training the model process, it uses two methods. The first step that we undertake is to reduce the dataset of classes through a horizontal simplification strategy. The main assumption made in the first approach is that in case an algorithm is performing poorly in an attempt to correctly label smaller data sets, then it would most definitely fail even larger data sets. The second step is to apply a vertical simplification strategy that is implemented in a random forest algorithm to select the most effective characteristics and further reduce the dataset on each iteration. The proposed strategy proves to be effective and sound by applying it to one of the laboratory PV systems to two experimental situations with different datasets. We have tested the proposed method on a large-scalePV system based on MATLAB/Simulink. The figures indicate that the test accuracy is 100 % in the case, 99.59% in the second and 99.17% in the third. The approach is also more effective in a variety of aspects compared to other similar studies that have been published in the past.

Key words: Photovoltaics / Machine learning / Fault detection / Dataset / Random Forest

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.