 Research
 Open access
 Published:
Automatic DNN architecture design using CPSOTJUTT for power system inspection
Journal of Big Data volume 10, Article number: 150 (2023)
Abstract
To quickly and accurately automatically design more highprecision deep neural network models (DNNs), this paper proposes an automatic DNN architecture design ensemble model based on consensus particle swarm optimizationassisted trajectory unified and TRUSTTECH (CPSOTJUTT), called CPSOTJUTTEM. The proposed model is a threelayer model, and its core is a threestage method for addressing the sensitivity of the local solver to the initial point and enabling fast and robust training DNN, effectively avoiding missing highquality DNN models in the process of automatic DNN architecture design. CPSOTJUTT has the following advantages: (1) highquality local optimal solutions (LOSs) and (2) robust convergence against random initialization. CPSOTJUTTEM consists of the bottom layer: stable and fast design highquality DNN architectures, the middle layer: exploration for a diverse set of optimal DNN classification engines, and the top layer: ensemble model for higher performance. This paper tests the performance of CPSOTJUTTEM on public datasets and three selfmade power system inspection datasets. Experimental results show that the CPSOTJUTTEM has excellent performance in automatic DNN architecture design, DNN model optimization. And the CPSOTJUTTEM can automatically design highquality DNN ensemble models, laying a solid foundation for the application of DNN in other fields.
Introduction
Deep neural networks (DNNs) are widely used in computer vision, natural language processing, speech recognition, and other fields [1, 2]. The most stateoftheart DNN models like ResNet [3] and EfficientNet [4] currently mainly rely on manual design based on a common standard dataset, such as ImageNet. However, DNN models usually do not show high performance in many specific field tasks. In order to design a DNN with good performance, it is necessary to have extensive professional knowledge in both the DNN and the problem field being studied, which may not necessarily be applicable to every interested user [5]. To address the above issues, Automatic DNN architecture design technology is an efficient solution that can meet the needs of tasks in different fields [6]. This method is expected to significantly reduce labor costs and improve model performance, promoting the application of DNN in other fields.
In critical and sensitive domains, such as medical diagnosis and autonomous driving, there is an increasing demand for highprecision model. A single model often exhibits different recognition abilities in different categories, which makes it difficult to meet the required accuracy [7]. With the advancement of hardware performance, the application of largescale ensemble models is becoming more widespread. The ensemble model improves the accuracy and generalization ability of the model by utilizing multiple models to comprehensively consider the prediction results through joint learning and decisionmaking [8]. However, designing and optimizing the ensemble model requires indepth learning of professional knowledge and skills, including model selection and ensemble method design, which affects its application in other fields [9].
This paper combines automatic DNN architecture design technology and ensemble model technology to achieve improved DNN performance. Automatic DNN architecture design technology can automatically search and generate DNN models that are more suitable for specific tasks and domain requirements. Once multiple proposal optimized DNN architectures are obtained, ensemble model technology can synthesize the prediction results of these models to achieve higher accuracy. However, the high redundancy and nonconvexity of the parameters lead to many local optimal solutions (LOSs) for the DNN. Also, both lowquality and highquality LOSs all have the same local properties [10, 11]. DNN training is usually realized by the firstorder local solver [12]. One disadvantage of the local solver is that the gradients in different directions are uniformly scaled and may converge to a bad LOS [13, 14], resulting in poor generalization ability or inability to converge [15]. For the poor robustness of the training DNN method, the performance of the automatically designed DNN architecture cannot be adequately evaluated.
To solve the above problems, this paper proposes a novel threelayer ensemble model, termed consensus particle swarm optimizationassisted trajectory unified and TRUSTTECH ensemble model (CPSOTJUTTEM). This model is based on automatic DNN architecture design technology, with CPSOTJUTT algorithm as the core. The bottom layer of CPSOTJUTTEM achieves stable and fast generation of highquality DNN architecture. Through this design, the experts of nondeep learning fields can also design the most suitable DNN framework for different domain requirements, effectively promoting the application of deep learning in other fields. The middle layer utilizes a threestage method for highprecision and robust DNN training to obtain candidate optimized DNN models. The top layer achieves higher performance ensemble model by ensemble highquality DNN models. The ensemble model can fully leverage the performance advantages of highquality subDNN models, improving the overall accuracy and generalization ability of the model.
The main contributions of this paper are:

1)
The CPSOTJUTTEM can robustly and automatically design a highquality DNN architecture according to the application field without extended expertise in DNNs.

2)
The CPSOTJUTTEM constructs an ensemble model consisting of a diverse set of highquality classification engines so that the ensemble model takes full advantage of each subDNN to maximize recognition accuracy. The generalization ability of the ensemble model is significantly improved.

3)
The CPSOTJUTT methodology with a strong theoretical basis can robustly converge to highquality LOSs from random initial points.

4)
The CPSOTJUTT methodology, which consists of the consensusbased PSO, TJU methodology, and the TRUSTTECH methodology, fully utilizes the global view of the consensusbased PSO, the robust convergence ability of TJU methodology, and the search ability of the TRUSTTECH methodology for higher quality LOSs.
Original contributions and novelties
The architecture of the proposed CPSOTJUTTEM is given in Fig. 1.
Bottom layer: Automatically designs highquality DNN architectures. The CPSOTJUTT methodology trains these DNN architectures and selects highquality DNN classification engines. In this layer, the consensusbased PSO is used to solve the sensitivity of training DNN to the initial value, which can converge quickly to the optimal stability region.
Middle layer: Explore a diverse set of highquality DNN classification engines via the CPSOTJUTT methodology. The CPSOTJUTT methodology can robustly converge to the LOS and search for better ones nearby while maintaining its global search ability.
Top layer: Take the highquality classification engines in the middle layer as a hidden node of the ensemble model, and apply the CPSOTJUTT methodology to strengthen the training to find the optimal combination of classification engines to further improve the identification accuracy and generalization ability.
Related work
Evolutionary neural network
Several methodologies have been proposed to automatically design DNN architecture. NeuroEvolution of Augmenting Topologies (NEAT) [16] represents early progress in the development of smallscale network architecture, which inspires the research of neuroevolution based on DNN. The NEAT model and coevolution of modules are combined in CoDeepNEAT [17].
The evolutionary algorithm uses an intuitive mutation operator to add layer structure, which makes the framework complex for a small network [18]. It has made remarkable achievements in the CIFAR dataset. AmoebaNetA improves the tournament selection evolutionary algorithm by adding an age property that favors the younger genotypes and surpasses hand designs for the first time [19]. The Genetic CNN algorithm [20] is a neuroevolutionary algorithm that optimizes connections between convolutional layers using mutation and crossevolution. The algorithm can meet the design requirements of the DNN model in some fields to some extent. CNNGA [21] effectively addresses the image classification tasks by designing a new encoding strategy for the GA to encode arbitrary depths of CNNs. Autoevolutionary CNN (AECNN) [5] provides effective local search and global search ability through a crossover operator and a mutation operator and can design highquality DNN architectures in the case of limited computing resources. CorrNet is a novel correlation based pruning (CFP) approach, which creates a feature selection scheme to obtain the pruning approaches. This approach achieves accuracy gain while saving a significant amount of computational costs [22].
Deep neural network training methods
The existing DNN training methods mainly adopt the firstorder gradient method and its variants. The optimization algorithm based on a firstorder gradient has linear efficiency in time and memory complexity and has achieved great success. Momentum Stochastic Gradient Descent (SGD) [12] pursues fast and stable convergence and is widely used for its simplicity and intuitiveness. However, the gradients in different directions are scaled uniformly, causing poor convergence when training sparse data. Therefore, the acceleration of SGD has attracted extensive research. Recently, some adaptive firstorder optimization methods have been proposed to achieve rapid convergence. Adagrad [23] accelerates DNN training by dynamically adjusting the learning rate based on the gradient. RMSprop [24] is an adaptive firstorder optimization method that discards remote gradients by using an exponentially declining average of squared gradients. This method has a much lower computational cost than SGD. Adam [25] combines the advantages of Adagrad and RMSprop, which scale the gradient by the square root of the accumulative square gradient to achieve fast convergence. Adam has become the default optimization algorithm for many DNNs due to its rapid convergence [26]. However, due to the sensitivity to initialization and hyperparameters, these optimization methods may converge to suboptimal local optimal solutions, resulting in worse generalization ability [10].
Deep neural network ensemble
The degree of DNN training has a significant impact on the accuracy and generalization ability of image classification. Insufficient DNN training will result in the low accuracy of hard examples, and overtraining will result in the poor generalization ability of easy examples, especially for the class imbalanced dataset [27, 28]. Cascade structure improves the recognition accuracy of a DNN, but high model capacity will lead to overfitting. The ensemble model of a DNN is the optimal combination of diverse highquality subDNNs, which can fully exploit of the recognition ability of any subDNN [29]. Plantdiseasenet uses the majority voting ensemble model to detect plant pests in the early stage of disease, and the results show that the proposed model has reached or exceeded the latest result [30]. The ensemble model of a DNN can obtain better accuracy and generalization ability than each subDNN [27].
Power system inspection
As an indispensable infrastructure in modern society, the stable operation of the power system is crucial to the development of social economy and the normal conduct of people’s lives. In order to ensure the safe and reliable operation of the power system, regular power system inspections are particularly important.
Power line insulator inspection is a regular assessment of the status of insulators on power transmission lines to ensure the stable and safe operation of the power system. Literature [31] proposes a power insulator inspection algorithm based on deep learning to eliminate the impact of complex power environments on detection accuracy. Power system substation inspection can promptly detect potential equipment failures and take maintenance measures to ensure the safe operation of the power grid. Literature [32] proposes a detection algorithm based on improved YOLO v5. A backbone with a unique attention mechanism was designed to extract more accurate feature maps. Solved the pain point of lack of detection accuracy in unmanned substations. Power line obstacle inspection can identify potential risks and take timely measures to ensure the normal operation of transmission lines. Literature [33] proposed an object detection algorithm based on RCNN to ensure the safety of power lines.
The automatic DNN architecture design using CPSOTJUTT for power system inspection method proposed in this article improves the accuracy and generalization ability of power system inspection. Solved the problems faced by power system inspection, such as multiple inspection scenarios, low accuracy of general single models, and high difficulty in designing specialized models. In the future, we will conduct research on more advanced deep design network models such as deep learning with prior knowledge [34, 35] to further improve the performance of the proposed methods.
The CPSOTJUTT methodology
The CPSOTJUTT methodology, which consists of the consensusbased PSO, TJU methodology, and TrustTech methodology, can converge robustly to highquality LOSs from random initial points. The CPSOTJUTT methodology is the core of CPSOTJUTTEM and fully utilizes the global view of the consensusbased PSO, the robust convergence ability of the TJU methodology, and the search ability of the TRUSTTECH methodology for higher quality LOSs. The pseudocode of the CPSOTJUTT methodology is shown in Algorithm 1.
The architecture of the CPSOTJUTT methodology is as follows:
Stage I: Exploration and Consensus: The positions of the DNN are updated by PSO until PSO is terminated when all the particles have reached a consensus. The three optimal particles in the consensus region and the weight center point are also selected as the initial points of the next stage.
Stage II: Robust Convergence: We use TJU methodology and a local solver to robustly converge to a highquality LOS from the representative particles selected in the previous stage.
Stage III: Search Optimal: TRUSTTECH methodology is applied to effectively jump out of the stability region of the SEP found in stage II, enter the stability region of neighboring SEPs, and obtain multiple highquality LOSs in a tierbytier search manner.
CPSOTJUTT stage I: exploration and consensus
The DNN is a regularized version of a multilayer perceptron with a multilayer network structure, and its performance is usually evaluated by. The goal of optimal DNN training is to reduce crossentropy (CE) loss function as much as possible, or even approach 0 infinitely:
where \(x\) is the input data, \(C\) is the number of classification objects, \({t}_{ij}\) is the onehot value of the class, \({p}_{ij}\) is the probability that sample \(i\) belongs to class \(j\), and N is the size of the MiniBatch.
In this stage, the global search ability of PSOs is used to assist the robust convergence of the local solver. To this end, we introduce a consensusbased PSO to locate optimal converge regions in the search space that contain highquality LOSs.
All particles exchange information with the personal best position and the global best position at each step of the PSO. The update of the particle is the combination of the original position and velocity, which can be described as follows:
where \(\omega \) is the weight of the DNN. \({C}_{1}\), \({C}_{2}\) are learning factors, respectively. \({R}_{1}\), \({R}_{2}\) are random numbers distributed between 0 and 1. pbest is the personal best position, while gbest is the global best position.
The updated personal position is calculated by (3).
However, the PSO lacks a global view and fast convergence ability in the later stage of DNN training [36]. To solve the problem of the computational burden of a PSO, we adopt a consensusbased PSO.
The CPSO can locate optimal convergence regions in the search space that contains highquality LOSs by exchanging information with the personal best position and the global best position [36, 37]. As shown in Fig. 2, all particles will reach a consensus state by converging into one or more regions.
We use minibatch KMeans [38] to cluster all the particles into several groups at each fixed interval. The minibatch kmeans method can reduce the calculation order of magnitude and has a better clustering performance in highdimensional optimization problems. The following is the stopping criterion of the consensusbased PSO:

In the subsequent 5 generations of CPSO, the members of particle groups did not change.
Numerical studies indicate that all particles have good global search ability in the early stage. With the exchange of information among all particles, the global search ability decreases gradually, while the local search ability increases. The PSO algorithm has global optimal particles and more diversity in the consensus state, with a lower computational cost. Thus, we select representative particles in each particle group as the initial point for the next stage of CPSOTJUTT.
CPSOTJUTT stage II: robust convergence
When stage I is completed, the methodology enters stage II, which is the robust convergence stage, as shown in Fig. 3. At this stage, we use the representative particles selected in the previous stage as the initial points \({w}_{0}\) and use the TJU methodology for robust convergence. The TJU methodology has fast and robust convergence during the early phase but slows down in the late phase. Therefore, the local solver (such as SGD, Adam) is used to enhance convergence after the TJU methodology.
TJU constructs a dynamical system based on the DNN such that LOSs of the DNN is mapped into SEPs of the dynamical system. Then, by starting from the representative initial point selected in the previous stage, the ensuing trajectory will enter the stability region of the SEP. The following is the nonlinear system of (1):
where h is the loss of the DNN, w is the weight, x is the input data, and N is the size of the minibatch.
The key of TJU methodology is to construct an effective dynamical system corresponding to the nonlinear system (4) and solve for solutions of (1) via the dynamic trajectories of the constructed nonlinear dynamical system, which can be described as follows:
where \(\nabla F\left(w\right)\) is the gradient of \(F\left(w\right)\). When \(\sigma =1\), this is the Focal Loss used by [39]. The system fully considers the gradient information and loss information of the deep neural network model, and mainly focuses training on a sparse set of hard examples.
We apply a technique called the pseudotransient continuation method (PTC) to realize fast calculation of the steadystate solution. This method can be explained as follows:
where w is the weight value, I is the unit matrix, d is the time step, and D is the Jacobian of dynamical system (5).
The training speed can be accelerated with the correction of the dynamical system search direction.
The PTC methodology can reliably compute a smallscale deep neural network model. To improve the scalability of the TJU methodology and train a largescale DNN model, the BlockDiagonalPseudoTransientContinuation (BDPTC) method is proposed to find the search direction D (D is a 1 × n matrix) [40]:
where \({d}_{t}=({\mathrm{I}/\updelta {D}_{t})}^{1}\dot{w}\) is the corrected update direction. The \({s}_{t}={D}_{t1}^{1}{\dot{w}}_{t}\) and \({y}_{t}{=\dot{w}}_{t+1}{\dot{w}}_{t}\), \({\theta }_{t}\in {R}^{n}\) denote the parameter to be optimized, \({\dot{w}}_{t}\in {R}^{n}\) is the dynamical system at \({\theta }_{t}\), \({\eta }_{t}\) denotes the step size, and tr denotes the trace operator. \({Diag(s}_{t}^{2})\) is the diagonal matrix with diagonal elements from the vector \({s}_{t}\).
Take the last (Nth) iteration of the BDPTC methodology as the initial point and apply a local solver to locate a LOS for problem (1).
The following describes the pseudocode of CPSOTJUTT Stage II:
CPSOTJUTT stage III: search optimal
The TRUSTTECH methodology can effectively jump out of the stability region of the SEP found in stage II, enter the stability region of neighboring SEPs, and obtain multiple SEPs in a tierbytier search manner. This stage has a strong theoretical basis [41].
An intuitive description of the TRUSTTECH methodology is shown in Fig. 4, where \({w}_{i,j}\) represents different SEPs, \(i\) represents the number of tiers of SEPs, and j represents the number of SEPs in this tier. The key steps of the TRUSTTECH methodology are detailed as follows:

Step 1 Starting from \({w}_{0}\), step outward in the direction of jumping out of this stability region until the exit point on the stability boundary is reached.

Step 2 Enter the adjacent stability region from the exit point and locate Tier1 of the stability region.

Step 3 Locate multiple SEPs on Tier1 by adjusting the direction of jumping out of \({w}_{0}\).

Step 4 Repeat steps 1–3.
Note that the exit point on the stability boundary refers to the point where the loss value changes from ascending gradually to descending steadily, indicating that the trajectory has entered the stability region of another SEP. There is a nonempty intersection point set between the stability boundaries of each tier of SEPs, i.e., the exit point set.
Next, we will describe in detail how the trajectory moves during TRUSTTECH methodology. First, we use a local solver to get a first (Tier0) SEP \({w}_{0}\). Then we define a trainable search direction \({g}_{i}\), and the parameter vector \({w}_{i}\) can be updated by the following equation:
where \({\rho }_{1}\left(i\right)\in \left(0, {\rho }_{max}\right)\) is a learning rate away from \({w}_{0}\), increasing from 0 to \({\rho }_{max}\).
When \({g}_{i}\) is fixed, the search direction of the parameter vector \({w}_{i}\) is fixed. However, in the face of highdimensional largescale models, the probability of finding the exit point along the fixed search direction is very low. Therefore, we adjust the direction \({g}_{i}\) by the following gradient descent equation:
where \({\rho }_{2}\) is the learning rate for the adjustment, and \({\nabla }_{{g}_{i}}F\left({w}_{i}\right)\) is the gradient of the loss function \(F\left(w, x\right)\) w.r.t. \({g}_{i}\).
When the exit point is found or \({\rho }_{1}\) increases to \({\rho }_{max}\), the trajectory stops moving and the local solver is called up again to find a new (Tier1) SEP from the exit point.
Theoretical basis
The stability region
The solution of the deep neural network (1) starts from \({w}_{e}\in {R}^{N}\) at \(t=0\) is called a trajectory and denoted as \(\phi \left(\cdot ,{w}_{0}\right)\). Define \({w}_{e}\in {R}^{N}\) as equilibrium point of (1) if \(\dot{w}{}_{{w}_{e}}=0\). An equilibrium point is a degenerate trajectory. For every \(\epsilon >0\), there is a \(\delta >0\) such that \(\Vert {w}_{0}{w}_{e}\Vert <\delta \) implies \(\Vert \phi \left(t,{w}_{0}\right){w}_{0}\Vert <\epsilon \), \(t>0\), then \({w}_{e}\) is stable. \(A\left({w}_{e}\right)\) is defined as the stability region of SEP \({w}_{e}\), and in this region, all trajectory converges to the \({w}_{e}\).
If the real part of the eigenvalue of the Jacobian matrix \(\nabla F\left({w}_{e}\right)\) is not 0, then the equilibrium point \({w}_{e}\) is termed hyperbolic [42]. Furthermore, the real parts of the eigenvalue of \(\nabla F\left({w}_{e}\right)\) have exactly k positive, and \({w}_{e}\) is a typek hyperbolic equilibrium point. A typek equilibrium point is unstable for all \(k \ge 1\). Given a typek equilibrium point \({w}_{e}\), its stable manifold \({W}^{s}\) and unstable manifold \({W}^{u}\) are defined by:
Observe that \({W}^{s}\left({w}_{e}\right)=A\left({w}_{e}\right)\) when \({w}_{e}\) is a type0 equilibrium point.
A comprehensive theoretical work in characterizing the stability region and the stability boundary has been developed [42,43,44,45]. If the quotient gradient system (5) satisfies the following assumptions, then its stability boundary can be fully characterized.

A1) All the equilibrium points on the stability boundary are hyperbolic.

A2) The stable and unstable manifolds of equilibrium points on the stability boundary satisfy the transversality condition.

A3) Every trajectory on the stability boundary approaches one of the equilibrium points as \(t\to \infty \).
Remark: Assumption A1) is a general property of quotient gradient system (3) and may be verified for a specific system by directly computing the eigenvalues of the corresponding Jacobian matrix of the vector field. Assumption A2) is also a general property, but it is difficult to be check. Although assumption A3) is not a general property, it can be checked in many systems using the Vfunction or direct analysis.
Theorem 1 (Characterization of the Stability Boundary) [42]:
Consider a nonlinear dynamical system (5) that satisfies assumptions A1) and A3). Let \({w}_{i}^{e}\), \(i\ge 1\) be the equilibrium points on the stability boundary \(\partial A\) of a SEP, say \({w}_{s}\).
Then, the stability boundary is completely characterized as follows:
This theorem asserts that the stability boundary of a class of nonlinear dynamical systems satisfying assumptions A1) and A3) can be completely characterized and it equals the union of the stable manifolds of the equilibrium points on the stability boundary.
We note that in solving problem (1), the following sequence of unconstrained optimization problems are solved instead [46, 47]:
We define the following nonlinear dynamical system:
Two important properties of system (15) are to be explored in the following to compute multiple LOSs of the general nonlinear optimization problem (1). These two properties are examined as follows.
Complete stability
Theorem 2 (Complete Stability) [43, Section IV]:
Every trajectory of quotient gradient system (15) converges and all converge to an equilibrium point. In addition, almost every trajectory converges to a SEP of (15).
This theorem states that every trajectory converges to an equilibrium point, indicating that the system behavior is simple and does not allow complex trajectory behavior [45]. The trajectory must converge to an equilibrium point of (15) from an initial point.
Furthermore, every trajectory converges to SEPs except for the trajectory glow on the stable boundary, which converges to an unstable equilibrium point. In addition, we also need to prove that the trajectory of (15) converging to a SEP is equivalent to solving a LOS for problem (1). The next section determines this through the equivalence relationship between the LOSs of (1) and the SEPs of (15).
Equivalence relations
Theorem 3 (Equivalence Relations):
Consider the nonlinear optimization problem (1), which corresponds to the nonlinear dynamical system (15) and satisfies assumptions A1) and A2). Then, the following properties hold.

1)
If \({w}^{*}\) is a local optimal solution of (1), then \({w}^{*}\) is a stable equilibrium point of system (15).

2)
If \({w}^{*}\) is a stable equilibrium point of (15), \({w}^{*}\) is a local optimal solution of (1).
Proof: 1)
Given the \({w}^{*}\) as a LOS, clearly \({\nabla }_{W}F\left({w}^{*}, x\right)=0\), by (14). For system (15):
Since \({w}^{*}\) is a local optimal solution, there exists a vector d \({d}^{T}{\nabla }_{ww}^{2}F\left({w}^{*}, x\right)d>0\) [48]. It needs to be proven that \({w}^{*}\) is a hyperbolic SEP. Consider the Jacobian of \({\nabla }_{w}F\) in (15):
Next, we show that the quadratic form \(J\left(w, x\right)\dot{=}{w}^{T} J\left({w}^{*}, x\right), w>0\), for \(\forall \left(w, x\right)\ne 0\). Let
Clearly, \(J\left(w, x\right)= P\left(w, x\right)\).
By the aboveverified claim, the quadratic form \(J\left(w, x\right)={w}^{T}J\left({w}^{*}, x\right)w=P\left(w, x\right)\), which shows all \(J\left({w}^{*}, x\right)=P\left(w, x\right)>0\) for all \(w\ne 0\). For \(J\left({w}^{*}, x\right)\) is symmetric, \(J\left({w}^{*}, x\right)\) is a positive definite square matrix, and the characteristic values of \(J\left({w}^{*}, x\right)\) are all positive real numbers. Therefore, \({w}^{*}\) is a type0 and the hyperbolic equilibrium point of (15), due to \({\nabla }_{w}\left({\nabla }_{w}F\right)=J\left({w}^{*}, x\right)\), when \(w={w}^{*}\). So, 1) is proved.
Proof: 2)
First, we claim that \({w}^{*}\) is a feasible point of the problem (1), \({w}^{*}\in S\). Given the SEP \({w}^{*}\) of (15), thus \({\nabla }_{w}F\left({w}^{*}, x\right)=0\). Then, \({w}^{*}\) is a LOS of (1), such that \({w}^{*}\) is a feasible point of (1).
\({w}^{*}\) is a (hyperbolic) SEP of (15). Therefore, \({w}^{*}\) is an isolated local minimum point of \(F\left(w, x\right)\) [43]. Then, define \(\Omega \subseteq {R}^{N+M}\) as a neighborhood of \({w}^{*}\), such that \(F\left({w}^{*}, x\right)<F\left(w, x\right)\), for \(\forall \left(w, x\right)\subseteq \Omega \) with \(w\ne {w}^{*}\). A neighborhood \({U}_{{w}^{*}}\subseteq {R}^{N+M}\) of \({w}^{*}\) exists such that:
Hence, there exists a neighborhood \({U}_{{w}^{*}}^{\pi }\subseteq {U}_{{w}^{*}}\) of \({w}^{*}\). Or, let \(\left({w}^{\pi }, x\right)\in {U}_{{w}^{*}}^{\pi }\cap S\subseteq \pi \). Then,
where \({w}^{\pi }\in {U}_{{w}^{*}}^{\pi }\cap S\), \({w}^{\pi }\ne {w}^{*}\). Therefore \({w}^{*}\) is a local optimal solution for problem (1). Thus, assertion 2) is proved.
So far, the equivalence relationship between the LOSs of (1) and the SEPs of (15) has been proved. This is the key to ensuring the effectiveness of the CPSOTJUTT methodology.
CPSOTJUTTbased ensemble model
This paper develops a threelayer ensemble model of CPSOTJUTTEM for automatically designing and training DNN architectures for power line inspection, as shown in Fig. 1. The pseudocode of CPSOTJUTTEM is shown in Algorithm 3.
In this paper, the CPSOTJUTTEM is developed to enhance the performance of automatically designed DNN architectures in two aspects:

1)
Enhance the robustness of the DNN training method by applying the CPSOTJUTT to efficiently build multiple highquality classification engines with different DNN architectures.

2)
Improve the generalization ability through the ensemble model by applying the CPSOTJUTT methodology to build an effective ensemble of multiple members to achieve a higher accuracy and generalization ability in power line inspection.
Bottom layer: design the DNN architecture
In this layer, the genetic algorithm (GA) is used to design highquality DNN architectures stably and quickly. Similar constructive strategies of DNNs have been widely studied and achieved satisfactory results [21]. We provide binary code representation of a DNN architecture for the GA method and automatically designed highquality DNN architectures to serve as the fundamental DNN for the subsequent stages. The binary code representation of the DNN architecture is shown in Fig. 5.
The encoding area begins with the second convolutional layer and represents the connection between the current and previous convolutional layers: 1 indicates that there is a connection, while 0 indicates that there is no connection. There is a fixed input node and an output node in each stage. Besides the convolutional layer, there are also batch normalization and ReLU, which are proven to play a positive role in DNN training. Fully connected parts are preset.
The model designed through automatic DNN architecture algorithm usually needs to be further customized according to the requirements of specific tasks. For classification model, it is usually necessary to add a fully connected layer at the end of the network, followed by a SoftMax layer, in order to output classification tasks. The fully connected layer is responsible for converting the feature maps extracted by the convolutional layer into the final classification results.
For object detection model, a common method is to use region proposal network (RPN) to generate candidate object regions, and then require a region of interest (RoI) pooling layer to extract fixed size feature representations from region proposals of different sizes for input into classification and regression. On the basis of the automatic DNN architecture design algorithm, the RPN layer can be added at an appropriate location to achieve object detection.
However, the method of DNN training is very sensitive to the initial points, so the true capacity of the designed DNN architecture is difficult to verify by a single training. The CPSOTJUTT methodology proposed in this paper can quickly and robustly train and select the highquality DNN architecture designed by the GA algorithm.
Middle layer: build diverse optimal DNN classification engines
In this layer, based on the optimal DNN architecture designed from the bottom layer and the corresponding initial guess w*, the CPSOTJUTT methodology proposed in this paper explores a set of diverse optimal DNN classification engines:
where w ∗ is the initial guess of the consensus state reached by CPSO in the bottom layer.
Stage II of the CPSOTJUTT methodology proposed in this paper quickly and robustly converges to the SEP (Tier0) from w*. Then we use the TRUSTTECH methodology to jump out of the current region, enter the stability region of neighboring SEPs, and obtain multiple SEPs in a tierbytier search manner.
We apply the CPSOTJUTT methodology to train these DNN architectures to obtain highquality DNN classification engines in the middle layer.
Top layer: the DNNbased ensemble model
In this layer, diverse highquality DNN classification engines from the middle layer are used as the hidden nodes of the ensemble model, and the CPSOTJUTT methodology is used for training to find the weight \(\sigma \). The final output of the ensemble model is as follows:
where \(x\) is the input data, \({t}_{ic}\) is the onehot value of the class, \({\sigma }_{jc}\) is the weight of the cth classification of the jth subDNN, and \({p}_{jc}\) is the probability that sample i belongs to class c by the jth subDNN, N is the size of MiniBatch, C is the number of classes, and M is the number of subDNN for the ensemble model.
The structure diagram of the classification ensemble model is shown in Fig. 6. In the figure, F is different feature extractor automatically designed by the automatic architecture algorithm, FC is the fully connected layer and SM is the SoftMax layer.
The ensemble function of the object detection ensemble model adopts the weighted bounding box fusion method. Assuming that the data of the bounding box is stored in set \(B\). \({B}_{c}\) contains a bounding box labeled \(c\). \({F}_{c}\) is the ensemble result of bounding boxes in \({B}_{c}\), represented as \(\left(s, {x}_{tl}, {x}_{br}, {y}_{tl},{y}_{br}\right)\). When the bounding box of the \(jth\) subDNN is added to \({B}_{c}\), the confidence level of the ensemble bounding box \({F}_{c}\) is recalculated as:
where, N represents the total number of bounding boxes contained in \({B}_{c}\), and \({A}_{k}\) is one of the bounding boxes, namely \(\left(s, {x}_{tl}, {x}_{br}, {y}_{tl},{y}_{br}\right)\).
The coordinates of the ensemble bounding box are updated as follows:
where, \({s}_{2}={A}_{k}\left(s\right)\) is the confidence value of \({A}_{k}\).
The structural diagram of the object detection ensemble model is shown in Fig. 7.
As shown in Fig. 7, F is the backbone network automatically designed by the automatic architecture algorithm, RPN is the Region Proposal Network (RPN), RoI is region of interest, N is the network block composed of convolutional layers, C is classification prediction, and B is boundary box prediction.
In this layer, the ensemble model composed of diverse highquality DNN model achieves higher performance than single model in the middle layer. The ensemble model is an optimal combination of diverse highquality DNN model, which can fully exploit the advantages of each subDNN, and further improve the overall accuracy and generalization capability.
The execution process of the CPSOTJUTTEM is relatively complex. In order to clearly display the execution status of each algorithm, we have created a flowchart, as shown in Fig. 8.
Experiment
With the increasing dependence of society on electricity, ensuring the continuous power supply of the power system has become an important component of power supply guarantee. Power system inspection is a key measure to ensure the stable and safe operation of the power system [49,50,51]. With the rapid development of information technology, new technologies such as unmanned aerial vehicle (UAV) inspection and robot inspection have gradually replaced traditional manual inspection method [52], bringing new opportunities for power system inspection. At the same time, deep learning (DL) has made rapid progress in computer vision technology, especially in the fields of power line object detection and defect recognition, where computer vision technology has been widely applied [53]. The application of deep learning technologies provides more efficient and accurate methods for power system inspection, further improving the safety and reliability of power system operation [54, 55]. The examples of the power system inspection object are shown in Fig. 9.
In this paper, three selfmade power system inspection datasets are independently developed for the three key areas of power system inspection, these datasets still meet the requirements for classification model testing and object detection model testing:
Power Line Insulator Inspection Dataset: Power line insulator inspection dataset (PLIID) made in this study consists of 60,000 color images of insulator defects, including 4 classes of defects: ceramic insulator edge loss, ceramic insulator middle loss, glass insulator edge loss, glass insulator middle loss.
Power System Substation Inspection Dataset: Power system substation inspection dataset (PSSID) consists of images of internal equipment defects in power system substations, including 9 classes of color images: suspended matter, component oil contamination, bird nests, ground oil pollution, metal corrosion, meter inspection, oil seal damage, silicone discoloration, dial blurriness.
Power line obstacle inspection dataset: Power line obstacle inspection dataset (PLOID) is composed of images of obstacles along the power line, which contains 10 classes of color images: forklift, crane, wire foreign object, tipper, wildfires, smog, Cement pump trucks, tower cranes excavator and other construction machinery.
To evaluate the effectiveness of the proposed framework, we also conduct numerical experiments on public datasets CIFAR10 and CIFAR100, and discuss the proposed results.
Public dataset and server configuration
MNIST: The handwritten dataset, commonly known as MNIST, is a fundamental benchmark dataset extensively utilized in the field of machine learning and computer vision. It comprises a collection of grayscale images, each representing a handwritten digit from 0 to 9. With a total of 70,000 examples, it is divided into a training set of 60,000 images and a testing set of 10,000 images.
CIFAR: CIFAR is a picture classification dataset that includes CIFAR10 and CIFAR100. The CIFAR10 dataset contains 60 k 32 × 32 color images divided into 10 classes, each with 6 k images. The CIFAR100 dataset contains 60 k 32 × 32 color images divided into 100 classes, each with 600 images.
Server configuration: We use four servers to evaluate the model proposed in this paper: Intel CPU Core (2.67 GHz) and eight GeForce RTX 2082TI GPUs. The software framework is Pytorch, and the operating system is Linux Ubuntu 18.04. We also use python and its CPU plugin for a test.
Convergence verification of CPSOTJUTT methodology
We train many initial points and use the filternormalized random direction method to draw the landscape around the LOSs, as shown in Fig. 10. To see convergence regions, the results are shown as contour plots rather than surface plots.
In the experimental setup to understand the influence of initial points on convergence, we carefully adjusted the SGD method to train the lenet5 network with a random initial point. The dataset is the handwritten dataset MNIST, and the experiment was repeated 100 times, as shown in Fig. 11.
There are obvious boundaries according to different test accuracies of a DNN trained by the local solver and CPSOTJUTT methodology, as shown in Fig. 11. We have statistics on the number of convergence regions with various test accuracies, and the time used for SGD is calculated at 20 epochs, and CPSOTJUTT is calculated at 20 epochs after reaching or over consensus, as shown in Table 1.
Figure 11 and Table 1 show that when SGD is applied to train the DNN, 18% of the initial points converge to the optimal convergence region, and each convergence region has obvious boundaries. The CPSOTJUTT methodology has better global convergence ability, and 83% of the initial points converge to the optimal convergence region. The CPSOTJUTT methodology with an over consensus state did not achieve better results while increasing the time cost.
Test results of the CPSOTJUTTEM on the CIFAR
To evaluate the effectiveness of our proposed CPSOTJUTTEM model, we first tested the automatic architecture design algorithm. The experiment aims to verify the performance of the proposed automatic architecture design algorithm in image classification tasks.
We selected some the peer competitors and divided them into three different categories. The first category covers the most advanced manually designed architectures (MD) DNN models, including ResNet [3], DenseNet [56], VGG [57]. Specifically, we used two different versions of ResNet in the experiment, namely ResNet101 and ResNet1202. The second category includes DNN architecture design algorithms with semiautomatic (SM) methods, such as Genetic CNN [20], Hierarchical Evolution [58], EAS [59], and Block QNNS [60]. The third category covers methods for fully automated (FA) design methods, including largescale evolution [18], CGPCNN [61], NAS [59], and AECNN [21]. The experiment selected two widely used image classification benchmark datasets, namely CIFAR10 and CIFAR100.
To maintain fair comparison, we followed the parameter settings commonly used by the peer competitors. The population size and the number of generations are all set to 20, and the probabilities of crossover and mutation are set to 0.9 and 0.2, respectively. In addition, we set the parameters of the SGD optimizer, including momentum 0.9, learning rate 0.01 and d the learning rate is decayed by a factor of 0.0005, according to the conventions of competitors.
In this article, to evaluate the computational complexity, the indicator “GPU days” was used. The calculation method for GPU days is obtained by multiplying the number of GPU cards used by the number of days executed to find the optimal architecture. We refer to the optimal model generated by the automatic architecture algorithm as CPGADNN. According to the experimental results, the performance of the proposed CPGADNN algorithm is superior to manually designed stateoftheart CNN models on the CIFAR10 and CIFAR100 datasets. The results are shown in Table 2, CIFAR10 and CIFAR100 represent the test error of the model on this dataset, unit: %.
Specifically, on the CIFAR10 and CIFAR100, compared to manually designed architectures, the CPGADNN exhibits lower test errors. Although the parameter size of CPGADNN on CIFAR10 is relatively large compared to ResNet101 and DenseNet40, the increase in computational complexity is not significant for existing hardware devices. Compared to semiautomatic competitors, CPGADNN exhibits superior performance on CIFAR10 and CIFAR100, surpassing algorithms such as Genetic CNN, EAS, and Block QGSS. Although Hierarchical Evolution slightly leads CPGADNN on CIFAR10, CPGADNN only consumes 1/14 of the GPU days required for Hierarchical Evolution. In fully automated competitors, CPGADNN performs best on the CIFAR10 and CIFAR100 datasets, with better test error, number of parameters, and GPU days than other methods, including Largescale Evolution, CGPCNN, NAS, and AECNN. On the CIFAR10 and CIFAR100, the test errors of CPGADNN were 3.67% and 16.55%, respectively. These results demonstrate the superiority and efficiency of our proposed automatic architecture design algorithm in designing DNN architectures, providing a more reliable and efficient automation method for solving image classification problems.
The above experiments show that the capability of a DNN architecture may not be fully exhibited given that local solvers may converge to bad LOSs, and thus highquality DNN architectures are missed. We evaluate the robustness of CPSOTJUTT in automated DNN architecture design. And recorded the best network architecture (BNA), as shown in Table 3.
Table 3 demonstrates that the proposed CPSOTJUTT can design higherquality DNN architecture faster than the local solver. To explain more clearly each stage of the CPSOTJUTT, we design the following experiment:

Step 1: Converge quickly and robustly to the SEP (Tier0) using stage II of the CPSOTJUTT methodology.

Step 2: Search for exit points on 10 paths, starting from the Tier0 using the TRUSTTECH methodology.

Step 3: Converge quickly and robustly to the SEP (Tier1) from each exit point using stage II of the CPSOTJUTT methodology.
The experimental results of the CPSOTJUTT are given in Table 4. The DNN architecture is the BNA selected in the first layer of the CPSOTJUTTEM.
Table 4 shows 10 search paths of Tier1 by the CPSOTJUTT. The results show that a LOS better than Tier0 can be obtained on each search path, and the best one (Tier1 (#1)) reduces the test error by 0.72% compared to Tier0 (ep 100). Tier0 is trained for 200 epochs, and the majority of the Tier1 train error and test error are lower than for Tier0 (ep 200). The findings suggest that searching for greater quality LOSs at Tier0 with additional iterations is challenging.
To expand the search space, we further search Tier2 starting with the best point of Tier1(#1), as shown in Table 5.
As shown in Table 5, all Tier2 solutions outperform Tier1 with limited incremental time cost. The results show that the TRUSTTECH methodology can efficiently explore highquality LOSs in the parameter space.
To evaluate the performance of CPSOTJUTT on a large dataset, the CIFAR100 dataset was used, and the results are given in Table 6, indicating the competitive capability of CPSOTJUTT.
The above experiments show that CPSOTJUTTEM achieves competitive results in testing on the CIFAR dataset. Further evaluations of the performance of CPSOTJUTTEM are given in later sections.
Testing results of the CPSOTJUTT on imbalanced PLOID dataset
We evaluate the ability of the CPSOTJUTT methodology to effectively handle imbalanced datasets in a realworld application for dronebased visual inspection of electric power transmission line corridors. Table 7 shows the results of a statistical analysis of the PLOID, which has a huge imbalance with the proportion of the largest and smallest classes being 38% and 2%, respectively. The imbalance of the PLOID dataset makes the classification task a big challenge.
As shown in Tables 8 and 9, we compared CPSOTJUTT with the most used SGD method: The weight decay and momentum of SGD are fixed as 0.0001 and 0.8, respectively.
Tables 8 and 9 shows the additional research on a local solver (SGD) and CPSOTJUTT in a different model. The experimental data show that CPSOTJUTT can quickly converge to a highquality LOS. In particular, the performance of the CPSOTJUTT is better, and the minimum error rate of the CPSOTJUTT is only 4.03%, which is 2.21% lower than that of the local solver for BNA2. In order to test the optimization performance of CPSOTJUTT under different types of models, we chose ResNet50 to test the test error on imbalanced datasets PLOID, as shown in Table 10.
Table 10 well demonstrates that the overall error rate of the CPSOTJUTT is lower than that of the local solver. The test error of CPSOTJUTT has a significant decline in the hard examples. In particular, the error rate of smog is 13.6% lower than the local solver. In general, CPSOTJUTT performs better performance in training DNN, especially in the case of an imbalanced dataset, which can greatly improve the accuracy of hard examples.
Performance test of classification models on three power system datasets
The proposed CPSOTJUTTEM designs diverse highquality DNN architectures and corresponding weights in the CIFAR and PILD datasets in the previous sections. In this section, we apply the CPSOTJUTT methodology to train the ensemble model of the above various DNN architectures, as shown in Table 11, the table shows the test error of the model on the corresponding dataset, unit: %.
We set up a comparative experiment using the DEnsVGG19 ensemble model and the proposed CPSOTJUTTEM model. The experimental results show that the CPSOTJUTT methodology achieves a lower test error than the local solver in the training of DEnsVGG19, as well as a lower test error than the DEnsVGG19 model. The CPSOTJUTTEM achieved lower test error in classification testing on both public datasets and three selfmade power system inspection datasets.
Performance testing of object detection models on three power system datasets
In this section, we conduct performance tests on object detection model based on three selfmade power system datasets: PSIID, PSSID, and PLOID. We chose classic object detection models, including YOLO v5, Faster RCNN, Faster FPN, and Cascade RCNN as benchmarks for comparison. We used the CPSOTJUTTEM algorithm to generate an ensemble object detection model and selected the single model with the best accuracy from it, which is denoted as CPGADNN. The optimization of the model adopts the CPSOTIUTT threestage optimization algorithm proposed in this article. The experimental results are shown in Table 12.
From Table 12, it can be observed that the CPGADNN outperforms models such as YOLOv5, Faster RCNN, and Faster FPN in object detection accuracy with only 20–22 M parameters. Although there is a slight difference compared to the Cascade RCNN, the parameter quantity of CPGADNN is much lower than that of Cascade RCNN. The accuracy of the ensemble model generated by CPSOTJUTTEM has significantly improved compared to Cascade RCNN and single model CPGADNN.
Conclusion
In this paper, a novel threelayer ensemble model (CPSOTJUTTEM) for power line inspection is developed, which can automatically design DNN architectures quickly and stably without any DNN expertise. In the CIFAR and PLOID datasets, the test errors are reduced by 4.18%, 12.06%, and 12.6%, respectively, especially for hard examples in the PLOID dataset. The CPSOTJUTT methodology proposed in this paper has a strong global convergence ability: 83% of the initial points converge to the optimal convergence region, thereby improving the stability by 65%. The ensemble classification model and ensemble object detection model automatically generated by CPSOTJUTTEM have achieved good results in PSIID, PSSID, and PLOID, indicating that the CPSOTJUTTEM threelayer model can achieve high inspection accuracy in power system inspections.
In conclusion, the CPSOTJUTTEM proposed in this paper can automatically design a highquality ensemble model of DNN for power system inspection.
Availability of data and materials
No new data were generated or analyzed in support of this research.
References
Yang R, Zha X, Liu K, Xu S. A CNN model embedded with local feature knowledge and its application to timevarying signal classification. Neural Netw. 2021;142:564–72.
Chen T, Wang N, Wang R, Zhao H, Zhang G. Onestage CNN detectorbased benthonic organisms detection with limited training dataset. Neural Netw. 2021;144:247–59.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
Tan M, Le Q. Efficientnet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning. PMLR, 2019.
Sun Y, Xue B, Zhang M, Yen GG, Lv J. Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans Cybern. 2020;50(99):1–15.
Stanley KO, Clune J, Lehman J, Miikkulainen R. Designing neural networks through neuroevolution. Nat Mach Intell. 2019;1(1):24–35.
Zheng Z, Li X. A novel vehicle lateral positioning methodology based on the integrated deep neural network. Expert Syst Appl. 2020;142: 112991.
Ahmed S, Razib M, Alam MS, Alam MS, Huda MN. Ensemble approach for improving generalization ability of neural networks. 2013 International Conference on Informatics, Electronics and Vision (ICIEV). IEEE, 2013.
Ganaie MA, Hu M, Malik AK, Tanveer M. Ensemble deep learning: a review. Eng Appl Artif Intell. 2022;115: 105151.
Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C, Borgs C, Chayes J, Sagun L, Zecchina R. EntropySGD: biasing gradient descent into wide valleys. J Stat Mech Theory Exp. 2019;2019(12): 124018.
Cheridito P, Jentzen A, Rossmannek F. Nonconvergence of stochastic gradient descent in the training of deep neural networks. J Complex. 2021;64: 101540.
Yuan K, Ying B, Sayed AH. On the influence of momentum acceleration on online learning. J Mach Learn Res. 2016;17(1):6602–67.
Arjevani Y, Carmon Y, Duchi JC, Foster DJ. Lower bounds for nonconvex stochastic optimization. Math Program. 2022;199:165.
Wilson AC, Roelofs R, Stern M. The marginal value of adaptive gradient methods in machine learning. Adv Neural Inf Proc Syst. 2017; 30.
Luo L, Xiong Y, Liu Y, Sun X. Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843, 2019.
Stanley KO, Miikkulainen R. Evolving neural networks through augmenting topologies. Evol Comput. 2002;10(2):99–127.
Miikkulainen R, Liang J, Meyerson E, Rawal A, Fink D, Francon O, Raju B, Shahrzad H, Navruzyan A, Duffy N, Hodjat B. Evolving deep neural networks. in Artificial Intelligence in the Age of Neural Networks and Brain Computing. Elsevier, 2019, pp. 293–312.
Real E, Moore S, Selle A, Saxena S, Suematsu YL, Tan J, Le QV, Kurakin A. Largescale evolution of image classifiers. in International Conference on Machine Learning. PMLR, 2017, pp. 2902–2911.
Real E, Aggarwal A, Huang Y, Le QV. Regularized evolution for image classifier architecture search. Proc AAAI Conf Artif Intell. 2019;33(01):4780–9.
Xie L, Yuille A. Genetic CNN. in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1379–1388.
Sun Y, Xue B, Zhang M, Yen GG. Completely automated cnn architecture design based on blocks. IEEE Trans Neural Netw Learn Syst. 2019;31(4):1242–54.
Kumar A, Yin B, Shaikh AM, Ali M, Wei W. CorrNet: pearson correlationbased pruning for efficient convolutional neural networks. Int J Mach Learn Cybern. 2022;13(12):3773–83.
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res. 2011; 12(7).
Tieleman T, Hinton G. Lecture 6.5rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn. 2012;4(2):26–31.
Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
Reddi SJ, Kale S, Kumar S. On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237. 2019.
Yang J, Zeng X, Zhong S, Wu S. Effective neural network ensemble approach for improving generalization performance. IEEE Trans Neural Netw Learn Syst. 2013;24(6):878–87.
Zhang S, Liu M, Yan J. The diversified ensemble neural network. Adv Neural Inf Process Syst. 2020;33:16001–11.
Zhang YF, Chiang HD. Enhanced eliteload: a novel CPSOATT methodology constructing shortterm load forecasting model for industrial applications. IEEE Trans Industr Inf. 2019;16(4):2325–34.
Turkoglu M, Yanikolu B, Hanbay D. Plantdiseasenet: convolutional neural network ensemble for plant disease and pest detection. Signal Image and Video Processing. 2021;(9): 1–9.
Wang Y, Wang J, Gao F, Hu P, Xu L, Zhang J, Yu Y, Xue J, Li J. Detection and recognition for fault insulator based on deep learning. 2018 11th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISPBMEI). IEEE, 2018.
Dai G, Yuan Y, Huang W, Liu Q, Ju C. Unattended substation inspection algorithm based on improved YOLOv5. 2022 IEEE International Conference on Realtime Computing and Robotics (RCAR). IEEE, 2022.
Zhang W, Liu X, Yuan J, Xu L, Sun H, Zhou J. RCNNbased foreign object detection for securing power transmission lines (RCNN4SPTL). Procedia Comput Sci. 2019;147:331–7.
Zhang J, Zhao Y, Shone F, Li Z, Frangi AF, Xie SQ, Zhang ZQ. Physicsinformed deep learning for musculoskeletal modeling: predicting muscle forces and joint kinematics from surface EMG. IEEE Trans Neural Syst Rehabil Eng. 2022;31:484–93.
Zhang J, Li Y, Xiao W, Zhang Z. Noniterative and fast deep learning: Multilayer extreme learning machines. J Franklin Inst. 2020;357(13):8925–55.
Li S, Tan M, Tsang IW, Kwok JTY. A hybrid PSOBFGS strategy for global optimization of multimodal functions. IEEE Trans Syst Man Cybern Part B (Cybernetics). 2011;41(4):1003–14.
Houssein EH, Gad AG, Hussain K, Suganthan PN. Major advances in particle swarm optimization: theory, analysis, and application. Swarm Evol Comput. 2021;63: 100868.
Sculley D. Webscale kmeans clustering. in Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 1177–1178.
Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980–2988.
Zhu M, Nazareth JL, Wolkowicz H. The quasicauchy relation and diagonal updating. SIAM J Optim. 1999;9(4):1192–204.
Hao Z, Chiang HD, Wang B. Trusttechbased systematic search for multiple local optima in deep neural nets. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–11, 2021.
Chiang HD, Hirsch MW, Wu FF. Stability regions of nonlinear autonomous dynamical systems. IEEE Trans Autom Control. 1988;33(1):16–27.
Chiang HD, Chu CC. A systematic search method for obtaining multiple local optimal solutions of nonlinear programming problems. Circ Syst I Fundamental Theory Appl IEEE Transactions on. 1993;43(2):99–109.
Chiang HD, Alberto LFC. Stability regions of nonlinear dynamical systems: theory, estimation, and applications. Cambridge University Press; 2015.
Deng JJ, Chiang HD, Zhao TQ. Newton method and trajectorybased method for solving power flow problems: nonlinear studies. Int J Bifurcation Chaos. 2015;25(6):591–484.
Pillo GD, Grippo L. A new class of augmented lagrangians in nonlinear programming. SIAM J Control Optim. 2006;17(5):618–28.
Du X, Zhang L, Gao Y. A class of augmented lagrangians for equality constraints in nonlinear programming problems. Appl Math Comput. 2006;172(1):644–63.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proc Syst. 2012; 25.
Wang W, Peng W, Tong L, Tan X, Xin T. Study on sustainable development of power transmission system under ice disaster based on a new security early warning model. J Clean Prod. 2019;228:175–84.
Glavic M. (Deep) Reinforcement learning for electric power system control and related problems: a short review and perspectives. Annu Rev Control. 2019;48:22–35.
Qin X, Su Q, Huang SH. Extended warranty strategies for online shopping supply chain with competing suppliers considering component reliability. J Syst Sci Syst Eng. 2017;26(6):753–73.
Santos T, Moreira M, Almeida J, Dias A, Martins A, Dinis J, Formiga J, Silva E. Plined: Visionbased power lines detection for unmanned aerial vehicles. in 2017 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC). IEEE, 2017, pp. 253–259.
Lan M, Zhang Y, Zhang L, Du B. Defect detection from uav images based on regionbased cnns. in 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018, pp. 385–390.
Wang D, Zhao G, Chen H, Liu Z, Deng L, Li G. Nonlinear tensor train format for deep neural network compression. Neural Netw. 2021;144:320–33.
Aldahdooh A, Hamidouche W, Fezza SA. Adversarial example detection for DNN models: a review and experimental comparison. Artif Intell Rev. 2022;55:4403.
Huang G, Liu Z, Maaten LVD, Weinberger KQ. Densely connected convolutional networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. arXiv preprint arXiv:1409.1556. 2014.
Liu H, Simonyan K, Vinyals O, Fernando C. Hierarchical representations for efficient architecture searc. arXiv preprint arXiv:1711.00436. 2017.
Zoph B, Le QV. Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. 2016.
Zhong Z, Yan J, Wu W, Shao J. Practical blockwise neural network architecture generation. Proceedings of the IEEE Conference on Computer Vision and Pattern recognition. 2018.
Suganuma M, Shirakawa S, Nagao T. A genetic programming approach to designing convolutional neural network architectures. Proceedings of the Genetic and Evolutionary Computation Conference. 2017.
Acknowledgements
I would like to thank Mr. Chiang for his guidance on my research. His guidance and encouragement prompted me to write the whole work and complete it successfully. In the process of writing, he always made me have confidence in myself and guided me to many important publications that were quite helpful.
Funding
Funding information is not available.
Author information
Authors and Affiliations
Contributions
XlL: Conceptualization, Methodology, Software, Formal analysis, Writing—original draft. HDC: Writing—review & editing, Project administration, Supervision.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This article does not involve animal or human experiments, and no ethics approval is required. And written informed consent was obtained from all participants.
Consent for publication
All authors gave their consent for publication.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lv, XL., Chiang, HD. & Dong, N. Automatic DNN architecture design using CPSOTJUTT for power system inspection. J Big Data 10, 150 (2023). https://doi.org/10.1186/s4053702300828y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4053702300828y