Automatic DNN architecture design using CPSOTJUTT for power system inspection

To quickly and accurately automatically design more high-precision deep neural network models (DNNs), this paper proposes an automatic DNN architecture design ensemble model based on consensus particle swarm optimization-assisted trajectory unified and TRUST-TECH (CPSOTJUTT), called CPSOTJUTT-EM. The proposed model is a three-layer model, and its core is a three-stage method for addressing the sensitivity of the local solver to the initial point and enabling fast and robust training DNN, effectively avoiding missing high-quality DNN models in the process of automatic DNN architecture design. CPSOTJUTT has the following advantages: (1) high-quality local optimal solutions (LOSs) and (2) robust convergence against random initialization. CPSOTJUTT-EM consists of the bottom layer: stable and fast design high-quality DNN architectures, the middle layer: exploration for a diverse set of optimal DNN classification engines, and the top layer: ensemble model for higher performance. This paper tests the performance of CPSOTJUTT-EM on public datasets and three self-made power system inspection datasets. Experimental results show that the CPSOTJUTT-EM has excellent performance in automatic DNN architecture design, DNN model optimization. And the CPSOTJUTT-EM can automatically design high-quality DNN ensemble models, laying a solid foundation for the application of DNN in other fields.

In critical and sensitive domains, such as medical diagnosis and autonomous driving, there is an increasing demand for high-precision model.A single model often exhibits different recognition abilities in different categories, which makes it difficult to meet the required accuracy [7].With the advancement of hardware performance, the application of large-scale ensemble models is becoming more widespread.The ensemble model improves the accuracy and generalization ability of the model by utilizing multiple models to comprehensively consider the prediction results through joint learning and decision-making [8].However, designing and optimizing the ensemble model requires in-depth learning of professional knowledge and skills, including model selection and ensemble method design, which affects its application in other fields [9].
This paper combines automatic DNN architecture design technology and ensemble model technology to achieve improved DNN performance.Automatic DNN architecture design technology can automatically search and generate DNN models that are more suitable for specific tasks and domain requirements.Once multiple proposal optimized DNN architectures are obtained, ensemble model technology can synthesize the prediction results of these models to achieve higher accuracy.However, the high redundancy and nonconvexity of the parameters lead to many local optimal solutions (LOSs) for the DNN.Also, both low-quality and high-quality LOSs all have the same local properties [10,11].DNN training is usually realized by the first-order local solver [12].One disadvantage of the local solver is that the gradients in different directions are uniformly scaled and may converge to a bad LOS [13,14], resulting in poor generalization ability or inability to converge [15].For the poor robustness of the training DNN method, the performance of the automatically designed DNN architecture cannot be adequately evaluated.
To solve the above problems, this paper proposes a novel three-layer ensemble model, termed consensus particle swarm optimization-assisted trajectory unified and TRUST-TECH ensemble model (CPSOTJUTT-EM).This model is based on automatic DNN architecture design technology, with CPSOTJUTT algorithm as the core.The bottom layer of CPSOTJUTT-EM achieves stable and fast generation of high-quality DNN architecture.Through this design, the experts of non-deep learning fields can also design the most suitable DNN framework for different domain requirements, effectively promoting the application of deep learning in other fields.The middle layer utilizes a threestage method for high-precision and robust DNN training to obtain candidate optimized DNN models.The top layer achieves higher performance ensemble model by ensemble high-quality DNN models.The ensemble model can fully leverage the performance advantages of high-quality sub-DNN models, improving the overall accuracy and generalization ability of the model.
The main contributions of this paper are: 1) The CPSOTJUTT-EM can robustly and automatically design a high-quality DNN architecture according to the application field without extended expertise in DNNs.
2) The CPSOTJUTT-EM constructs an ensemble model consisting of a diverse set of high-quality classification engines so that the ensemble model takes full advantage of each sub-DNN to maximize recognition accuracy.The generalization ability of the ensemble model is significantly improved.
3) The CPSOTJUTT methodology with a strong theoretical basis can robustly converge to high-quality LOSs from random initial points.4) The CPSOTJUTT methodology, which consists of the consensus-based PSO, TJU methodology, and the TRUST-TECH methodology, fully utilizes the global view of the consensus-based PSO, the robust convergence ability of TJU methodology, and the search ability of the TRUST-TECH methodology for higher quality LOSs.

Original contributions and novelties
The architecture of the proposed CPSOTJUTT-EM is given in Fig. 1.
Bottom layer: Automatically designs high-quality DNN architectures.The CPSOTJUTT methodology trains these DNN architectures and selects high-quality DNN classification engines.In this layer, the consensus-based PSO is used to solve the sensitivity of training DNN to the initial value, which can converge quickly to the optimal stability region.
Middle layer: Explore a diverse set of high-quality DNN classification engines via the CPSOTJUTT methodology.The CPSOTJUTT methodology can robustly converge to the LOS and search for better ones nearby while maintaining its global search ability.
Top layer: Take the high-quality classification engines in the middle layer as a hidden node of the ensemble model, and apply the CPSOTJUTT methodology to strengthen the training to find the optimal combination of classification engines to further improve the identification accuracy and generalization ability.

Evolutionary neural network
Several methodologies have been proposed to automatically design DNN architecture.NeuroEvolution of Augmenting Topologies (NEAT) [16] represents early progress in the development of small-scale network architecture, which inspires the research of neuroevolution based on DNN.The NEAT model and co-evolution of modules are combined in CoDeepNEAT [17].
The evolutionary algorithm uses an intuitive mutation operator to add layer structure, which makes the framework complex for a small network [18].It has made remarkable achievements in the CIFAR dataset.AmoebaNet-A improves the tournament selection evolutionary algorithm by adding an age property that favors the younger genotypes and surpasses hand designs for the first time [19].The Genetic CNN algorithm [20] is a neuroevolutionary algorithm that optimizes connections between convolutional layers using mutation and cross-evolution.The algorithm can meet the design requirements of the DNN model in some fields to some extent.CNN-GA [21] effectively addresses the image classification tasks by designing a new encoding strategy for the GA to encode arbitrary depths of CNNs.Auto-evolutionary CNN (AE-CNN) [5] provides effective local search and global search ability through a crossover operator and a mutation operator and can design high-quality DNN architectures in the case of limited computing resources.CorrNet is a novel correlationbased pruning (CFP) approach, which creates a feature selection scheme to obtain the pruning approaches.This approach achieves accuracy gain while saving a significant amount of computational costs [22].

Deep neural network training methods
The existing DNN training methods mainly adopt the first-order gradient method and its variants.The optimization algorithm based on a first-order gradient has linear efficiency in time and memory complexity and has achieved great success.Momentum Stochastic Gradient Descent (SGD) [12] pursues fast and stable convergence and is widely used for its simplicity and intuitiveness.However, the gradients in different directions are scaled uniformly, causing poor convergence when training sparse data.Therefore, the acceleration of SGD has attracted extensive research.Recently, some adaptive first-order optimization methods have been proposed to achieve rapid convergence.Adagrad [23] accelerates DNN training by dynamically adjusting the learning rate based on the gradient.RMSprop [24] is an adaptive first-order optimization method that discards remote gradients by using an exponentially declining average of squared gradients.This method has a much lower computational cost than SGD.Adam [25] combines the advantages of Adagrad and RMSprop, which scale the gradient by the square root of the accumulative square gradient to achieve fast convergence.Adam has become the default optimization algorithm for many DNNs due to its rapid convergence [26].However, due to the sensitivity to initialization and hyperparameters, these optimization methods may converge to sub-optimal local optimal solutions, resulting in worse generalization ability [10].

Deep neural network ensemble
The degree of DNN training has a significant impact on the accuracy and generalization ability of image classification.Insufficient DNN training will result in the low accuracy of hard examples, and overtraining will result in the poor generalization ability of easy examples, especially for the class imbalanced dataset [27,28].Cascade structure improves the recognition accuracy of a DNN, but high model capacity will lead to overfitting.The ensemble model of a DNN is the optimal combination of diverse high-quality sub-DNNs, which can fully exploit of the recognition ability of any sub-DNN [29].Plantdiseasenet uses the majority voting ensemble model to detect plant pests in the early stage of disease, and the results show that the proposed model has reached or exceeded the latest result [30].The ensemble model of a DNN can obtain better accuracy and generalization ability than each sub-DNN [27].

Power system inspection
As an indispensable infrastructure in modern society, the stable operation of the power system is crucial to the development of social economy and the normal conduct of people's lives.In order to ensure the safe and reliable operation of the power system, regular power system inspections are particularly important.
Power line insulator inspection is a regular assessment of the status of insulators on power transmission lines to ensure the stable and safe operation of the power system.Literature [31] proposes a power insulator inspection algorithm based on deep learning to eliminate the impact of complex power environments on detection accuracy.Power system substation inspection can promptly detect potential equipment failures and take maintenance measures to ensure the safe operation of the power grid.Literature [32] proposes a detection algorithm based on improved YOLO v5.A backbone with a unique attention mechanism was designed to extract more accurate feature maps.Solved the pain point of lack of detection accuracy in unmanned substations.Power line obstacle inspection can identify potential risks and take timely measures to ensure the normal operation of transmission lines.Literature [33] proposed an object detection algorithm based on R-CNN to ensure the safety of power lines.
The automatic DNN architecture design using CPSOTJUTT for power system inspection method proposed in this article improves the accuracy and generalization ability of power system inspection.Solved the problems faced by power system inspection, such as multiple inspection scenarios, low accuracy of general single models, and high difficulty in designing specialized models.In the future, we will conduct research on more advanced deep design network models such as deep learning with prior knowledge [34,35] to further improve the performance of the proposed methods.

The CPSOTJUTT methodology
The CPSOTJUTT methodology, which consists of the consensus-based PSO, TJU methodology, and Trust-Tech methodology, can converge robustly to high-quality LOSs from random initial points.The CPSOTJUTT methodology is the core of CPSOTJUTT-EM and fully utilizes the global view of the consensus-based PSO, the robust convergence ability of the TJU methodology, and the search ability of the TRUST-TECH methodology for higher quality LOSs.The pseudocode of the CPSOTJUTT methodology is shown in Algorithm 1.
The architecture of the CPSOTJUTT methodology is as follows: Stage I: Exploration and Consensus: The positions of the DNN are updated by PSO until PSO is terminated when all the particles have reached a consensus.The three optimal particles in the consensus region and the weight center point are also selected as the initial points of the next stage.
Stage II: Robust Convergence: We use TJU methodology and a local solver to robustly converge to a high-quality LOS from the representative particles selected in the previous stage.
Stage III: Search Optimal: TRUST-TECH methodology is applied to effectively jump out of the stability region of the SEP found in stage II, enter the stability region of neighboring SEPs, and obtain multiple high-quality LOSs in a tier-by-tier search manner.
Algorithm1 Pseudocode of the CPSOTJUTT methodology.

1:
Initialize the particle swarm, best position pbest, and the global best position of the swarm gbest.

2: repeat 3:
Update the position P K and velocity V K of the swarms.

5:
If not consensus, then 6: Execute mini-batch k-means clustering the particles.7 End If 8: until the consensus-based PSO is satisfied.

9:
The TJU methodology and local solver are used as the local solver to locate the LOSs.

10:
Search the better LOSs by the TRUST-TECH methodology.

CPSOTJUTT stage I: exploration and consensus
The DNN is a regularized version of a multi-layer perceptron with a multi-layer network structure, and its performance is usually evaluated by.The goal of optimal DNN training is to reduce cross-entropy (CE) loss function as much as possible, or even approach 0 infinitely: where x is the input data, C is the number of classification objects, t ij is the one-hot value of the class, p ij is the probability that sample i belongs to class j , and N is the size of the Mini-Batch.In this stage, the global search ability of PSOs is used to assist the robust convergence of the local solver.To this end, we introduce a consensus-based PSO to locate optimal converge regions in the search space that contain high-quality LOSs.
All particles exchange information with the personal best position and the global best position at each step of the PSO.The update of the particle is the combination of the original position and velocity, which can be described as follows: (1) where ω is the weight of the DNN.C 1 , C 2 are learning factors, respectively.R 1 , R 2 are random numbers distributed between 0 and 1. pbest is the personal best position, while gbest is the global best position.The updated personal position is calculated by (3).
However, the PSO lacks a global view and fast convergence ability in the later stage of DNN training [36].To solve the problem of the computational burden of a PSO, we adopt a consensus-based PSO.
The CPSO can locate optimal convergence regions in the search space that contains high-quality LOSs by exchanging information with the personal best position and the global best position [36,37].As shown in Fig. 2, all particles will reach a consensus state by converging into one or more regions.
We use mini-batch K-Means [38] to cluster all the particles into several groups at each fixed interval.The mini-batch k-means method can reduce the calculation order of magnitude and has a better clustering performance in high-dimensional optimization problems.The following is the stopping criterion of the consensus-based PSO: • In the subsequent 5 generations of CPSO, the members of particle groups did not change.
Numerical studies indicate that all particles have good global search ability in the early stage.With the exchange of information among all particles, the global search ability decreases gradually, while the local search ability increases.The PSO algorithm has global optimal particles and more diversity in the consensus state, with a lower computational cost.Thus, we select representative particles in each particle group as the initial point for the next stage of CPSOTJUTT.

CPSOTJUTT stage II: robust convergence
When stage I is completed, the methodology enters stage II, which is the robust convergence stage, as shown in Fig. 3.At this stage, we use the representative particles selected in the previous stage as the initial points w 0 and use the TJU methodology for robust convergence.The TJU methodology has fast and robust convergence during 2 Process for CPSO to reach a consensus state the early phase but slows down in the late phase.Therefore, the local solver (such as SGD, Adam) is used to enhance convergence after the TJU methodology.
TJU constructs a dynamical system based on the DNN such that LOSs of the DNN is mapped into SEPs of the dynamical system.Then, by starting from the representative initial point selected in the previous stage, the ensuing trajectory will enter the stability region of the SEP.The following is the nonlinear system of ( 1): where h is the loss of the DNN, w is the weight, x is the input data, and N is the size of the mini-batch.
The key of TJU methodology is to construct an effective dynamical system corresponding to the nonlinear system (4) and solve for solutions of (1) via the dynamic trajectories of the constructed nonlinear dynamical system, which can be described as follows: where ∇F (w) is the gradient of F (w) .When σ = 1 , this is the Focal Loss used by [39].The system fully considers the gradient information and loss information of the deep neural network model, and mainly focuses training on a sparse set of hard examples.
We apply a technique called the pseudo-transient continuation method (PTC) to realize fast calculation of the steady-state solution.This method can be explained as follows: where w is the weight value, I is the unit matrix, d is the time step, and D is the Jacobian of dynamical system (5).
The training speed can be accelerated with the correction of the dynamical system search direction.
The PTC methodology can reliably compute a small-scale deep neural network model.To improve the scalability of the TJU methodology and train a large-scale ( 4) DNN model, the Block-Diagonal-Pseudo-Transient-Continuation (BD-PTC) method is proposed to find the search direction D (D is a 1 × n matrix) [40]: where d t = (I/δ − D t ) −1 ẇ is the corrected update direction.The s t = D −1 t−1 ẇt and y t = ẇt+1 − ẇt , θ t ∈ R n denote the parameter to be optimized, ẇt ∈ R n is the dynamical system at θ t , η t denotes the step size, and tr denotes the trace operator.Diag(s 2 t ) is the diagonal matrix with diagonal elements from the vector s t .
Take the last (Nth) iteration of the BD-PTC methodology as the initial point and apply a local solver to locate a LOS for problem (1).
The following describes the pseudocode of CPSOTJUTT Stage II:

CPSOTJUTT stage III: search optimal
The TRUST-TECH methodology can effectively jump out of the stability region of the SEP found in stage II, enter the stability region of neighboring SEPs, and obtain multiple SEPs in a tier-by-tier search manner.This stage has a strong theoretical basis [41].An intuitive description of the TRUST-TECH methodology is shown in Fig. 4, where w i,j represents different SEPs, i represents the number of tiers of SEPs, and j represents the number of SEPs in this tier.The key steps of the TRUST-TECH methodology are detailed as follows: Step 1 Starting from w 0 , step outward in the direction of jumping out of this stability region until the exit point on the stability boundary is reached.
Step 2 Enter the adjacent stability region from the exit point and locate Tier-1 of the stability region.
Step 3 Locate multiple SEPs on Tier-1 by adjusting the direction of jumping out of w 0 .
Note that the exit point on the stability boundary refers to the point where the loss value changes from ascending gradually to descending steadily, indicating that the trajectory has entered the stability region of another SEP.There is a non-empty intersection point set between the stability boundaries of each tier of SEPs, i.e., the exit point set.
Next, we will describe in detail how the trajectory moves during TRUST-TECH methodology.First, we use a local solver to get a first (Tier-0) SEP w 0 .Then we define a trainable search direction g i , and the parameter vector w i can be updated by the following equation: where ρ 1 (i) ∈ (0, ρ max ) is a learning rate away from w 0 , increasing from 0 to ρ max .
When g i is fixed, the search direction of the parameter vector w i is fixed.However, in the face of high-dimensional large-scale models, the probability of finding the exit point along the fixed search direction is very low.Therefore, we adjust the direction g i by the following gradient descent equation: where ρ 2 is the learning rate for the adjustment, and ∇ g i F (w i ) is the gradient of the loss function F (w, x) w.r.t.g i .
When the exit point is found or ρ 1 increases to ρ max , the trajectory stops moving and the local solver is called up again to find a new (Tier-1) SEP from the exit point.(9)

The stability region
The solution of the deep neural network (1) starts from w e ∈ R N at t = 0 is called a tra- jectory and denoted as φ(•, w 0 ) .Define w e ∈ R N as equilibrium point of (1) if ẇ| w e = 0 .An equilibrium point is a degenerate trajectory.For every ǫ > 0 , there is a δ > 0 such that �w 0 − w e � < δ implies �φ(t, w 0 ) − w 0 � < ǫ , t > 0 , then w e is stable.A(w e ) is defined as the stability region of SEP w e , and in this region, all trajectory converges to the w e .
If the real part of the eigenvalue of the Jacobian matrix ∇F (w e ) is not 0, then the equi- librium point w e is termed hyperbolic [42].Furthermore, the real parts of the eigenvalue of ∇F (w e ) have exactly k positive, and w e is a type-k hyperbolic equilibrium point.A type-k equilibrium point is unstable for all k ≥ 1 .Given a type-k equilibrium point w e , its stable manifold W s and unstable manifold W u are defined by: Observe that W s (w e ) = A(w e ) when w e is a type-0 equilibrium point.A comprehensive theoretical work in characterizing the stability region and the stability boundary has been developed [42][43][44][45].If the quotient gradient system (5) satisfies the following assumptions, then its stability boundary can be fully characterized.A1) All the equilibrium points on the stability boundary are hyperbolic.A2) The stable and unstable manifolds of equilibrium points on the stability boundary satisfy the transversality condition.A3) Every trajectory on the stability boundary approaches one of the equilibrium points as t → ∞.
Remark: Assumption A1) is a general property of quotient gradient system (3) and may be verified for a specific system by directly computing the eigenvalues of the corresponding Jacobian matrix of the vector field.Assumption A2) is also a general property, but it is difficult to be check.Although assumption A3) is not a general property, it can be checked in many systems using the V-function or direct analysis.
Theorem 1 (Characterization of the Stability Boundary) [42]: Consider a nonlinear dynamical system (5) that satisfies assumptions A1) and A3).Let w e i , i ≥ 1 be the equilib- rium points on the stability boundary ∂A of a SEP, say w s .
Then, the stability boundary is completely characterized as follows: This theorem asserts that the stability boundary of a class of nonlinear dynamical systems satisfying assumptions A1) and A3) can be completely characterized and it equals the union of the stable manifolds of the equilibrium points on the stability boundary.
We note that in solving problem (1), the following sequence of unconstrained optimization problems are solved instead [46,47]: We define the following nonlinear dynamical system: Two important properties of system (15) are to be explored in the following to compute multiple LOSs of the general nonlinear optimization problem (1).These two properties are examined as follows.

Complete stability
Theorem 2 (Complete Stability) [43, Section IV]: Every trajectory of quotient gradient system (15) converges and all converge to an equilibrium point.In addition, almost every trajectory converges to a SEP of (15).This theorem states that every trajectory converges to an equilibrium point, indicating that the system behavior is simple and does not allow complex trajectory behavior [45].The trajectory must converge to an equilibrium point of (15) from an initial point.
Furthermore, every trajectory converges to SEPs except for the trajectory glow on the stable boundary, which converges to an unstable equilibrium point.In addition, we also need to prove that the trajectory of (15) converging to a SEP is equivalent to solving a LOS for problem (1).The next section determines this through the equivalence relationship between the LOSs of (1) and the SEPs of (15).

Equivalence relations
Theorem 3 (Equivalence Relations): Consider the nonlinear optimization problem (1), which corresponds to the nonlinear dynamical system (15) and satisfies assumptions A1) and A2).Then, the following properties hold.
1) If w * is a local optimal solution of (1), then w * is a stable equilibrium point of system (15). ( 2) If w * is a stable equilibrium point of (15), w * is a local optimal solution of (1).
Proof: 1) Given the w * as a LOS, clearly ∇ W F (w * , x) = 0 , by (14).For system (15): Since w * is a local optimal solution, there exists a vector d d T ∇ 2 ww F (w * , x)d > 0 [48].It needs to be proven that w * is a hyperbolic SEP.Consider the Jacobian of ∇ w F in (15): Next, we show that the quadratic form J (w, x) =w T J (w * , x), w > 0 , for ∀(w, x) � = 0 .Let Clearly, J (w, x) = P(w, x).By the above-verified claim, the quadratic form J (w, x) = w T J (w * , x)w = P(w, x) , which shows all J (w * , x) = P(w, x) > 0 for all w = 0 .For J (w * , x) is symmetric, J (w * , x) is a positive definite square matrix, and the characteristic values of J (w * , x) are all positive real numbers.Therefore, w * is a type-0 and the hyperbolic equilibrium point of (15), due to ∇ w (−∇ w F ) = −J (w * , x) , when w = w * .So, 1) is proved.
So far, the equivalence relationship between the LOSs of (1) and the SEPs of ( 15) has been proved.This is the key to ensuring the effectiveness of the CPSOTJUTT methodology.(16)

CPSOTJUTT-based ensemble model
This paper develops a three-layer ensemble model of CPSOTJUTT-EM for automatically designing and training DNN architectures for power line inspection, as shown in Fig. 1.The pseudocode of CPSOTJUTT-EM is shown in Algorithm 3.

1:
Initialize the population, the number of individuals is N, generation is T, define the fitness function f(x), minimum is ε : 2: for j=1 to epoch do 3: Select the population.4: Crossover, mutation operation.

5:
Calculate the fitness function f(x).Search for the optimal solutions by CPSOTJUTT.6: Update the evolutionary population.7: If ∆f (x) < ε then 8: break 9: end for 10: Output the optimal individual (sub-DNN).11: Use the optimal particle to generate an ensemble model.12: Train the ensemble model weights by CPSOTJUTT.
In this paper, the CPSOTJUTT-EM is developed to enhance the performance of automatically designed DNN architectures in two aspects: 1) Enhance the robustness of the DNN training method by applying the CPSOTJUTT to efficiently build multiple high-quality classification engines with different DNN architectures.2) Improve the generalization ability through the ensemble model by applying the CPSOTJUTT methodology to build an effective ensemble of multiple members to achieve a higher accuracy and generalization ability in power line inspection.

Bottom layer: design the DNN architecture
In this layer, the genetic algorithm (GA) is used to design high-quality DNN architectures stably and quickly.Similar constructive strategies of DNNs have been widely studied and achieved satisfactory results [21].We provide binary code representation of a DNN architecture for the GA method and automatically designed high-quality The encoding area begins with the second convolutional layer and represents the connection between the current and previous convolutional layers: 1 indicates that there is a connection, while 0 indicates that there is no connection.There is a fixed input node and an output node in each stage.Besides the convolutional layer, there are also batch normalization and ReLU, which are proven to play a positive role in DNN training.Fully connected parts are preset.
The model designed through automatic DNN architecture algorithm usually needs to be further customized according to the requirements of specific tasks.For classification model, it is usually necessary to add a fully connected layer at the end of the network, followed by a SoftMax layer, in order to output classification tasks.The fully connected layer is responsible for converting the feature maps extracted by the convolutional layer into the final classification results.
For object detection model, a common method is to use region proposal network (RPN) to generate candidate object regions, and then require a region of interest (RoI) pooling layer to extract fixed size feature representations from region proposals of different sizes for input into classification and regression.On the basis of the automatic DNN architecture design algorithm, the RPN layer can be added at an appropriate location to achieve object detection.
However, the method of DNN training is very sensitive to the initial points, so the true capacity of the designed DNN architecture is difficult to verify by a single training.The CPSOTJUTT methodology proposed in this paper can quickly and robustly train and select the high-quality DNN architecture designed by the GA algorithm.

Middle layer: build diverse optimal DNN classification engines
In this layer, based on the optimal DNN architecture designed from the bottom layer and the corresponding initial guess w*, the CPSOTJUTT methodology proposed in this paper explores a set of diverse optimal DNN classification engines: where w * is the initial guess of the consensus state reached by CPSO in the bottom layer.
Stage II of the CPSOTJUTT methodology proposed in this paper quickly and robustly converges to the SEP (Tier-0) from w*.Then we use the TRUST-TECH methodology to jump out of the current region, enter the stability region of neighboring SEPs, and obtain multiple SEPs in a tier-by-tier search manner.
We apply the CPSOTJUTT methodology to train these DNN architectures to obtain high-quality DNN classification engines in the middle layer.

Top layer: the DNN-based ensemble model
In this layer, diverse high-quality DNN classification engines from the middle layer are used as the hidden nodes of the ensemble model, and the CPSOTJUTT methodology is used for training to find the weight σ .The final output of the ensemble model is as follows: (21) min F (w)| x where x is the input data, t ic is the one-hot value of the class, σ jc is the weight of the c-th classification of the j-th sub-DNN, and p jc is the probability that sample i belongs to class c by the j-th sub-DNN, N is the size of Mini-Batch, C is the number of classes, and M is the number of sub-DNN for the ensemble model.
The structure diagram of the classification ensemble model is shown in Fig. 6.In the figure, F is different feature extractor automatically designed by the automatic architecture algorithm, FC is the fully connected layer and SM is the SoftMax layer.
The ensemble function of the object detection ensemble model adopts the weighted bounding box fusion method.Assuming that the data of the bounding box is stored in set B .B c contains a bounding box labeled c .F c is the ensemble result of bounding boxes in B c , represented as s, x tl , x br , y tl , y br .When the bounding box of the jth sub-DNN is added to B c , the confidence level of the ensemble bounding box F c is recalculated as: where, N represents the total number of bounding boxes contained in B c , and A k is one of the bounding boxes, namely s, x tl , x br , y tl , y br .
The coordinates of the ensemble bounding box are updated as follows: As shown in Fig. 7, F is the backbone network automatically designed by the automatic architecture algorithm, RPN is the Region Proposal Network (RPN), RoI is region of interest, N is the network block composed of convolutional layers, C is classification prediction, and B is boundary box prediction.
In this layer, the ensemble model composed of diverse high-quality DNN model achieves higher performance than single model in the middle layer.The ensemble model is an optimal combination of diverse high-quality DNN model, which can fully exploit the advantages of each sub-DNN, and further improve the overall accuracy and generalization capability.
The execution process of the CPSOTJUTT-EM is relatively complex.In order to clearly display the execution status of each algorithm, we have created a flowchart, as shown in Fig. 8.

Experiment
With the increasing dependence of society on electricity, ensuring the continuous power supply of the power system has become an important component of power supply guarantee.Power system inspection is a key measure to ensure the stable and safe operation of the power system [49][50][51].With the rapid development of information technology, new technologies such as unmanned aerial vehicle (UAV) inspection and robot inspection have gradually replaced traditional manual inspection method [52], bringing new opportunities for power system inspection.At the same time, deep learning (DL) has made rapid progress in computer vision technology, especially in the fields of power line Fig. 7 Structure diagram of object detection ensemble model object detection and defect recognition, where computer vision technology has been widely applied [53].The application of deep learning technologies provides more efficient and accurate methods for power system inspection, further improving the safety and reliability of power system operation [54,55].The examples of the power system inspection object are shown in Fig. 9.
In this paper, three self-made power system inspection datasets are independently developed for the three key areas of power system inspection, these datasets still meet the requirements for classification model testing and object detection model testing: Power Line Insulator Inspection Dataset: Power line insulator inspection dataset (PLIID) made in this study consists of 60,000 color images of insulator defects, including 4 classes of defects: ceramic insulator edge loss, ceramic insulator middle loss, glass insulator edge loss, glass insulator middle loss.
Power System Substation Inspection Dataset: Power system substation inspection dataset (PSSID) consists of images of internal equipment defects in power system Fig. 8 The flowchart of the CPSOTJUTT-EM substations, including 9 classes of color images: suspended matter, component oil contamination, bird nests, ground oil pollution, metal corrosion, meter inspection, oil seal damage, silicone discoloration, dial blurriness.
Power line obstacle inspection dataset: Power line obstacle inspection dataset (PLOID) is composed of images of obstacles along the power line, which contains 10 classes of color images: forklift, crane, wire foreign object, tipper, wildfires, smog, Cement pump trucks, tower cranes excavator and other construction machinery.
To evaluate the effectiveness of the proposed framework, we also conduct numerical experiments on public datasets CIFAR-10 and CIFAR-100, and discuss the proposed results.

MNIST:
The handwritten dataset, commonly known as MNIST, is a fundamental benchmark dataset extensively utilized in the field of machine learning and computer vision.It comprises a collection of grayscale images, each representing a handwritten digit from 0 to 9. With a total of 70,000 examples, it is divided into a training set of 60,000 images and a testing set of 10,000 images.
Server configuration: We use four servers to evaluate the model proposed in this paper: Intel CPU Core (2.67 GHz) and eight GeForce RTX 2082TI GPUs.The software Fig. 9 Examples of the power system inspection object framework is Pytorch, and the operating system is Linux Ubuntu 18.04.We also use python and its CPU plug-in for a test.

Convergence verification of CPSOTJUTT methodology
We train many initial points and use the filter-normalized random direction method to draw the landscape around the LOSs, as shown in Fig. 10.To see convergence regions, the results are shown as contour plots rather than surface plots.
In the experimental setup to understand the influence of initial points on convergence, we carefully adjusted the SGD method to train the lenet-5 network with a random initial point.The dataset is the handwritten dataset MNIST, and the experiment was repeated 100 times, as shown in Fig. 11.
There are obvious boundaries according to different test accuracies of a DNN trained by the local solver and CPSOTJUTT methodology, as shown in Fig. 11.We have statistics on the number of convergence regions with various test accuracies, and the time used for SGD is calculated at 20 epochs, and CPSOTJUTT is calculated at 20 epochs after reaching or over consensus, as shown in Table 1.
Figure 11 and Table 1 show that when SGD is applied to train the DNN, 18% of the initial points converge to the optimal convergence region, and each convergence region has obvious boundaries.The CPSOTJUTT methodology has better global convergence ability, and 83% of the initial points converge to the optimal convergence region.The CPSOTJUTT methodology with an over consensus state did not achieve better results while increasing the time cost.

Test results of the CPSOTJUTT-EM on the CIFAR
To evaluate the effectiveness of our proposed CPSOTJUTT-EM model, we first tested the automatic architecture design algorithm.The experiment aims to verify the performance of the proposed automatic architecture design algorithm in image classification tasks.We selected some the peer competitors and divided them into three different categories.The first category covers the most advanced manually designed architectures (MD) DNN models, including ResNet [3], DenseNet [56], VGG [57].Specifically, we used two different versions of ResNet in the experiment, namely ResNet-101 and ResNet-1202.The second category includes DNN architecture design algorithms with semi-automatic (SM) methods, such as Genetic CNN [20], Hierarchical Evolution [58], EAS [59], and Block QNN-S [60].The third category covers methods for fully automated (FA) design methods, including large-scale evolution [18], CGP-CNN [61], NAS [59], and AE-CNN [21].The experiment selected two widely used image classification benchmark datasets, namely CIFAR10 and CIFAR100.
To maintain fair comparison, we followed the parameter settings commonly used by the peer competitors.The population size and the number of generations are all set to 20, and the probabilities of crossover and mutation are set to 0.9 and 0.2, respectively.In addition, we set the parameters of the SGD optimizer, including momentum 0.9, learning rate 0.01 and d the learning rate is decayed by a factor of 0.0005, according to the conventions of competitors.
In this article, to evaluate the computational complexity, the indicator "GPU days" was used.The calculation method for GPU days is obtained by multiplying the number of GPU cards used by the number of days executed to find the optimal  architecture.We refer to the optimal model generated by the automatic architecture algorithm as CPGA-DNN.According to the experimental results, the performance of the proposed CPGA-DNN algorithm is superior to manually designed state-of-theart CNN models on the CIFAR10 and CIFAR100 datasets.The results are shown in Table 2, CIFAR10 and CIFAR100 represent the test error of the model on this dataset, unit: %.Specifically, on the CIFAR10 and CIFAR100, compared to manually designed architectures, the CPGA-DNN exhibits lower test errors.Although the parameter size of CPGA-DNN on CIFAR10 is relatively large compared to ResNet-101 and DenseNet-40, the increase in computational complexity is not significant for existing hardware devices.Compared to semi-automatic competitors, CPGA-DNN exhibits superior performance on CIFAR10 and CIFAR100, surpassing algorithms such as Genetic CNN, EAS, and Block QGS-S.Although Hierarchical Evolution slightly leads CPGA-DNN on CIFAR10, CPGA-DNN only consumes 1/14 of the GPU days required for Hierarchical Evolution.In fully automated competitors, CPGA-DNN performs best on the CIFAR10 and CIFAR100 datasets, with better test error, number of parameters, and GPU days than other methods, including Large-scale Evolution, CGP-CNN, NAS, and AE-CNN.On the CIFAR10 and CIFAR100, the test errors of CPGA-DNN were 3.67% and 16.55%, respectively.These results demonstrate the superiority and efficiency of our proposed automatic architecture design algorithm in designing DNN architectures, providing a more reliable and efficient automation method for solving image classification problems.
The above experiments show that the capability of a DNN architecture may not be fully exhibited given that local solvers may converge to bad LOSs, and thus highquality DNN architectures are missed.We evaluate the robustness of CPSOTJUTT in automated DNN architecture design.And recorded the best network architecture (BNA), as shown in Table 3. Table 3 demonstrates that the proposed CPSOTJUTT can design higher-quality DNN architecture faster than the local solver.To explain more clearly each stage of the CPSOTJUTT, we design the following experiment: Step 1: Converge quickly and robustly to the SEP (Tier-0) using stage II of the CPSOTJUTT methodology.
Step 2: Search for exit points on 10 paths, starting from the Tier-0 using the TRUST-TECH methodology.
Step 3: Converge quickly and robustly to the SEP (Tier-1) from each exit point using stage II of the CPSOTJUTT methodology.
The experimental results of the CPSOTJUTT are given in Table 4.The DNN architecture is the BNA selected in the first layer of the CPSOTJUTT-EM.4 shows 10 search paths of Tier-1 by the CPSOTJUTT.The results show that a LOS better than Tier-0 can be obtained on each search path, and the best one (Tier-1 (#1)) reduces the test error by 0.72% compared to Tier-0 (ep 100).Tier-0 is trained for 200 epochs, and the majority of the Tier-1 train error and test error are lower than for Tier-0 (ep 200).The findings suggest that searching for greater quality LOSs at Tier-0 with additional iterations is challenging.
To expand the search space, we further search Tier-2 starting with the best point of Tier-1(#1), as shown in Table 5.
As shown in Table 5, all Tier-2 solutions outperform Tier-1 with limited incremental time cost.The results show that the TRUST-TECH methodology can efficiently explore high-quality LOSs in the parameter space.
To evaluate the performance of CPSOTJUTT on a large dataset, the CIFAR-100 dataset was used, and the results are given in Table 6, indicating the competitive capability of CPSOTJUTT.The above experiments show that CPSOTJUTT-EM achieves competitive results in testing on the CIFAR dataset.Further evaluations of the performance of CPSOTJUTT-EM are given in later sections.

Testing results of the CPSOTJUTT on imbalanced PLOID dataset
We evaluate the ability of the CPSOTJUTT methodology to effectively handle imbalanced datasets in a real-world application for drone-based visual inspection of electric power transmission line corridors.Table 7 shows the results of a statistical analysis of the PLOID, which has a huge imbalance with the proportion of the largest and smallest classes being 38% and 2%, respectively.The imbalance of the PLOID dataset makes the classification task a big challenge.
As shown in Tables 8 and 9, we compared CPSOTJUTT with the most used SGD method: The weight decay and momentum of SGD are fixed as 0.0001 and 0.8, respectively.
Tables 8 and 9 shows the additional research on a local solver (SGD) and CPSOTJUTT in a different model.The experimental data show that CPSOTJUTT can quickly converge to a high-quality LOS.In particular, the performance of the CPSOTJUTT is better,  and the minimum error rate of the CPSOTJUTT is only 4.03%, which is 2.21% lower than that of the local solver for BNA-2.In order to test the optimization performance of CPSOTJUTT under different types of models, we chose ResNet-50 to test the test error on imbalanced datasets PLOID, as shown in Table 10.Table 10 well demonstrates that the overall error rate of the CPSOTJUTT is lower than that of the local solver.The test error of CPSOTJUTT has a significant decline in the hard examples.In particular, the error rate of smog is 13.6% lower than the local solver.In general, CPSOTJUTT performs better performance in training DNN, especially in the case of an imbalanced dataset, which can greatly improve the accuracy of hard examples.

Performance test of classification models on three power system datasets
The proposed CPSOTJUTT-EM designs diverse high-quality DNN architectures and corresponding weights in the CIFAR and PILD datasets in the previous sections.In this section, we apply the CPSOTJUTT methodology to train the ensemble model of the above various DNN architectures, as shown in Table 11, the table shows the test error of the model on the corresponding dataset, unit: %.
We set up a comparative experiment using the DEns-VGG19 ensemble model and the proposed CPSOTJUTT-EM model.The experimental results show that the CPSOTJUTT methodology achieves a lower test error than the local solver in the training of DEns-VGG19, as well as a lower test error than the DEns-VGG19 model.The CPSOTJUTT-EM achieved lower test error in classification testing on both public datasets and three self-made power system inspection datasets.In conclusion, the CPSOTJUTT-EM proposed in this paper can automatically design a high-quality ensemble model of DNN for power system inspection.

Fig. 1
Fig.1The architecture of the CPSOTJUTT-EM for power line inspection

Fig. 3
Fig. 3 TJU is used to robustly and accurately compute a LOS.A1 and A2 are stability regions of TJU and the local solver, respectively

Fig. 4
Fig. 4 Schematic diagram of search path of the TRUST-TECH methodology

Fig. 5
Fig. 5 Binary code representation of DNN architecture

s 1 Fig. 6
Fig. 6 Structure diagram of classification ensemble model

Fig. 10
Fig. 10 2D visualization of the loss surface of DNN

Fig. 11
Fig. 11 Test results of a DNN trained by the local solver and the CPSOTJUTT methodology.a The well-tuned SGD method.b The CPSOTJUTT method.c The CPSOTJUTT methodology with over consensus

Table 1
Statistics on the number of stability regions with various test accuracies

Table 2
Comparisons between the proposed algorithm and the state-of-the-art peer competitors

Table 7
The proportion of all classes of PLOID

Table 8
Testing results of CPSOTJUTT (Tier-1) on PLOID Bold value indicates the lowest test error

Table 9
Optima performances by CPSOTJUTT on PLOID

Table 10
The error rates of all classes of two training methods are reduced by 4.18%, 12.06%, and 12.6%, respectively, especially for hard examples in the PLOID dataset.The CPSOTJUTT methodology proposed in this paper has a strong global convergence ability: 83% of the initial points converge to the optimal convergence region, thereby improving the stability by 65%.The ensemble classification model and ensemble object detection model automatically generated by CPSOTJUTT-EM have achieved good results in PSIID, PSSID, and PLOID, indicating that the CPSOTJUTT-EM three-layer model can achieve high inspection accuracy in power system inspections.