- Research
- Open access
- Published:

# Practical ANN prediction models for the axial capacity of square CFST columns

*Journal of Big Data*
**volume 10**, Article number: 67 (2023)

## Abstract

In this study, two machine-learning algorithms based on the artificial neural network (ANN) model are proposed to estimate the ultimate compressive strength of square concrete-filled steel tubular columns. The development of such prognostic models is achievable since an extensive set of experimental tests exist for these members. The models are developed to use the simplest possible network architecture but attain very high accuracy. A total dataset of 1022 specimens with 685 stub columns and 337 slender columns subjected to pure axial compression is collected from the available literature. This is significant for the development of the initial model considering that for this field it falls under the scope of big data analysis. The ANN models are validated by comparison with experimental results. The validation study has shown the superiority of surrogate models over the Eurocode 4 design code. The empirical equation derived from the best-tuned Bayesian regularization algorithm shows a better agreement with the experimental results than those obtained by the Levenberg–Marquardt algorithm, and Eurocode 4 design code. A similar conclusion applies to stub and slender columns independently. The Bayesian regularization-based model is negligibly slower than the one developed on the Levenberg–Marquardt algorithm but gives a better generalization even with simplified ANN. Generally, besides its high accuracy, one of the key benefits of the presented ANN model is its applicability to a broader range of columns than Eurocode 4 and other studies.

## Introduction

In recent years, numerous machine learning (ML) techniques have found applications in different fields within civil engineering. The main reasons are less computational effort and the possibility of achieving optimized solutions referring to the price and required performance relations in various practical problems.

Due to the very complex highly nonlinear behavior of concrete-filled steel tubular (CFST) columns, it is crucial to correctly predict their ultimate compressive capacity in order to avoid an abrupt failure. Experimental testing is one of the most important part in this research field, but it is very time-consuming and expensive. These tests involve subjecting CFST columns to axial loading until failure and collecting data about their load–displacement behavior. This data are then used to calculate the ultimate compressive strength of the column but also to validate existing models. Theoretical models and design codes can be used to forecast the performance of CFST columns for various parameter combinations, while predictive models are typically built on experimental data acquired from tests.

The paper will explore how regression models can be applied to manipulate numerical values related to axial capacity. Also, it will discuss the potential implications and applications of such manipulations in the context of the specific domain, such as how the predicted numerical values can be used for decision-making or other practical applications. The limitations, challenges and future research directions related to the implementation of machine learning algorithms to regression problems will be considered. The authors will highlight the performance of the implemented algorithms, such as their coefficient of determination (*R*^{2}), or different error metrics.

Many authors have investigated the behavior of concrete-filled steel tubular columns, intending to implement various regression-based ML techniques to predict the ultimate compressive strength of rectangular or circular CFST columns. Regression models such as Decision tree (DT) and Random forest (RF) were employed by Đorđević and Kostić [9] for circular CFST columns. However, due to the small amount of collected data with just 236 stub columns and 272 slender columns, the relevance of the obtained *R*^{2} results of 0.989 for stub and 0.985 for slender columns is somewhat limited. In many previous studies, the same limitations in the size of the dataset and, consequently, the reliability of the derived conclusions are present. Tran et al. [62] have trained an ANN with 300 samples of square CFST columns subjected to concentric loading. Tran et al. [63] developed an ANN model for CFST columns with ultra-high-strength concrete using a database of 768 finite element models created by the ABAQUS software and compared the results with those obtained by different design codes such as EC4 [13], ANSI/AISC 360-10 [2] and GB 50936 [15]. Le et al. [29] collected 880 samples of rectangular columns, and achieved a training coefficient of determination of 0.982, using the backpropagation (BP) rule and feed-forward neural network (FNN). Authors employed the network with 27 neurons in hidden layer, and consequently a significantly higher number of learnable parameters, using log-sigmoid and hyperbolic-tangent activation function for input-to-hidden and hidden-to-output layers, respectively. Khalaf et al. [24] collected a database of 280 circular CFST specimens with experimental results. The developed multilayered feed-forward ANN model based on the backpropagation rule had two hidden layers with eight neurons and hyperbolic-tangent activation function and six neurons with pure linear function, respectively. However, the authors did not provide sufficient information on data preprocessing. Allouzi et al. [1] implemented a conventional approach using 3D nonlinear finite element (FE) model for predicting the behavior of concrete-filled double-skin steel tube columns while Tran and Kim [61] successfully applied the ANN model with eight input parameters and one hidden layer with 17 nodes and the same associated activation functions (hyperbolic-tangent and pure linear), implemented on the same task. Tran et al. [60] conducted a study with 145 tests of elliptical CFST columns under axial loading by establishing a combination of ANN and interior-point (IP) algorithm. Several authors, Bu et al. [5], Dinaharan et al. [7], and Soepangkat et al. [57] recommended the application of the basic Levenberg–Marquardt (LM)-ANN algorithm. Zarringol et al. [67], using the same algorithm, achieved better performance for circular than rectangular CFST columns but with a more complex network architecture. In addition, Zarringol et al. [66] made analyses with a different number of neurons using ANN and support vector regression (SVR) and concluded that referring to the ANN model with the Bayesian regularization rule, the best accuracy is achieved with the network with two hidden layers with 5 and 25 neurons in them, respectively. Some previous studies as Du et al. (12) or Nguyen et al. (48) have focused on the development of ANN models, but without reporting the resulting empirical equations. Vu et al. [64] proposed a gradient tree boosting (GTB) surrogate model with a similar dataset size as in this study (1017 samples), consisting of circular CFST samples subjected to concentric loading. They compared it with the support vector machines (SVM), RF and DT models and obtained *R*^{2} values of 0.999, 0.965, 0.971, and 0.963, respectively, for all data. Additional alternative single methods for successful determination of the axial compressive strength of CFST columns, such as fuzzy logic (FL), multivariate adaptive regression splines (MARS), gene expression programming (GEP) and adaptive neuro-fuzzy inference system (ANFIS), were recommended by Moon et al. [44], Luat et al. [35] and Mansouri et al. [40], Payam et al. [51], Güneyisi et al. [17] and Iqbal et al. [21], Ly et al. [37], Le and Phan [31] and Saed et al. [54]. The hybrid intelligent approaches such as PANN as a combination of ANNs and particle swarm optimization (PSO), a fusion of ANN and genetic algorithm (GA), a mixture of ANN and ABC algorithms for prediction of infilled reinforced concrete frame frequencies, or conjunction of GA, ABC and PSO algorithms with Bayesian Additive Regression Tree (BART) have been successfully applied by Nguyen and Kim [47], Nikoo et al. [49], Asteris and Nikoo [3] and Luat et al. [36]. These proposed hybrid algorithms generally provide highly precise results but require more complex model structures.

The study presented here focuses on the ANN models with the aim to develop the networks with the simplest viable architecture but with an accuracy comparable to the most accurate previously developed models. This way, the obtained empirical expressions involve fewer parameters and are simpler. In order to achieve this, special attention is made to selection of hyperparameters. In the paper, two approaches are selected: the early stopping rule of the LM algorithm and the regularization method of the Bayesian regularization (BRA) algorithm, and developed in MATLAB [25] environment. Both approaches for predicting the square CFST column's ultimate capacity provide more accurate predictions than other available solutions, including the Eurocode 4 (EC4) design code. It can be concluded that the use of a well-suited ML algorithm has the potential to greatly enhance performance levels [46].

## Problem definition

Due to their high load-bearing capacity and durability, CFST columns are widely used in the construction industry. To guarantee the safety and exployability of these columns in structural design, it is crucial to correctly predict their ultimate compressive strength. On the other side, these members consist of steel and concrete portions, i.e. a combination of two materials that have very different material behavior. Therefore, the resulting behavior of a composite column is highly nonlinear and the exact analytical solutions for the column’s capacity do not exist. The use of ANNs in predicting the axial capacity of CFST columns has several advantages. ANNs are a type of machine learning algorithm that can learn from data and generalize to new data, making them suitable for predicting complex and nonlinear relationships. ANNs are the best choice for modeling the behavior of CFST columns because they can manage big datasets. The goal of this study is to create model that accurately predicts the ultimate strength of concrete-filled steel tubular columns based on a variety of input parameters. The model will be trained and validated using experimental data obtained from tests on CFST columns. To assess the accuracy and efficiency of the suggested ANN model, it will be contrasted with other existing models and EC4 design code that are already in use. The results of this study will provide valuable insights into the use of ANNs in predicting the ultimate compressive strength of CFST columns and may lead to the development of improved design code guidelines for the safe and reliable use of these columns in structural design. Figure 1 illustrates a sample of the square CFST column subjected to axial load. Besides geometrical parameters shown in this figure (B, L, t), the column axial capacity also depends on the parameters that define the material behavior of steel and concrete. The use of CFST columns is expected to increase due to the increasing demand for sustainable and resilient infrastructure.

## Existing solutions

In this chapter the current state-of-the-art in the field is established and it provides a basis for comparison with the best ANN model, delivered by this study. The behavior of CFST columns under axial loading has been extensively studied, and various theoretical models [, , , 22, 27, 34, 58] have been developed. The majority of the existing approaches rely on finite element analysis (FEA), but they have certain limitations in the form of resource consumption, which makes them computationally very expensive and a less optimal solution. To overcome the limitations of existing methods, researchers have explored a number of alternative machine learning methods to predict the axial compression capacity of CFST columns. As ANNs are capable of learning intricate nonlinear relationships between input and output features, they can be used to forecast how CFST columns will behave. The authors firstly present the key characteristics of the database of square columns before reviewing and evaluating the various regression models.

### Dataset description

This study uses the experimental dataset with 1022 samples of square CFST columns. The entire database is extracted from the following researchers: Denavit [6] (470 samples), Thai et al. [59] (263 samples), Goode [16] (166 samples) and Belete [4] (123 samples), and represents the largest database of square columns currently collected in this field, which places it in the domain of big data problems. The database consists of the samples exposed to pure compression only, without load eccentricity and steel reinforcement.

The selected input features are the square section width (\(B\)), the thickness of the steel tube (\(t\)), the length of column (\(L\)), steel yield stress (\(f_y\)) and concrete compressive strength (\({f_c}^{^{\prime}}\)). Table 1 shows the experimental test ranges and distribution of input features. As can be seen, the database contains a wide range of samples considering geometric and material properties. In cases when for some samples, the concrete compressive strength for the cube samples (\(f_{cu}\)) was reported, these values are converted to a cylinder strength (\({f_c}^{^{\prime}}\)) using the Eq. (1) proposed by L'Hermite [33]:

Referring to the length-to-width ratio (\(L/B\)), CFST columns are commonly categorized as stub columns for ratios less than or equal to 4 (i.e. *L*/*B* ≤ 4), and as slender columns for ratios greater than 4 (i.e. *L*/*B* > 4) [67]. The database used in this study contains specimens from both categories: 685 stub columns and 337 slender columns. The section slenderness is defined as the ratio between \(B\) and \(t\), and, according to EC4, the specimens are not prone to local buckling effects when the following relation holds: \(B/t \le 52 \sqrt{\frac{235}{{ f}_{y}}}\). Figure 2 shows the matrix of the correlation values between input and output variables throughout a heatmap. As expected, this matrix shows that the strongest correlations (0.810, 0.636) exist among the ultimate compressive strength \(N_{exp}\) and input attributes (\(B, t\)) as presented in Fig. 3. Zarringol et al. [66] obtained similar results, as well as Đorđević and Kostić [10], where a comparable level of correlations for circular columns is reached. However, there is a noticeable low correlation between the input parameters, which is why special care must be taken in case of possible application of dimensionality reduction procedures. It is certain that principal component analysis (PCA) would not be suitable, so the application of autoencoder neural networks would be recommended in this case.

In order to present the high range of column parameters from the database, the histograms of the number of the experimental specimens for steel yield stress \(f_y\), concrete compressive strength \({f_c}^{^{\prime}}\), relative slenderness (\(\overline{\lambda }\)) and section slenderness (\(B/t\)) are presented in Fig. 4.

In this figure, the dash-dot red lines represent the limitations given by the EC4. These limitations are also summarized later in Table 2. Most of the specimens have properties that meet the criteria given by the EC4. However, there are specimens (although a noticeably smaller number of them) with properties outside the EC4 limits. These primarily refer to high steel and concrete strengths and specimens with higher section slenderness. For these reasons, the derived expressions are more general considering this extended range of column properties.

### Axial strength according to Eurocode 4 design code

Eurocodes are a series of separate standards for designing structures. EC4 [13], a design code for composite structures offers a simplified method for calculating the ultimate compressive strength of CFST members. The ultimate compressive strength (\({\text{N}}_{\text{u}}^{\text{EC4}}\)) calculates from the following Eq. (2):

where \(\chi\) is the reduction factor for the relevant buckling mode, defined as in Eq. (3):

Parameter \(\Phi\) and the relative slenderness \({\overline{\lambda}}\) are calculated from Eqs. 4 and 5:

where \(N_{cr}\) is the elastic critical force for the relevant buckling mode calculated with the effective flexural stiffness \({EI}_{eff}\) obtained from Eq. (6):

The plastic resistance to compression \(N_{us}\) for rectangular and square columns can be determined from Eq. (7):

Table 2 shows the major limitations in the geometric and material properties given by the EC4 for evaluating the axial capacity of CFST columns by a simplified method.

As mentioned before, when the first condition from Table 2 is satisfied, the local buckling of the steel tube can be neglected. The second and third constrains refer to the limitations of the steel and concrete material properties. The last limitation from Table 2 refers to the steel contribution ratio \(\delta\). It should satisfy Eq. (8) (where \({{N}}_{{pl,Rd}}\) is calculated from the same expression as \({{N}}_{{us}}\) but with the design values for \({{f}}_{{y}}\) and \({f}_{c}^{^{\prime}}\), i.e. \({{f}}_{{yd}}\) and \({{f}}_{{cd}}\)):

### Solutions from the literature

So far, the analysis of the behavior of CFST columns exposed to axial loading, using advanced methods is mainly limited to the observation of circular or rectangular columns. Despite the extensive and very detailed analyzes that the researchers did, their studies chiefly contained only a few hundred samples, except for some that managed to collect over a thousand members, as presented in the previous sections. Averagely a small number of collected members is justified considering their production limitations and expenses. Conversely, except already mentioned study by Tran et al. [62] with just 300 samples, only a few researchers have included square columns in their studies, also using scarcer databases.

Namely, Sarir et al. [55] have implemented metaheuristic-based neural network algorithms, i.e. PSO-ANN, and competitive imperialism algorithm (ICA) neural network for prediction of ultimate axial load of SCFST columns, on 149 samples with R^{2} values of 0.913 and 0.857 on all data, respectively. Ben Seghier et al. [56] used GEP method for modeling the nonlinear behavior of square columns using 300 specimens, and achieved R^{2} of 0.9943. However, the size of the dataset requires a more extensive discussion of the results. Ren et al. [53] obtained training performance of 0.932, using a hybrid PSO-SVM method on only 180 SCFST specimens. On similar tasks, also using about 300 samples, Le [28], Le and Le [30], and Mai et al. [39] have applied other hybridized methods such as Gaussian process regression (GPR), Kernel-based Gaussian process regression (KGPR), and combination of radial basis function neural network (RBFNN) and firefly algorithm (FFA). These paradigms achieved the training R^{2} values of 0.968, 0.993, and 0.9992, respectively. However, although some models achieve impressive results, the generalization and achieved performance of such models are questionable, considering the size of the database.

In order to overcome the shortcomings of existing analyzes of SCFST columns, more precisely, on one hand, to cover the limitation in the number of samples, and on the other hand, to achieve better accuracy and efficiency of the developed model, the authors of this work provided a wider set of input experimental results, and developed a simpler and more practical model.

## Proposed solution

The proposed solution is built on the observation of two ANN paradigms based on the approximation of the second derivative of the Hessian matrix, using the first derivative of the performance function. This approach is followed by the LM and BRA algorithms, which were used to perform the execution, and as a result, a very good generalization of the task was obtained. The final solution's use of a feed-forward neural network with one hidden layer makes it computationally efficient and reduces the risk of overfitting. Implemented algorithms additionally help prevent overfitting and improve the generalization of the model, but also represent an excellent basis for the development of even more advanced models through the transfer learning (TL) procedure as in Đorđević [8], using feature extraction or fine-tuning techniques. In addition, new perspectives are clearly opened for a deeper analysis of these members through the potential application of techniques to analyze visual imagery for recognition and classification of damage, and prediction of limit states.

### Artificial neural networks

An ANN model is a network of interconnected neurons based on the biological human nervous system. The first attempts to construct a system based on natural neural networks were made by McCulloch and Pitts [41]. In this study, two ANN algorithms, LM and BRA, are used to predict the axial capacity of square CFST columns. Both approaches are based on the most used multilayer perceptron (MLP) feedforward neural network. The Levenberg–Marquardt algorithm uses an early stopping rule in the learning phase. The Bayesian method is based on the determination of regularization parameters and is characterized by reaching a better generalization than the LM method [50]. Several prediction models based on the ANN algorithms have been developed in the last few years. However, most of them use very complex network architectures. Therefore, the main goal of this study is to obtain relatively simple practical expressions for ultimate column capacity, focused on the networks with the smallest number of neurons. On the other side, the derived ANN models need to perform comparably well as some of the most sophisticated regression models developed by other authors. In order to achieve this goal and make prevention of overfitting, adjustment of the model hyperparameters is performed, as explained below.

The widespread application of ANN models on different problems comes from their success in describing arbitrary nonlinear relations. In an ANN, the mathematical relations between the output of the neuron *k* in the current layer *m,* \({{a}}_{{k}}^{{m}},\) and the outputs from neurons in the previous layer *m* − *1*, \({{a}}_{{k}}^{m - 1}\), are given by Eqs. (9) and (10):

where \({{z}}_{{k}}^{{m}}\) is the input signal in the current layer* m*, \({{w}}_{{k,l}}^{{m}}\) are the weights, \(n\) is the number of neurons in layer *m* − *1*, \({{b}}_{{k}}^{{m}}\) are the biases of the current layer and \({{f }}^{{m}}\) is the activation function for the current layer *m*.

In this paper, the proposed neural networks have one hidden layer, and the activation functions for the hidden and output layers are adopted as hyperbolic-tangent and pure linear (see Fig. 5), respectively. Functions were adopted based on a suggestion given by Đorđević and Kostić [11] and Ho and Le [18], but also based on the implemented data normalization procedure.

These functions are given by Eqs. (11) and (12):

The mathematical background and working principles of applied advanced LM and BRA algorithms are presented in the following subsections. The intention of this study is to show the basic differences between early-stopping and regularization techniques, as well as their validation methods, key advantages and disadvantages in terms of interpretability, generalization and final performance of the developed models. It is important to note dataset division strategies in the stages of tuning hyperparameters and unbiased final evaluation of the models. Namely, a strategy based on division 70/15/15% for training, validation and test set is proposed, respectively. However, this division was achieved in two ways, in accordance with the applied algorithm. Early-stopping based algorithm uses a single validation set, while the one based on the regularization rule enables a more sophisticated K-fold cross-validation technique. During the final evaluation, both approaches are tested on the same 15% of the data, which generated the results under the same circumstances. Keeping the same samples in the training and test set of both algorithms is enabled by using the random number generator (rng) command with identical seed value. This is significant due to the equality of both models. The decision on the final division strategy was made on the basis of the recommendations and experiences of other works, but also considering the available number of training samples to get the best adaptation and representativeness of the model. A more detailed description of model training and validation strategies can be found in the following chapters. In a preprocessing phase, input and output parameters are normalized to the range between − 1 and 1, according to Eq. (13):

where \({{y}}\) is a normalized value of \({{x}}\), \({{x}}_{{max}}\) and \({{x}}_{{min}}\) are maximum and minimum original values, \({{y}}_{{max}}\) and \({{y}}_{{min}}\) are expected maximum and minimum values, i.e. 1 and − 1, respectively.

### Levenberg–Marquardt algorithm

The Levenberg–Marquardt algorithm belongs to the early-stopping algorithms where the initial dataset needs to be divided into three parts: training, validation and test sets. As opposed to the basic BP gradient descent algorithms, LM is a high-performance, robust algorithm based on the standard numerical optimization methods using the Gauss–Newton algorithm, such as conjugate gradient and quasi-Newton methods. In contrast to Newton’s method described by Eq. (14), the LM algorithm avoids the calculation of the second-order derivatives of the Hessian matrix (\({A}_{k})\). As a substitute, it uses an approximation with the first-order Jacobian matrix *J* [20], as described by Eqs. (15) and (16):

where \({g}_{k}\)—current gradient, \(J\)—Jacobian matrix, \(\mu\)—adaptive (damping) parameter, \(I\)—identity matrix, \({J^T} \cdot {e}\)—approximated gradient, \({{x}}_{{k}}\)—current value of variable \({{x}}\), \({{x}}_{{k} + {1}}\)—the updated value of variable \({{x}}\), \({{H}}_{{LM}}\)—LM approximation of the Hessian matrix.

### Bayesian regularization

The Bayesian regularization/optimization method efficiently upgrades the basic LM algorithm. It generally shows superior behaviour over the LM algorithm, even for ANN networks with simpler architecture. In addition to the better generalization that BRA possesses, with a slightly modified performance function, it tends to limit the weights and biases. This way, it reduces the chance of potential overfitting. The BRA appreciably penalizes large weights and makes a smoother network response. During the training phase, some network parameters can lose their purpose and thus break the connections between some neurons from adjacent layers. To prevent this, the effective network parameters are calculated.

Using the LM notation from Eq. (15), for network training, the BRA approximation of the second-order Hessian matrix is given by Eqs. (17)–(19):

where \({{H}}_{{BRA}}\)—BRA approximation of the Hessian matrix, \(\alpha\) and \(\beta\) are the regularization parameters, \({n}_{t}\) is the total number of ANN parameters, \({E_W}\) is the sum of the squared weights and \({E_P}\) is the selected performance measure. The effective number of network parameters (\(\upgamma\)) is equal to the total number of ANN parameters \({n}_{t}\) in the first iteration, and further is calculated from Eq. (20) [10]:

The modified performance function is calculated as a combination of the errors as given by Eqs. (21)–(23):

where \({\text{MSE}}\) is the mean squared error, \({{y}}_{{i}}\) is a target value, \(\overline{{{y} }_{{i}}}\) is the predicted value, \({{n}}\) is the number of samples, and \({{w}}_{{j,k}}\) are the network weights defined in Eq. (9).

## Elaboration

The authors provide a detailed description of the proposed ANN models and the evaluation of its performance using two approximated second-order training algorithms (Levenberg–Marquardt and Bayesian regularization) and various metrics. This paper utilizes a dataset of experimental results from previous studies to train and validate the proposed ANN models. The results of this study contribute to the development of more accurate and efficient models for structural engineering applications. It is proven that ANN models require less computational resources and time than FEM, and that they can be easily updated and adapted to new and bigger datasets, whose growth is expected over time.

### Quality evaluation

The performance of developed predictive models is assessed through several error indicators, similar to Murad et al. [45] and Wu et al. [65]: coefficient of determination (\(R^2\)), \({\text{MSE}}\), root mean squared error (\({\text{RMSE}}\)), mean absolute error (\({\text{MAE}}\)) and mean absolute percentage error (\({\text{MAPE}}\)).

These indicators express agreement between the experimental and the predicted results. Namely, lower values of MSE, RMSE, MAE and MAPE errors and the higher value of \(R^2\) show a better agreement with the actual experimental results.

### ANN hyperparameters

In order to find the most appropriate ANN model with the best generalization, it is necessary to observe an additional set of hyperparameters used in the ANN algorithm. Proposed algorithms distinguish the following hyperparameters for preventing overfitting and adequately adjusting the learning speed: \(\mu\), \(\mu_{dec}\) and \(\mu_{inc}\). Parameter \(\mu\) is known as the damping factor and can be decreased or increased by the other two factors \(\mu_{dec}\) and \(\mu_{inc}\), as presented by Howard and Mark [19]. After previously defined activation functions for the hidden and output layer, the following values of hyperparameters (\(\mu\);\(\mu_{dec}; \mu_{inc}\)) are explored (0.001, 0.01, 0.1; 0.001, 0.01; 10, 50), and then adopted using trial-and-error method. Accompanying network architectures subjected to analyses are (5-4-1, 5-5-1, 5-6-1, 5-8-1, 5-10-1, 5-12-1, 5-14-1). Finally, the total number of considered combinations is 140 (2·10·7). The LM algorithm obtained results from five runs, while the BRA algorithm used fivefold cross-validation technique with different validation subsets in each fold. The use of 5 runs/folds strategies enabled an approximately equal participation of the number of samples in the training and validation set, and therefore their comparison under approximately equal conditions. However, the K-fold cross-validation technique is more sophisticated over the early-stopping rule, which can be confirmed from several perspectives. In general. it offers several advantages over the classical validation set approach, including better utilization of data, reduced bias, more reliable performance estimation, flexibility in hyperparameter tuning, and the better ability to detect overfitting.

The results of the least sensitive LM and BRA models are given in Fig. 6. This is reflected in the smallest standard deviation of the results around the mean value of the coefficients of determination. To select the best combination, a minimum average over five runs/folds was selected as a criterion.

The best ANN models have network architectures 5-12-1 and 5-8-1 as in Fig. 7a and b, with associated hyperparameter configurations (0.1, 0.01, 10 and 0.1, 0.001, 10), respectively for LM and BRA approaches. Both algorithms have small standard deviations around the mean *R*^{2} values of 0.977 and 0.984 for LM and BRA, respectively. However, the BRA model is more efficient concerning the consumption of the resources, network dimensions and overall results [23].

### Results

The performances of the developed ANN models and the corresponding error distributions are depicted in Fig. 8a–d. Both methods show a good convergence, as it can be seen from the performance plots. LM and BRA algorithms with corresponding architectures and hyperparameters, achieved the best results after only 64 and 189 epochs, respectively. Besides the more optimal data division strategy that BRA has, it also shows a better agreement with the experimental results than LM. The *R*^{2} values for training and test data are (0.987, 0.985), as illustrated in Fig. 9a and b. Both ANN algorithms give more accurate results than EC4 on the entire data set, with coefficients of determination (0.986, 0.982, 0.953) for BRA, LM and EC4, respectively (see Fig. 10a–c).

Table 3 shows the values of the coefficients of determination, and Table 4 contains the results of other performance scores (MSE, RMSE, MAE, MAPE) for all data.

Figures 9 and 10 show that the proposed network with one hidden layer and the optimal number of neurons estimated by the trial-and-error method is very productive.

The values of different error indicators are summarized in Table 4. As mentioned before, in general, smaller error values and larger coefficients of determination indicate good prediction performances. The two used ANN algorithms, BRA and LM, outperform the EC4 in all measured criteria. The BRA shows outstanding accuracy and is recommended for predicting the axial capacity of square CFST columns. By applying the BRA method, the error measures are about two to three times lower than those obtained by the EC4 design code.

Figure 8 depicts the regression lines for the two ANN models and the EC4. These results also support the conclusion that the Bayesian regularization method performs the best. The EC4 shows the most significant scatter in the vicinity of the regression line. For stub and slender columns separately, the BRA gave the highest coefficients of determination 0.986 and 0.981, respectively, while EC4 gave 0.965 and 0.901, for all data.

Figure 11a–d present the effects of geometric and material properties on predicting the axial capacity of square CFST columns. It can be concluded that the results of both ANN algorithms are closer to the experimentally measured values than the results of the EC4, which agrees well with the conclusions by Lee et al. [32] and Peng et al. [52]. The results obtained by the BRA algorithm have the smallest dissipation.

The BRA model gives a better generalization of the column’s ultimate compressive capacity within and outside the ranges prescribed by EC4. The output results of the LM algorithm are generally close to the experimentally obtained results with only a few discrepancies. The EC4 results have the most significant disagreements with the experimental results. These disagreements show a similar tendency for samples irrespective of the EC4 limitations (steel yield stress, concrete compressive strength, section slenderness and relative slenderness).

Finally, both presented ANN models can very accurately predict the axial column capacity. However, since the BRA algorithm has shown superior over the LM algorithm, only the empirical equations developed from the best BRA model are presented in the next section.

#### Empirical equations

This section presents the proposed empirical equations from the best ANN model based on the BRA algorithm for calculating the ultimate compressive strength of square CFST columns. These relations are given by Eq. (24) and can be used in practice:

A review of weighting coefficients and biases, either in their raw form or in the form of proposed equations, can be significant. It provides insights into the favoritism of certain network parameters or the uselessness of others, as well as their overall contribution to the final results.

## Conclusions

In this study, two efficient ANN algorithms, LM and BRA, have been used to predict the axial capacity of square CFST columns exposed to pure compression. The benefits of ANN models are demonstrated through their ability to predict the behaviour of CFST columns without any initial assumptions and limitations. In order to develop the LM and BRA models with the simplest viable architecture but with very high accuracy, a trial-and-error method is applied, with the error measures of the performance function separately evaluated on the validation and test set. The BRA method has been shown to be superior to the LM method but also to other existing solutions, even for a network with a simpler architecture. The power of the proposed paradigm is evident also through the rest performance indicators that are far better than the same results delivered by other studies.

Since hyperparameters play a significant role in an ANN model, their values are determined with a special attention. Both presented ANN algorithms are validated by comparison with the experimental results and have shown output results closer to the experimentally measured values than those obtained by the EC4 design code. The new empirical equations for the calculation of the axial column strength of square CFST columns are derived from the best BRA model.

As the analysis showed that BRA can handle noisy and ill-conditioned data, in future research on big data sets, the developed model can serve as a basis for the application of TL, and it’s implementation to improved and more sophisticated models for other but related problems including tensile stress, shear stress, torsional stress, etc. Derived equations have shown high overall accuracy, not only for those specimens that satisfy the EC4 limitations. Therefore, the suggested empirical equations and publicly available scripts on the GitHub reporsitory (https://github.com/filip94grf/Square-CFST-columns-ANN-prediction-models-BRA-LM.git), with instruction manual for application of the developed models may be beneficial for engineers and professionals who deal with coupled structures in practice, but also for those who work with software solutions and their implementation. In order to further simplify the equations, the application of symbolic regression is recommended for subsequent investigations, in order to obtain more interpretable output equations. It would also be very useful to control the differences in model performance in more detail using statistical tests such as t-test or Analysis of variance (ANOVA) test.

In general, ML models have been proven to provide higher productivity and reliability compared to traditional and conservative FEM methods. Evidently, this trend will be even more visible in the future, as evidenced by various modern computer methods and platforms for speeding up the process up to 1000 times and even more, such as DataFlow by the Maxeler DataFlow Engine, or ASIC DataFlow by Google Tensor Processing Unit [, 26, 38]. However, if algorithms need acceleration, while the technology is to stay the same, options are: (a) The existing computing paradigm could be enhanced [43]; (b) A new computing paradigm could be invented [14]; (c) The number of iterations in iterative algorithms could be decreased using machine intelligence [46]; or (d) Each iteration of iterative algorithms could be shortened using suboptimal computing [42]. These are options in case big data starts togrow uncontrollably. In addition to software solutions, progress in material science is also important, which is reflected in the accelerated development of modern materials in recent years. This can lead not only to the expansion of databases in various construction sectors but also to a reduction in the cost of manufacturing structural elements and even entire buildings. In synergy with the already mentioned advanced software platforms, in further research, it is necessary to find compromise solutions for newly opened interdisciplinary tasks but also to make them available to the wider community for use, which is a very challenging task for current and future generations.

## Availability of data and materials

The path to the supporting sources is mentioned in the manuscript. Additional informations can be obtained by sending a question to the corresponding author’s email address.

## References

Allouzi R, Abu-Shamah A, Alkloub A. Capacity prediction of straight and inclined slender concrete-filled double-skin tubular columns. Multidiscip Model Mater Struct. 2022;18(4):688–707.

ANSI/AISC 360–10. Specification for Structural Steel Buildings, Chicago, USA. 2010.

Asteris PG, Nikoo M. Artificial bee colony-based neural network for the prediction of the fundamental period of infilled frame structures. Neural Comput Appl. 2019;31(9):4837–47.

Belete D. Engineering a database on concrete filled steel tube columns. Addis Abbaba: Addis Abbaba University; 2016.

Bu L, Du G, Hou Q. Prediction of the compressive strength of recycled aggregate. Materials. 2021;15(20):1–18.

Denavit MD. Steel-concrete composite column database. 2005.

Dinaharan I, Palanivel R, Murugan N, Laubscher RF. Predicting the wear rate of AA6082 aluminum surface composites produced by friction stir processing via artificial neural network. Multidiscip Model Mater Struct. 2020;16(2):409–23.

Đorđević F. A novel ANN technique for fast prediction of structural behavior. In: 6th international construction management conference, we build the future, Belgrade. 2023. http://orel.unionnikolatesla.edu.rs/index.php/orel/article/view/19.

Đorđević F, Kostić SM. Prediction of ultimate compressive strength of CCFT columns using machine learning algorithms. In: The 8th international conference “civil engineering—science and practice”. 2022. p. 8.

Đorđević F, Kostić SM. Estimation of ultimate strength of slender CCFST columns using artificial neural networks. In: 16th congress of association of structural engineers of Serbia, Arandjelovac, Serbia, Arandjelovac. 2022. p. 10.

Đorđević F, Kostić SM. Axial strength prediction of square CFST columns based on the ANN Model. In: First Serbian international conference on applied artificial intelligence. 2022. p. 12.

Du Y, Chen Z, Zhang C, Cao X. Research on axial bearing capacity of rectangular concrete-filled steel tubular columns based on artificial neural networks. Front Comput Sci. 2017;11(5):863–73.

EC4. Eurocode 4: Design of composite steel and concrete structures. Part 1.1, General rules and rules for buildings. EN 1994-1-1:2004, Brussels, Belgium. 2004.

Flynn MJ, Mencer O, Milutinovic V, Rakocevic G, Stenstrom P, Trobec R, Valero M. Moving from petaflops to petadata. Commun ACM. 2013;56(5):39–42.

GB 50936. Technical code for concrete filled steel tubular structures. Beijing: Architecture & Building Press; 2014.

Goode CD. 1819 tests on concrete-filled steel tube columns compared with Eurocode 4. Struct Eng. 2008;8(33):86.

Güneyisi EM, Gültekin A, Mermerdaş K. Ultimate capacity prediction of axially loaded CFST short columns. Int J Steel Struct. 2016;16(1):99–114.

Ho NX, Le TT. Effects of variability in experimental database on machine-learning-based prediction of ultimate load of circular concrete-filled steel tubes. Meas J Int Meas Confed. 2021;176(February):109198.

Howard D, Mark B. Neural network toolbox user’s guide. Natick: The MathWorks Inc.; 2004.

Igel C, Toussaint M, Weishui W. Rprop using the natural gradient. Trends Appl Constr Approx. 2005;1:259–72.

Iqbal M, Zhao Q, Zhang D, Jalal FE, Jamal A. Evaluation of tensile strength degradation of GFRP rebars in harsh alkaline conditions using non-linear genetic-based models. Mater Struct. 2021;54(5):190.

Johansson M. The efficiency of passive confinement in CFT columns TT. Steel Composite Struct Int J 국제구조공학회. 2002;2(5):379–96.

Kayri M. Predictive abilities of Bayesian Regularization and Levenberg-Marquardt algorithms in artificial neural networks: a Comparative empirical study on social data. Math Comput Appl. 2016;21(2):20. https://doi.org/10.3390/mca21020020.

Khalaf AA, Naser KZ, Kamil F. Predicting the ultimate strength of circular concrete filled steel tubular columns by using artificial neural networks. Int J Civil Eng Technol. 2018;9(7):1724–36.

Kim P. MATLAB deep learning: with machine learning, neural networks and artificial intelligence, Library of Congress Control Number. 2017.

Kos A, Ranković V, Tomažič S. Chapter Four - Sorting networks on maxeler dataflow supercomputing systems. In: Hurson AR, Milutinovic V, editors. Dataflow processing, vol. 96. Amsterdam: Elsevier; 2015. p. 139–86.

Lai MH, Ho JCM. A theoretical axial stress-strain model for circular concrete-filled-steel-tube columns. Eng Struct. 2016;125:124–43.

Le T-T. Practical machine learning-based prediction model for axial capacity of square CFST columns. Mech Adv Mater Struct. 2022;29(12):1782–97.

Le TT, Asteris PG, Lemonis ME. Prediction of axial load capacity of rectangular concrete-filled steel tube columns using machine learning techniques. Eng Comput. 2022;38:3283–316. https://doi.org/10.1007/s00366-021-01461-0.

Le TT, Le MV. Development of user-friendly kernel-based Gaussian process regression model for prediction of load-bearing capacity of square concrete-filled steel tubular members. Mater Struct. 2021;54(2):59.

Le TT, Phan HC. Prediction of ultimate load of rectangular CFST columns using interpretable machine learning method. Adv Civil Eng. 2020. https://doi.org/10.1155/2020/8855069.

Lee S, Vo TP, Thai HT, Lee J, Patel V. Strength prediction of concrete-filled steel tubular columns using Categorical Gradient Boosting algorithm. Eng Struct. 2021;238(February):112109.

L’Hermite R. Id´ees Actualles Sur La Technologie Du B´eton, Paris. 1955.

Lin S, Zhao Y-G, Lu Z-H, Yan X-F. Unified theoretical model for axially loaded concrete-filled steel tube stub columns with different cross-sectional shapes. J Struct Eng. 2021;147(10):e0003150. https://doi.org/10.1061/(asce)st.1943-541x.0003150.

Luat NV, Lee J, Lee DH, Lee K. GS-MARS method for predicting the ultimate load-carrying capacity of rectangular CFST columns under eccentric loading. Comput Concr. 2020;25(1):1–14.

Luat NV, Shin J, Lee K. Hybrid BART-based models optimized by nature-inspired metaheuristics to predict ultimate axial capacity of CCFST columns. Eng Comput. 2022;38(2):1421–50.

Ly HB, Pham BT, Le LM, Le TT, Le VM, Asteris PG. Estimation of axial load-carrying capacity of concrete-filled steel tubes using surrogate models. Neural Comput Appl. 2021;33(8):3437–58.

Machupalli R, Hossain M, Mandal M. Review of ASIC accelerators for deep neural network. Microprocess Microsyst. 2022;89:104441.

Mai SH, Ben Seghier MEA, Nguyen PL, Jafari-Asl J, Thai DK. A hybrid model for predicting the axial compression capacity of square concrete-filled steel tubular columns. Eng Comput. 2022;38(2):1205–22.

Mansouri I, Ozbakkaloglu T, Kisi O, Xie T. Predicting behavior of FRP-confined concrete using neuro fuzzy, neural network, multivariate adaptive regression splines and M5 model tree techniques. Mater Struct. 2016;49(10):4319–34.

Mcculloch WS, Pitts W. A logical calculus nervous activity. Bull Math Biol. 1990;52(1):99–115.

Milutinovic VM. Comparison of three suboptimum detection procedures. Electron Lett. 1980;16:681–3.

Milutinovic V, Tomasevic M, Markovi B, Tremblay M. A new cache architecture concept: the split temporal/spatial cache. In: Proceedings of 8th mediterranean electrotechnical conference on industrial applications in power systems, computer science and telecommunications (MELECON 96), 1996. Vol. 2, pp. 1108–11.

Moon J, Kim JJ, Lee TH, Lee HE. Prediction of axial load capacity of stub circular concrete-filled steel tube using fuzzy logic. J Constr Steel Res. 2014;101:184–91.

Murad Y, Abdel-Jabar H, Diab A, Abu Hajar H. Exterior RC joints subjected to monotonic and cyclic loading. Eng Comput. 2020;37(7):2319–36.

Ngom A, Stojmenovic I, Milutinovic V. STRIP—a strip-based neural-network growth algorithm for learning multiple-valued functions. IEEE Trans Neural Netw. 2001;12(2):212–27.

Nguyen MST, Kim SE. A hybrid machine learning approach in prediction and uncertainty quantification of ultimate compressive strength of RCFST columns. Constr Build Mater. 2021;302(February):124208.

Nguyen MST, Thai DK, Kim SE. Predicting the axial compressive capacity of circular concrete filled steel tube columns using an artificial neural network. Steel Compos Struct. 2020;35(3):415–37.

Nikoo M, Torabian Moghadam F, Sadowski Ł. Prediction of concrete compressive strength by evolutionary artificial neural networks. In: Tao C, editor. Advances in materials science and engineering, vol. 2015. London: Hindawi Publishing Corporation; 2015. p. 849126.

Okut H. Bayesian regularized neural networks for small n big p data. Artif Neural Netw Models Appl. 2016. https://doi.org/10.5772/63256.

Payam S, Chen J, Panagiotis GA, Armaghani JD, Tahir MM. Developing GEP tree-based, neuro-swarm, and whale optimization models for evaluation of bearing capacity of concrete-filled steel tube columns. Eng Comput. 2019;37:19.

Peng D, Lu Y, Jie W, Keyang N, Yi G. Compressive behavior of concrete-filled square stainless steel tube stub columns. Steel Compos Struct, 테크노프레스. 2022;42(1):91–106.

Ren Q, Li M, Zhang M, Shen Y, Si W. Prediction of ultimate axial capacity of square concrete-filled steel tubular short columns using a hybrid intelligent algorithm. Appl Sci (Switzerland). 2019;9(14):2802. https://doi.org/10.3390/app9142802.

Saed SA, Kamboozia N, Ziari H, Hofko B. Experimental assessment and modeling of fracture and fatigue resistance of aged stone matrix asphalt (SMA) mixtures containing RAP materials and warm-mix additive using ANFIS method. Mater Struct. 2021;54(6):225.

Sarir P, Armaghani DJ, Jiang H, Sabri MMS, He B, Ulrikh DV. Prediction of bearing capacity of the square concrete-filled steel tube columns: an application of metaheuristic-based neural network models. Materials. 2022;15(9):3309. https://doi.org/10.3390/ma15093309.

Seghier MEAB, Gao XZ, Jafari-Asl J, Thai DK, Ohadi S, Trung NT. Modeling the nonlinear behavior of ACC for SCFST columns using experimental-data and a novel evolutionary-algorithm. Structures. 2021;30(December 2020):692–709.

Soepangkat BOP, Norcahyo R, Rupajati P, Effendi MK, Agustin HCK. Multi-objective optimization in wire-EDM process using grey relational analysis method (GRA) and backpropagation neural network–genetic algorithm (BPNN–GA) methods. Multidiscip Model Mater Struct. 2019;15(5):1016–34.

Teng JG, Hu YM, Yu T. Stress–strain model for concrete in FRP-confined steel tubular columns. Eng Struct. 2013;49:156–67.

Thai S, Thai HT, Uy B, Ngo T. Concrete-filled steel tubular columns: Test database, design and calibration. J Constr Steel Res. 2019;157:161–81.

Tran VL, Jang Y, Kim SE. Improving the axial compression capacity prediction of elliptical CFST columns using a hybrid ANN-IP model. Steel Compos Struct. 2021;39(3):319–35.

Tran VL, Kim SE. Efficiency of three advanced data-driven models for predicting axial compression capacity of CFDST columns. Thin-Walled Struct. 2020;152(April):106744.

Tran VL, Thai DK, Kim SE. Application of ANN in predicting ACC of SCFST column. Compos Struct. 2019;228(April):111332.

Tran VL, Thai DK, Nguyen DD. Practical artificial neural network tool for predicting the axial compression capacity of circular concrete-filled steel tube columns with ultra-high-strength concrete. Thin-Walled Struct. 2020;151(January):106720.

Vu QV, Truong VH, Thai HT. (2021), “Machine learning-based prediction of CFST columns using gradient tree boosting algorithm.” Compos Struct. 2020;259(December):113505.

Wu J, Luo Z, Zhang N, Gao W. A new sequential sampling method for constructing the high-order polynomial surrogate models. Eng Comput. 2018;35(2):529–64.

Zarringol M, Thai HT, Naser MZ. Application of machine learning models for designing CFCFST columns. J Constr Steel Res. 2021;185(December 2020):106856.

Zarringol M, Thai HT, Thai S, Patel V. Application of ANN to the design of CFST columns. Structures. 2020;28(August):2203–20.

## Acknowledgements

The second author thanks the Ministry of Science of the Republic of Serbia for financial support under project number 2000092.

## Funding

Not applicable.

## Author information

### Authors and Affiliations

### Contributions

Both authors FĐ and SMK carried out the conception and design of the research. FĐ prepared and preprocessed the data, and worked on the implementation of the proposed methods. SMK worked on introduction, literature review, and discussion of the results. Both authors read and approved the final manuscript.

### Corresponding author

## Ethics declarations

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

The authors declare that there are no conflicts of interest in this research.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Đorđević, F., Kostić, S.M. Practical ANN prediction models for the axial capacity of square CFST columns.
*J Big Data* **10**, 67 (2023). https://doi.org/10.1186/s40537-023-00739-y

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40537-023-00739-y