Skip to main content

Detecting heterogeneity parameters and hybrid models for precision farming

Abstract

Precision farming (PF) plays a crucial role in the field of agriculture to solve the challenges of food shortages in society. Heterogeneity, multicollinearity, and outliers are problems in PF because they can cause bias and lead to incorrect inferences. However, traditional methods typically assume it to be a homogenous model, and in machine learning, data scientists ignore heterogeneity. In this study, the aim is to identify the heterogeneity parameters and develop hybrid models before and after heterogeneity. Data on seaweed is collected using sensor smart farming technology attached to v-Groove Hybrid Solar Drier (v-GHSD). There are 29 drying parameters, and each parameter has 1914 observations. We considered the highest order up to the second order interaction, and the parameters increased to 435 parameters from 29 parameters. In high-dimensional data, the number of observations is less than the number of parameters. The authors proposed a method using the variance inflation factor to identify the heterogeneity parameters. Seven predictive models such as ridge, random forest, support vector machine, bagging, boosting, LASSO and elastic net are used to select the 15, 25, 35 and 45 significant drying parameters for the moisture content removal of the seaweed, and hybrid models are developed using robust statistical methods. For before heterogeneity, the hybrid model random forest M Hampel with 19 outliers is the best, because it performs better when compared to other models. For after heterogeneity, the hybrid model boosting M Hampel with 19 outliers is the best, because it performs better when compared to other models. These results are vital to seaweed precision farming. The study of heterogeneity will not only help us to comprehend the dynamics of the large number of the drying parameters, but also gives a way to leverage the data for efficient predictive modelling.

Introduction

Farming involves the growing of crops and the rearing of livestock. It is a source of raw materials for industries. The traditional methods used by farmers are not precise, which leads to manual labour and the consumption of time [1]. Precision farming (PF) plays a vital role in the field of agriculture to solve the challenges of food shortages in society. The PF method is a subset of smart farming technologies (SFTs) that deals with information systems, the internet of things (IoT), precision agriculture systems, artificial intelligence, cloud computing, farm management, wireless sensor networks, robotics, and automation of agriculture [2, 3]. The merit of the method is that it boosts farm profits and cuts down the cost of production [6].

Seaweeds are also called macroalgae. They are like plant organisms attached to rocks or rock layers. In addition, they grow in lakes, oceans, rivers, and water bodies [7, 8]. It a crucial source of fat, carbohydrates, vitamins, fibre, and ash, as well as proteins and beta-carotene [9]. For example, seaweed is useful in many forms (for example, powder, fresh, salted, canned, dried or extracts) for eating by humans or as feeds, biofuels, medicines, and fertilisers [10]. (See Fig. 1 for the stages involved in seaweed pre-harvest and post-harvest of seaweed).

Fig. 1
figure 1

Seaweed processing application [25]

One of the post-harvest problems with seaweed is the high moisture content. According to [11], seaweed is easily damaged when it is very fresh. Therefore, this demands that seaweed be dried after harvesting. The drying of seaweed is used to reduce the moisture content [15]. The biomass weight of seaweed during transportation will be decreased, which makes it available for additional processing [12]. Drying also reduces storage, transportation, and processes to prevent losses and increase value [14]. The types of drying are freeze-drying (direct drying method), conventional drying and microwave- assisted drying (solar). See Fig. 2 for details. A solar drier is the most efficient drying method for seaweed and can dry the water content faster [16]. These authors [13, 17,18,19] have employed solar driers in their studies. The drying parameters using v-Groove Hybrid Solar Drier (v-GHSD) were monitored effectively by [13, 17]. Furthermore, the internet of things (IoT) based solar drying system using the v-Groove Hybrid Solar Drier (v-GHSD) was more effective in monitoring the drying behaviour [13, 15]. All the parameters involved in solar drying should be studied to reduce the moisture content of seaweed, improve food quality and quantity. However, the methods Density-Based Spatial Clustering of Applications (DBSCAN), Clustering Large Applications (CLARA), Partitioning About Medoids (PAM) and multiple linear regression were used to find the optimal parameters to increase the production of crops [24].

Fig. 2
figure 2

Types of existing drying methods for seaweed

ML algorithms are used to model complicated problems that humans cannot understand because of their complexity. In addition, these algorithms are useful to detect diseases, predict soil parameters, predict crop yield, and detect species [1, 6].

A study conducted by [26] on fish drying investigated the moisture content using ridge regression in conjunction with eight selection criteria. The most significant factors influencing the moisture content and the interaction terms were investigated. From the results, the important drying parameters can be predicted from the moisture content of fish. Research by [27] on the drying parameters that determined the moisture content removal of seaweed was investigated. From the results, bagging performed better than boosting in determining the drying parameters of the seaweed, but heterogeneity was not considered.

Big data analysis comes with many challenges, such as outliers, and multicollinearity. Many studies have been conducted on how to handle these problems. Another problem facing big data is heterogeneity and there is insufficient knowledge about heterogeneity, especially in the field of agriculture using seaweed big data. In addition, the data obtained in big data has varied sources and some are structured and unstructured [28]. All these complexities make the data complicated to analyse. Heterogeneity refers to variation in the data. This variability needs to be investigated to avoid wrong results and inferences.

Heterogeneity is a problem in the field of agriculture. For example, [29] found that there is substantial heterogeneity driving the forces of the rice ecosystem. The results showed that the adoption of each management method has heterogeneity. According to [30], heterogeneity was based on the spatial characteristics and behaviour of the participants, which influenced decision making. In the study of hydrological response to heterogeneity using a variable infiltration capacity model by [31], accounting for heterogeneity in land use gives better responses to hydrology and evapotranspiration. A study on the effects of ignoring heterogeneity showed that ignoring heterogeneity results in overestimation of the technical efficiency and underestimation of the parameters of the models [32]. The study on farmland heterogeneity revealed that under different ecosystem services (ES). The changes in heterogeneity are not the same, there is a need for improvement in the ES to understand the market, especially for pest regulation and crop production [33]. According to [34], the effect of temperature on yield was a significant heterogeneity and it was an eye opener for adaptation between cooler and warmer counties. The study on bird diversity by [35] revealed that the community is affected by cropland heterogeneity and cropland size.

Additionally, there is little research on the parameters influencing the moisture content removal of seaweed. Even in the literature found, few researchers have worked on seaweed big data. Also, few studies considered the interaction terms in seaweed drying. There is no study that compared the outliers before and after heterogeneity. Finally, there is no study on heterogeneity using big data in agriculture, especially on the moisture content removal of seaweed.

A lot of studies have been done on outliers and multicollinearity, but not on heterogeneity. In fact, we do not find any literature in the agricultural field that addresses heterogeneity using drying parameters. Hence, this study focuses on how to detect the heterogeneity of drying parameters and develop hybrid models to determine the significant parameters of the moisture content removal of seaweed. Interaction effects up to the second order for the seaweed big data are incorporated into the model. In addition, hybrid models using seven supervised ML algorithms with robust estimation are utilised to determine the significant parameters that determine the moisture content removal of the seaweed and reduce the number of outliers. The accuracy of the ML algorithms is also investigated via evaluation metrics. Finally, the impact of the errors is also compared before and after heterogeneity.

Materials and methods

Seven supervised machine learning algorithms such as ridge, random forest, support vector machine, bagging, boosting, LASSO and elastic net will be used to determine the significant parameters for the moisture content removal of the seaweed before and after heterogeneity. In addition, robust methods are utilised for the development of the hybrid models. The flowchart in Fig. 3 states the procedure and methodology used in this research.

Fig. 3
figure 3

Flowchart for the study

Data description

The data are collected from 8th April 2021 to 12th April 2021, between the hours of 8:00 am to 5:00 pm during the drying of seaweed by using v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. Some of the parameters are temperature, relative humidity ambient, relative humidity chamber, and solar radiation. Table 1 shows the 29 main parameters, and each parameter has 1914 observations in this study, which is equivalent to 536,870,912 equations. Each observation area is evaluated as a parameter and the region is considered to simplify the system. This is not feasible to deal with because of the time and complexity. The addition of the second order interaction to the main 29 seaweed drying parameters increased all the parameters to 435. Optimization by selecting the first 15, 25, 35 and 45 high-ranking important variables is performed.

Table 1 Representation of parameters

Phase I

This involves the addition of all possible models up to second order and testing of assumptions. According to [15], the total number of models can be calculated by using Eq. 1.

$$N=\sum_{j=1}^{k}j\left(\begin{array}{c}k\\ c\\ j\end{array}\right)$$
(1)

where \(N\) represents number of possible models, \(k\) is the total number of explanatory variables and \(j=\mathrm{1,2},3,\dots ,k\). The assumptions of linearity, errors, observations, independent variables, and heterogeneity are checked in the R programming language. Then ridge, random forest (RF), support vector machine (SVM), boosting, bagging, ridge, LASSO and elastic net are used to select the significant parameters that determine the moisture content removal. The 15, 25, 35 and 45 parameters are selected because features selection can only provide the rank of important variables and does not tell us the number of significant factors [36]. Next, the validation metrics are computed using mean absolute percentage error (MAPE), mean squared error MSE and coefficient of determination (R2).

Phase II

Next, the computation of VIF is done with vif from the car library in R using the original data. This gives the range of the values for the variances before we compute the R-squared and 90% confidence interval. If the model has a value that falls below the maximum R-squared, then it exhibits heterogeneity. The models that exhibit heterogeneity are excluded and the models that do not exhibit heterogeneity are included. Then, the ML algorithms in phase I are used to select the 15, 25, 35 and 45 significant parameters.

Phase III

Next, the hybrid models are developed for before and after heterogeneity using robust methods. Data with outliers can be analysed by using robust estimation [37, 38]. The robust methods that are used are M Bi-Square, M Hampel, M Huber, MM and S. Finally, the validation metrics are computed using the 3—sigma limits to identify the number of outliers. The sigma limits are used for quality improvement [41].

The v-Groove Hybrid Solar Drier (v-GHSD)

In this study, v-Groove Hybrid Solar Drier (v-GHSD) was used for drying the seaweed. Solar drier is a used in precision agriculture to dry foods by using solar energy to improve the quality of food and reduce wastage. The v-GHSD drier (Fig. 4) comprises a solar panel, a v-aluminium roof, a drying chamber solar collector, and sensors using the internet of things to retrieve data. All the parameters are to receive data from different locations of the drying drier. The sensors are positioned to measure the data for temperature, solar radiation, relative humidity, and moisture content. IoT cloud database was used to understand the performance and the interaction of drying parameters during identified drying period and then, the data are stored in cloud database for every second and later converted to thirty minute intervals for performing analysis and identifying heterogeneity parameters and reduce the multicollinearity and outliers, using the proposed model to determine the moisture content removal.

Fig. 4
figure 4

v-Groove Hybrid Solar Drier (v-GHSD)

Heterogeneity identification

Heterogeneity refers to variability of observations. This variability leads to inconsistent estimates and distort conclusion [42]. Suppose we have this multiple linear regression (MLR)

$$Y={\beta }_{0}+{T}_{1}{\beta }_{1}+{T}_{2}{\beta }_{2}+\dots +{a}_{j}+\varepsilon$$
(2)

where \(Y\) is the moisture content, estimates \({\beta }{\prime}s\) are the regression coefficients, \({T}{\prime}s\) are the drying parameter, \({a}_{j}\) denote heterogeneity, that is, the parameters that exhibit heterogeneity and \(\varepsilon\) is the random error. In Eq. 2, a common problem is the issue of multicollinearity, and this happens when many variables that are correlated and significant not only with dependent variable, but also with each other. Our interest in this equation is \({a}_{j}\). In Eq. (2), if we estimate the regression equation and omit a crucial variable, then the estimate of β will be biased and inconsistent. According to [43], the variance inflation factor in multiple regression is used to quantify the level of severity. It can be computed with

$${VIF}_{l}=\frac{1}{1-{R}_{l}^{2}}$$
(3)

Which means that \({R}^{2}=1-\frac{1}{VIF}.\)

If the R2 satisfied certain conditions, then the parameter is said to exhibit heterogeneity.

Evaluation metric

The suitability and accuracy of the models were evaluated using the mean absolute percentage error (MAPE), mean squared error (MSE) and coefficient of determination (\({R}^{2}\)). The metrics are stated in Table 2, where \({y}_{i}\) is the actual value and \(\overline{y }\) is the mean of the actual value and \({\widehat{y}}_{i}\) is the forecast value.

Table 2 Evaluation metric

Statistical power for percentage change and absolute change

Statistical power is the probability of a test to reject a false null hypothesis. Statistical power = P(reject H1|H1 is false) where \({H}_{1}\) is the null hypothesis. For a t-test, the equation becomes P(|t|> tα/2) = P(Pt < α) where tα/2 represents t-value under the level of significance α and \({P}_{t}\) is the t-test p-value.

$${\mathrm{Percentage \,change} \,P}_{c}=\frac{{B}_{MAPE}-{A}_{MAPE}}{{B}_{MAPE}} \times 100$$
(5)
$${{\mathrm{Absolute \,change } \, A}_{c}=|B}_{MAPE}-{A}_{MAPE }|$$
(6)

where \({B}_{MAPE}\) and \({A}_{MAPE}\) are the MAPE before and after heterogeneity.

To know the best indicator to use between percentage and absolute change, the statistical power must be compared [47]. Statistical power was compared through simulation [48]. According to [49], absolute change was used to study the weight change. Absolute change was used to investigate change in obesity by [50]. Percentage change was used to study change in loss of fat by [51]. A test statistic that compared the maximum likelihood of an absolute change to a percentage change was developed by [52]. According to [53], the percentage change is not affected by the unit of measurement, but the paper did not explain how to choose between absolute and percentage change.

For the evaluation, if \(R=\frac{Statistical \, power \, of \, absolute \, change}{Statistical \, power \, of \, percentage \, change}>1\) [47], then absolute change has a better statistical power than percentage change, then we choose absolute change, otherwise, we choose percentage change.

Results and discussion

In this research, the assumptions of linear regression are verified to understand the data. The heterogeneity parameters among the seaweed drying parameters are identified. To determine the significant factors that determine the moisture content removal of the seaweed, seven popular supervised machine learning algorithms such as ridge, random forest, support vector machine, bagging, boosting, LASSO, and elastic net are utilized. Furthermore, metric validations were conducted, and hybrid models were developed.

The variability of the 29 main parameters is shown in Fig. 5. Each box-plot represents each drying parameter for the seaweed and helps to understand the heterogeneity among the main parameters. The points outside the box-plot are the outliers. A box-plot uses the 5-number summary of Q1, Q2, Q3, minimum and maximum value to summarise the data. The assumptions of linearity between the dependent and independent variables are checked. No linear relationship exists between them. The assumption of no multicollinearity among the independent variables is not satisfied. The values of the variance inflation factor (VIF) are high, the highest value of the VIF was 75,337.29. It shows the high level of multicollinearity. The assumption that the observations are independent is also checked using the Durbin Watson Test. From the results we obtained, the p-value of 0 is less than the significance level α = 0.05, which shows that the residuals are autocorrelated. It means that the observations are not independent. In addition, the normality assumption is also checked with the Kolmogorov–Smirnov test. The the p-value = 2.2e−16 which is less than 0.05 means we have enough evidence to say that the residuals do not come from a normal distribution. Figures 6, 7, 8, 9, 10, 11 and 12 show the standardised residual plots for the ridge, RF, SVM, bagging, boosting, LASSO and elastic net for before and after heterogeneity.

Fig. 5
figure 5

Box-plot for the seaweed drying parameters

Fig. 6
figure 6

Comparison between the standardized residuals for 45 highest ranking variables for ridge before and after heterogeneity

Fig. 7
figure 7

Comparison between the standardized residuals for 45 highest ranking variables for random forest before and after heterogeneity

Fig. 8
figure 8

Comparison between the standardized residuals for 45 highest ranking variables for support vector machine before and after heterogeneity

Fig. 9
figure 9

Comparison between the standardized residuals for 45 highest ranking variables for bagging before and after heterogeneity

Fig. 10
figure 10

Comparison between the standardized residuals for 45 highest ranking variables for boosting before and after heterogeneity

Fig. 11
figure 11

Comparison between the standardized residuals for 45 highest ranking variables for LASSO before and after heterogeneity

Fig. 12
figure 12

Comparison between the standardized residuals for 45 highest ranking variables for elastic net before and after heterogeneity

Based on these results in Table 3, the parameters T7, T11, H5, T6, T8, H1, and PY exhibit heterogeneity. This is also evident in Fig. 5. After removing the seven parameters that exhibit heterogeneity and including the second order interaction, there are 253 parameters that determine the moisture content removal of the seaweed. The selection of important features was used by [54, 55]. The summary of the assessment results for the ML models is stated in Table 4. However, before the heterogeneity parameters are removed, all validation model measures reveal that random forest outperforms other models in predicting the significant parameters. In addition, evaluation measures with MAPE (2.125891), MSE (7.330011) and R-squared (0.9732063), indicate that significantly better results are obtained by random forest for the 45 highest important variables when compared to the 45 highest important variables for other models for significant parameters that determine the moisture content removal. After the heterogeneity parameters are removed, all validation model measures also reveal that random forest outperforms ridge, support vector machine, bagging, boosting, LASSO, and elastic net in predicting the significant parameters that determine the moisture content removal of the seaweed.

Table 3 Heterogeneity parameters
Table 4 Determination of optimal machine learning models before and after heterogeneity

In addition, evaluation measures with MAPE (7.588079), MSE (44.39000) and R-squared (0.8377405) indicate that significantly better results are obtained by random forest for the 45 highest important variables when compared to the 45 highest important variables for ridge, support vector machine, bagging, boosting, LASSO, and elastic net significant parameters that determine the moisture content removal. Since the random forest algorithm performed better than the other methods based on the results of the metrics, the 15, 25, 35 and 45 highest important variables for random forest are the most important parameters that accurately forecast the moisture content removal of the seaweed. This also confirms the results of [27, 54, 59, 60] where random forest absolutely performed better than the other methods. All the values for MAPE random forest are less than 10. It is sufficient to say that this is a high prediction accuracy for the predictive model. This is in line with [61] which claims that if MAPE value is less than 10, it is a high prediction accuracy.

By comparing the metric validation for after and before heterogeneity parameters are removed, generally for ridge, random forest, support vector machine, bagging, boosting, LASSO, and elastic net in Table 3, the MAPE and MSE after the heterogeneity parameters are removed are higher than the values of MAPE and MSE when the heterogeneity parameters have not been removed in the model. Also, the R-squared values after heterogeneity parameters are removed are lower than the R-squared before heterogeneity is removed. The results have shown that the removal of some variables can reduce the accuracy of the model.

The heterogeneity parameters that were removed did not increase the accuracy of the model. According to [62], if an MAPE validation is equal or less after the removal of a parameter, it does not mean that the parameter has no effect on the response variable. It means that the variability level in the data was not enough to be explained by the model.

The percentage change for ridge 15, bagging 15, LASSO 15 and elastic net is positive. This represents 14.3% of the total number of models and the few cases where MAPE before heterogeneity is higher than MAPE after heterogeneity. The percentage change of 24 models is negative, which means that the MAPE before heterogeneity is lower than the MAPE after heterogeneity. This represents 85.7% of the total number of models. Random forest 15, 25, 35 and 45 models have the highest negative percentage change compared to other models.

In summary, through the validation metrics, the ability of ridge, random forest, support vector machine, bagging, boosting, elastic net, and LASSO is evaluated to accomplish more substantial and significant conclusions. The results are shown in Table 4 for all models. It is observed that random forest shows higher accuracy than other models models. This proves the superiority of random forest before and after heterogeneity over the other models and it leads to higher accuracy with the lowest errors. According to [54] the number of parameters is crucial because it will reduce the training time and avoid the curse of dimensionality.

The comparison of the statistical power is shown in Table 5. The ratio of the test statistic for absolute change to percentage change is less than 1. This shows that percentage change has better statistical power than absolute change to explain the results and draw valid conclusions.

Table 5 Comparison of statistical power

Table 6 shows the results of the hybrid model and the original model before and after heterogeneity for 45 high-ranking variables. The 3-sigma limits are also provided to identify the number of outliers and make comparisons. For the ridge before heterogeneity, the best robust estimator is M Hampel with 16 outliers, while the original has 23 outliers.

Table 6 Comparison between the number and percentage of outliers outside the 3-sigma limits for the original and hybrid models for 45 high-ranking variables

For the random forest before heterogeneity, the best robust estimator is M Hampel with 19 outliers, while the original has 45 outliers. For the support vector machine before heterogeneity, the best robust estimator is M Hampel with 23 outliers and the original has 24 outliers. For the elastic net before heterogeneity, the best robust estimator is M Hampel and M Huber with 33 outliers, while the original has 29 outliers. With these results. For before heterogeneity, M Hampel robust estimation performs better than M Bi-Square, M Huber, MM and S.

For the ridge after heterogeneity, the best robust estimators are M Bi-Square and MM with 22 outliers, while the original has 29 outliers. For the random forest after heterogeneity, the best robust estimator is M Hampel with 29 outliers, while the original has 41 outliers. For the support vector machine after heterogeneity, the best robust estimator is M Bi-Square with 27 outliers, while the original has 24 outliers. For the bagging after heterogeneity, the best robust estimator is M Hampel with 21 outliers, while the original has 28 outliers. For the elastic net after heterogeneity, the best robust estimator is M Hampel and M Huber with 23 outliers, while the original has 33 outliers. With these results. For after heterogeneity, the ridge performs better with M Bi-Square and MM. Random forest, bagging and boosting perform better with M Hampel. Support vector machine and LASSO perform better with M Bi-Square. The elastic net performs better with M Hampel and M Huber.

Generally, the outliers using the 3-sigma limits for before and after heterogeneity indicate that for the original model, the number of outliers increases from before heterogeneity to after heterogeneity for ridge, LASSO, and elastic net. It is constant for support vector machine. It decreases for random forest, bagging and boosting.

Conclusions and future work

The heterogeneity parameters are identified, and hybrid models were developed to forecast the significant drying parameters that determine the moisture content removal of the seaweed after drying. Seven predictive models, such as ridge, random forest, support vector machine, bagging, boosting, LASSO, and elastic net are used for determining the significant parameters in conjunction with robust methods. These hybrid models are useful for determining the significant parameters that determine the moisture content removal of the seaweed. For before heterogeneity, the hybrid model random forest M Hampel with 19 outliers is the best, because it performs better when compared to other models. For after heterogeneity, the hybrid model boosting M Hampel with 19 outliers is the best, because it performs better when compared to other models.

For future studies, the traditional statistical methods and machine learning models for predicting the moisture content removal of seaweed can be compared. The number of selected drying parameters can be increased or all the parameters with interaction can be used. Other robust estimators such as least trimmed squares (LTS), least absolute deviation (LAD) and least median of squares (LMS) estimators can be used to develop a hybrid model.

Availability of data and materials

Data is available on request. Materials and methodologies are described in this paper.

Abbreviations

PF:

Precision farming

v-GHSD:

V-Groove Hybrid Solar Drier

LASSO:

Least absolute shrinkage and selection operator

MAPE:

Mean absolute percentage error

MSE:

Mean squared error

SSE:

Sum of squared error

R-squared:

Coefficient of determination

ML:

Machine learning

SFTs:

Smart farming technologies

IoT:

Internet of things

GPS:

Global positioning system

RF:

Random forest

SVM:

Support vector machine

VIF:

Variance inflation factor

References

  1. Durai SKS, Shamili MD. Smart farming using machine learning and deep learning techniques. Decis Anal J. 2022;3: 100041.

    Article  Google Scholar 

  2. Moysiadis V, Sarigiannidis P, Vitsas V, Khelifi A. Smart Farming in Europe. Computer Science Review, 2021;39. https://doi.org/10.1016/j.cosrev.2020.100345.

  3. Klerkx L, Jakku E, Labarthe P. A review of social science on digital agriculture, smart farming and agriculture 4.0: new contributions and a future research agenda. NJAS Wageningen J Life Sci. 2019;90–91. https://doi.org/10.1016/j.njas.2019.100315.

    Article  Google Scholar 

  4. Rose DC, Chilvers J. Agriculture 4.0: broadening responsible innovation in an era of smart farming. Front Sustain Food Syst. 2018. https://doi.org/10.3389/fsufs.2018.00087.

    Article  Google Scholar 

  5. Balafoutis AT, van Evert FK, Fountas S. Smart farming technology trends: Economic and environmental effects, labor impact, and adoption readiness. Agronomy. 2020;10(5). https://doi.org/10.3390/agronomy10050743.

    Article  Google Scholar 

  6. Sharma A, Jain A, Gupta P, Chowdary V. Machine learning applications for precision agriculture: a comprehensive review. IEEE Access. 2021;9:4843–73.

    Article  Google Scholar 

  7. National Oceanic and Atmospheric Administration. What is seaweed? National Ocean Service. 2017. https://oceanservice.noaa.gov/facts/seaweed.html#:~:text=%22Seaweed%22%20is%20the%20common%20name,Marine%20Sanctuary%20and%20National%20Park.

  8. Guiry MD. What are seaweeds? The Seaweed Site. 2014. https://www.seaweed.ie/algae/seaweeds.php.

  9. Suwati S, Romansyah E, Syarifudin S, Jani Y, Purnomo AH, Damat D, et al. Comparison between natural and cabinet drying on weight loss of seaweed Euchuema cottonii Weber-van Bosse. Sarhad J Agric. 2021;37(SpecialIssue 1):1–8.

    Google Scholar 

  10. Buschmann AH, Camus C, Infante J, Neori A, Israel Á, Hernández-González MC, et al. Seaweed production: overview of the global state of exploitation, farming and emerging research activity. Eur J Phycol. 2017;52(4):391–406.

    Article  Google Scholar 

  11. Pradana GB, Prabowo KB, Hastuti RP, Djaeni M, Prasetyaningrum A. Seaweed drying process using tray dryer with dehumidified air system to increase efficiency of energy and quality product. IOP Conf Ser Earth Environ Sci. 2019. https://doi.org/10.1088/1755-1315/292/1/012070.

    Article  Google Scholar 

  12. Ali MKM, Sulaiman J, Md Yasir S, Ruslan M. Cubic spline as a powerful tools for processing experimental drying rate data of seaweed using solar drier. Malay J Math Sci. 2017;11:159–72.

    MATH  Google Scholar 

  13. van Oirschot R, Thomas JBE, Gröndahl F, Fortuin KPJ, Brandenburg W, Potting J. Explorative environmental life cycle assessment for system design of seaweed cultivation and drying. Algal Res. 2017;1(27):43–54.

    Article  Google Scholar 

  14. Xiao HW, Mujumdar AS. Importance of drying in support of human welfare. Drying Technol. 2020;38(12):1542–3.

    Article  Google Scholar 

  15. Suherman S, Djaeni M, Kumoro AC, Prabowo RA, Rahayu S, Khasanah S. Comparison drying behavior of seaweed in solar, sun and oven tray dryers. MATEC Web Conf. 2018. https://doi.org/10.1051/matecconf/201815605007.

    Article  Google Scholar 

  16. Ali MKM, Fudholi A, Sulaiman J, Muthuvalu MS, Ruslan MH, Yasir SMd, et al. Post-harvest handling of eucheumatoid seaweeds. In: Tropical seaweed farming trends, problems and opportunities. Springer International Publishing, Cham; 2017. p. 131–45.

  17. Ali MKM, Sulaiman J, Md Yasir S, Ruslan M. Cubic Spline as a Powerful Tools for Processing Experimental Drying Rate Data of Seaweed Using Solar Drier. Malaysian Journal of Mathematical Sciences, 2017;11:159–172.

  18. Nimnuan P, Nabnean S. Experimental and simulated investigations of the performance of the solar greenhouse dryer for drying cassumunar ginger (Zingiber cassumunar Roxb.). Case Stud Thermal Eng. 2020;22. https://doi.org/10.1016/j.csite.2020.100745.

    Article  Google Scholar 

  19. Lakshmi DVN, Muthukumar P, Layek A, Nayak PK. Drying kinetics and quality analysis of black turmeric (Curcuma caesia) drying in a mixed mode forced convection solar dryer integrated with thermal energy storage. Renew Energy. 2018;120. https://doi.org/10.1016/j.renene.2017.12.053.

    Article  Google Scholar 

  20. Pankaew P, Aumporn O, Janjai S, Pattarapanitchai S, Sangsan M, Bala BK. Performance of a large-scale greenhouse solar dryer integrated with phase change material thermal storage system for drying of chili. Int J Green Energy. 2020;17(11). https://doi.org/10.1080/15435075.2020.1779074.

    Article  Google Scholar 

  21. Vijayan S, Arjunan TV, Kumar A. Exergo-environmental analysis of an indirect forced convection solar dryer for drying bitter gourd slices. Renew Energy. 2020;146. https://doi.org/10.1016/j.renene.2019.08.066.

    Article  Google Scholar 

  22. Hao W, Liu S, Mi B, Lai Y. Mathematical modeling and performance analysis of a new hybrid solar dryer of lemon slices for controlling drying temperature. Energies (Basel). 2020;13(2). https://doi.org/10.3390/en13020350.

    Article  Google Scholar 

  23. Nabnean S, Nimnuan P. Experimental performance of direct forced convection household solar dryer for drying banana. Case Stud Thermal Eng. 2020;22. https://doi.org/10.1016/j.csite.2020.100787.

    Article  Google Scholar 

  24. Majumdar J, Naraseeyappa S, Ankalaki S. Analysis of agriculture data using data mining techniques: application of big data. J Big Data. 2017;4(1). https://doi.org/10.1186/s40537-017-0077-4.

    Article  Google Scholar 

  25. Ali MKM, Critchley AT, Hurtado AQ. The impacts of AMPEP K+ (Ascophyllum marine plant extract, enhanced with potassium) on the growth rate, carrageenan quality, and percentage incidence of the damaging epiphyte Neosiphonia apiculata on four strains of the commercially important carrageenophyte Kappaphycus, as developed by micropropagation techniques. J Appl Phycol. 2020;32(3). https://doi.org/10.1007/s10811-020-02117-0.

    Article  Google Scholar 

  26. Lim HY, Fam PS, Javaid A, Ali MKM. Ridge regression as efficient model selection and forecasting of fish drying using v-groove hybrid solar drier. Pertanika J Sci Technol. 2020;28(4):1179–202.

    Article  Google Scholar 

  27. Majahar Ali MKM, Tahir Ismail M, Hamundu FM, Akhtar NA, et al. Hybrid model in machine learning–robust regression applied for sustainability agriculture and food security. Int J Electric Comput Eng. 2022;12(4):4457–68.

    Google Scholar 

  28. El-Din AMG, Senousy MB. A Solution for Handling Big Data Heterogeneity Problem. In: Lecture Notes in Networks and Systems. Springer, Singapore. 2022;224. https://doi.org/10.1007/978-981-16-2275-5_11.

  29. Gouraram P, Goyari P, Paltasingh KR. Rice ecosystem heterogeneity and determinants of climate risk adaptation in Indian agriculture: farm-level evidence. J Agribus Dev Emerg Econ. 2022. https://doi.org/10.1108/JADEE-03-2022-0044.

    Article  Google Scholar 

  30. Kanchanaroek Y, Aslam U. Policy schemes for the transition to sustainable agriculture—farmer preferences and spatial heterogeneity in northern Thailand. Land Use Policy. 2018;1(78):227–35.

    Article  Google Scholar 

  31. Srivastava A, Kumari N, Maza M. Hydrological response to agricultural land use heterogeneity using variable infiltration capacity model. Water Resour Manage. 2020;34(12):3779–94.

    Article  Google Scholar 

  32. Li K, Liu J, Xue Y, Rahman S, Sriboonchitta S. Consequences of ignoring dependent error components and heterogeneity in a stochastic frontier model: an application to rice producers in northern Thailand. Agriculture. 2022;12(8):1078.

    Article  Google Scholar 

  33. Botzas-Coluni J, Crockett ETH, Rieb JT, Bennett EM. Farmland heterogeneity is associated with gains in some ecosystem services but also potential trade-offs. Agric Ecosyst Environ. 2021;1:322.

    Google Scholar 

  34. Keane M, Neal T. Climate change and U.S. agriculture: accounting for multi-dimensional slope heterogeneity in production functions. Quantitative Economics, 2000;11:1391–1429

  35. Liao J, Liao T, He X, Zhang T, Li D, Luo X, et al. The effects of agricultural landscape composition and heterogeneity on bird diversity and community structure in the Chengdu Plain. China Glob Ecol Conserv. 2020;1:24.

    Google Scholar 

  36. Drobnič F, Kos A, Pustišek M. On the interpretability of machine learning models and experimental feature selection in case of multicollinear data. Electronics (Switzerland). 2020;9(5) https://doi.org/10.3390/electronics9050761.

    Article  Google Scholar 

  37. Alma ÖG. Comparison of robust regression methods in linear regression. Int J Contemp Math Sci. 2011;6(9):409–21.

    MathSciNet  Google Scholar 

  38. Javaid A, Ismail MT, Ali MKM. Efficient model selection of collector efficiency in solar dryer using hybrid of LASSO and robust regression. Pertanika J Sci Technol. 2020;28(1):193–210.

    Google Scholar 

  39. Mohamed AE, Almongy HM, Mohamed AH. Comparison between M-estimation, S-estimation, and MM estimation methods of robust estimation with application and simulation. Int J Math Arch. 2018;9(11):55.

    Google Scholar 

  40. Mukhtar Ali MKM, Javaid A, Ismail MT, Fudholi A. Accurate and hybrid regularization—robust regression model in handling multicollinearity and outlier using 8SC for big data. Math Model Eng Probl. 2021;8(4):547–56.

    Article  Google Scholar 

  41. Wijaya IMS, Sari DI. Quality control of optical fiber disruption with big data using the six sigma method. JURTEKSI (J Teknol Sist Inform). 2022;8(2):125–32.

    Article  Google Scholar 

  42. Gormley TA, Matsa DA. Common errors: how to (and not to) control for unobserved heterogeneity. Rev Financ Stud. 2014;27(2):617–61.

    Article  Google Scholar 

  43. Cheng J, Sun J, Yao K, Xu M, Cao Y. A variable selection method based on mutual information and variance inflation factor. Spectrochim Acta A Mol Biomol Spectrosc. 2022;5:268.

    Google Scholar 

  44. Kim S, Kim H. A new metric of absolute percentage error for intermittent demand forecasts. Int J Forecast. 2016;32(3):669–79.

    Article  Google Scholar 

  45. Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci. 2021;7:1–24.

    Article  Google Scholar 

  46. Gouda SG, Hussein Z, Luo S, Yuan Q. Model selection for accurate daily global solar radiation prediction in China. J Clean Prod. 2019;1(221):132–44.

    Article  Google Scholar 

  47. Stridbeck R, Zhang L, Han K. How to analyze change from baseline: absolute or percentage change? D-level Essay in Statistics. 2009;1–18.

  48. Vickers AJ. The use of percentage change from baseline as an outcome in a controlled trial is statistically inefficient: a simulation study. BMC Med Res Methodol. 2001. https://doi.org/10.1186/1471-2288-1-6.

    Article  Google Scholar 

  49. Waleekhachonloet OA, Limwattananon C, Limwattananon S, Gross CR. Group behavior therapy versus individual behavior therapy for healthy dieting and weight control management in overweight and obese women living in rural community. Obes Res Clin Pract. 2007;1(4):223–32.

    Article  Google Scholar 

  50. Neovius M, Rössner S. Results from a randomized controlled trial comparing two low-calorie diet formulae. Obes Res Clin Pract. 2007;1(3):165–71.

    Article  Google Scholar 

  51. Kim MK, Tanaka K, Kim MJ, Matuso T, Endo T, Tomita T, et al. Comparison of epicardial, abdominal and regional fat compartments in response to weight loss. Nutr Metab Cardiovasc Dis. 2009;19(11). https://doi.org/10.1016/j.numecd.2009.01.010.

    Article  Google Scholar 

  52. Kaiser L. Adjusting for baseline: change or percentage change? Stat Med. 1989. https://doi.org/10.1002/sim.4780081002.

    Article  Google Scholar 

  53. Törnqvist L, Vartia P, Vartia YO. How should relative changes be measured? Am Stat. 1985;39(1):43–6.

    Google Scholar 

  54. Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. J Big Data. 2020;7(1):1–26.

    Article  Google Scholar 

  55. Han Y. Stable feature selection: theory and algorithms. State University of New York at Binghamton. 2012.

  56. Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, et al. Feature selection: a data perspective. ACM computing surveys (CSUR). 2017;50(6):1–45.

  57. Gupta C. Feature selection and analysis for standard machine learning classification of audio beehive samples. (Doctoral dissertation, Utah State University). 2019.

  58. Ali MKM, Mukhtar, Ismail MT, Ferdinand MH, Alimuddin. Machine learning-based variable selection: An evaluation of Bagging and Boosting. Turk J Comput Math Educ. 2021;12(13):4343–9.

  59. Roell GW, Sathish A, Wan N, Cheng Q, Wen Z, Tang YJ, et al. A comparative evaluation of machine learning algorithms for predicting syngas fermentation outcomes. Biochem Eng J. 2022;1:186.

    Google Scholar 

  60. Adugna T, Xu W, Fan J. Comparison of random forest and support vector machine classifiers for regional land cover mapping using coarse resolution FY-3C images. Remote Sens (Basel). 2022;14(3). https://doi.org/10.3390/rs14030574.

    Article  Google Scholar 

  61. Sumari ADW, Charlinawati DS, Ariyanto Y. A simple approach using statistical-based machine learning to predict the weapon system operational readiness. In: The 1st International Conference on Data Science and Official Statistics. 2021. p. 343–51.

  62. Jimenez-Marquez SA, Thibault J, Lacroix C. Prediction of moisture in cheese of commercial production using neural networks. Int Dairy J. 2005;15(11):1156–74.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the “Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme with Project Code: FRGS/1/2022/STG06/USM/02/13” for their support in this research.

Funding

The authors are grateful to the “Ministry of Higher Education Malaysia for Fundamental Research Grant Scheme with Project Code: FRGS/1/2022/STG06/USM/02/13” for their financial support.

Author information

Authors and Affiliations

Authors

Contributions

OJI: He conducted the analysis and manuscript development. FPS: She designed the experiment, contributed to the results, discussion, and supervision. JS: He contributed to the writing, logic and editing. MKMA: He designed the experiment, contributed to the manuscript writing and supervision of the work. All the authors approved the final manuscript.

Corresponding author

Correspondence to Majid Khan Majahar Ali.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibidoja, O.J., Shan, F.P., Sulaiman, J. et al. Detecting heterogeneity parameters and hybrid models for precision farming. J Big Data 10, 130 (2023). https://doi.org/10.1186/s40537-023-00810-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40537-023-00810-8

Keywords