- Research
- Open access
- Published:

# The run test for two samples in the presence of uncertainty

*Journal of Big Data*
**volume 10**, Article number: 166 (2023)

## Abstract

The run test, which examines whether two samples selected from the same population are random, has been employed. However, the current run test for two samples is based on the assumption of certainty, which is not always valid in practical scenarios. This paper aims to introduce a modified version of the run test for two samples that account for uncertainty. We will develop a statistical approach for the run test that considers uncertain factors such as sample size, level of significance, and observations. To evaluate the effectiveness of the proposed test, we analyze wind power and photovoltaic power data. The analysis of these variables demonstrates that they are randomly selected from the population. The results indicate that the proposed run test is well-suited for addressing uncertainty in renewable energy. By employing this modified test, we can effectively assess the randomness of samples and make reliable conclusions in uncertain conditions.

## Introduction

The statistical tests are used to classify the nature of the data collected from various fields. These tests have also been used in decision-making when the unknown population parameter. The statistical tests are workable when a sample(s) are selected at random from the population under study. The statistical tests cannot be applied if the assumption of the random sample is violated. The run test related to two samples has many applications to decide the randomness of the data selected from the population. The run tests related to two samples work under the assumption that the samples should be selected from the same population. The null hypothesis is stated as the two samples selected from the population are random vs. the alternative hypothesis that two samples are away from the randomness. Kanji [1] presented the procedure and application of the run test related to two samples. Dakhilalian et al. [2], Haramoto [3], Paindaveine [4] and Doğanaksoy et al. [5] proposed the various run tests and discussed the applications.

The statistical methods have many applications in the fields related to energy or renewable energy. The main of statistical analysis or modeling is to predict/forecast the wind speed and energy produced by wind. Many researchers worked in the energy area to solve various issues, see for example, Brano et al. [6], Lü et al. [7], Masseran [8], Ren et al. [9], Arias-Rosales and Osorio-Gómez [10], Vogt et al. [11], Katinas et al. [12], Qing [13], Mohammed et al. [14], Min et al. [15], Akgül and Şenoğlu [16], Mahmood et al. [17], Alrashidi et al. [18], Campisi-Pinto et al. [19], Deep et al. [20], Jang et al. [21], Kapen et al. [22], ul Haq et al. [23] and Wenxin et al. [24] applied various statistical methods in the fields of energy and renewable energy. Liu et al. [25] studied the effect of the subsidy on wind power and photovoltaic power.

The classical statistical methods for estimating, analyzing and forecasting the energy are applied under the assumption that certain information about the parameters, sample, and observations. According to Viertl [26], “statistical data are frequently not precise numbers but more or less non-precise also called fuzzy. Measurements of continuous variables are always fuzzy to a certain degree”. Filzmoser and Viertl [27], Tsai and Chen [28], Taheri and Arefi [29], Jamkhaneh and Ghara [30], Chachi et al. [31], Kalpanapriya and Pandian [32], Parthiban and Gajivaradhan [33], Montenegro et al. [34], Park et al. [35] and Garg and Arora [36] presented work using the fuzzy logic.

The fuzzy logic can give information about two measures. The fuzzy logic is answerless about the critical measure is called the “measure of indeterminacy”. The neutrosophic logic is found to be efficient than the fuzzy logic, see Smarandache [37]. Abdel-Basset et al. [38] discussed the application of the neutrosophic logic in renewable energy. More details about the neutrosophic logic can be seen in Das and Edalatpanah [39] and El Barbary and Abu Gdairi [40]. Smarandache [41] introduced the neutrosophic statistics to give interpretation of the imprecise data. Several authors demonstrated performance of neutrosophic statistics over the classical statistics, see, for example, Chen et al. [42, 43], Sherwani et al. [44], Aslam [45] and Albassam et al. [46].

In the literature, run test related to two samples under classical statistics is available. The existing run test related to two samples using to classical statistics has the limitation in its application when level of significance, sample size, or the observations in the data are uncertain. By exploring the literature and the best of the author’s knowledge, there is no work on run test related to two samples using neutrosophic statistics. We found no work on run tests related to samples using neutrosophic statistics. There is still a gap to work on the run test related to two samples using the neutrosophic statistics. In this paper, the main aim is to design the run test related to two samples using neutrosophic statistics. Therefore, the main contribution of the paper is to introduce run test related to two samples that can be applied when imprecise observations are in the data. The proposed test is designed and applied using wind power and photovoltaic power selected from renewable energy, see Liu et al. [25]. It is expected that the use of the proposed run test related to two samples in renewable energy will be more edifying than the existing tests. It is also expected that the proposed test will be more information than the current run test under classical statistics.

## Methodology

The existing run test of randomness related to two samples is applied for testing the random of two samples assuming that all parameters associated with the test are specific and determinate. In practice, in statistical theory, the assumption of certainty about the level of significance and sample size is not fulfilling always. Using the existing run test of randomness related to two samples is not workable in the presence of uncertainty. In this section, we will modify the current test under neutrosophic statistics with the expectation that the proposed run test for two samples can be used effectively in the presence of indeterminacy/uncertainty. The methodology of the proposed run test for two samples is stated as follows: Let \({\alpha }_{N}={\alpha }_{L}+{\alpha }_{U}{I}_{N\alpha };\,{I}_{N\alpha }\epsilon \left[{I}_{L\alpha },{I}_{U\alpha }\right]\) be a neutrosophic level of significance and \({n}_{N}={n}_{L}+{n}_{U}{I}_{Nn};\,{I}_{Nn}\epsilon \left[{I}_{Ln},{I}_{Un}\right]\) be the neutrosophic random sample, where \({\alpha }_{L}\), \({n}_{L}\) are the lower values (determinate), \({\alpha }_{U}{I}_{N\alpha }\), \({n}_{U}{I}_{Nn}\) are the upper values (indeterminate), and \({I}_{N\alpha }\epsilon \left[{I}_{L\alpha },{I}_{U\alpha }\right]\), \({I}_{Nn}\epsilon \left[{I}_{Ln},{I}_{Un}\right]\) are measures of indeterminacy/uncertainty associated with level of significance and sample size, respectively. Suppose that \({Z}_{NC}={Z}_{LC}+{Z}_{UC}{I}_{NC};\, {I}_{NC}\epsilon \left[{I}_{LC},{I}_{UC}\right]\) be the neutrosophic tabulated/critical values corresponding to \({\alpha }_{L}\) and \({\alpha }_{U}{I}_{N\alpha }\), respectively. Let \({X}_{N1}={X}_{L1}+{X}_{U1}{I}_{NX1};\,{I}_{NX1}\epsilon \left[{I}_{LX1},{I}_{UX1}\right]\) and \({X}_{N2}={X}_{L2}+{X}_{U2}{I}_{NX2};\,{I}_{NX2}\epsilon \left[{I}_{LX2},{I}_{UX2}\right]\) be two neutrosophic random samples selected from the same neutrosophic population. Suppose that two samples of sizes \({n}_{N1}={n}_{L1}+{n}_{U1}{I}_{Nn1};\,{I}_{Nn1}\epsilon \left[{I}_{Ln1},{I}_{Un1}\right]\) and \({n}_{N2}={n}_{L2}+{n}_{U2}{I}_{Nn2};\,{I}_{Nn2}\epsilon \left[{I}_{Ln2},{I}_{Un2}\right]\), respectively have been selected at random from the same neutrosophic population. The operational process of the proposed test is implemented as: the two neutrosophic samples are merged and should be arranged in increasing order with respect to mid values of \({X}_{N1}\epsilon \left[{X}_{L1},{X}_{U1}\right]\) and \({X}_{N2}\epsilon \left[{X}_{L2},{X}_{U2}\right]\). Assign the + sign to the elements of the first neutrosophic sample \({n}_{N1}\epsilon \left[{n}_{L1},{n}_{U1}\right]\) and Assign the -sign to the elements of the second neutrosophic sample \({n}_{N2}\epsilon \left[{n}_{L2},{n}_{U2}\right]\) According to Kanji [1] “A succession of values with the same sign, i.e. from the same sample, is called a run”. The numbers of runs \({K}_{N}\epsilon \left[{K}_{L},{K}_{U}\right]\) are counted and the test statistic \({Z}_{N}\epsilon \left[{Z}_{L},{Z}_{U}\right]\) is expressed as

where

When \({Z}_{L}={Z}_{U}\), the statistic \({Z}_{N}\epsilon \left[{Z}_{L},{Z}_{U}\right]\) can be expressed as

Note that \({Z}_{L}\) is the test statistic under the classical statistics and \({Z}_{U}{I}_{NZ}\) is the indeterminate value, and \({I}_{NZ}\epsilon \left[{I}_{LZ},{I}_{UZ}\right]\) is the measure of intermediacy associated with the test statistic. The proposed test statistic \({Z}_{N}\epsilon \left[{Z}_{L},{Z}_{U}\right]\) reduces to test for two samples under classical statistics if no uncertainty is found. Note also that \({\mu }_{NK}\epsilon \left[{\mu }_{LK},{\mu }_{UK}\right]\) denotes the neutrosophic average and expressed by

where

when \({\mu }_{LK}={\mu }_{UK}\), the mean \({\mu }_{NK}\epsilon \left[{\mu }_{LK},{\mu }_{UK}\right]\) can be expressed as

The neutrosophic standard deviation \({\sigma }_{NK}\epsilon \left[{\sigma }_{LK},{\sigma }_{UK}\right]\) is defined by

When \({\sigma }_{LK}={\sigma }_{UK}\), the mean \({\sigma }_{NK}\epsilon \left[{\sigma }_{LK},{\sigma }_{UK}\right]\) can be expressed as

The operational process of the proposed run test related to samples is shown in Fig. 1.

## Applications using power wind and photovoltaic power

The applications of the proposed run test for two samples will be discussed using the data obtained from the renewable energy field. Recently, Liu et al. [25] provided a detailed analysis of the “installed capacity of wind power and photovoltaic (PV) power for the year 2017 and the year 2018”. Now, we will apply the proposed run test to see whether wind power and PV power are selected randomly or not. Let \({\alpha }_{L}=0.02\) and \({n}_{N1}={n}_{N2}=[31, 31]\). Suppose that the decision-makers are uncertain about the selection of the level of significance with the measure of indeterminacy \({I}_{N\alpha }\epsilon \left[\mathrm{0,0.6}\right]\). The neutrosophic form can be written as \({\alpha }_{N}=0.02+0.05{I}_{N\alpha };\,{I}_{N\alpha }\epsilon \left[\mathrm{0,0.6}\right]\). For renewable energy variables, the mean for wind power and PV power is calculated as

For renewable energy variables, the standard deviation for wind power and PV power is calculated as

The value of \({K}_{N}\) for wind power is 25 and for PV power is 29. The value of the test statistic \({Z}_{N}\) for the wind power is calculated by

The value of the test statistic \({Z}_{N}\) for the PV power is calculated by

The tabulated values of \({\alpha }_{N}\epsilon \left[\mathrm{0.02,0.05}\right]\) from Kanji [1] are 2.33 and 1.96, respectively. By comparing the values of \({Z}_{N}\) for renewable energy variables, the null hypothesis \({H}_{0}\): wind power and PV power follow the randomness will be accepted and the alternative hypothesis \({H}_{1}\): wind power and PV power away from the randomness will be rejected. Based on the study, it is concluded that although, the test was carried out in the presence of uncertainty in the significance level. Implementing of the proposed test leads that the renewable energy variables are selected randomly from the same population. Further statistical analysis can be carried out accordingly, see Liu et al. [25]. The operational proves of the proposed test for wind power and PV power is shown in Fig. 2

## Simulation study

Now, the effect of the decision on wind power and PV power under uncertainty will be discussed. To see the effect of uncertainty in the level of significance, various values of the uncertain level of significance are considered. The values of \({\alpha }_{N}\epsilon \left[{\alpha }_{L},{\alpha }_{U}\right]\) are considered from [0.001, 0.002] to [0.20, 0.318]. The corresponding neutrosophic forms of each value of the level of significance the measure of indeterminacy \({I}_{N\alpha }\) associated with this level of significance are shown in Table 1. The tabulated values against each level of significance are also reported in the same Table 1. From Table 1, it can see that when the values of \({\alpha }_{N}\) increase, it effects decision about \({H}_{0}\) of wind power. When \({\alpha }_{N}<[0.02, 0.0456]\), the decision about the randomness in wind power is accepted. On the other hand, when \({\alpha }_{N}>[0.02, 0.0456]\), the null hypothesis is rejected and the claim about the randomness in wind power is rejected. At this significance level, the wind power data is away from randomness. The decision about the randomness in PV power retains as is.

## Competitive study

The proposed run test related to two samples generalizes of several existing run tests. For example, the proposed run test related to two samples reduces to the existing run test related to two samples under classical statistics when the decision-makers are uncertain about the significance level. The interval-statistics is also a special case of the proposed test. The run test related to two samples using interval-statistics utilizes the information within the intervals only. The proposed test reduces to run test related to two samples using fuzzy-based test when no information about the measure of indeterminacy is obtained. Based on the previous example about the renewable energy variables, for example, the decision-makers are uncertain about the level of significance that it should be from 0.05 to 0.10. The neutrosophic form for this level of significance is \({\alpha }_{N}=0.05+0.10{I}_{N\alpha };\,{I}_{N\alpha }\epsilon \left[\mathrm{0,0.5}\right]\). Note that the present neutrosophic form of level of significance is a generalization of three existing runs test. When the decision-maker is sure about the level of significance which is 0.05, the proposed neutrosophic form reduces to a certain value of the level of significance when \({I}_{L\alpha }=0\). The proposed neutrosophic form of level of significance has two parts. The first part 0.05 denotes the level of significance under the classical statistics and the second part \(0.10{I}_{N\alpha }\) is the indeterminate part and the measure of indeterminacy associated with the level of significance is 0.5. From the results, it can be clearly seen that the proposed run test related to two samples is supple and revealing than the existing run test related to two samples using classical statistics. The run test related to two samples using fuzzy logic is also a special case of the proposed test. The fuzzy-based test gives only the information about the measure of truth that and the measure of falseness. The proposed run test related to two samples gives information about the measure of truth [0.95, 0.90], the measure of falseness that is [0.05, 0.10], and the measure of indeterminacy that 0.5. Based on these results, it is concluded that many tests are special cases of the proposed test. The proposed test is also an extension of the run test related to the sample using interval-statistics. The test using interval-statistics gives only information about the smaller and the larger values of the interval. On the other hand, the proposed test gives the information about the measure of truth, false and intermediacy. In nutshell, the proposed run test related to two samples is efficient than the existing tests in terms of information, flexibility and adequacy.

## Concluding remarks

In this paper, run test related to two samples designed using neutrosophic statistics. The proposed test was found to be a generalization of several existing tests. The test statistic was developed when the level of significance and sample size are uncertain. The proposed test was applied to renewable energy variables including wind power and PV power. The application of the proposed test on wind power and PV power showed that the proposed test can be used effectively in renewable energy. The simulation study showed that that the uncertainty in the level of significance may affect the decision about randomness. The proposed test was found to be efficient than the existing tests in terms of information. The proposed test using some other sampling scheme and using big data can be studied as future research.

## Availability of data and materials

The data is given in the paper.

## References

Kanji GK. 100 Statistical tests. London: Sage; 2006.

Dakhilalian M, Jazi EM, Taghiyar MJJIJOCS, Security N. Analysis of randomness of runs and its application for statistical tests. Int J Comput Sci Netw Secur. 2009;9(9):83–90.

Haramoto H. Automation of statistical tests on randomness to obtain clearer conclusion

*Monte carlo and quasi-monte carlo methods 2008*. London: Springer; 2009. p. 411–21.Paindaveine D. On multivariate runs tests for randomness. J Am Stat Assoc. 2009;104(488):1525–38.

Doğanaksoy A, Sulak F, Uğuz M, Şeker O, Akcengiz ZJMPIE. New statistical randomness tests based on length of runs. Math Problems Eng. 2015;2015:1.

Brano VL, Orioli A, Ciulla G, Culotta S. Quality of wind speed fitting distributions for the urban area of Palermo, Italy. Renew Energy. 2011;36(3):1026–39.

Lü X, Lu T, Kibert CJ, Viljanen M. Modeling and forecasting energy consumption for heterogeneous buildings using a physical–statistical approach. Appl Energy. 2015;144:261–75.

Masseran N. Evaluating wind power density models and their statistical properties. Energy. 2015;84:533–41.

Ren G, Liu J, Wan J, Guo Y, Yu D, Liu J. Measurement and statistical analysis of wind speed intermittency. Energy. 2017;118:632–43.

Arias-Rosales A, Osorio-Gómez G. Wind turbine selection method based on the statistical analysis of nominal specifications for estimating the cost of energy. Appl Energy. 2018;228:980–98.

Vogt M, Marten F, Braun M. A survey and statistical analysis of smart grid co-simulations. Appl Energy. 2018;222:67–78.

Katinas V, Gecevicius G, Marciukaitis M. An investigation of wind power density distribution at location with low and high wind speeds using statistical model. Appl Energy. 2018;218:442–51.

Qing X. Statistical analysis of wind energy characteristics in Santiago island, Cape Verde. Renew Energy. 2018;115:448–61.

Mohammed D, Abdelaziz M, Sidi A, Mohammed E, Elmostapha E. Wind speed data and wind energy potential using Weibull distribution in Zagora, Morocco. Int J Renew Energy Develop. 2019;8(3):1.

Min Y, Chen Y, Yang H. A statistical modeling approach on the performance prediction of indirect evaporative cooling energy recovery systems. Appl Energy. 2019;255:113832.

Akgül FG, Şenoğlu B. Comparison of wind speed distributions: a case study for Aegean coast of Turkey. Energy Sources Part A Recovery Util Environ Effects. 2019;2019:1–18.

Mahmood FH, Resen AK, Khamees AB. Wind characteristic analysis based on Weibull distribution of Al-Salman site. Iraq Energy Rep. 2019;2019:1.

Alrashidi M, Rahman S, Pipattanasomporn M. Metaheuristic optimization algorithms to estimate statistical distribution parameters for characterizing wind speeds. Renew Energy. 2020;149:664–81.

Campisi-Pinto S, Gianchandani K, Ashkenazy Y. Statistical tests for the distribution of surface wind and current speeds across the globe. Renew Energy. 2020;149:861–76.

Deep S, Sarkar A, Ghawat M, Rajak MK. Estimation of the wind energy potential for coastal locations in India using the Weibull model. Renew Energy. 2020;161:319–39.

Jang S-K, Choi J-H, Kim J-H, Kim H, Jeong H, Choi I-G. Statistical analysis of glucose production from

*Eucalyptus pellita*with individual control of chemical constituents. Renew Energy. 2020;148:298–308.Kapen PT, Marinette JG, David Y. Analysis and efficient comparison of ten numerical methods in estimating Weibull parameters for wind energy potential: application to the city of Bafoussam, Cameroon. Renew Energy. 2020;2020:1.

ul-Haq MA, Rao GS, Albassam M, Aslam M. Marshall–Olkin Power Lomax distribution for modeling of wind speed data. Energy Rep. 2020;6:1118–23.

Wenxin W, Kexin C, Yang B, Yun X, Jianwen W. Influence of wind energy utilization potential in urban suburbs: a case study of Hohhot. Sci Rep. 2021;11(1):1–19.

Liu D, Liu Y, Sun K. Policy impact of cancellation of wind and photovoltaic subsidy on power generation companies in China. Renew Energy. 2021;2021:1.

Viertl R. Univariate statistical analysis with fuzzy data. Comput Stat Data Anal. 2006;51(1):133–47.

Filzmoser P, Viertl R. Testing hypotheses with fuzzy data: the fuzzy p-value. Metrika. 2004;59(1):21–9.

Tsai C-C, Chen C-C. Tests of quality characteristics of two populations using paired fuzzy sample differences. Int J Adv Manuf Technol. 2006;27(5):574–9.

Taheri SM, Arefi M. Testing fuzzy hypotheses based on fuzzy test statistic. Soft Comput. 2009;13(6):617–25.

Jamkhaneh EB, Ghara AN. Testing statistical hypotheses with fuzzy data. In: Paper presented at the 2010 international conference on intelligent computing and cognitive informatics; 2010.

Chachi J, Taheri SM, Viertl R. Testing statistical hypotheses based on fuzzy confidence intervals. Aust J Stat. 2012;41(4):267–86.

Kalpanapriya D, Pandian P. Statistical hypotheses testing with imprecise data. Appl Math Sci. 2012;6(106):5285–92.

Parthiban S, Gajivaradhan P. A comparative study of two-sample

*t*-test under fuzzy environments using trapezoidal fuzzy numbers. Int J Fuzzy Math Arch. 2016;10(1):1.Montenegro M, Casals MAR, Lubiano MAA, Gil MAA. Two-sample hypothesis tests of means of a fuzzy random variable. Inf Sci. 2001;133(1–2):89–100.

Park S, Lee S-J, Jun S. Patent big data analysis using fuzzy learning. Int J Fuzzy Syst. 2017;19(4):1158–67.

Garg H, Arora R. Generalized Maclaurin symmetric mean aggregation operators based on Archimedean t-norm of the intuitionistic fuzzy soft set information. Artif Intell Rev. 2020;2020:1–41.

Smarandache F. Neutrosophy. neutrosophic probability, set, and logic, proquest information & learning. Ann Arbor Michigan USA. 1998;105:118–23.

Abdel-Basset M, Gamal A, Chakrabortty RK, Ryan MJ. Evaluation approach for sustainable renewable energy systems under uncertain environment: a case study. Renew Energy. 2021;168:1073–95.

Das SK, Edalatpanah S. A new ranking function of triangular neutrosophic number and its application in integer programming. Int J Neutrosophic Sci. 2020;4(2):1.

El-Barbary OG, Abu Gdairi R. Neutrosophic logic-based document summarization. J Math. 2021;2021:1–7.

Smarandache F. Introduction to neutrosophic statistics. London: Infinite Study; 2014.

Chen J, Ye J, Du S. Scale effect and anisotropy analyzed for neutrosophic numbers of rock joint roughness coefficient based on neutrosophic statistics. Symmetry. 2017;9(10):208.

Chen J, Ye J, Du S, Yong R. Expressions of rock joint roughness coefficient using neutrosophic interval statistical numbers. Symmetry. 2017;9(7):123.

Sherwani RAK, Shakeel H, Saleem M, Awan WB, Aslam M, Farooq M. A new neutrosophic sign test: an application to COVID-19 data. PLoS ONE. 2021;16(8):e0255671.

Aslam M. Neutrosophic statistical test for counts in climatology. Sci Rep. 2021;11(1):1–5.

Albassam M, Khan N, Aslam M. Neutrosophic D’Agostino test of normality: an application to water data. J Math. 2021;2021:1.

## Acknowledgements

The author is deeply thankful to the editor and reviewers for their valuable suggestions to improve the quality and presentation of the paper

## Funding

None.

## Author information

### Authors and Affiliations

### Contributions

MA wrote the paper.

### Corresponding author

## Ethics declarations

### Ethics approval and consent to participate

Not applicable.

### Consent for publication

Not applicable.

### Competing interests

No conflict of interest regarding the paper.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

### Cite this article

Aslam, M. The run test for two samples in the presence of uncertainty.
*J Big Data* **10**, 166 (2023). https://doi.org/10.1186/s40537-023-00850-0

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40537-023-00850-0