Data analysis for vague contingency data

The existing Fisher’s exact test has been widely applied for investigating whether the difference between the observed frequencies is significant or not. The existing Fisher’s exact test can be applied only when the observed frequencies are in determinate form and has no vogues information. In practice, due to the complicity in the production process, it is not always possible to have observed frequencies in determinate form. Therefore, the use of the existing Fisher’s exact test may mislead the industrial engineers. The paper presents the modification of Fisher’s exact test using neutrosophic statistics. The operational process, simulation study, and application using the production data will be given in the paper. From the analysis of industrial data, it can be concluded that the proposed Fisher’s exact test performs well than the existing Fisher’s exact test.


Introduction
Fisher's exact test using classical statistics has been applied for investigating whether the observed frequencies from dichotomous distributions are associated with each other or independent from each other.Fisher's exact test using classical statistics is usually applied for 2 × 2 contingency table.The main aim of Fisher's exact test is to test the null hypothesis that observed frequencies dichotomous distributions are associated vs. the alternative hypothesis that observed frequencies dichotomous distributions are independent.According to Kanji [1], the test statistic p of Fisher's exact test is calculated and compared with the specified level of significance (which is the probability of rejecting the null hypothesis when it is true) and the null hypothesis is rejected when the calculated value of the test statistic is less than the level of significance, otherwise, the null hypothesis is not rejected.Chen [2] differentiate between the chi-square test and Fisher's exact test for 2 × 2 contingency table.Choi et al. [3] discussed the foundations and infer- ence of 2 × 2 contingency table.Zhong et al. [4] discussed the application of the test for biological data.Ma and Mao [5] discussed the application of this test for scanning dependency.More information on Fisher's exact test can be seen in [6][7][8].
Fuzzy-logic has the application where uncertainty is found in the data.To analyze the uncertain data, the statistical tests using classical statistics cannot be applied.The information about two measures (true and false) can be obtained from the fuzzy-based analysis.The logic having more information in uncertainty is known as "neutrosophic logic" is introduced by [9].Smarandache [10] discussed that the neutrosophic logic has an edge over the interval data analysis and fuzzy logic.Basha et al. [11] and Das et al. [12] discussed the applications of neutrosophic logic.Based on the idea of neutrosophic numbers, the idea of neutrosophic statistics was given by [13] and further investigated by [14,15].Neutrosophic statistics was found to be more informative and more efficient than classical statistics by [12,16,17].
The operational process of Fisher's exact test using classical statistics is designed to analyze only the determinate or exact observed frequencies.The existing Fisher's exact test cannot be applied when the observed frequencies are in intervals.By exploring the literature and best of the author's knowledge, no efforts have been made to design Fisher's exact test using neutrosophic statistics.In this paper, we will extend Fisher's exact test using neutrosophic statistics.The test statistic of Fisher's exact test will be modified to analyze the neutrosophic numbers.The power of Fisher's exact test will be discussed and application will be given using the industrial data.It is expected that Fisher's exact test under neutrosophic statistics will be more efficient than the existing Fisher's exact tests in terms of the power of the test, information and flexibility.

The proposed fisher's exact test
The exiting Fisher's exact test under classical statistics is applied to investigate whether the difference between observed frequencies is significant or not.The existing Fisher's exact test cannot be applied if the observed frequency is interval rather than the exact number.To overcome this issue, it is necessary to modify Fisher's exact test using neutrosophic statistics so that an investigation about the difference in frequency can be done in the presence of interval, fuzzy, imprecise and indeterminate data.Similar to Fisher's exact test under classical statistics, the proposed Fisher's exact test under neutrosophic statistics will be applied using a 2 × 2 contingency table.Let and d N = d L + d U I N ; I N ǫ[I L , I U ] be neutrosophic observed frequencies.Note here that the first values of observed frequency denote the determinate values, a U I N ,b U I N ,c U I N ,d U I N are indeterminate observed frequencies and I N ǫ[I L , I U ] is a measure of indeterminacy associated with observed frequencies.These measures can be calculated from the imprecise data as (upper value-lower value)/upper value.Suppose that N N = N L + N U I N ; I N ǫ[I L , I U ] be the total observed frequency.A 2 × 2 contin- gency table to carry out Fisher's exact test under the idea of neutrosophy is presented in Table 1 as follows, see [18,19] for more details.The neutrosophic test statistic p N ǫ p L , p U for Fisher's exact test is defined as where the first part p L denotes the statistic of Fisher's exact test under classical sta- tistics, the second part p U I p N denote the indeterminate part and I p N ǫ I p L , I p U is the uncertainty measure associated with the proposed test statistic.The proposed test statistic p N ǫ p L , p U reduces to the existing test statistic p L when I p L =0.By follow- ing [1], the test statistic of the proposed test can be written as The proposed test statistic p N ǫ p L , p U can be expressed as As mentioned in [1] "the summation is over all possible 2 × 2 schemes with a cell frequency equal to or smaller than the smallest experimental frequency (keeping the row and column totals fixed as above)".
The computed value of p N ǫ p L , p U is compared with the pre-specified level of significance α .The null hypothesis of independence between sample and class is rejected if p N ǫ p L , p U < α , otherwise, the alternative hypothesis that sample (1) Fig. 1 The procedure of Fisher's exact test under classical statistics and class are not independent is not rejected.The operational procedure of the proposed Fisher's exact test under classical statistics is discussed in Fig. 1.

Application using industrial data
In this section, the application of the proposed test is given using the information obtained from the manufacturing industry.Two machines A 1 and A 2 work for an hour and produced defective items in intervals.To explain the process of the proposed test, a 2 × 2 contingency table is extracted from [20] and the data is shown in Table 2. Indus- trial engineers are interested to investigate there is a significant difference between the performance of machines A 1 and A 2 .As mentioned before, the neutrosophic-based tests have the ability to analyze the interval-based data more effectively than the tests using classical statistics.
The neutrosophic test statistic is derived by computing all conceivable combinations utilizing the hypergeometric distribution, as outlined in Table 3.The minimum value among these combinations is identified and compared against all other combinations to ascertain those below this minimum.It's important to emphasize that these combinations are carefully selected to ensure that both the row and column totals remain consistent with those presented in Table 2.
Based on the possible combinations in Tables 2, 3, p N is calculated as The simplified neutrosophic form of p N ǫ[0.9218,0.8980] is given as p N = 0.9218 − 0.8980I p N ; I p N ǫ[0, 0.0267] .Suppose that α=0.05.The calculated val- ues of p N ǫ[0.9218,0.8980] will be compared with 0.05.By comparing the values of statistic p N ǫ[0.9218,0.8980] with 0.05, the values of statistic p N ǫ[0.9218,0.8980] is greater than 0.05, therefore, the industrial engineers do not reject the null hypothesis H 0 of no difference between the performance of machines A 1 and A 2 .Figure 2 depicts the operational procedure of the proposed Fisher's exact test for the production data.

Advantages based on industrial data
The proposed Fisher's exact test using neutrosophic statistics is a generalization of several tests.Now, the efficiency of the proposed Fisher's exact test under neutrosophic statistics will be compared with Fisher's exact test using Fisher's exact test under classical statistics, interval-statistics, and Fisher's exact test using fuzzy logic in terms of information and adequacy.To compare the efficiency of various tests, the neutrosophic statistic p N ǫ p L , p U obtained for the produc- tion data will be considered.The neutrosophic form of the statistic from the data is given as: p N = 0.9218 − 0.8980I p N ; I p N ǫ[0, 0.0267] .Note that the first value p L =0.9218 presents Fisher's exact test under classical statistics and 0.8980I p N is an inde- terminate part, and I p N ǫ[0, 0.0267] is a measure of indeterminacy associated with p N ǫ p L , p U .The proposed statistic p N ǫ p L , p U reduces to Fisher's exact test under classical statistics when I p L =0.By comparing the proposed Fisher's exact test under neutrosophic statistics with Fisher's exact test under classical statistics, it can be seen that the proposed Fisher's exact test under neutrosophic statistics provide the values of statistic p N ǫ p L , p U in indeterminate interval with the measure of indeterminacy.For example, for testing the null hypothesis at a level of significance α=0.05, the proposed Fisher's exact test under neutrosophic is explained as: the probability of accepting the null hypothesis is 0.95, the probability of committing an error is 0.05 and the measure of indeterminacy is 0.0267 .From Fig. 2 The procedure of Fisher's exact test for production data the comparison, it is clear that the proposed Fisher's exact test under neutrosophic is more efficient and more informative than Fisher's exact test using classical statistics.Now, the efficiency of the proposed Fisher's exact test under neutrosophic statistics will be compared with Fisher's exact test using interval-statistics.The statistic p N ǫ p L , p U using interval-statistics only capture the data inside the interval.The statistic p N ǫ p L , p U using interval statistic tells that the values of the test statistic may vary from 0.9218 to 0.8980 .Similarly, Fisher's exact test using fuzzy-logic gives information about the measure of truth that is 0.95, and the measure of falseness that is 0.05.Like the interval-statistics, it tells that the statistic p N ǫ p L , p U may change from 0.9218 to 0.8980 under uncertain environment.From the analysis, it is concluded that the proposed Fisher's exact test under neutrosophic statistics has an edge over the three Fisher's exact tests.Therefore, the use of the proposed Fisher's exact test under neutrosophic statistics in the production industry will give more information and facilitate the decision-makers in the presence of an indeterminate environment.

Simulation study
To see whether the measure of indeterminacy I p N ǫ I p L , I p U affects the deci- sion about the null hypothesis or not.To study this affect, various intervals values of p N ǫ p L , p U are considered in Table 4.The neutrosophic forms of p N ǫ p L , p U for the selected values of p N ǫ p L , p U , the measure of indeterminacy I p N ǫ I p L , I p U , and the decision about the null hypothesis are also reported in Table 4. From Table 4, it can be seen that p N ǫ p L , p U increases, p L , p U changes but no effect on the null hypothesis when comparing with α=0.05.We note that the larger values of I p N affect the decision about the null hypothesis.For example, when I p N ǫ[0, 3] , the decision about the null hypothesis is changed from "Do not reject H 0 " to "reject H 0 ".From the study, it is clear that the larger values of the measure of uncer- tainty/indeterminacy affect the decision about the null hypothesis.Therefore, industrial engineers should be very careful in making decisions in the presence of uncertainty.

Sensitivity analysis
The sensitivity of the proposed Fisher's exact test under neutrosophic statistics will be discussed now.

Power of the test
This section presents the discussion on the power of Fisher's exact test under neutrosophic statistics.Suppose that α and β be the probability of rejecting H 0 | true and the probability of accepting H 0 | false.The power of the test is denoted by (1 − β).
By following Nosakhare and Bright [21], the steps used to calculate β are given as.
Step-1: Generate a set of 10,000 random samples of the test statistic p N Step-2: Compare the values of p N with the level of significance and record whether the null hypothesis H 0 is rejected or accepted.
Step-3: Determine the values of β (Type II error rate) by the ratio of the number of erroneous conclusion to the total number of replications.By implementing the above simulation process, the values of α N ǫ[ α L , α U ] are placed in Table 6.From Table 6, it can be noted that lower values of α N are the same as the α 0 .But it is worth noting that the upper values of α N are larger than α 0 .In addition, it can be noted that as α 0 , there is an increase in α N .From the study, it is clear while implement- ing the test under uncertainty, the level of significance may change from α 0 .For example, when α 0 =0.05, the computed α N is α N ǫ[0.05,0.20] .We can see that level of significance changes from 0.05 to 0.20 which can affect the decision related to the null hypothesis.

Concluding remarks
In this paper, Fisher's exact test under neutrosophic statistics was presented.The design of the proposed Fisher's exact test under an indeterminate environment was given.The operational procedure was explained with the help of industrial data.The proposed Fisher's exact test was a generalization of the existing Fisher's exact test under classical statistics.Based on the analysis and the simulation studies, it is concluded that the proposed test efficiently indicates a change in the power of the test and the level of significance when the test is implanted in the presence of imprecise data.The use of the proposed test is quite adequate to be applied in the uncertain environment as compared to the existing test.Based on the analysis and simulation studies, the application of the proposed Fisher's exact test is recommended in the industry where the production data is ambiguous, imprecise, and or inintervals.For future research, other statistical properties of the proposed Fisher's exact test under neutrosophic statistics can be studied.Another fruitful area of the research may the extension of the proposed Fisher's exact test using other sampling schemes.

Table 2 A
2 × 2 contingency table of machines and production

Table 3
1st combination of the original data

Table 4
Effect of indeterminacy

p N ǫ p L , p U p L + p U I pN ; I pN ǫ I pL , I pU Decision
Do not reject H 0 the values of I p N ǫ I p L , I p U decreases.For example, when p N ǫ[0.01,0.04] , the values of I p N is I p N ǫ[0, 3] .When p N ǫ[0.95,0.99] , the values of I p N is I p N ǫ[0, 0.04] .In addition, it can be noted when I p N ǫ[0, 0.80] are fewer, although the values of p N ǫ p N ǫ[0.79,0.75] 0.79 − 0.75I p N ; I p N ǫ[0, 0.05] Do not reject H 0 p N ǫ[0.84,0.80] 0.84 − 0.80I p N ; I p N ǫ[0, 0.05] Do not reject H 0 p N ǫ[0.89,0.85] 0.89 − 0.85I p N ; I p N ǫ[0, 0.05] Do not reject H 0 p N ǫ[0.94,0.90] 0.94 − 0.90I p N ; I p N ǫ[0, 0.04] Do not reject H 0 p N ǫ[0.99,0.95] 0.99 − 0.95I p N ; I p N ǫ[0, 0.04] The values of I p N ǫ I p L , I p U are shown in Table 4. From Table 4, it can be seen that when p N ǫ p L , p U changes from [0.79, 0.75] to [0.89, 0.85], the values of the measure of indeterminacy remain the same that is 0.05.When p N ǫ p L , p U changes from [0.94, 0.90] to [0.99, 0.95], the values of the meas- ure of indeterminacy remain the same that is 0.04.Similarly, there is not much change in I p N ǫ I p L , I p U when p N ǫ p L , p U changes from [0.34, 0.30] to [0.44, 0.40].This analysis shows that the change in the statistic p N ǫ p L , p U change the val- ues of I p N ǫ I p L , I p U but it does not affect the decision about the null hypothesis.From the analysis, it is concluded that the proposed test is sensitive for the higher values of I p N ǫ I p L , I p U .

Table 5
The values of power of the tests