 Research
 Open access
 Published:
Blind Federated Learning without initial model
Journal of Big Data volume 11, Article number: 56 (2024)
Abstract
Federated learning is an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. This method is secure and privacypreserving, suitable for training a machine learning model using sensitive data from different sources, such as hospitals. In this paper, the authors propose two innovative methodologies for Particle Swarm Optimisationbased federated learning of Fuzzy Cognitive Maps in a privacypreserving way. In addition, one relevant contribution this research includes is the lack of an initial model in the federated learning process, making it effectively blind. This proposal is tested with several open datasets, improving both accuracy and precision.
Introduction
Federated learning is an emerging approach to enable privacypreserving machine learning by sharing local models instead of the data itself. Therefore, it is a method for training machine learning models in a distributed way, and it can be used for both classification and regression tasks.
The overall basic process is as follows. The federated learning system is initiated by one server or participant, which sends an initial model to be trained by each participant with their own local data, who in turn delivers the weights or the gradients of the model back to the server (or to all the participants) to be aggregated. Then the federated model is sent back to the participants in an iterative way [1, 2]. The proceeding goes on until the termination conditions are accomplished. After this process, the output is a federated model that has been trained with the private data of all the participants [3].
This approach becomes critical when dealing with sensitive data, for instance in domains such as healthcare or finance. The aim of this paper, not an empirical research, is to propose an innovative federated learning approach for Fuzzy Cognitive Maps and to prove how appropriate FCMs are for Distributed Artificial Intelligence.
The proposal does not prioritise a particular optimisation method. In fact, this paper’s primary emphasis is not on the training of FCMs, nor on the distributed training of FCMs. Instead, the central focus of this paper is on the FCMs distributed training without an initial model. The main contributions of this paper are threefold:

1.
A privacypreserving machine learning approach for FCMs. The authors design a training scheme for collaborative FCM training that includes data privacy. This proposal allows multiple participants to train an FCM model with their own data in compliance with strict data privacy regulations.

2.
Two approaches to Fuzzy Cognitive Maps distributed learning. The authors propose two Particle Swarm Optimizationbased FCM learning approach in a distributed way.

3.
Blind Federated Learning as a new federated learning approach without an initial model, since the use of FCMs allows the participants not to define a model. To the best of our knowledge, this is the first federated learning proposal in which an initial model is not needed at all, defined neither by a server nor by the participants.
The authors test the validity of the proposal with wellknown open datasets. The results of the experiments show that the proposal achieves a similar performance to the nondistributed method and improves the performance of the noncollaborative approach.
The rest of this paper is organized along these lines. We discuss the theoretical background in Section "Theoretical background". The methodological proposal is outlined in Section "Methodological proposal". Section "Experimental approach" describes the details of the experimental approach and the results. Finally, the authors draw the conclusions in Section "Conclusions".
Theoretical background
Fuzzy Cognitive Maps
Fundamentals
Fuzzy Cognitive Maps’ nodes are modelling concepts, variables or features, the edges model relationships between them, and the weights represents the influence of those relations [4]. The value of a weight \(\varpi _{ij}\) models how much node \(c_i\) impacts over the node \(c_j\). The fuzzy weights between edges are normalised within the ranges \(\xi =\{[0,+1][1,+1]\}\), depending if it includes just positive values or both positive and negative. The maximum positive influence is \(+1.0\) and the opposite influence is 0.0 or \(1.0\). The zero value shows that there is no correlation between the nodes. From a computational point of view, FCMs models are represented by a weight (adjacency) matrix which contains all edges’ weights between the nodes.
The state of the nodes is shown as a state vector \(c =[c_1,c_2,\ldots ,c_N]\) that gives a snapshot of the states of the nodes at any iteration in the FCM dynamics [5]. The state of the node i in the vector state at time t (denoted as \(c_i(t)\)) is computed as shown in Eq. (1):
where \(c_j\) are the presynaptic nodes and \(\varpi _{ji}\) is the weight of the edge from \(c_j\) to \(c_i\). In a more formal way, a FCM can be denoted as a 4tuple \(\Phi = \langle c, {\mathcal {W}}, f, \xi \rangle\), where \(c=\{c_i\}_{i=1}^n\) is the nodes’ state with n number of nodes, \({\mathcal {W}} = [\varpi _{ij}]_{n\times n}\,\vert 1,0\le i,j\le +1\) is the adjacency matrix representing the weights between the nodes, \(f(\cdot )\) is the activation function, and \(\xi\) is the nodes’ range [6].
The FCM dynamical analysis begins with an initial vector state \(c(0)=[c_1(0),\ldots , c_n(0)]\), which models the initial state of each node. The state of the nodes is updated in an iterative process. Thus, it includes a activation (transformation) function [7] for mapping monotonically the state of the node into a normalized range between \([0, +1]\) for unipolar FCMs or \([1, +1]\) for bipolar ones. If the range is \([0, +1]\), the sigmoid is the most used transformation function, while hyperbolic tangent is the most used when the nodes’ range is \([1, +1]\) [8].
If the selected activation function \(f(\cdot )\) is the unipolar sigmoid, then the component i of the vector state \(c_i(t)\) at the instant t is computed as shown in Eq. (2):
where \(\lambda\) represents the slope of the unipolar sigmoid function. On the contrary, if the selected activation function \(f(\cdot )\) is the hyperbolic tangent, then the node’s state \(c_i(t)\) at the instant t is computed as Eq. (3) shows:
After the dynamics, the FCM reaches one of three possible states after a number of iterations: it settles down to either a fixed pattern of node values (the socalled hidden pattern), to a limited cycle, or to a fixedpoint attractor [9, 10].
Augmented FCMs
There are two approaches to build FCMs. The first is through human experts [9]. This approach involves having each expert contribute their own FCM model. A group of experts should be carefully selected. Each expert individually design a FCM model that represents their own knowledge on the system to model. The second approach is automatic construction directly from raw data [5, 6, 11]. Due to the purpose of this research, this paper focuses on the distributed automatic construction of FCM.
According to the literature [7], an augmented adjacency matrix could be built by aggregating the adjacency matrix of each FCM. The elements’ aggregation depends on whether there exist common nodes. If the adjacency matrices have common nodes, the states \(\varpi _{jk}\) in the augmented matrix are computed by adding the adjacency matrix of each FCM model (\({\mathcal {W}}_{i}\)).
The addition method when the adjacency matrices have not common nodes is known as direct sum of matrices, and the augmented matrix is denoted as \(\odot _{i=1}^{N}{\mathcal {W}}_{i}\). Given a couple of FCMs with no common nodes and even different number of nodes with adjacency matrices \(\varpi ^A_{n\times n}\) and \(\varpi ^B_{m\times m}\), the resulting augmented adjacency matrix can be computed as in Eq. (4):
where N is the number of adjacency matrices to join, zeros are actually zero matrices, and the dimension of \(\displaystyle \odot _{i=1}^{N}{\mathcal {W}}_{i}\) is \([\,\cdot \,]_{(m+n)\times (m+n)}\). In the case of common nodes, they would be computed as the average (or even the weighted average) of the nodes’ states in each adjacency matrix \({\mathcal {W}}^{i}\).
Pattern recognition with FCMs
Because of the structure of an FCM model, it is a neurofuzzy technique and many concepts and procedures from neural networks can be applied in FCMs. FCMs have been applied both in classification and regression tasks. This paper is focused on the first task.
The literature has analysed pattern recognition tasks using Fuzzy Cognitive Maps. Papakostas et al. [12] and Papakostas and Koulouriotis [13] propose several FCM architectures for pattern recognition. Swzed [14] proposed a FCM based classifier with a fully connected architecture. Wu et al. [15] applied broad learning systems for time series classification with FCMs. RamirezBautista et al. [16] applies FCMs for classification of human plantar foot alterations. Baykasoglu and Golcuk [17] proposed alphacut based FCM methods are tested on several case studies. Papakostas et al. [18] applied unsupervised hebbian learning for pattern recognition problems.
In general terms, the main goal of a conventional classifier is the mapping of an input to a specific output according to a pattern. In this proposal, the input concepts represent the features of the dataset, while the output are the classes’ labels where the patterns belong. Figure 1 shows an example topology of a Fuzzy Cognitive Map classifier, where the state of the concepts \(c_{1}\) and \(c_{2}\) defines the class where the input vector state belongs.
In that sense, if \(c_{1}>c_{2}\) the input vector state belongs to class 1, while if \(c_{1}<c_{2}\) the input vector state belongs to class 2. Note that if \(c_{1}=0.03\) and \(c_{2}=0.8\), then the input vector state would belong to class 2.
FCM automatic construction with Particle Swarm Optimisation
FCM automatic construction endeavours are commonly focused on building the adjacency matrix based either on the available historical raw data or on expert knowledge [19,20,21]. FCM learning approaches could be divided into three categories [11, 22]: Hebbian, populationbased, and hybrid mixing the main aspects of Hebbianbased and populationbased learning algorithm.
The aim of the Hebbianbased FCM learning approaches is to modify adjacency matrices leading the FCM model to either achieve a steady state or converge into an acceptable region for the target system. This course has not been successful for FCMs extensions such as Fuzzy Grey Cognitive Maps [10].
Populationbased methods do not need the human intervention. They compute adjacency matrices from historical raw data that best fit the sequence of input state vectors (the instances of the dataset). The learning goal of FCM evolutionary learning is to generate optimal adjacency matrix for modeling systems behaviour.
Particle Swarm Optimization is a bioinspired, populationbased and stochastic optimisation algorithm. The PSO algorithm generates a swarm of particles moving in an ndimensional search space which must include all potential candidate solutions. In order to train the FCM adjacency matrices, we take into account the \(k^{th}\) particle’s position (a candidate solution or adjacency matrix), denoted as \(\varpi _k=[\varpi _{k_1},\ldots ,\varpi _{k_j}]\) and its velocity, \(v_k=[v_{k_1},\ldots ,v_{k_j}]\). Note that each particle is a potential solution or FCM candidate and its position \(\varpi _k\) represents its adjacency matrix of the kth FCM candidate [6, 23]. Each particle’s velocity and position are updated at each time step. The position and the velocity of each particle are computed as shown in Eqs. (5) and (6):
where \(U(0,\phi _i)\) is a vector of random numbers from a uniform distribution within \([0,\phi _i],\) generated at each iteration and for each particle. Also, \({\dot{\varpi }}_k\) is the best position of particle k in all former iterations, \(\ddot{\varpi }_k\) is the best position of the whole population in all previous iterations, and \(\otimes\) is the componentwise multiplication.
The PSO algorithm’s goal is to locate all the particles in the global optima to a multidimensional hypervolume. The fitness function used in this research is the complement of the Jaccard similarity coefficient (\({\overline{J}}=(Y\times {\hat{Y}})\setminus J\)). The Jaccard score computes the average of Jaccard similarity coefficients between pairs of the ith samples, with a ground truth label set and a predicted label set. The complement operation is needed in terms of minimization of the fitness function. The Jaccard similarity coefficient’s complement is computed as follows in Eq. (7):
The fitness function is sampled after each particle position update and is the objective function used to compute how close a given particle is in order to be able to achieve the global optimum.
Federated learning
Distributed Artificial Intelligence is the subfield of artificial intelligence that studies the sharing of knowledge between agents in order to solve complex problems, classically via the distribution of tasks or data. Such processes may not be of interest in fields where the characteristics of the data and the regulations make it impossible to share it, such as finance or health.
Conventional machine learning requires all data collected on local devices to be stored centrally on a data silo. The goal of federated learning is building a global model that can be trained on data distributed while assuring the data privacy [24]. Federated learning is one of the most recent efforts in secure distributed artificial intelligence, proposed by McMahan et al. [2] and further developed in Konecny et al. [25] and McMahan and Ramage [26]. Some advantages of federated learning are privacy protection and the possibility of solving complex problems with small data samples such as healthcare [27].
In the recent years, there have been several attempts to create a federated version of conventional machine learning algorithms, such as federated linear regression [28,29,30], federated logistic regression [31], federated random forest [32], federated XGBoost [33,34,35], and federated support vector machines [36, 37]. To the best of our knowledge, this is the first work focusing on utilizing FCM in a blind federated setting.
A centralised federated learning system can be described as follows:

1.
The central server delivers a model to each agent. In the initial iteration of this process, the server has built an empty model.

2.
The participants train the model with their own private data.

3.
Each participant sends the parameters of the model or its gradients to the central server in a private way, usually encrypted.

4.
The central server builds a federated model by aggregating the parameters of the individual models.

5.
The central server checks if the termination condition is accomplished in which case the federated model is finished, otherwise the process goes back to step 1.
The ultimate goal of the federated model is to minimize the total loss (Eq. 8) of all participants computed as follows:
where \(\Psi\) are the model parameters, \({\mathcal {D}}_i\) is the dataset of the participant i, \({\mathcal {L}}^*\) is the loss function of the federated model, \({\mathcal {L}}_i(\cdot )\) is the loss function for each participant in the federation, and \(\kappa _i\) represent the importance (weight) of each participant. It is possible to determine \(\kappa _i\) by several criteria such as dataset size, accuracy and so on.
The first application of federated learning was to create collaborative predictive models using private data in Android mobile phones [26]. In particular, a model in Gboard on Android, the Google Keyboard, in order to predict the following word or phrase that the user is going to write based on the former text and other users (private) data. In this setup the central server manages the federated model and the communications with the agents, while the participants own their data and train the partial models. In this way, a federated learning system ensures that the distributed model is built in a private environment, since the private data never leaves the local agent.
Nevertheless, there are always risks associated with the data transmission, such as the possibility of the reconstruction of the model or the training data from the model parameters. Due to these risks, there is an increasing interest in the use of an additional layer of privacy to this information, and there are many studies that use privacypreserving methods in federated learning such as Differential Privacy [38], Secure MultiParty Computation [39] or Homomorphic encryption [40]. The comparison with other privacypreserving techniques is outside the scope of this work, focused on the construction of a federation process using FCMs and without an initial model, but in the philosophy of federated learning, an extra security layer, such as Differential Privacy, could be added at the time of sharing the parameters of the model [41].
Federated learning represents a significant step forward in the privacypreserving machine learning field. Its practical managerial significance lies in its potential to address the balance between utilizing valuable data for business insights, respecting privacy regulations and customer trust. By allowing model distributed training on decentralised data sources while preserving privacy, federated learning offers several managerial benefits:

Collaborative business insights: FL can facilitate collaboration between different business units or partners without sharing sensitive data directly. This fosters knowledge sharing and crossfunctional collaboration while maintaining data privacy.

Enhanced data privacy compliance: FL enables organisations to comply with strict data protection regulations such as GDPR. This approach avoids reputational damage that may result from noncompliance of data leaks .

CostEfficient AI training: Since data remains on local devices or servers, it reduces the need for extensive data transfer and centralised storage infrastructure.

Customer trust and brand loyalty: Companies can build trust with their customers by demonstrating a strong commitment to data privacy. This trust can lead to increased customer loyalty and positive brand perception.
In this sense, a practical realworld healthcare FL application would involve a consortium of healthcare institutions or health data owners working together to improve patient care and disease prediction while preserving data privacy. In this scenario, each institution would retain control of its patient data, ensuring compliance with strict privacy regulations like HIPAA and GDPR.
In Section "Federated learning for FCMs", the authors detail the federated learning approach and the proposed methodology for FCMs, that enables the creation of a machine learning model between several agents while all the participants keep their data private.
Methodological proposal
Federated learning for FCMs
The proposed Blind Federated Learning methodology for FCMs is shown in Fig. 2, and can be described as follows:

1.
Although the central server has no data, it triggers the Blind Federated Learning process by setting in motion the participants, who own the data to train the final model. Note that the central server does not send any initial FCM to the participants and this is one contribution of this research. As far as we know this is the first federated learning proposal that it does not need an initial model, then the central server is not even required.

2.
Each participant trains its own initial local FCM with their own dataset. The authors have used a PSO algorithm to train the FCMs, and that the dynamics have converged when the difference between two consecutive vector states is under a tolerance value (in these experiments was \(1\times 10^{5}\)), but this proposal is agnostic to the learning approach and to other considerations.

3.
Each participant delivers its model parameters, which in this case are the trained adjacency matrices. If needed, the participants could send any other performance metric needed to calculate the averaging of the models (see section "Aggregation methods"). Due to the privacy concerns discussed earlier, the parameters may be encrypted using a privacypreserving method. Finally, the local FCM is stored in the participant devices.

4.
The central server aggregates the parameters of the local models in its device using the appropriate weight. The section "Aggregation methods" shows a detailed description of the different aggregation methods considered by the authors. This process results in the parameters of a federated model.

5.
The participants receive these parameters from the central server. They build the next iteration of their local model using the federated model parameters. The authors propose two different federation methods: (a) in the proposal that is closer to the Blind Federated Learning, the local model is just the global model, which in turn was created as aggregation of all participant’s models; and (b) an innovative federated learning approach, called Blended Blind Federated Learning, where the new local model is obtained by combining the new received adjacency matrix with the previous local adjacency matrix.

6.
In either aggregation case, the participant retrains the new FCM in their local data and sends the parameters (and the needed performance metrics) back to the central server to be aggregated once again. Also, at this point the participants use their local data to test the performance of their local model.

7.
The central server checks whether the termination condition is met. The authors have chosen that the federation process must be run 20 iterations. If the condition is not fulfilled, the process goes back to stage 4.

8.
If the termination condition is satisfied, then the federated learning process ends resulting in a Federated FCM.
The proposed approach deals with the issue of federated learning without the need for an initial model. To the best of the authors’ knowledge, this problem remains unsolved. For this reason, we view this paper as a valuable undertaking.
Aggregation methods
An important parameter when defining a federated learning approach is the aggregation method employed by the central server to obtain the federated model.
In this paper, the authors propose three different approaches:

1.
Federated averaging. This method performs the aggregation using the arithmetic average [2]. The central server sums the parameters of the different models and divides by the number of participants (or models). This way, the federated model weighs all the participants in a similar fashion. The parameters of the federated model \(\varpi ^*\) would be computed as shown in Eq. (9):
$$\begin{aligned} \varpi ^*=\frac{1}{n}\cdot \sum _{i=1}^n\varpi _i \end{aligned}$$(9)where n is the number of participants and \(\varpi _i\) are the parameters of the local model for participant i.

2.
Accuracybased federated weighted averaging, with the normalized accuracy of each model as the weight: The central server receives not only the individual models, but the accuracy of each model in a test set for the participant as well. Then, it averages the models parameters using a weighted average with the normalized accuracy for each participant as its weight. Therefore, the aggregation weighs reinforces the participant that contributes the most to the general accuracy. In this case, the parameters of the federated model \(\varpi ^*\) would be computed as follows in Eq. (10):
$$\begin{aligned} \varpi ^* =\sum _{i=1}^n \psi _i \cdot \varpi _i, \end{aligned}$$(10)where \(\psi _i\) (Eq. 11) is the weight for participant i, computed as the normalized accuracy:
$$\begin{aligned} \psi _i = \frac{\text {accuracy}_i}{\sum _{j=1}^n\text {accuracy}_j}. \end{aligned}$$(11) 
3.
Precisionbased federated weighted averaging, with the normalized precision of each model as the weight. Similarily to the previous case, the central server receives both the models and the precision on a test set for each participant, and averages the models parameters with a weighted average where the weights are the normalized precision for each participant. This way, the distributed system amplifies the participant’s data with larger precision. The parameters of the federated model \(\varpi ^*\) are shown in Eq. (12):
$$\begin{aligned} \varpi ^*=\sum _{i=1}^n \phi _i \cdot \varpi _i \end{aligned}$$(12)where \(\phi _i\) (Eq. 13) is the weight for participant i, computed as the normalized accuracy:
$$\begin{aligned} \phi _i = \frac{\text {precision}_i}{\sum _{j=1}^n\text {precision}_j}. \end{aligned}$$(13)
Experimental approach
In all of the following cases we will train two FCM models, using PSO for the optimisation stage, with 20 iterations and a swarm size of 10. The first FCM will have a slope of 2 and use a hyperbolic tangent as activation function, while the second model will have a slope of 5 and a sigmoid activation function.
The first experiment for each dataset will be a baseline model to discuss the case where no distribution is made and all agents build a model as one agent. We will compare these results with the postfederated learning ones to see how our methodology can improve the results of models trained in their individual data (and therefore, models with, in general, worse performance metrics due to the lack of diverse data to be trained with) to obtain similar results to this baseline model.
The other experiments analyse the different combinations of federation methods (Blind Federated Learning and Blended Blind Federated Learning, described in section "Federated learning for FCMs") with the proposed aggregation methods (federated averaging, accuracybased, and precisionbased, defined in section "Aggregation methods"). The authors compare the average accuracy and precision, computed in a test set, for all participants before and after the federation process, and also with the baseline model.
For these experiments, the authors have tested four different data partitions among the participants. The first one is an evenly splitted dataset for every agent. The remaining three are comprised by uneven sets, the first one a random partition and the remaining two with sharp differences where several agents have very small datasets. This way, we can test a hypothetical case where a group of agents want to share secure information and a private model even in the case where one or more of the agents have much less information to share than the rest. Moreover, there are no class balancing mechanisms in the partitioning of the data, and therefore the experiments also test the cases when the percentage of positive samples is noticeable dissimilar.
As it is usual, for each participant’s dataset a split train/test has been performed in order to have a validation dataset to compute the performance metrics.
The results will be shown in tables where the rows are the metrics for each participant, and the columns are the following: the size or percentage of the original dataset that each participant has, the percentage of positives in that participant’s dataset, and the accuracy and the precision on a test set before and after the Blind Federated Learning process.
Experiment 1. Breast Cancer dataset
In this experiment the authors use the Breast Cancer Wisconsin dataset, made publicly available [42] at the UC Irvine Machine Learning Repository. As a baseline model with no distribution, the FCM with slope 2 and hyperbolic tangent activation function achieves an accuracy on a test set of 0.9211 and a precision of 0.7742, as seen in Table 1, while the FCM with slope 5 and sigmoid function has an accuracy of 0.8246 and a precision of 0.5714, see Table 2.
Our first Blind Federated Learning experiment will consist in a distributed system trained with a methodology that is close to the Blind Federated Learning approach, as described in section "Methodological proposal", and using an aggregation method based on the arithmetic average of the number of participants (federated averaging). Each one of the five participants is provided with a subset of the breast cancer dataset and trains its initial FCM using these data. The results of this first experiment can be found in Tables 3 and 4, where we see that the Blind Federated Learning process improves the values of the accuracy in every case. But for the most uneven data distributions the precision is not increased.
The next experiment uses the Blended Blind Federated Learning process, also described in section 3. As in the previous experiment, the authors will use the arithmetic average of the number of participants as the aggregation method. The results are described in Tables 5 and 6. As in the previous case, the accuracy increases after the Blind Federated Learning process in all cases, but also the precision is improved for all partitions with this methodology.
It is also worth noticing that, with an even partition, the averaged accuracy of the five models is similar (or even better) than the case of an only model using the full dataset (0.9211 for a unique model vs. 0.9296 for the Federated models in the even case with slope 2 and tangent activation function). Moreover, in the case of the uneven splits, the precision is much better than the baseline model (0.7742 vs. 0.8733 for the first random split).
The next experiment uses the accuracybased aggregation in order to improve the accuracy values of the model, in combination with the Blind Federated Learning methodology. As previously, there will be four different partitions to understand how this methodology deals with participants with different sizes. Tables 7 and 8 collects the results of this experiment, in which the accuracy improves after the federated learning execution as expected, but we find that the precision levels decrease in all uneven data distributions.
Similarly to the former experiment, the next one uses the accuracybased aggregation and the four different partitions outlined before, but in this case it will be for a Blended Blind Federated Learning system instead of the conventional one. Tables 9 and 10 show how the Blended Blind Federated Learning methodology improves the results of the previous experiment, in the sense that not only the accuracy is better after the federated learning execution, but also the precision in all cases but one.
Our next set of experiments will deal with the precisionbased aggregation method in order to try to improve the precision of the model, since the accuracy improves in every previous test. The results for the Blind Federated Learning methodology, using the four different partitions as previously, can be found in Table 11 and 12, and show the usual improvement of the accuracy of the model post federated learning, but also an increase in the precision of the model in all cases but one.
Finally, the last experiment will be similar to the previous one: the Blended Blind Federated Learning approach, with the precisionbased aggregation method, and with the usual four different partitions. The results are described in Tables 13 and 14 and show that the accuracy keeps improving even when the weights of the aggregation method depend on the precision, but the precision only increases in two out of four cases.
Experiment 2. Adult dataset
The dataset for the second experiment will be the adult dataset, with census data from 1994 containing demographic features of adults and their income, from the US Census Bureau, and publicly available [42] at the UC Irvine ML Repository. The baseline FCM model (Table 15) reaches an accuracy of 0.8611 and a precision of 0.9487 in the test set for the FCM model with slope 2 and hyperbolic tangent activation function, and an accuracy of 0.9222 and a precision of 0.9257 (Table 16).
The Blind Federated Learning approach combined with arithmetic average of the number of participants in the second dataset shows a very different behaviour for the even and uneven splits (Table 17 and 18): the federated learning process improves the performance metrics for the even split compared with the model without federation, but for all the uneven splits we see that the accuracy is much lower because of the amount of data and the target imbalance. Nevertheless, the federation process improves the accuracy, but not the precision, which has higher values than the accuracy.
Next, the Blended Blind Federated Learning with constant weights shows (Tables 19 and 20) similar results to those of the Blind Federated Learning, with high accuracy and precision for the even split and lower accuracy for the uneven, and improvement after the federation in the accuracy but not for the precision.
The results (Tables 21 and 22) for the Blind Federated Learning using an aggregation based in accuracy shows similar results to the two previous experiments, with limited accuracy for uneven splits and no improvements in the precision for the most extreme uneven splits.
Tables 23 and 24 show the results for the Blended Blind Federated Learning with accuracybased weights, very similar to those with Blind Federated Learning and the same aggregation method.
The precisionbased aggregation method should drastically improve the precision, a metric that, as we have seen in the previous experiments, is more prone to not being improved by the federation process. This is the behaviour that we see in the experiments with this aggregation method, as shown by Table 25 and 26, where not only the accuracy improves for all splits, but also the precision for all but one, the even split.
In the case of the Blended Blind Federated Learning with precisionbased aggregation (Tables 27 and 28), the accuracy is improved in all cases, and the precision is also improved but only in the first case of the even split.
Discussion
The different experiments show that, even in the most imbalanced cases, the federated learning approach improves the average accuracy of the models. FL increases the performance of the models while allowing the private sharing of data among the participants.
Nevertheless, there are differences between the efficiency of the models after the distinct federation methodologies. In general terms, the Blind Federated Learning method has lower precision than the Blended counterpart with the same aggregation approach in two out of three datasets used, while in the third one the results are quite similar for all aggregations. For the breast cancer dataset the results show that, in the case of the Blind Federated Learning, the averaged precision of the models does not improve after the federation process in two out of four examples with different imbalanced data (see Tables 3 and 4), while in all cases of the new methodology the averaged precision increases (see Tables 5 and 6).
A similar reasoning can be applied to the accuracybased aggregation methods. In this case, as mentioned before, the accuracy improves in all cases after the federated learning process, independently of the methodology used. Nevertheless, as in the previous example, the Blind Federation process does not increase the model precision in three out of four experiments, always in the imbalanced cases (see Table 7). Meanwhile, for the Blended Federated Learning procedure, the precision is increased in all cases but one, the second most imbalanced one (see Table 9).
Given the difficulties shown to improve the average precision of the models, the authors test if the precisionbased aggregation method can improve the precision of the models while maintaining the accuracy levels postFederation. The results provides two insights. Firstly, the performance of the model regarding accuracy is improved in all cases, just like in the previous experiments. Secondly the precision is improved in most of the experiments with this aggregation method.
The Blind Federated Learning process increases the averaged precision of the models in all cases but one, the most imbalanced test (see Tables 11 and 12). Despite of the rest of the experiments using different aggregation methods, the Blind methodology performs better than the Blended one, since the averaged precision is only increased in two out of four tests (see Tables 13 and 14).
Finally, the authors benchmarked the different methodologies and aggregation methods with the initial case where no distribution is made, to understand if the accuracy and precision metrics of the distributed problems are similar to a conventional problem.
The performance of an FCM with only one agent and trained using the full dataset is shown in Tables 1, 2, 15 and 16. The accuracy and precision are similar to the averaged metrics of the Federated models using the new methodology, with all aggregation methods, in the case of balanced datasets.
In the case of the Blind Federated Learning approach, we can see that the absolute performance values for the balanced datasets are worse than using the Blended Blind Federated Learning procedure: 0.9123 for the averaged accuracy and 0.6981 for the averaged precision with the aggregation using constant weights, 0.9209 for the averaged accuracy and 0.6981 for the averaged precision with the accuracybased aggregation, and 0.9123 for the averaged accuracy and 0.7137 for the averaged precision with the precisionbased aggregation, in the case of the FCM with slope 2 and hyperbolic tangent activation function.
Clearly, the performance metrics for the imbalanced cases are noncomparable to the nondistributed problem, since the difference between the amount of information each participant holds has to be leveraged.
Conclusions
In this research, the authors propose two innovative methodologies to apply federated learning to FCMs, in order to take advantage of the benefits of this new paradigm for Distributed Artificial Intelligence that allows the sharing of private data in a secure way to train a sophisticated machine learning model.
Both methods show an improvement of the averaged accuracy postFederation in all experiments performed, both in balanced and imbalanced data. In the balanced case, we can see that both the accuracy and precision are comparable to the performance metrics of the nondistributed case. That is, an only FCM trained with all the available data.
Finally, comparing the two presented proposals, the Blind Federated Learning, and Blended Blind Federated Learning, the second proposal, where the new local model is obtained by averaging the parameters of the global method with the ones of the previous local method, instead of using the global model as the new local model as in the case of the Blind Federated Learning approach, generally performs better across all experiments but for the case when a precisionbased aggregation is used.
Also, an important benefit to the use of FCM for the federated learning approach is that there is no need for the definition of an initial model as in the case of conventional federated learning with neural networks, where an additional central server usually describes the architecture of the network that every participant will train. In this case, every participant trains the FCM without any predefined model from the server, making it a blind method. The presented approach addresses the challenge of federated learning without the requirement of an initial model, constituting a novelty in the field.
In the course of this research, we have proposed a methodology for federated learning without an initial model, primarily relying on FCMs. While this research has addressed the challenges associated with blind federated learning, an open research question remains unexplored: the development of federated models based on AI architectures different from FCMs. When dealing with AI models other than FCMs, the blind approach needs to be reformulated, as these models lack the specific characteristics of FCMs.
Availability of data and materials
The datasets analysed during the current study are available in the UCI repository, https://archive.ics.uci.edu/ml/datasets.php.
References
Ahmed U, Lin JCW, Srivastava G. 5gempowered drone networks in federated and deep reinforcement learning environments. IEEE Commun Stand Mag. 2021;4:55–61.
McMahan B, Moore E, Ramage D, y Arcas BA. Federated learning of deep networks using model averaging. ArXiv. 2016. https://arxiv.org/abs/1602.05629.
Chen J, Xue J, Wang Y, Huang L, Baker T, Zhou Z. Privacypreserving and traceable federated learning for data sharing in industrial IoT applications. Exp Syst Appl. 2023;213: 119036.
Kosko B. Fuzzy cognitive maps. Int J Man Mach Stud. 1986;24(1):65–75.
Nápoles G, Jastrzebska A, Mosquera C, Vanhoof K, Homenda W. Deterministic learning of hybrid fuzzy cognitive maps and network reduction approaches. Neural Netw. 2020;124:258–68.
Salmeron JL, Froelich W. Dynamic optimization of fuzzy cognitive maps for time series forecasting. Knowl Based Syst. 2016;105:29–37.
Salmeron JL, RuizCelma A. Dyn Under Uncertain. Synthetic emotions for empathic building Mathematics 2021, 9, 701. 2021;8(7):5.
Bueno S, Salmeron JL. Benchmarking main activation functions in fuzzy cognitive maps. Exp Syst Appl. 2009;36(3):258–68.
Lopez C, Salmeron JL. Modeling maintenance projects risk effects on ERP performance. Comput Stand Interfaces. 2014;36(3):545–53.
Salmeron JL, Palos P. Uncertainty propagation in fuzzy grey cognitive maps with Hebbianlike learning algorithms. IEEE Transact Cybern. 2019;1:211–20.
Napoles G, Salmeron JL, Vanhoof K. Construction and supervised learning of longterm grey cognitive networks. IEEE Transact Cybern. 2021;51(2):686–95.
Papakostas GA, Boutalis YS, Koulouriotis DE, Mertzios BG. Fuzzy cognitive maps for pattern recognition applications. Int J Pattern Recogn Artif Intell. 2008;22(8):1461–86.
Papakostas GA, Koulouriotis DE. Classifying Patterns Using Fuzzy Cognitive Maps. In: Glykas M, editor. Fuzzy Cognitive Maps Studies in Fuzziness and Soft Computing. Berlin: Springer; 2010. p. 291–306.
Szwed P. Classification and feature transformation with fuzzy cognitive maps. Appl Soft Comput. 2021;105: 107271.
Wu K, Yuan K, Teng Y, Liu J, Jiao L. Broad fuzzy cognitive map systems for time series classification. Appl Soft Comput. 2022;128: 109458.
RamirezBautista JA, HuertaRuelas JA, Kóczyb LT, Hatwágner MF, ChaparroCárdenasa L, HernándezZavala A,. Classification of plantar foot alterations by fuzzy cognitive maps against multilayer perceptron neural network. Biocybern Biomed Eng. 2020;40:404–14.
Baykasoğlu A, Gölcük I. Alphacut based fuzzy cognitive maps with applications in decisionmaking. Comput Ind Eng. 2021;152: 107007.
Papakostas GA, Koulouriotis DE, Polydoros AS, Tourassis VD. Towards hebbian learning of fuzzy cognitive maps in pattern classification problems. Appl Soft Comput. 2012;39(12):10620–9.
Napoles G, Salmeron JL, Vanhoof K. Construction and supervised learning of longterm grey cognitive networks. IEEE Transact Cybern. 2019;51(2):686–95.
Salmeron JL, Mansouri T, Moghadam MRS, Mardani A. Learning fuzzy cognitive maps with modified asexual reproduction optimisation algorithm. Knowl Based Syst. 2019;163:723–35.
Vanhoenshoven F, Napoles G, Froelich W, Salmeron JL, Vanhoof K. Pseudoinverse learning of fuzzy cognitive maps for multivariate time series forecasting. Appl Soft Comput. 2020;95: 106461.
Salmeron JL, RuizCelma A, Mena A. Learning FCMS with multilocal and balanced memetic algorithms for forecasting drying processes. Neurocomputing. 2017;232:52–7.
Salmeron JL, Rahimi SA, Navali AM, Sadeghpour A. Medical diagnosis of rheumatoid arthritis using data driven PSOFCM with scarce datasets. Neurocomputing. 2017;232:65–75.
Zhu H, Xu J, Liu S, Jin Y. Federated learning on nonIID data: a survey. Neurocomputing. 2021;465:371–90.
Konecný J, McMahan B, Ramage D, Richtárik P. Federated optimization: distributed machine learning for ondevice intelligence. 2016. https://arxiv.org/abs/1610.02527.
McMahan B, Ramage D. Google AI Blog. 2017. https://ai.googleblog.com/2017/04/federatedlearningcollaborative.html.
Salmeron JL, Arevalo I. A privacypreserving, distributed and cooperative FCMbased learning approach for cancer research. In: Ciucci D, editor. International joint conference on rough sets. La Habana: Springer; 2020.
Karr AF, Lin X, Sanil AP, Reiter JP. Privacypreserving analysis of vertically partitioned data using secure matrix products. J Off Statist. 2009;25:125–38.
Gascon A, Schoppmann P, Balle B, Raykova M, Doerner J, Zahur S, Evans D. Privacypreserving distributed linear regression on highdimensional data. Proc Priv Enhanc Technol. 2017;2017(4):345–64. https://doi.org/10.1515/popets20170053.
Zhang Y, Wei S, Liu S, Wang Y, Xu Y, Li Y, Shang X. Graphregularized federated learning with shareable side information. Knowl Based Syst. 2022;257: 109960.
Hardy S, Henecka W, IveyLaw H, Nock R, Patrini G, Smith G, Thorne B. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. ArXiv. 2017. https://arxiv.org/abs/1711.10677.
Liu Y, Liu Y, Liu Z, Zhang J, Meng C, Zheng Y. Federated forest. ArXiv. 2019. https://arxiv.org/abs/1905.10053.
Cheng K, Fan T, Jin Y, Liu Y, Chen T, Yang Q. Secureboost: A lossless federated learning framework. ArXiv. 2019. https://arxiv.org/abs/1901.08755.
Fang W, Chen C, Tan J, Yu C, Lu Y, xilinx Wang L, Wang L, Zhou J, Alex X. A hybriddomain framework for secure gradient tree boosting. ArXiv. 2020. https://arxiv.org/abs/2005.08479.
Xie L, Liu J, Lu S, Chang TH, Shi Q. An efficient learning framework for federated xgboost using secret sharing and distributed optimization. ArXiv. 2021.https://arxiv.org/abs/2105.05717.
Gu B, Dang Z, Li X, Huang H. Federated doubly stochastic kernel learning for vertically partitioned data. Proceedings of the 26th ACM SIGKDD International conference on knowledge discovery and data mining. 2020.
Yu H, Vaidya J, Jiang X. Privacypreserving svm classification on vertically partitioned data. In: PAKDD. 2006.
Phong LT, Aono Y, Hayashi T, Wang L, Moriai S. Privacypreserving deep learning via additively homomorphic encryption. IEEE Transact Inform Forensics Sec. 2017;13(5):1333.
Berry C, Komninos N. Efficient optimisation framework for convolutional neural networks with secure multiparty computation. Comput Sec. 2022;117: 102679.
Halder S, Newe T. Enabling secure timeseries data sharing via homomorphic encryption in cloudassisted IIoT. Future Gener Comput Syst. 2022;133:351–63.
Arévalo I, Salmeron JL. A chaotic mapsbased privacypreserving distributed deep learning for incomplete and NonIID datasets. 2023.
Dua D, Graff C. UCI machine learning repository. 2017. http://archive.ics.uci.edu/ml.
Acknowledgements
Prof. Salmeron research was kindly supported by the project Artificial Intelligence for Healthy Aging (Convocatoria 2021 – Misiones de I+D en Inteligencia Artificial: Inteligencia Artificial distribuida para el diagnóstico y tratamiento temprano de enfermedades con gran prevalencia en el envejecimiento, exp.: MIA.2021.M02.0007) lead by Capgemini Engineering.
Funding
The authors declare that they have no funding to disclose.
Author information
Authors and Affiliations
Contributions
All authors collaborated equally to the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Salmeron, J.L., Arévalo, I. Blind Federated Learning without initial model. J Big Data 11, 56 (2024). https://doi.org/10.1186/s4053702400911y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4053702400911y