Skip to main content

Research in computing-intensive simulations for nature-oriented civil-engineering and related scientific fields, using machine learning and big data: an overview of open problems


This article presents a taxonomy and represents a repository of open problems in computing for numerically and logically intensive problems in a number of disciplines that have to synergize for the best performance of simulation-based feasibility studies on nature-oriented engineering in general and civil engineering in particular. Topics include but are not limited to: Nature-based construction, genomics supporting nature-based construction, earthquake engineering, and other types of geophysical disaster prevention activities, as well as the studies of processes and materials of interest for the above. In all these fields, problems are discussed that generate huge amounts of Big Data and are characterized with mathematically highly complex Iterative Algorithms. In the domain of applications, it has been stressed that problems could be made less computationally demanding if the number of computing iterations is made smaller (with the help of Artificial Intelligence or Conditional Algorithms), or if each computing iteration is made shorter in time (with the help of Data Filtration and Data Quantization). In the domain of computing, it has been stressed that computing could be made more powerful if the implementation technology is changed (Si, GaAs, etc.…), or if the computing paradigm is changed (Control Flow, Data Flow, etc.…).


This introduction concentrates first on the general issues of fundamental importance for computer engineering and then turns to specific issues of importance for the utilization of computer engineering in nature-based engineering supported by Artificial Intelligence applied to Big Data problems, with a focus on civil engineering that utilizes nature-based construction.

This includes: Simulation of natural processes, genomics for new species, earthquake and geophysical disaster prevention engineering, and engineering of new materials.

The above mentioned applications generate massive Big Data and use complex Iterative Algorithms. Techniques do exist to make problems less computationally demanding, but no matter how effective these are, computers have to be made more powerful, by changing the Implementation Technology or the Computing Paradigm. These issues are elaborated in Fig. 1.

Fig. 1
figure 1

Avenues of making the problems less demanding and making the computers more powerful (hybrid solutions are also possible, while the term “powerful”, in its wider sense, has four dimensions: speed, size, precision, and power, in the narrow sense)

General issues

In general, a major problem nowadays is the automatic generation, validation, and integration of scientific research hypotheses into a larger body of scientific knowledge, by applying the scientific methods to the measurements and the assumptions already collected [36].

Another significant problem is the experiment design that would be optimal for verifying a particular set of hypotheses. A high-profile example of such a challenge is the design of the experiment that would be capable of confirming or disproving the quantum nature of gravity—eventually uniting the quantum field theory with the general theory of relativity.

However, in many smaller domains, plenty of cases would gain a lot from a computational way of designing such an experiment. For example, in the solid electrolyte research field, several theoretical models explain the possibility of ion current waiting to be verified by an experimentalist. An engineering application of the computation approach capable of ensuring an optimal design would be the automated construction of devices and production pipelines capable of producing specific outputs (products, material objects, etc.) meeting a set of requirements. The research aimed at the outlined goals is unfolding in several directions implying advances in many fields of computer science. There is no direct attempt to address the outlined problems as a whole, to the best of our knowledge.

Although there is a strong community aiming at developing Artificial General Intelligence (AGI), their goal is still a bit different. Mainstream AI research mainly aims at solving local tasks, exploiting the power of machine intelligence to solve a particular problem (e.g., AlphaFold). The main branches of research imply the development of:

  1. (1)

    Optimization algorithms capable of dealing with large combinatorial spaces and finding optimum solutions in such spaces that are defined by a Pareto-optimal frontier of a given multivariate function and a set of constraints. As a target, those algorithms should deal with: (a) high-dimensional real-life systems defined by a computer simulation or experimental facilities, as well as with: (b) concepts/hypotheses that would emerge out of measurements and interactions with human-peers.

  2. (2)

    A mathematical, conceptual modeling language that is capable of describing an abstract model of a given domain and a set of transformational heuristics that allow analysis, inference, and composition of new conceptual models shown experimental pieces of evidence.

  3. (3)

    A language model that is capable of communicating with human peers on the representation of given experimental domain measurements, conceptual model representations, plans, and actions related to the integrated simulation/experimental environments.

  4. (4)

    Researchers at the Google AI Language group presented BERT (Bidirectional Encoder Representations from Transformers) for deep bidirectional representations from the unlabeled text by jointly conditioning on both left and right context in all layers [11]. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task specific architecture modifications. It is one powerful example of transformers—a deep learning model that adopts the mechanism of self-attention, differentially weighting the significance of each part of the input data. Today, transformers models are used primarily in the fields of computer vision and natural language processing (NLP).

A concise state of the art of the outlined branches is discussed next:

  1. (1)

    Deep learning models have demonstrated astonishing breakthroughs in dealing with large combinatorial problems. One of the most prominent examples of deep learning applied to solving complex interaction tasks is the DeepMind AlphaZero and AlphaGo systems [44, 45], which were developed to play the game of Go. The system was able to defeat a professional Go player in a match in 2016, and has since been used to develop other systems that can play games like poker and shogi. AlphaFold is a deep learning system that was developed by DeepMind to predict the 3D structure of proteins [27]. The system was able to accurately predict the 3D structure of proteins in a recent competition and is seen as a major breakthrough in the field of protein structure prediction. MuZero [41] is another deep learning system developed by Google DeepMind, which has been designed to play a wide variety of games, without any prior knowledge of the rules. The system has been shown to be able to achieve strong performance in a variety of games. Remarkably, the latter system is designed to have implicit trainable representation, as well as memory, suitable for asking/answering questions about the external environment to make long-term predictions. Another notable example in this field is the GFlowNet—a deep learning flow-network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return [3].

  2. (2)

    Mathematical field of research focusing on the development of conceptual foundations capable to cover a great variety of domains is rooted in the latest research of the category theory [5] and the topos theory [30]. Categories are notable mathematical objects that already found a lot of practical applications in computer science [19], deep learning [46], and modeling [55]. Such formalism allows introspection resulting in the development of various modeling theories [31] that conceptualize representation transformations for a variety of needs.

  3. (3)

    Recent attempts to design deep learning models capable of reasoning through a series of deductive steps have demonstrated mixed results. On the one hand, many models can perform simple reasoning tasks such as basic arithmetic. Still, on the other hand, current neural models struggle with more complex reasoning required for tasks such as solving algebraic equations. This area itself represents a long-standing challenge for artificial intelligence and deep learning [33, 34]. Several attempts have been made to design models that use deep learning to represent and reason through a series of deductive steps [51]. Using these models, the authors successfully solve simple algebraic and geometric problems from primary school math. Authors also demonstrate the model's ability to solve complex equation systems and logic puzzles requiring a higher level of reasoning [56].

  4. (4)

    Another possible avenue of research implies the use of a newly emerging programming paradigm based on Graph Manipulation. This is best explained using an example: Assume that a program should predict the ability of a person to return in time the loan (s)he took. A common method implies to look at the credit history of all of the associated people. However, a more realistic program would be based on many parameters that better characterize a person's ability to return a loan, such as the living place, the car owned, the profile of expenses, etc. Suppose that we like to evaluate a newly coming person using the above mentioned. For that purpose, we can use a special type of machine learning algorithm, targeted to be performed over the network of connections. Such algorithms, that learn how to evaluate the characterizations of future customers, are very common today, and the algorithms belonging to this operation are termed GNN. The GNN algorithms are commonly used in numerous applications, such as detecting anomalies in systems, identifying security attacks, and many more.

An artificial neural network ANN for artificial vision has been suggested in [26], where an ANN was used for facilitating image reconstruction, which could be of importance also for NBC. Images are formed using fabricated compound eyes, inspired by structures found in the compound eyes of different insects. This ANN consists of several layers depending on which insect’s eye is used for biometrics; each layer performs a complex operation for a layered image. Layer’s nodes have different weights, determined during the training cycle. These layers are linked, and some are cross-linked to the next layer nodes, with signal processing performed in the forward direction. The architecture of such an ANN could overcome current limitations in terms of speed and accuracy of vision systems based on optical apparatus of the compound eye. Also, this type of ANN could resolve imaging problems in the presence of large background noise, enabling imaging techniques in low-light illumination, down to the single-photon level, which is an important feature when mimicking vision systems of insects known to be able to see in “dark”. Such an ANN requires large amounts of data, if higher resolutions are needed. Therefore, the best-suited methods of computing in applications of such ANNs are those that have been proven effective for Big Data processing using Iterative Algorithms.

The know-how of all these examples could be utilized for the treatment of engineering problems like nature-based construction, data mining from genomics, earthquake engineering, and the discovery of new materials. Machine Learning can considerably decrease the number of iterations in a detailed simulation process. However, the limitations of Machine Learning should be well understood, especially when Big Data are involved [21].

Specific issues

Before a complex project in civil engineering is launched, a proper feasibility study has to be performed, to show that the project is worth the effort, to demonstrate what will be its positive effects, qualitatively and quantitatively, but also to shed light on possible problems, and to figure out what are the research efforts and time scales of relevance for solving the possible problems.

Such simulations are nowadays performed using the classical Control Flow machines (multiprocessors or multicomputers) and may take months, years, or even decades (if extremely complex algorithms and extremely Big Data are involved) to complete. So, the natural question to ask is if computing alternatives do exist, and what they are.

These days, the computing infrastructure could be based on four different paradigms for computing and four related programming models. Some of the paradigms and models are well established, while others are still in their development phases.

The four paradigms are: Control Flow (MultiCores, as in the case of Intel, and ManyCores as in the case of NVidia), Data Flow (Fixed ASIC-based, as in the case of Google TPU, and flexible FPGA-based, initially as in the case of Maxeler DFE and more recently as in the case of many other vendors), Diffusion Flow (as in the case of IoT, Internet of Things, and WSNs, Wireless Sensor Networks), and Energy Flow (as in the case of BioMolecular and QuantumMechanical computing).

For details of each mentioned computing paradigm, the interested reader is referred to the references [1, 2, 8, 16, 43, 49, 52].

Each one of the mentioned paradigms is characterized by different capabilities in the major domains of computing hardware: (a) Speed, (b) Power, (c) Size, and (d) Potential for the highest precision. As far as the domain of computing software, each related programming model has its own peculiarities that impact the ease of programming.

Each one of the mentioned paradigms is best suited for its own set of problems.

Some paradigms are best used to serve as hosts, while others are best used to serve as accelerators attached to the host. However, they are utilized most effectively through a proper combination, meaning that each one of the many different threads of a complex simulation process should be run using the paradigm that is best suited for it. Excellent articles that shed more light onto the issues covered in our research are [20, 50, 54].

This article first defines the problem. Then, in the second part, it presents the pros and cons of each mentioned paradigm. Third, it discusses the chip-level infrastructure that combines the four presented paradigms and enables their synergy in the application domain. Forth, generally, it presents the four problem domains that may need more paradigms to cooperate in a number of possible ways that enable them to synergize. Fifth, it sheds more light on a number of specific problems that are extremely computing intensive. Finally, it concludes with the open research problems that need immediate attention.

Comparison of four computing paradigms

The Control Flow paradigm was initiated from the research of von Neumann. It is best suited for transactional processing and could be effectively used as the host in hybrid machines that combine all the above mentioned paradigms.

In the case when a Control Flow MultiCore machine is used as a host, the transactional code is best run on the Control Flow host, while the other types of problems are best run on accelerators based on other types of paradigms. In the case when the code operates on data organized in 2D, 3D, or structures of higher dimensions, a better level of acceleration could be achieved by a Control Flow ManyCore accelerator. The programming model is relatively easy to comprehend. Speed, Power, Size, and Potential for high precision of Control Flow machines are well understood [9].

The Data Flow paradigm was inspired by the research of Richard Feynman, and is based on the fact that computing is most effective if data, during the computational process, are being transferred over infinitesimal distances (or not transferred at all), as in the case of computing based on execution graphs.

Compared with Control Flow, the Data Flow approach brings speedups, power savings, smaller size of machinery, and larger potentials for higher precision, but utilizes a more complex programming model, which could be made less complex if lifted on the higher levels of abstraction, but in that case, a part of the claimed advantages could disappear.

The Diffusion Flow paradigm is based on research in massive parallelism (IoT), and is meant also for such systems enhanced with sensors (WSNs). One intrinsic characteristic of this approach is a large areal or geographical coverage, which means that it is theoretically impossible, during the computing process, to move data over small distances. This means that the speed-up related observations of Richard Feynman could not be utilized. So, the processing has to be based on another concept: Processing while transferring. Some level of processing while transferring is absolutely necessary, typically for data reduction purposes, or for any other kind of pre-processing. This is a reason for using the term “diffusion”. Pre-processing for data reduction or other purposes could be done during the “diffusion” of the collected, possibly Big Data type towards the host, to make it easier for use.

If the energy is scavenged, the power efficiency could be extremely high, so the pre-processing by IoT or WSN would make the host burn less energy during the final processing at the host.

In addition, the size of the machinery is negligible in the case of IoT and WSN, but unfortunately also the potential for the highest precisions. If a lot higher precision is needed, another paradigm has to be searched for. A number of related programming models have evolved for IoT and WSNs, since the initial PROTO research at MIT. These models could be fairly complex and have to be mastered properly, which could be a challenge.

The Energy Flow paradigm, nowadays, is meant only for the acceleration of the algorithms that are not well suited for any one of the paradigms mentioned so far. No matter if BioMolecular or QuantumMechanical approach is used, the processing is based on energy transformations, and the corresponding programming model has to respect the intrinsic essence of the utilized approach. Only in such a case, the best possible performance would be achieved.

For a specific set of algorithms, the speedup could be enormous, the needed energy could be minimal, the size could be small enough for a great number of applications, while the potential for precision could be unthinkable of if QuantumMechanical approach is used. If the BioMolecular approach is used, potentials do exist for effective simulations in some fields, like nature-based construction. The related programming models for both sub-paradigms are on the rise.

Possible architecture of a supercomputer on a chip

For the current state of the technology, over 100 Billion Transistors (BTr) on a chip, or over a Trillion transistors (TTr) on a wafer, are doable. Consequently, it is possible to place (on a single chip) both the above-mentioned Control Flow engines (a small number of MultiCores and a large number of ManyCores) and both the above mentioned Data Flow engines (a fixed systolic array for the most frequently used algorithms and a reconfigurable engine for execution graph based computing applicable to all the other algorithms).

However, possible enhancers for data collection (in the form of IoT or WSN) and possible accelerators of nature-inspired or nature-based processes or when extremely high speeds and precisions are needed (in the form of BioMolecular and/or QuantumMechanical) have to be off-chip, but easily accessible via proper on-chip interfaces.

The random access memory and the classical I/O could be placed partially on the chip and partially off the chip. If these resources are placed off the chip, they should be connectable over proper interfaces, so that the speed-up losses due to off-chip placement would be minimal. Of course, the protocols linked to the interface should not lower the effectiveness of the off-chip accelerators.

State-of-the-art Supercomputers on chip (SoC) examples encourage this approach [7, 12, 38, 53]. Future SoC will have all the above-mentioned components in a package, each one as a separate chipset, connected with a high bandwidth on-package interface in a 3D manner.

Therefore, no matter if 100BTr or 1TTr structures are used, the internal architecture, on the highest level of abstraction, should be as in Fig. 2. However, the distribution of resources could be drastically different from one such chip to another, due to different application demands (transactions-oriented or crunching-oriented), and due to different data requirements (memory-intensive for massive Big Data of the static type, or streaming-oriented for massive Big Data of the dynamic type, coming in and going out via the Internet or other protocols).

Fig. 2
figure 2

A supercomputer on a chip architecture for Artificial Intelligence and Big Data (the effort assumes optimizations both in the architecture and the power domains)

However, note that the BIREN chip has reached 77BTr in 2022 and that the CEREBRAS wafer has reached 2.5TTr also in 2022, but their internal architectures differ from the one advocated in this article.

No matter what is the internal architecture of a supercomputer on a chip, testing is a critical issue, and the following is of the ultimate importance for testing:

(a) Identifying the relevant test cases (integration, functionality, booting, and power management;

(b) Managing interaction between the software executed on processing cores and the test environment in pre-silicon simulations.

The type of chip architecture advocated in this article is tuned to the applications that need all four paradigms of computing and are highly demanding on speed, power, precision, and size, so they can not benefit from approaches that combine hardware and software in traditional ways. Such applications are summarized in [36].

The following examples cover categories of problems with extremely high demands related to detailed simulations in the domain of Big Data problems, based on highly Iterative Algorithms. These problems often times need the enhancement that comes from Artificial Intelligence (AI). These problems, if relevant for Civil Engineering or related fields, belong to the following for general categories:

(1) NBCE—Nature-Based Construction Engineering

(2) GNBE—Genomics Supporting NBCE

(3) EQIS—EarthQuake (EQ) Information Systems for Predicting and Alarming

(4) NCEM—Creation of new materials for civil engineering that do not radiate CO2 and are not sensitive to EQs.

The algorithms used in the above areas could be:

(1) Statistical and stochastic processes mimicking the processes in the nature or variations there-off.

(2) Genomics algorithms like NW (Niederman Wuensch) or SW (Smith-Waterman) or similar.

(3) Partial Differential Equations (PDEs) of the type: FE (Final Element) or FD (Final Difference) or hybrid.

(4) Tensor calculus and mathematical logic or hybrid

For such a set of applications, in our research, we assume that the optimal distribution of resources would be as shown in Table 1 [36].

Table 1 A hypothetic distribution of resources in a supercomputer on a chip, with basic memory on-chip and extended memory off-chip

It is important to underline, for applications of interest, data come either from the internal memory system, or from an Internet Protocol (IP) stream, and/or from WSNs or IoT.


In NBCE, it is better to use biological structures that grow fast and are populated with insects that generate relatively hard nanomaterials, than to build concrete walls that emit CO2 and are EQ-sensitive. The bio structures may not radiate CO2 and may be extremely flexible when it comes to earthquakes. Also, it is better to use fish and plankton or other specialized organisms, than utilizing metal nets, to protect specific underwater structures [15]. Such solutions may include some other side effects of interest for the local or the global economy. Before each and every investment of this type, a feasibility study has to be performed. The most reliable feasibility studies are based on simulations. However, such simulations could be extremely time-consuming and could last for months or even years, especially if they are supported by extremely large data sets coming in via IoT (Internet of Things) and/or WSNs (Wireless Sensor Networks). The solution of duration-related problems is in switching from the Control Flow paradigm to a proper combination of the Control Flow paradigm and the other three above mentioned computing paradigms, using the computing architecture in which the Control Flow part is the host and the other three parts are accelerators.

In GNBE, genetics of species, and related processes, in real circumstances, may take months or even years to generate the desired effects, especially when it comes to civil engineering related applications. If various mutations of potential interest are to be studied, the computing complexity may explode. In such cases, computer simulations on Control Flow engines, based on enough details, could take time which is not acceptable. Again, the solution is in proper synergies of the four here-mentioned computing paradigms.

In EQIS, models do exist of cities, which are based on bricks and cement, but simulations of earthquakes with these models as inputs, may take not only months or years but even decades, to complete on the fastest Control Flow machine today. The simulation process could be drastically accelerated if only a proper Data Flow accelerator is used. Such accelerators are well suited for PDEs of the type FE (Final Element) needed for prediction purposes, and for PDEs of the type FD (Final Difference) needed for alarming purposes in various emergency situations, during and after EQs.

In NCEM, new processes and materials with desired properties are best found if ML (machine learning) algorithms are combined with classical algorithms used in processes-related and materials-related research, such as studies of two-dimensional materials, of fluid or aerodynamics, and of the non-CO2 emitting or the self-repair structures. Such hybrid algorithms are computing-intensive, so again, the solution is in the synergy of several paradigms.

A summary of highly demanding specific problems needing research towards new algorithms and new acceleration approaches

This section gives an overview of selected problems that need both new algorithms and their acceleration via a synergy of the computing paradigms presented.

Each one of these problems is now defined, its importance is elaborated, the best existing solutions are pointed to, the essence of the suggested research avenues and the related methodologies are explained, as well as the relevant details of importance for the methodology selected for the research to follow. Each one of the following examples, in sections. "Examples/Tasks related to NBCE", "Examples/tasks related to GNBE", "Examples/tasks related to EQIS", "Examples/tasks related to NCEM", is presented using the same template, if and when appropriate:

  1. (a)

    General problem and its implications in science, engineering, humanities, and/or social studies.

  2. (b)

    Specific problems in computer-based simulations that are crucial for feasibility studies, and have to be solved.

  3. (c)

    A brief overview of the existing solution(s), and their criticism, from the viewpoint of interest for the mission of this research.

  4. (d)

    The essence of the suggested research avenue, with an indication that it will overcome the formerly mentioned criticism.

  5. (e)

    Suggested research avenues in the methodology domain, using Gantt and Perth charts, aimed for the best possible effectiveness, down to the level of task details of importance for the overall success of the related research.

  6. (f)

    Logics (explained via a figure, accompanied by a related text) and mathematics (given via a formula, accompanied by a related text), are involved in the most critical aspects of the research in the domain of computing.

  7. (g)

    Main outcomes of the research and their possible side effects.

  8. (h)

    Possible obstacles on the way to the goals set, and related risks.

  9. (i)

    A conclusion which summarizes the essence of the issues presented above.

  10. (j)

    References relevant to the described research mission, including those that cover the related research history of the author(s).

In summary, if and where appropriate, each one of the parts in sections "Examples/tasks related to NBCE", "Examples/tasks related to GNBE", "Examples/tasks related to EQIS" to "Examples/tasks related to NCEM" includes the above notions (brief and crispy), graphs, figures, and possibly some mathematics, while the related references are merged together at the end of this article.

Examples/tasks related to NBCE

Critical research is concerned with mapping of a complex algorithm of interest for NBCE/DoD applications, onto four different computing platforms, each one based on a different computing paradigm. For the paradigm that proves to be the most effective in terms of combined speed, power, precision potential, and physical size, the research should create guidelines for further improvements along four different avenues:

  1. i.

    Algorithmic modifications for better fit with the selected computing paradigm.

  2. ii.

    Architectural modifications of the selected computing paradigm, for better fit with the algorithm in focus.

  3. iii.

    Optimization add-ons to the compiler for better system effectiveness.

  4. iv.

    Effective creation of an ASIC chip architecture for additional speedup and additional power savings.

An example that synergizes two algorithms proven effective for a number of NBCE/DoD applications, is in image understanding, in conditions when the image is corrupted by noise or damaged in a number of different ways. The two algorithms, proven to be the most effective ones, are: Generalized Brain-State-in-a-Box neural network (gBSBNN) introduced by [37] and the combined discrete Fourier transform and neural network (DFT-NN) algorithm from [22]. The synergy of these two symbiotic algorithms is envisioned for future research as follows: For one set of conditions, gBSBNN performs better and DFT-NN performs better for another set of conditions. The exact specification of the two sets of conditions could also be the subject of a future study.

It follows from a preliminary study that the two sets overlap only partially, meaning that the union of the two sets is a lot larger compared to the intersection of the two sets. This implies that the best algorithm is a hybrid algorithm that first checks the conditions, and then utilizes either gBSBNN or DFT-NN.

With the above in mind, the future research would be to evaluate the hybrid algorithms for several methods of computing:

  • FPGA Data Flow by the Maxeler DataFlow Engine (DFE)

  • ASIC Data Flow by Google Tensor Processing Unit (TPU)

  • MultiCore Control Flow by Intel CPU

  • ManyCore Control Flow by NVidia GPU.

During the comparison, we expect the DFE approach to be superior [18, 29, 35]. To further improve the performance, algorithmic changes would be introduced into the algorithm, as well as the architecture/compiler improvements into the implementation of the paradigm [47, 48].

Another issue is related to the computation that occurs on a much smaller scale at regulatory and signaling pathways in individual cells and even within single biomolecules; we can simulate biomolecular computing in biological systems.

There are remarkable capacities of biological building blocks to compute in highly sophisticated ways. For example, the plaque progression model in the carotid artery was coupled with Agent Based Method (ABM) [4, 17]. The ABM was coupled with shear stress and LDL initial distribution from the lumen.

Iterative calculation inside the wall for lipid infiltration and accumulation using a random number generator for each time step has been used. It was coupled with an initial WSS (Wall Shear Stress) profile, which triggers a pathologic vascular remodeling by perturbing the baseline cellular activity and favoring lipid infiltration and accumulation within the arterial wall. The ABM model takes shear stress and LDL initial distribution from the lumen and starts iterative calculation inside the wall for lipid infiltration and accumulation using a random number generator for each time step.

Power issues are inherent to any system of the type described here, and have to be treated properly [23,24,25].

Today's operations of electric power grids can be enhanced by evolving a hierarchically-designed and operated physical system into an interactive Cyber-Physical System (CPS).

Current industry practice is to coordinate the operations by the Energy Management Systems (EMS) sending commands to controllable power plants in their area to produce energy in a feed-forward manner. This is done at the Balancing Authority (BA) level where EMS uses its SCADA-enabled state estimator to predict power imbalances. The hard-to-predict imbalances are managed by the BAs, most often implemented using dedicated communication and control schemes. Important for harvesting new opportunities by means of digitization, is to understand the assumptions implied in today’s operation and to design hardware and software needed to relax them. The emerging poly-centric approach to electricity services proposed as a possible way forward should be considered and further explored [23]. The next generation SCADA becomes a Dynamic Monitoring and Decision System (DyMonDS) that relaxes major assumptions through interactive information exchange [24, 25]. This brings about inter-temporal and inter-spatial flexibility as a means of implementing cooperative gains and the ability to increase efficiency without sacrificing QoS. This CPS design is non-unique for any given social-ecological energy system (SEES) since it depends on the performance objectives and its resources, end users, governance system, and their interactions.

System governance and policy making determine the overall organization of the physical system, into sub-systems, with their own sub-objectives, and rules for information sharing in operations and planning. As such, they must be accounted for when building physical man-made portions of the system and the supporting CPS architecture. The design of a man-made physical grid and its cyber are done to enhance the performance of an existing man-made system. At the same time, digitization is needed to improve dynamic interactions of the SEES components and to align their sub-objectives to the best degree possible. Several real-world power grid examples are shown to illustrate its key role and potential benefits.

Examples/tasks related to GNBE

As it has already been stressed, plants could provide desirable characteristics of interest for civil engineering, while the species could maintain processes of interest for a given civil engineering mission. However, the question is if the existing plants and species are the most suitable options for each given task. There is a probability that a more sophisticated genetic mutation could be created, which performs the given goals more effectively. For that issue to be analyzed in-depth, a proper genetic analysis is necessary.

Such an analysis requires a large number of possible genetic structures to be compared, in order to find out what direction of mutations brings what potential benefits. The number of combinations to analyze could be enormous. Finding hidden knowledge related to desirable characteristics may take huge amounts of time in a computing environment. Consequently, a considerable level of acceleration is needed.

It has been predicted that, in a near future, using the genomic mechanisms that analyze gene mutations, it would be possible to generate new organisms with desired properties, which could be of importance for nature-based engineering, in general, and for nature-based construction in specific, like in civil engineering [28, 39, 40].

Examples/tasks related to EQIS

In order to design buildings under earthquake loads, a simulation of seismic effects has to be carried out. The first step in seismic design of structures is to perform a modal analysis, which generates Eigen frequencies and Eigen modes of structures.

Based on them, the seismic force is calculated and the design forces could be determined. This is a standard procedure in engineering practice, based on linear simulation. However, for more complex problems, a pushover (non-linear static) analysis may have to be employed. It gives more insight into the non-linear behavior of the studied buildings (Fig. 3). The most precise, but also the most demanding computationally, is the time history analysis. It gives the results in each time step of the numerical integration (Fig. 4), as a solution of the differential equation of forced oscillations. Due to its high computational demands, it is mostly used for special structures (dams, nuclear power plants, industrial facilities, etc.…).

Fig. 3
figure 3

a Geometry and finite element of the mesh numerical model; b force–displacement curve of the non-linear static analysis and c Deformed shape with compression damage distribution (5 times scaled deformation) [32]

Fig. 4
figure 4

a Model of a steel industrial structure; b applied acceleration at the base of the structure in the time history analysis and c displacements at the each story of the model [6]

In recent years, the use of machine learning in the field of earthquake engineering is increasing. For example, neural networks could be used to optimize the design/calculation of structures (Fig. 5), to enhance the quality of time history simulations, rapid earthquake damage assessment and recovery, sensor farms treatment in construction project management, etc.

Fig. 5
figure 5

Use of ML for the prediction of the fundamental period of masonry infilled RC frames: a Architectures of the best ANNs for the infilled frames and b frequencies of prediction errors [13]

The use of machine learning in the seismic design of structures will reduce the number of simulations performed in the parametric studies and thus it will lead to the optimized design reached much faster. This will additionally reduce the level of mistakes in the design and thus will reduce the investment construction costs. Furthermore, decrease of time and effort needed for the parametric optimization will enable the use of time history analysis for residential and commercial buildings that constitute most of the building portfolio of each country.

To develop and train a machine learning algorithm, a data set of simulation results from time history analysis needs to be provided. This means that for the development and validation of a machine learning algorithm, a parametric study using time history analysis has to be performed to some extent. There is a risk that the process could be demanding on computing infrastructure and could last unacceptably long.

The increasing evolution of supercomputers and artificial intelligence paradigms could play a key role in the prevention of social-economic consequences caused by earthquakes, through the development of early warning systems. The aim of such systems should be to enable fast and accurate capture of even extremely weak signals, already in the early stages, as well as prediction of cracking patterns and seismic capacity of buildings based on previous experience.

In order to develop such systems, effective computer networking is also essential in enabling greater availability of existing experimental and numerical results to the broader community. To adequately perform seismic isolation of facilities based on location, the magnitude of the earthquakes, and buildings’ vulnerability, wider deployment of advanced computing techniques is mandatory.

A novel ML technique based on transfer learning (TL) has become attractive recently, due to a number of advantages it possesses. Although this technique has its primary application in image recognition (IG) and natural language processing (NLP), it also finds a significant role in predictive analyses through knowledge sharing with robust, pre-trained models. This is particularly important in earthquake engineering, both because of the smaller sample size needed to train such surrogate models and because of the less sensitivity to their over-fitting. The TL models reverse the train/test sampling strategy and improve the accuracy of the results compared to the basic pre-trained models. Considering the large number of negative experiences caused by earthquakes, in the future, by applying TL methods, such situations could be significantly reduced or eliminated.

Examples/tasks related to NCEM

The main fields of research in new materials include:

(1) New materials for energy applications: This includes materials for solar cells, batteries, fuel cells, and other energy-related applications.

(2) New materials for electronic applications: This includes materials for semiconductors, solar cells, sensors, and other electronic-related applications.

(3) New materials for optoelectronic applications: This includes materials for LEDs, lasers, solar cells, as well as other optoelectronics-related applications.

(4) New materials for magnetic applications: This includes materials for magnets, magnetic storage devices, and other magnetic-related applications.

New materials are challenging to discover from a theoretical point of view due to several factors: The number of elements in the periodic table, the variety of possible crystal structures, and the vast number of possible ways to combine those elements chemically. Moreover, one might be interested not only in single-point material properties but in the functional response of the material to certain changes in environmental conditions. For example, producing solid ion electrolytes would require the prediction of properties of ionic mobility for a range of temperature conditions. Machine Intelligence (MI) models that would be capable of predicting such properties demand a great deal of data to be generated a priory. Such a requirement might effectively render the MI-based approach unfeasible. However, if it were possible to simulate data on the fly, it would considerably improve the usability of a hybrid system. Hence, the suggested approach of computation acceleration by combining a single-chip computing device would come in handy for a great variety of NCEM tasks.

Another set of functions that would profit considerably from an accelerated chip design is inverse problem tasks that people are faced with demanding to synthesize a material possessing a given set of properties.

An extensive scan of literature and databases for similar materials, inferring changes required for a found case, and analysis of possible chemical structures that would provide obliged properties is just a tiny fraction of the computing load that would be needed.

It would be almost redundant to mention that computing acceleration would dramatically improve success chances for various life-changing applications.

Finally, quasicrystals [42] are also among the materials that could be used in applications of interest for nature-based engineering [14], especially because some of them could be synthesized using processes that are highly economical [10].

Quasicrystals could be synthesized using all existing metals; from aluminum alone, several dozens of alloys could be created, similarly from other metals (Mg, Zn, U, Fe, etc. …). Around 100 different types of quasicrystals have been created in the lab, so far.

Quasicrystals possess properties of interest for a number of classical and emerging applications. These properties could be categorized into four main groups:

(a) Elastic

(b) Electrical

(c) Surface

(d) Thermodynamic

Quasicrystals are exceptionally brittle, electrical conductivity is fair for some alloys, surface properties include low friction, and heat conductivity could be poured, which is of interest for a number of applications.

All four characteristics could be found a lot more present in many different materials, but all of them together, in exactly the mentioned quantities, only in quasicrystals. Consequently, the research leading to applications needing this special combination of properties is an open research field, along the.lines: “Materials need applications and applications need materials”.

Also, quasicrystals have served as inspiration for a number of domains in aesthetics-oriented activities (both in arts and sciences):

(a) In urbanism, shapes of city structures, looking from air, could be motivated by quasicrystals, inspired by the thoughts of Nobel Laureate Roger Penrose.

(b) In architecture, the central city piazza and faces of the surrounding buildings could take the shapes of quasicrystals.

(c) In the interior decoration of houses and apartments, tiles for living rooms or utility spaces have been already inspired by quasicrystals.

(d) In personal 2D or 3D art, ties have been designed with the quasicrystal shape of the artistic patterns.

Finally, maybe, one of the most important impacts of quasicrystals is in teaching the scientific methodology and building the student motivation (some of us teach or plan to teach the FFT over an image, treating electron diffraction, in a way originally treated by the inventor of quasicrystals). The methodology leading to the invention and the related creativity processes could serve as a great source of inspiration and motivation, not only for students but also for senior researchers.

All these facts mean the need for researchers, in geophysics, materials science, civil engineering, and computer science, to cooperate synergistically.


This article sheds light on the potentials coming from the synergistic interaction of four different computing algorithms, in the context of computationally intensive problems related to mathematics, physics, geosciences, engineering, and engineering-oriented genomics. A number of potential areas for additional research in the realm of computing for NBCE/DoD applications are given special consideration. The mapping of complex algorithms onto various computing platforms, each of which is based on a different computing paradigm, is one of the main areas of this study. The paradigm that offers the best balance of speed, power, precision, and stability must first be identified in order to develop guidelines for future improvements along four different paths—algorithmic adjustments, architectural modifications, compiler efficiency add-ons, and efficient construction of the ASIC chip.

This approach is discussed, if and when applicable, in the context of the civil engineering oriented research, but could be easily ported to other different contexts. In other words, the stress is on synergies of interest for civil engineering, and on the research that has implications, among others, in civil engineering, too. The Generalized Brain-State-in-a-Box neural network and the combined discrete Fourier transform and neural network algorithm for image understanding in noisy or damaged circumstances, for instance, are described as having potential synergy in the paper. It is suggested that future research could concentrate on evaluating the performance of these algorithms on various computing platforms, such as MultiCore Control Flow by Intel CPU, ManyCore Control Flow by NVidia GPU, FPGA Data Flow by the Maxeler DataFlow Engine, and ASIC Data Flow by Google Tensor Processing Unit. Biological building blocks have the amazing ability to compute in extremely complex ways, and biological systems can simulate biomolecular computing, both of which have the potential to significantly improve NBCE/DoD applications.

This article is to be used primarily for research purposes, in academia and industry; however, it could be used for educational purposes, which means that the effort described could be used to motivate students to pursue research careers. Nevertheless, it can be used as a guide for researchers and practitioners working on cutting-edge technologies and applications, as well as for those who are interested in the potential impact of these technologies on society and the economy.

The approach advocated in this article is best implemented on a chip that includes some of the mentioned paradigms (those used more frequently) and effectively interfaces to other paradigms (those used less frequently) of interest for fundamental problems outlined here, with stress on nature-based engineering. In summary, the article contends that utilizing cutting-edge big data techniques is essential for additional investigation into complex algorithm mapping for NBCE/DoD applications. These include creating new algorithms and computing paradigms, utilizing several algorithms in concert, as well as creating interactive cyber-physical systems for power networks. The development of sophisticated CPSs can increase the effectiveness and flexibility of energy management systems, while the suggested research on hybrid algorithms and the evaluation of various computing techniques can assist in identifying the best strategy for particular applications.

Availability of data and materials

Not applicable.


  1. Babovic ZB, Protic J, Milutinovic V. Web performance evaluation for internet of things applications. IEEE Access. 2016;7(4):6974–92.

    Article  Google Scholar 

  2. Benenson Y. Biomolecular computing systems: principles, progress and potential. Nat Rev Genet. 2012;13(7):455–68.

    Article  Google Scholar 

  3. Bengio Y, Deleu T, Hu EJ, Lahlou S, Tiwari M, Bengio E. GFlowNet Foundations. CoRR. 2021;abs/2111.09266.

  4. Blagojevic A, Sustersic T, Filipovic N. Agent-Based Model and Simulation of Atherosclerotic Plaque Progression. The IX International Conference on Computational Bioengineering (ICCB2022), 11–13, Lisbon, Portugal, April 2022.

  5. Borceux F. Handbook of categorical algebra: volume 1, basic category theory. Cambridge: Cambridge University Press; 1994.

    Book  MATH  Google Scholar 

  6. Butenweg C, Bursi OS, Paolacci F, Marinković M, Lanese I, Nardin C, et al. Seismic performance of an industrial multi-storey frame structure with process equipment subjected to shake table testing. Eng Struct. 2021;243: 112681.

    Article  Google Scholar 

  7. Chatha K. Qualcomm® Cloud Al 100 : 12TOPS/W Scalable, High Performance and Low Latency Deep Learning Inference Accelerator. In 2021 IEEE Hot Chips 33 Symposium (HCS). 2021. p. 1–19.

  8. Chegini H, Naha RK, Mahanti A, Thulasiraman P. Process automation in an IoT–fog–cloud ecosystem: a survey and taxonomy. IoT. 2021;2(1):92–118.

    Article  Google Scholar 

  9. Clarkson M. OCaml Programming: Correct + Efficient + Beautiful,, 2022.

  10. Bassani GF, Liedl GL, Wyder P, editors. Encyclopedia of condensed matter physics. Amsterdam: Elsevier; 2005.

    Google Scholar 

  11. Devlin J, Chang M-W, Lee K, Toutanova K. {BERT:} Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR. 2018;abs/1810.04805.

  12. Ditzel D, Espasa R, Aymerich N, Baum A, Berg T, Burr J, et al. Accelerating ML Recommendation with over a Thousand RISC-V/Tensor Processors on Esperanto’s ET-SoC-1 Chip. In 2021 IEEE Hot Chips 33 Symposium (HCS). 2021. p. 1–23.

  13. Đorđević F, Marinković M. Advanced ANN regularization-based algorithm for prediction of the fundamental period of masonry-infilled RC frames. Adv Comput. 2023. Submitted for review

  14. Dubois J-M. Properties-and applications of quasicrystals and complex metallic alloys. Chem Soc Rev. 2012;41(20):6760–77.

    Article  Google Scholar 

  15. Emmett RS, Nye DE. The environmental humanities: a critical introduction. Cambridge: MIT Press; 2017.

    Book  Google Scholar 

  16. Feynman RP. Quantum mechanical computers. Optics News. 1985;11(2):11–20.

    Article  Google Scholar 

  17. Filipovic N, Blagojevic A, Tomasevic S, Arsic B, Djukic T. Agent based and finite element method for plaque development in the carotid artery. AORTA. 2022;10(S 01): A075.

    Google Scholar 

  18. Flynn MJ, Mencer O, Milutinovic V, Rakocevic G, Stenstrom P, Trobec R, Valero M. Moving from petaflops to petadata. Commun ACM. 2013;56(5):39–42.

    Article  Google Scholar 

  19. Fong B, Spivak DI. An invitation to applied category theory: seven sketches in compositionality. Cambridge: Cambridge University Press; 2019.

    Book  MATH  Google Scholar 

  20. Guo Y, Yuan X, Wang X, Wang C, Li B, Jia X. Enabling encrypted rich queries in distributed key-value stores. IEEE Trans Parallel Distrib Syst. 2019;30(6):1283–97.

    Article  Google Scholar 

  21. Furht B. CAKE – The NSF Industry/University Cooperative Research Center for Advanced Knowledge Enablement. Florida Atlantic University,, 2022.

  22. Hui S, Żak SH. Discrete Fourier transform based pattern classifiers. Bull Polish Acad Sci Tech Sci. 2014:15–22.

  23. Ilić MD. Dynamic monitoring and decision systems for enabling sustainable energy services. Proc IEEE. 2010;99(1):58–79.

    Article  Google Scholar 

  24. Ilic MD. Toward a unified modeling and control for sustainable and resilient electric energy systems. Found Trends® in Electric Energy Syst. 2016;1(1–2):1–41.

    MathSciNet  Google Scholar 

  25. Ilic MD, 25. Lessard DR. A distributed coordinated architecture of electrical energy systems for sustainability. (2022).

  26. Jelenkovic, B. Intelligent Artificial Eye. Project Proposal to Serbian Government, 2022.

  27. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9.

    Article  Google Scholar 

  28. Khalil AM. The genome editing revolution. J Genet Eng Biotechnol. 2020;18(1):1–6.

    Article  Google Scholar 

  29. Kotlar M, Milutinovic V. Comparing controlflow and dataflow for tensor calculus: speed, power, complexity, and MTBF. In: High Performance Computing: ISC High Performance 2018 International Workshops, Frankfurt/Main, Germany, June 28, 2018, Revised Selected Papers 33 2018 (pp. 329–346). Springer International Publishing.

  30. MacLane S, Moerdijk I. Sheaves in geometry and logic: a first introduction to topos theory. Berlin: Springer; 1992.

    Google Scholar 

  31. Manin YI, Zilber B. A course in mathematical logic for mathematicians. 2010.

  32. Marinković M, Butenweg C. Numerical analysis of the in-plane behaviour of decoupled masonry infilled RC frames. Eng Struct. 2022;272: 114959.

    Article  Google Scholar 

  33. Milutinovic D, Milutinovic V, Soucek B. The honeycomb architecture. IEEE. Computer. 1987;20(4):81–3.

    Article  Google Scholar 

  34. Milutinovic V. Mapping of neural networks on the honeycomb architecture. Proc IEEE. 1989;77(12):1875–8.

    Article  Google Scholar 

  35. Milutinovic V, Kotlar M, Stojanovic M, Dundic I, Trifunovic N, Babovic Z. DataFlow supercomputing essentials. Cham: Springer; 2017.

    Book  Google Scholar 

  36. Milutinović V, Azer ES, Yoshimoto K, Klimeck G, Djordjevic M, Kotlar M, Bojovic M, Miladinovic B, Korolija N, Stankovic S, Filipović N. The ultimate dataflow for ultimate supercomputers-on-a-chip, for scientific computing, geo physics, complex mathematics, and information processing. In: 2021 10th Mediterranean Conference on Embedded Computing (MECO) 2021 Jun 7 (pp. 1–6). IEEE.

  37. Oh C, Zak SH. Large-scale pattern storage and retrieval using generalized brain-state-in-a-box neural networks. IEEE Trans Neural Netw. 2010;21(4):633–43.

    Article  Google Scholar 

  38. Ratkovic I. On the design of power-and energy-efficient functional units for vector processors. PhD Thesis. 2016.

  39. Riley LA, Guss AM. Approaches to genetic tool development for rapid domestication of non-model microorganisms. Biotechnol Biofuels. 2021;14(1):30.

    Article  Google Scholar 

  40. Riolo J, Steckl AJ. Comparative analysis of genome code complexity and manufacturability with engineering benchmarks. Sci Rep. 2022;12(1):2808.

    Article  Google Scholar 

  41. Schrittwieser J, Antonoglou I, Hubert T, Simonyan K, Sifre L, Schmitt S, et al. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature. 2020;588(7839):604–9.

    Article  Google Scholar 

  42. Shechtman D. Quasicrystals. Nobel Prize in Chemistry, 2011.

  43. Sengupta J, Kubendran R, Neftci E, Andreou A. High-Speed, Real-Time, Spike-Based Object Tracking and Path Prediction on Google Edge TPU. In: 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2020. p. 134–5.

  44. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature. 2016;529(7587):484–9.

    Article  Google Scholar 

  45. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature. 2017;550(7676):354–9.

    Article  Google Scholar 

  46. Spivak DI. Learners’ Languages. arXiv Prepr arXiv210301189. 2021.

  47. Trifunovic N, Milutinovic V, Korolija N, Gaydadjiev G. An AppGallery for dataflow computing. J Big Data. 2016;3:1–30.

    Article  Google Scholar 

  48. Trifunovic N, Milutinovic V et al. Paradigm Shift in SuperComputing: DataFlow vs ControlFlow. Springer Journal of Big Data, 2017.

  49. Vázquez F, Fernández JJ, Garzón EM. A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr Comput Pract Exp. 2011;23(8):815–26.

    Article  Google Scholar 

  50. Wang S, Zhang Y, Guo Y. A blockchain-empowered arbitrable multimedia data auditing scheme in IoT cloud computing. Mathematics. 2022;10(6):1005.

    Article  Google Scholar 

  51. Wei J, Wang X, Schuurmans D, Bosma M, Chi EH, Le Q, et al. Chain of thought prompting elicits reasoning in large language models. CoRR. 2022;abs/2201.11903.

  52. Wu SD, Kempf KG, Atan MO, Aytac B, Shirodkar SA, Mishra A. Improving new-product forecasting at intel corporation. Interfaces (Providence). 2010;40(5):385–96.

    Article  Google Scholar 

  53. Wuu J, Agarwal R, Ciraula M, Dietz C, Johnson B, Johnson D, et al. 3D V-Cache: the Implementation of a Hybrid-Bonded 64MB Stacked Cache for a 7nm x86–64 CPU. In 2022 IEEE International Solid- State Circuits Conference (ISSCC). 2022. p. 428–9.

  54. Yao J, Zheng Y, Guo Y, Wang C. SoK: A Systematic Study of Attacks in Efficient Encrypted Cloud Data Search. In: Proceedings of the 8th International Workshop on Security in Blockchain and Cloud Computing. New York, NY, USA: Association for Computing Machinery; 2020. p. 14–20.

  55. Zardini G, Spivak DI, Censi A, Frazzoli E. A compositional sheaf-theoretic framework for event-based systems (extended version). arXiv Prepr arXiv200504715. 2020.

  56. Lewkowycz A, Andreassen A, Dohan D, Dyer E, Michalewski H, Ramasesh V, et al. Solving Quantitative Reasoning Problems with Language Models. 2022.

Download references


The authors are thankful to their colleagues from MIT, Purdue, ETH, EPFL, Universities on Kragujevac and Novi Sad in Serbia, and those from the schools of Electrical Engineering and Civil Engineering at the University of Belgrade, who helped develop the infrastructure and/or the ideas that enabled the success of the research advocated in this article.


Not applicable.

Author information

Authors and Affiliations



All authors helped shape the research and made a valuable contribution to finalizing this work. All equally contributed, in a synergistic interaction, so it is impossible to specify who did what. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Veljko Milutinović.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Babović, Z., Bajat, B., Đokić, V. et al. Research in computing-intensive simulations for nature-oriented civil-engineering and related scientific fields, using machine learning and big data: an overview of open problems. J Big Data 10, 73 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Computing paradigms
  • Artificial intelligence
  • Control flow
  • Data flow
  • Big data