Big data in manufacturing: a systematic mapping study
© O’Donovan et al. 2015
Received: 12 June 2015
Accepted: 31 July 2015
Published: 11 September 2015
The manufacturing industry is currently in the midst of a data-driven revolution, which promises to transform traditional manufacturing facilities in to highly optimised smart manufacturing facilities. These smart facilities are focused on creating manufacturing intelligence from real-time data to support accurate and timely decision-making that can have a positive impact across the entire organisation. To realise these efficiencies emerging technologies such as Internet of Things (IoT) and Cyber Physical Systems (CPS) will be embedded in physical processes to measure and monitor real-time data from across the factory, which will ultimately give rise to unprecedented levels of data production. Therefore, manufacturing facilities must be able to manage the demands of exponential increase in data production, as well as possessing the analytical techniques needed to extract meaning from these large datasets. More specifically, organisations must be able to work with big data technologies to meet the demands of smart manufacturing. However, as big data is a relatively new phenomenon and potential applications to manufacturing activities are wide-reaching and diverse, there has been an obvious lack of secondary research undertaken in the area. Without secondary research, it is difficult for researchers to identify gaps in the field, as well as aligning their work with other researchers to develop strong research themes. In this study, we use the formal research methodology of systematic mapping to provide a breadth-first review of big data technologies in manufacturing.
KeywordsBig data Manufacturing Smart manufacturing Industry 4.0 Big data analytics Engineering informatics Machine learning Big data systems Distributed computing Cyber physical systems Internet of things, loT
Modern manufacturing facilities are data-rich environments that support the transmission, sharing and analysis of information across pervasive networks to produce manufacturing intelligence [1–3]. The potential benefits of manufacturing intelligence include improvements in operational efficiency, process innovation, and environmental impact, to name a few [4, 5]. However, similar to other industries and domains, the current information systems that support business and manufacturing intelligence are being tasked with the responsibility of storing increasingly large data sets (i.e. Big Data), as well as supporting the real-time processing of this ‘Big Data’ using advanced analytics [5–10]. The predicted exponential growth in data production will be a result of an increase in the number of instruments that record measurements from physical environments and processes, as well as an increase in the frequency at which these devices record and persists measurements. The technologies that transmit this raw data will include legacy automation and sensor networks, in addition to new and emerging paradigms, such as the Internet of Things (IoT) and Cyber Physical Systems (CPS) [1, 11, 12]. The low-level granular data captured by these technologies can be consumed by analytics and modelling applications to enable manufacturers to develop a better understanding of their activities and processes to derive insights that can improve existing operations.
The focus on big data technologies in manufacturing environments is a relatively new interdisciplinary research area which incorporates automation, engineering, information technology and data analytics, to name a few. At this point in time, it is important to understand the current state of the research pertaining to big data technologies in manufacturing, and to identify areas where future research efforts should be focused to support the next-generation infrastructure and technologies for manufacturing. Therefore, this study aims to classify current research efforts, derive prominent research themes, and identify gaps in the current literature.
This study employs the well-known and formal secondary research method of systematic mapping to capture the broad and diverse research strands currently related to big data technologies in manufacturing . The contribution of this study is a comprehensive report on the current state of research pertaining to big data technologies in manufacturing, including (a) the type of research being undertaken, (b) the areas in manufacturing where big data research is focused, and (c) the outputs from these big data research efforts. The research methodology employed in this study is guided by the systematic mapping process described by Peterson et al. .
The remainder of this paper is described as follows. In section 2, the research methodology and process used in the study (i.e. systematic mapping) are described. In section 3, the results of the study are presented. In section 4, the study results are discussed in detail. In section 5, the threats relating to the validity of the results are considered. Finally, in section 6 the conclusions from the research are presented and future areas of research are identified.
This study employed systematic mapping to capture the current state of the research relating to big data technologies in manufacturing. Compared with other secondary research methods, such as traditional literature reviews, a mapping study provides an approach that facilitates an investigation of great breadth, while sacrificing depth . In the context of this study, a systematic mapping method was deemed appropriate as it provided a formal and well-structured approach to synthesising material. This structure also served to provide a foundation for reducing bias and harmonising literature review efforts across the research team. Furthermore, the breadth-first perspective that can be derived by systematic mapping was especially useful for reporting on a new and pervasive area of research (i.e. big data in manufacturing) that currently lacks prominent and consistent theories. Indeed, it is the lack of strong research themes that makes a depth first literature review of the area a challenging undertaking.
The mapping process
At the beginning of the study, the initial research questions were agreed to provide a general scope for the study. Based on this scope, primary search terms and phrases were identified and used to find research papers listed in several digital databases. After the results of these searches were recorded, each paper was manually screened using a set of inclusion and exclusion criteria that attempts to identify papers that are aligned with the theme and scope of the study. Those papers that were deemed relevant to the study were further analysed to determine prominent keywords and phrases that could be used to classify the research being conducted in the area. Finally, the classified papers were aggregated, visualised and mapped in a manner that would enable us to answer the research questions posed in this study.
“How are big data technologies being used in manufacturing?”
To answer the main research question, five ancillary research questions that relate to various aspects of big data in manufacturing were identified. Decomposing and characterising the main research question in to smaller and more specific questions enables the topic to be considered from multiple perspectives, while also providing the results needed to answer the main research question. The additional research questions are described below.
RQ1: What is the publication fora relating to big data in manufacturing?
Rationale: The intention of this question is to illustrate the interest in the research area over time, as well as identifying the primary sources of literature in the field. This study assumes that the publication rate is indicative of research interest in the area, while the most prominent sources of research in the field are those journals and conferences that have the highest publication frequency of relevant literature.
RQ2: What type of research is being undertaken in the area of big data in manufacturing?
Rationale: The intention of this question is to highlight the type of formal research being undertaken in the area, ranging from philosophical perspectives, to real-world evaluations. By answering this question the study aims to understand the maturity level of the research area, with the assumption that research efforts that do not exhibit rigorous validation and evaluation may be indicative of a field that is still maturing and focused on developing methodologies to support future research efforts.
RQ3: What type of contributions are being made to the area of big data in manufacturing?
Rationale: The intention of this question is to understand the type of contributions and outputs from research efforts in the field. These outputs may vary greatly and range from information system architectures, to analytical tools and methods for process optimisation. By answering this question the study aims to further assess the maturity level of the field, with the assumption that early research efforts may focus on guidelines and methodologies, and more mature research areas may focus on implementing, evaluating and validating these methods. Furthermore, identifying trends and patterns in the research outputs in the field will also provide an understanding to the approaches used to solve specific challenges in the area.
RQ4: What type of analytics are being used in the area of big data in manufacturing?
Rationale: The intention of this question is to identify the prominence of big data analytics in the research relating to big data technologies in manufacturing, as well as classifying the specific type of analytics being used. In recent years, the term analytics has become synonymous with big data technologies. By answering this question the study aims to better understand the relationship between analytics and big data in the context of manufacturing. Furthermore, the classification of the different types of big data analytics used in research can provide an understanding of the types of problems being addressed.
RQ5: What areas of manufacturing are big data technologies being applied?
Rationale: The intention of this question is to highlight the different areas in manufacturing facilities where researchers are employing big data. By answering this question the study aims to highlight specific research themes, as well as identifying the areas of manufacturing operations that are striving to meet the challenges of large-scale data production and processing.
Main and candidate search terms for big data in manufacturing
Smart Manufacturing, Advanced Manufacturing, Industry 4.0, Cyber Physical Systems, Supply Chain, Factories, Factory, Production, and Process.
Large-scale Data, Cloud Computing, Machine Learning, Big Data Analytics, Data Virtualization, and Master Data Management.
Primary search string used for study
(Manufactur* OR Factor* OR Industry 4.0)
Search results from digital repositories
Number of publications
ACM Digital Library
Web of Science
Screening of research
To be considered for inclusion in the study, the research being evaluated had to originate from an academic source, such as a journal or conference, and clearly show its contribution was focused on big data in manufacturing, which was primarily determined by the presence of the primary search terms. Publications that met this criteria were then processed using exclusion criteria (i.e. filters), with the intention of highlighting the most relevant research in the area of big data in manufacturing.
Filter 1: remove publications that do not contain ‘manufacturing’, ‘factory’, ‘factories’ or ‘Industry 4.0’ in the title, abstract or meta-data section of the document.
Filter 2: remove publications that do not contain ‘big data’ in the title, abstract or meta-data sections of the document.
Filter 3: remove papers that only refer to ‘manufacturing’, ‘factory’, ‘factories’, ‘Industry 4.0’, or ‘big data’ as a fleeting point of reference. For example, many big data related papers cite the potential application of big data to manufacturing, without exclusively investigating the area.
Filter 4: review the introduction and discussion sections of each publication, and remove those that do not focus on, and contribute to, the area of big data in manufacturing.
Classification of research
All of the publications in the study were classified using four dimensions. These dimensions were chosen to provide different perspectives on the current state of research in the area, while also building a data set that could be used to answer each of the research questions highlighted in the study.
Type of research
Types of research 
Research that investigates novel and unique techniques but have not yet been implemented in real-world environments.
Research that includes a significant implementation of a given technique along with a complete evaluation.
Research that includes an illustration or example of a solution to a particular problem.
Research that provides a conceptual way of looking at a particular problem or field.
Research that expresses a personal opinion about whether a particular technique is good or bad, without focusing on related work or standard research methods.
Research that is written from the personal experience of the researcher, and describes how something was done.
Area in manufacturing
Areas in manufacturing 
Research focusing on the design of product for manufacturing activities.
Process and Planning
Research focusing on all aspects of process and planning, with a core emphasis on the reduction of waste and the increase of output yielded.
Research focusing on quality management in manufacturing environments.
Maintenance and Diagnosis
Research focusing on the health of machinery in manufacturing operations, ranging from predictive maintenance, to real-time diagnostics.
Research focusing on the scheduling, management and optimisation of activities and processes in manufacturing environments.
Research focusing on the control, management and optimisation of operations and processes in manufacturing environments.
Environment, Health and Safety
Research focusing on the factors relating to the environment, energy, as well as health and safety.
Research focusing on the realisation of virtual factories and processes.
Type of contribution
Types of contribution
Research that provides a theoretical view of how various components in a solution will sit together and interact.
Research that describes the encapsulation of multiple software libraries that solve a particular problem, while also being extensible.
Research that develops high-level guidelines and roadmaps for a particular problem.
Research that presents low-level approaches to solving a particular problem.
Research that produces mathematical models for solving particular problems.
Research that provides a system with hardware and software components, which enables applications to execute.
Research that presents low-level processes to solving a particular problem.
Research that develops well-defined software utilities that address a subset of a bigger problem.
Type of analytics
Types of analytics 
Research that is focused on describing the structure, relationships and meaning of data.
Research that is focused on predicting an outcome using the available data.
Research that is focused on prescribing actions using the available data.
RQ1 – What is the publication fora relating to big data in manufacturing?
RQ2 – What type of research is being undertaken in the area of big data in manufacturing?
RQ3 – What type of contributions are being made to the area of big data in manufacturing?
RQ4 – What type of analytics are being used in the area of big data in manufacturing?
RQ5 – What areas of manufacturing are big data technologies being applied?
RQ1 - What is the publication fora relating to big data in manufacturing?
The rationale behind this research question was to ascertain the level of research interest in the area, as well as highlighting prominent sources of primary research. The results clearly show that big data in manufacturing is an area of research that is experiencing exponential growth, with publications on the topic increasing by a multiple of ten between 2012 and 2014. Looking at the publication results in more depth, there is a correlation between the year-on-year growth in conference and journal publications. This correlation may be a result of early research efforts focusing on the development of short research papers for conferences, and at a later date, developing those papers in to in-depth journal papers. The results identified a number of the most prominent sources of research relating to big data technologies in manufacturing, with the Journal of Production Economics, and IEEE Conference on Big Data, being the top sources for journal and conference publications respectively. At present, research interest in the area of big data technologies in manufacturing is high, which is clearly illustrated by the year-on-year exponential growth in publications over the last number of years.
RQ2 - What type of research is being undertaken in the area of big data in manufacturing?
The rationale behind this research question was to understand the maturity of the research area. This was based on the assumption that philosophical research that focuses on theory, with no application or implementation, may be indicative of an area that is relatively immature and requires the development of theory to support future applications. This type of feed-forward process is evident in the results focusing on the type of research approaches being employed, where philosophical research is the most common, with a chronological lag in solution and evaluation research. Most notably, evaluation research, which is considered in this study to be the most mature type of research relating to technology implementations, is associated with the same number of publications in the Q1 2015 as it was in 2013 and 2014 combined. While big data technologies in manufacturing is a new area of research in chronological terms, the exponential growth shown in the results related to RQ1, coupled with the natural cascading of theoretical and philosophical research, with rigorous, empirical and demonstrable research, indicates that the area is developing rapidly. However, based on the findings in this study, the area may still be classified as being somewhat immature due to the high proportion of philosophical-based research, coupled with the low quantity of rigorous evaluation-based research.
RQ3 – What type of contributions are being made to the area of big data in manufacturing?
The rationale behind this research question was to further identify the maturity of the area by classifying the type of outputs originating from research, while also highlighting prominent current research themes and trends. The year-on-year data shows an increasing distribution and balance in the area. This may be interpreted as being indicative of a vibrant research community that is maturing and evolving. There appears to be a strong relationship between results in RQ2 and RQ3. In particular, the most prominent classifications in both sets of results are largely analogous, namely philosophical-based research, and theory-based contributions. As this area of research is relatively new and immature, there is an emphasis on developing theories that can be used by future research efforts to solve particular problems in the field. Indeed, the next most prominent research outputs after theory are frameworks and platforms. These types of outputs can be viewed as a midway point between theory and application, as they are developed on a theoretical foundation (e.g. design or architecture) and facilitate the development of applications and systems. One anomaly in the results showed that there was a lack of journal papers identifying platforms as their output, when compared to that of conferences. This could simply be a result of the term ‘platform’ being more prominent in one community (e.g. industry conferences), versus another community (e.g. academic journals). However, investigating the anomaly further is not warranted in this study given that it is not critical to answering the research question. As previously alluded to, the main themes in the research contributions overlap to some extent with the findings from RQ2. Based on these results, it appears that there is a theoretical base being developed to progress the research area, with technologies being developed to implement those theories.
RQ4– What type of analytics are being used in the area of big data in manufacturing?
The rationale behind this research question was to identify the extent that analytics is being used with big data technologies in manufacturing, as well as understanding the type of problems being solved. The results show that about half of big data in manufacturing research employs some form of analytics. This is interesting in the sense that it confirms that big data technologies are being used independent of analytics, and the terms should not be used synonymously. The most common type of problems handled by big data analytics is prediction accuracy, which is a desirable quality in decision-making. The prominence of predictive analytics may be attributed to the presence of theories and methods pertaining to prediction from other fields (e.g. statistics), and the applicability of prediction analytics to real-world problems. In contrast, the lack of prescriptive analytics is evident from the results. However, this can be attributed to the difficulty in constructing prescriptive applications. Prescriptive applications are inherently complex when compared with descriptive and predictive analytics, given the need to align technology, modelling, prediction, optimisation, and subject matter expertise. Therefore, given the area of big data in manufacturing is still in its infancy, it is little surprise that only a few of these highly complex prescriptive analytics applications have emerged.
RQ5– What areas of manufacturing are big data technologies being applied?
The rationale behind this research question was to highlight research themes relating to big data technologies in manufacturing, with a particular emphasis on understanding the type of manufacturing problems that are being addressed by big data. The results indicate that process and planning in manufacturing is currently receiving the most research interest, with diverse cross-departmental enterprise applications, and maintenance and diagnosis making up 74.6 % of publications in this study. However, given research classified as ‘enterprise’ is comprised of diverse applications that address a broad range of topics in manufacturing, it should not be classified as a significant research theme for big data in manufacturing. In keeping with the results from RQ2 and RQ3, there is a strong emphasis on theoretical research, as well as the development of frameworks, platforms and architectures to realise those theoretical foundations.
Threats to validity
As with any secondary research methodology, the process of systematic mapping is not infallible, and there are indeed a number of threats to the validity of this study. To this end, every effort was made to mitigate potential risks throughout the process. The threats to the validity that were identified are described in this section.
The search criteria used to acquire papers for this study was chosen collectively by participating researchers. The choices relating to the search criteria were driven by (a) the agreed scope of the research, (b) the research questions that needed to be answered, as well as (c) the relevance of papers returned from testing various search combinations. However, while upmost care was taken to choose the most appropriate search strings for the study, there is an inherent risk that this could restrict the discovery of papers that did not meet the search criteria. Although based on the experience of the research team, the sophistication of the search facilities in modern digital databases, coupled with the availability of publication meta-data, as well as the commonality of the terms incorporated in our search string, would suggest that the risk of omitting relevant papers was at least minimised.
The research team selected the digital databases to acquire papers for the study. These databases were selected using a combination of prior knowledge relating to engineering and technology research, as well as noting prominent databases used in closely related fields. Therefore, if a particular digital repository was not searched, there is a risk that relevant papers would not be included in the study. However, the amount of overlap experienced in the search results across different types of digital repositories provided a level of redundancy. More to the point, if a particular digital repository was not used in the study, there is a realistic chance that the research which it holds will either be indexed by another source that is being used, or indeed, be discovered by following the references from each papers in the study (e.g. snowballing).
As specified in the research methodology, there was an issue with constructing an appropriate search string for Google Scholar. While other databases enabled the construction of searches to interrogate titles, abstracts and keywords using Boolean logic, Google Scholar was limited to searching by title or full text. When a full text search was carried out a total of 9540 records were returned, which is obviously too much data to analyse for a study of this size. Therefore, the search by title option was chosen as it returned a manageable 14 publications. By choosing this search approach for Google Scholar, there is a risk that publications with abstracts and keywords that match the study’s search criteria may have been omitted.
Inclusion and exclusion criteria
The criteria defined for inclusion and exclusion in this study stemmed from discussions within the research team, where the rules and conditions that were deemed to be aligned with the scope of the study were identified. Creating rules to identify the initial literature to review means that there is a risk that relevant research may be omitted if it utilises different terminology to that of the inclusion/exclusion criteria. However, the study’s primary search terms, namely manufacturing and big data, are conventional, well-defined and accepted terms, which should reduce the number of publications omitted due to authors using synonymous terms. Furthermore, as the study is focused on identifying the main research in the area of big data in manufacturing, there is not as much of a concern with capturing research that is very loosely related to the topic.
There is a risk that the research teams labelling and categorisation of the research in the study may be different to that of another researcher. To reduce individual bias, and gain confidence in the accuracy of our classification process, each researcher in the team was asked to classify each publication. The results of this classification process was then analysed, with those publications that were classified the same being labelled immediately, and those with differing classifications subject to a review meeting to determine the most relevant classification.
Conclusion and future work
At the time of writing, this is the only research effort focusing on the systematic mapping of big data technologies in manufacturing. The research presented in this paper provided a breadth-first review of the research relating to big data in manufacturing to promote a better understanding of a new and pervasive area. In particular, several fundamental research questions that are relevant to current research efforts focusing on big data in manufacturing were answered, while also providing an excellent platform for further research and investigation in the area. In particular, it is logical that future work should focus on the development of systematic and literature reviews that are aligned with the areas of manufacturing identified in this study, such as the creation of a systematic review of big data in manufacturing that is focused on maintenance and diagnosis. The combination of these reviews, coupled with the systematic mapping presented in this research, can serve to provide a complete perspective of the primary research relating to big data in manufacturing.
The authors would like to thank the Irish Research Council and DePuy Ireland for their funding of this research under the Enterprise Partnership Scheme (EPSPG/2013/578).
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Davis J, Edgar T, Porter J, Bernaden J, Sarli M (2012) Smart manufacturing, manufacturing intelligence and demand-dynamic performance. Comput Chem Eng 47:145–156View ArticleGoogle Scholar
- Chand S, Davis J (2010) What is smart manufacturing? Time magazineGoogle Scholar
- Lee J, Kao HA, Yang S (2014) Service innovation and smart analytics for Industry 4.0 and big data environment. Procedia CIRP 16:3–8View ArticleGoogle Scholar
- Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA (2014) Data quality for data science, predictive analytics, and big data in supply chain management: An introduction to the problem and suggestions for research and applications. Int J Prod Econ 154:72–80View ArticleGoogle Scholar
- Fosso Wamba S, Akter S, Edwards A, Chopin G, Gnanzou D. How ‘big data’ can make big impact: Findings from a systematic review and a longitudinal case study. Int J Prod Econ. 2015;165:1–13.
- Lee J, Lapira E, Bagheri B, Kao H (2013) Recent advances and trends in predictive manufacturing systems in big data environment. Manuf Lett 1(1):38–41View ArticleGoogle Scholar
- Kumar P, Dhruv B, Rawat S, Rathore VS (2014) Present and future access methodologies of big data. Int J Adv Res Sci Eng 8354(3):541–547Google Scholar
- McKinsey, “Big data: The next frontier for innovation, competition, and productivity,” 2011.
- Philip Chen CL, Zhang C-Y (2014) “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data,”. Inf Sci (Ny) 275:314–347View ArticleGoogle Scholar
- Vera-baquero A, Colomo-palacios R, Molloy O (2014) “Towards a process to guide Big data based decision support systems for business processes,”. In: Conference on ENTERprise information systems towards, vol 00., p 2212Google Scholar
- Lee J, Bagheri B, Kao H (2015) A cyber-physical systems architecture for industry 4. 0-based manufacturing systems. Manuf Lett 3:18–23View ArticleGoogle Scholar
- Wright P (2014) Cyber-physical product manufacturing. Manuf Lett 2(2):49–53View ArticleGoogle Scholar
- Petersen K, Feldt R, Mujtaba S, Mattsson M (2008) “Systematic mapping studies in software engineering,” EASE’08 Proc. 12th Int Conf Eval Assess Softw Eng., pp. 68–77.
- Budgen B, Turner D, Brereton M, Kitchenham P (2008) Using mapping studies in software engineering. Proc PPIG 2:195–204, 2088Google Scholar
- Wieringa R, Maiden N, Mead N, Rolland C (2006) Requirements engineering paper classification and evaluation criteria: a proposal and a discussion. Requir Eng 11:102–107View ArticleGoogle Scholar
- Meziane F, Vadera S, Kobbacy K, Proudlove N (2000) Intelligent systems in manufacturing: current developments and future prospects. Integr Manuf Syst 11(4):218–238View ArticleGoogle Scholar
- Delen D, Demirkan H (2013) Data, information and analytics as services. Decis Support Syst 55(1):359–363View ArticleGoogle Scholar