Skip to main content
  • Survey Paper
  • Open access
  • Published:

A survey on driving behavior analysis in usage based insurance using big data


The emergence and growth of connected technologies and the adaptation of big data are changing the face of all industries. In the insurance industry, Usage-Based Insurance (UBI) is the most popular use case of big data adaptation. Initially UBI is started as a simple unitary Pay-As-You-Drive (PAYD) model in which the classification of good and bad drivers is an unresolved task. PAYD is progressed towards Pay-How-You-Drive (PHYD) model in which the premium is charged for the personal auto insurance depending on the post-trip analysis. Providing proactive alerts to guide the driver during the trip is the drawback of the PHYD model. PHYD model is further progressed towards Manage-How-You-Drive (MHYD) model in which the proactive engagement in the form of alerts is provided to the drivers while they drive. The evolution of PAYD, PHYD and MHYD models serve as the building blocks of UBI and facilitates the insurance industry to bridge the gap between insurer and the customer with the introduction of MHYD model. Increasing number of insurers are starting to launch PHYD or MHYD models all over the world and widespread customer adaptation is seen to improve the driver safety by monitoring the driving behavior. Consequently, the data flow between an insurer and their customers is increasing exponentially, which makes the need for big data adaptation, a foundational brick in the technology landscape of insurers. The focus of this paper is to perform a detailed survey about the categories of MHYD. The survey results in the need to address the aggressive driving behavior and road rage incidents of the drivers during short-term and long-term driving. The exhaustive survey is also used to propose a solution that finds the risk posed by aggressive driving and road rage incidents by considering the behavioral and emotional factors of a driver. The outcome of this research would help the insurance industries to assess the driving risk more accurately and to propose a solution to calculate the personalized premium based on the driving behavior with most importance towards prevention of risk.


An accident is defined as an unfortunate incident that happens unexpectedly and unintentionally, typically resulting in damage or injury. Considering all the consequences that could eventuate after an accident, there are reasons to believe that a normal person does not drive with an ex-ante intention to cause an accident. Holding a valid driving license is pre-requisite to drive in any part of the world and during the licensing process, people are educated about the driving rules and safety measures to be followed. Notwithstanding all these, accidents happen and surprisingly, human factor is attributed to be the foremost reason causing the accidents. Reasons such as distraction, drunkenness, speeding, running red lights and stop signs, recklessness, road rage, aggressiveness and drowsiness are ranked among the topmost human factors.

The transport with self-driving vehicles promises to cut down the human factor, but the common usage of them are many decades away. Till then, the total number of passenger car registrations will continue to rise. The growing trend in the number of passenger car registrations across the world is shown in Table 1.

Table 1 Number of passenger car registration worldwide

The numbers of car registrations are around 800 million in the year 2011 and it has burgeoned to around one billion in the year 2018. It is estimated that this number might cross around 2 billion by the year 2050 [1]. With more vehicles congesting the roads, the probabilities of accidents are also increasing proportionately. The statistics of road traffic crash based on the geographical location is mentioned in Fig. 1. For example, road traffic fatalities per 1,00,000 population is 10.3 for Europe, 18.5 for South East Asia and 24.1 for Africa.

Fig. 1
figure 1

Road traffic crash

Worldwide mortality as a result of road accidents from 2013 to 2018 [2] is shown in Fig. 2. However, this does not reflect the real picture of all the accidents as the deaths of pedestrians and cyclists due to the accidents are not included in the statistics. If those numbers are also included, then the overall mortality figures would be higher. 2018 statistics of World Health Organization [3] points out that

Fig. 2
figure 2

Number of deaths year wise in road accidents

  • The number of people died in the road crashes each year is reported to be around 1.3 million with an average of 3287 deaths a day and 20–50 million people injured or disabled every year.

  • More than half of all road traffic deaths occurred among young adults aged 15–44.

  • Road traffic crashes were ranked as the 9th leading cause of death and accounts for 2.2% of all deaths globally.

  • Unless some remedial action is initiated, road traffic injuries would likely to become the fifth leading cause of death by 2030.

The causal reasons for accidents are classified into three categories: Bad weather or bad infrastructure (rain, potholes on the road), vehicle malfunctioning (manufacturing defects or wear and tear) or human factors (physiological or behavioral). While the physiological mistakes are happening due to driver fatigue, drowsiness, behavioral mistakes could take many forms such as distracted driving, drunk driving, aggressive driving, road rage, hard acceleration, hard braking and cornering and speeding. Aggressive driving and road rage are a priori behaviors that are potentially leading to fatal or non-fatal road accidents, incidents of physical violence and even murders.

Aggressive driving involves driving the motor vehicle in an unsafe and hostile manner without regards for others which includes unsafe behavior in road such as making frequent or unsafe lane changes, running red lights and stop signs, wrong-way driving, improper turns, tailgating, disrespecting traffic controls.

Road rage is an angry driving behavior exhibited by the driver, which includes making rude gestures, making physical and verbal threats, and exhibiting dangerous driving methods targeted towards another driver in an effort to intimidate or release frustration.

The increase in road accidents proportionately with increases with the frequency of insurance claims made by policyholders. The primary reason for insurers to introduce UBI is to bring in some realistic and correct measurability to ascertain the risk where the customers are exposed to and charge a risk-based premium suggested by an actuary. The premium charging method implies that policyholders who exhibited higher risk during driving need to pay a higher premium. The PAYD model follows a simple premise that a policyholder who drives the vehicle for more miles during a year exposes the vehicle to more hours of on-road risk and consequently the risk of an accident is more than that of a policyholder who drives for lesser miles during a year. The milometer readings are studied to factor in the risk exposure and the motor insurance premium.

The growth of connected technologies leads to matured UBI solutions and well-developed telematics solutions that closely assess driving behavior patterns and factors them for pricing each customer in a personalized way [4]. Over the years, the method of driver behavior data collection, the parameters collected and the frequency of collection are changing. The insurance companies are receiving a large volume of driver behavior data as a result of data collection. Four different ways are suggested and listed for data collection from the customers:

  1. 1.

    Black box: An electronic device installed in car to record information related to vehicle crashes or accidents and has one-way outward interaction after crash.

  2. 2.

    Dongle: An electronic device that allows a server to access the vehicle network. The insurers will install the device into the vehicle and will have only one-way interaction.

  3. 3.

    Embedded: Car manufacturers provide embedded telematics equipment for vehicles such as remote diagnostics device, navigation sensors and infotainment services.

  4. 4.

    Smartphones: Smartphone based solutions for telematics is the latest addition to the existing methods. Smartphones work either as stand-alone device or gets linked to information system of vehicle to transmit a variety of information from the car. Smartphones could be used in the following means to gather data:

    • Smartphone sensors: In-built sensors such as accelerometers, gyroscopes, and magnetometers are used. The advantages of the in-built sensor usage are the less cost and less complexity of implementation to obtain the behavioral events such as hard acceleration and hard braking. However, the drawbacks of the system are the maintenance of stable position for the smart phones and identifying the hard cornering event. Table 2 shows the sample of research papers where in-built smartphone sensors are used.

      Table 2 Parameters used in research papers
    • Global Positioning System (GPS) Data: With the help of GPS signal, values of speed, latitude, longitude, course and altitude are retrieved. Driver behavior events such as hard acceleration, hard braking, hard cornering and over speeding are detected from these values by using big data technologies and different algorithms. The advantages are the possibility of unstable location of smart phones and identification of the hard cornering events. The drawbacks are the complex implementation and high cost of computation. Table 3 shows the sample of industry solutions where GPS data are used.

      Table 3 Parameters used in industry solutions

Most of the research papers discuss the use of both GPS and in-built sensors to collect the driving data. In order to avoid the limitations of using in-built sensors, the plan is to use GPS to collect the data from drivers in the proposed solution. Immediately after a driver starts a journey, the server will start receiving the various attributes of the behavior data in regular intervals. The data from all the drivers at any given moment will be so large that none of the traditional data management tools will be able to store or process it efficiently. Big data technology makes it possible to handle this data deluge comprising huge volumes, high velocity, and veracity. While the existing telematics solutions are focused more on driver aberrations such as over speeding, hard acceleration, and hard braking, this detailed survey on MHYD found that there is a need for a solution to detect aggressive and road rage drivers while driving the car.

The proposed solution is used to identify the aberrations using big data and machine learning technologies. The outcome of this research is to provide alerts to drivers to improve safety during inferred incidents of aggressive driving and road rage and calculate personalized motor insurance premiums.

The rest of the paper is organized as follows: “Introduction” section provides introduction to the insurance and big data technologies. “Background and overview” section provides a brief introduction to big data and insurance, types of insurance, UBI and different types of UBI are presented. “Related works” section outlines introduction to models of UBI and the classifications of MHYD such as driving pattern monitoring, fatigue monitoring, drowsiness detection, and driver distraction are presented. “Proposed solution for personalized premium calculation” section proposes a solution to detect the driving behavior and the benefits are highlighted. “Conclusion and future research directions” section offers the conclusions and future directions.

Background and overview

Big data is generally characterized by the “three V’s” principle put forward by Laney in 2001: increasingly huge volume of data, variety of data that includes raw, unstructured and semi-structured data, and velocity of the data that denotes the fact that these data are produced, harvested and analyzed in real-time. Gartner defines big data as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Over the years, various other definitions have evolved making the number of V’s increase from three to ten. Figure 3 shows the ten characteristics of big data [5] introduced by Firican in 2017.

Fig. 3
figure 3

Characteristics of big data

  1. 1.

    Volume: Representation of the size of data, which plays a crucial role in determining values out of data. Insurance companies receive huge volume data on every second from different drivers, which helps to accurately predict the driving behavior.

  2. 2.

    Velocity: Representation of the speed of data generation from various insurance customers. Driving data is created, saved, analyzed and visualized at an increasing speed, making it possible to predict the driving behavior and visualize high volumes of data in the real driving environment.

  3. 3.

    Variety: Representation of the heterogeneous data formats and the nature of data, both structured and unstructured data. Insurance companies managed the driving data by using spreadsheets and traditional databases during earlier days. Insurance industry is facing challenges with regard to storage, mining and analyzing driving data at present.

  4. 4.

    Veracity: Representation of quality or trustworthiness of the data. The data collected by the insurance company should be accurate to generate value and also helps the enterprise to predict the driving behavior accurately.

  5. 5.

    Value: Representation of business value from the data. The insights gleaned from big data can help insurance companies to derive values towards customer engagement, optimize operations, prevent threats and fraud, and capitalize on new sources of revenue.

  6. 6.

    Visualization: Representation of data in pictorial form to visualize the huge volume, velocity, variety of data. The insights or the data gained by the insurance company are shared to the insurance customers.

  7. 7.

    Variability: Representation of data in multiple ways. The challenges of big data for insurance companies not only arise from the sheer volume of data but also from the fact that the data are generated in multiple forms as a mix of unstructured and structured data.

  8. 8.

    Validity: Representation of accuracy and correctness of data. The personalized premium generated by the insurance companies should be accurate.

  9. 9.

    Vulnerability: Representation of the frailty that makes the possible threat to become an attack. The changes made in the big data by insurance company should not harm the system or the premium calculation engine.

  10. 10.

    Volatility: Representation of availability of data. Due to the velocity and the volume of big data in the insurance company, its volatility needs to be carefully considered. Need to establish rules for data by insurance company to ensure rapid retrieval of insurance customer information when required.

The growth of technologies around creating, transmitting, storing, and analyzing the data have made giant strides in recent years. The manifestation of fourth industrial revolution is reflecting in an exponential growth in the volumes of available data, which leads to significant improvement in the computational power, storage, and development of sophisticated algorithms to glean insights. Industries from all verticals including insurance are making huge investments towards research and innovation with respect to big data processing and analytics. Figure 4 shows the increasing investments in big data from 2012 to 2018. Insurers all over the world are planning to increase their investments towards big data over the next 3 to 5 years [6].

Fig. 4
figure 4

Big data investments between 2012 and 2018

Consequent to the impact of big data and the insight it provides using beautiful visualizations, companies are making changes in their core business propositions, as well as products and services. IBM [7] has surveyed on the latest trends in big data and found that 74 percent of insurance companies are having a yearly report which highlights the use of big data processing and analytics for creating a competitive advantage over other industries. Study done by Madan [8] reveals almost 2.5 quintillion bytes of data are created each day by various sources.

The concept of insurance revolves around predicting future risk events, calculating the losses that could arise out of them and the premium that they need charge for the underlying risk transfer. The types of insurance can be broadly classified as life or property and casualty (non-life) insurance depending on the nature of the subject of insurance covered. Based on the type of the customer, whether an individual or company, they can be further categorized as personal or commercial lines. Insurers are always depending on several types of data from a multiple source to infer their causal and correlative association with risk events.

The advances in big data and analytics are now transforming the insurance industry. They can leverage big data for identifying more accurate causal associations. In motor insurance, insurers have traditionally looked at the customer profile, historical claims and information from public records such as traffic police fines or violations for insights to classify policyholders as good or bad drivers. UBI is a recent innovation in auto insurance that originated because of the explosion of telematics data. Instead of depending on the proxy and spurious correlations, insurers can now know how a person exactly drives a vehicle and calculate the premium accurately. Initially UBI is started with personal lines catering to individual customers and is now expanding to commercial lines where it is applied to commercial fleet insurance.

UBI can be categorized into the following three types

  1. 1.

    Pay-As-You-Drive (PAYD): The premium is calculated based on the number of miles driven based on the milometer readings. The only parameter considered is milometer reading and there is no distinction between good and bad drivers resulting in both of them paying the same premium.

  2. 2.

    Pay-How-You-Drive (PHYD): The premium is calculated based on the customer’s driving pattern such as over speed, hard acceleration, hard braking, and hard cornering, etc. Driver score is arrived at the end of each and every trip for the particular driver. The individual scores are normalized for the policy year to arrive at the overall driver behavior to calculate the premium and discount.

  3. 3.

    Manage-How-You-Drive (MHYD): The premium is calculated based on the same way as PHYD in addition to the real-time alerts and suggestions to the driver for ensuring safety.

Over the years PAYD and PHYD models have reasonably stabilized with the emergence of a few dominant designs, however MHYD is still evolving and more research is in progress as mentioned in Fig. 5. A thoroughgoing survey is presented in the further sections specifically on four categories (pattern monitoring, fatigue monitoring, and drowsiness detection and driver distraction) of MHYD. Also, a solution is proposed to calculate the personalized premium using the driving behavior with MHYD under UBI. The focus of this paper is only on the MHYD model.

Fig. 5
figure 5

Classification of UBI

Related works

Recent studies have proved that customer satisfaction is more in the case of good drivers who participate in UBI programs [9]. UBI programs reward the best drivers and help insurance companies not only to classify and price them appropriately but also to engage with them to build better relationships. In this section, a review of the existing works on MHYD is presented and the classification is shown in Table 4.

Table 4 Classification of MHYD

The summary of monitoring approach for every classification of MHYD may vary from one another as shown in Fig. 6. Driving pattern monitoring uses On-Board Diagnosis (OBD), Smartphone sensors or GPS to capture the data from the real driving environment.

Fig. 6
figure 6

Summary of monitoring approaches

Driving pattern monitoring

The optimal way to prevent accidents is to monitor the driving pattern and alert the driver in case if there is any abnormal event. The policyholder could get additional discounts on the premium for good driving behavior. There are three classifications of driving pattern monitoring such as pattern monitoring using GPS, pattern monitoring using mobile phone sensors and pattern monitoring using OBD which are summarized in the further sub sections

Pattern monitoring using GPS

IBM Corporation [6] has developed geo-fencing services, which helps the teen and commercial drivers not to go beyond the defined boundary. If the driver goes beyond the boundary, then the vehicle displays an alert on the car’s dashboard screen and the designated contact also receives an alert. The parameters used are speed, hard braking, hard acceleration and hard cornering. In addition to the alerts, one more insurance capability called “Next generation First Notice of Loss” is also provided.

AllState (Leading US Insurance Company) [10] has introduced a solution called Drivewise, which is a way for drivers to get rewarded for everyday safe driving. The solution records the time and location of the vehicle during trips, the number of trips per day, the speed at which the vehicle is traveling, hard braking and mileage using big data technologies. After adaptation of Drivewise solution, Allstate has reported that the usual claims are reduced by 12% and the focus is only on PHYD. In the proposed solution, the plan is to focus on MHYD in addition to PHYD.

TD Insurance (Leading Canada Insurance Company) [11] has adapted a solution called TD My Advantage, which collects and analyzes driving data and assigns a driving score for each trip. The insurance premium is calculated based on the assigned score. The solution records speed, hard braking, acceleration, and cornering using big data technologies. Even though the solution records all the behavioral parameters, it considers only the speed parameter to alert the driver. In the proposed solution for personalized premium calculation, all the behavioral parameters will be used to alert abnormal drivers.

Progressive (Ranked one of the best insurance companies in the United States) [12] has implemented an UBI program called Snapshot. The program personalizes the insurance amount, based on the actual driving behavior. In fact, Snapshot rewards the average driver with a $130 discount. Plus, they get an automatic discount just for signing up. This program’s focus is only on the post-trip analysis and user rewards. In our proposed model, proactive alerts to guide the driver during the trip will be provided while the trip is in progress.

State Farm (Large group of Insurance and Financial Services companies in the United States) [13] has activated an UBI Program called “Drive Safe and Save”. The recorded data includes hard acceleration, hard braking, speeding and time of the day the vehicle is driven. The UBI solution focuses on PHYD and also provides some additional services like roadside assistance, maintenance alert and stolen vehicle locator. In the proposed research, an enhanced MHYD solution will be provided in addition to PHYD.

Nationwide (Leading US insurance company) [14] has presented a solution for the UBI program called SmartRide. It analyzes the collected data and gives personalized feedback to help the driver to drive safely besides providing discount up to 5% to 40% for the good drivers. The parameters used are only hard braking and hard acceleration. In this program, there are no real-time notifications and the drivers will get the feedback only after the trip is over. In our research, we plan to provide real-time notification for abnormal drivers during the trip when the abnormal event occurs.

Pattern monitoring using mobile phone sensors

Yu et al. [15] have proposed a system called “Fine-grained abnormal driving behavior detection and identification system, D3” to detect real-time high-accurate abnormal driving behavior. SVM and Neural Network algorithms are used to detect the abnormality. The authors collect 6-months driving traces from real driving environment. The parameters used are hard cornering and hard braking. D3 achieved an average total accuracy of 95.36 percent with SVM classifier model, and 96.88 percent with NN classifier model. To improve the accuracy, the plan is to use minimum 12 months of driving traces from the real environment in the proposed solution for personalized premium calculation.

Shi et al. [16] have employed a solution by normalizing driving behavior based on personalized driver model. Considering only the speed parameter, K-means clustering and neural network algorithms are used. The authors use only the simulated data to test the system and there is a lack of real-time driving data. The authors use simulated data for driving behavior detection. In the proposed solution the possibilities of using driving traces from the real environment are well explored.

Liu et al. [17] have designed a system called “Deep Sparse Auto Encoder (DSAE)” which extracts the hidden features for visualization using driving behavior visualization method called a driving color map that maps the extracted 3-D hidden feature to the red, green and blue color space. The generated colors do not tend to appear biased, e.g., reddish, bluish, or others. Visualization can yield different results on rotation in the color space even if it uses the same data. The parameters used are hard braking and hard cornering. Deep learning algorithms are used. In DSAE, the authors have considered hard braking and hard cornering. In the proposed research, the plan is to use hard acceleration, hard cornering, hard braking and speed with different machine learning algorithms.

Daptardar et al. [18] have experimented on a new technique by using Hidden Markov Model (HMM). This is to detect lateral maneuvers and Jerk Energy based technique to detect longitudinal maneuvers. The parameters used are hard acceleration and hard braking. Only android version is available, and the accuracy of the system is 95%. In the energy based HMM research, the authors have considered the behavioral factors alone. In the proposed solution for premium calculation, both the behavioral and emotional factors will be used with different machine learning algorithms to improve the accuracy.

Zhao et al. [19] have applied the PHYD model to Insurance. The UBI is introduced into personal motor insurance with premium assessment based upon the time of usage, distance driven and the driving behavior. UBI technology used the dongle, black box, embedded devices and smartphone. UBI offered cashback, premium discount, and value-added services. Cars with telematics device captured the driving behavior information. The data is analyzed and sent to the insurance company, which is used to calculate the customer’s premium. Pricing is done based on kilometers driven, speeding, sharp parking and sudden acceleration per 100 km driven. The focus is towards PHYD. The proposed solution for personalized premium calculation focuses on MHYD with live voice alerts.

Tselentis et al. [9] have offered a solution towards usage-based motor insurance for PAYD and PHYD. Drivers need to pay a premium based on their driving behavior and degree of exposure. Financial incentive is given to drivers to improve their driving behavior such as reducing the number of harsh braking and acceleration events taking place or reducing their degree of exposure such as their annual mileage and the time of day travelling which reduces the traffic risk considerably. The authors focused on PAYD/PHYD and providing personalized premium to the customers with less emphasis on safety aspects. The proposed model focuses on personalized premium calculation with great emphasis on safety aspects as well.

Hu et al. [20] have devoted research efforts towards the design of a personalized driver model by using a locally designed neural network and the real-world Vehicle Test Data (VTD). Besides, an abnormality index is proposed to quantitatively evaluate the abnormal driving behavior. The parameters used are speed, hard brake, and hard acceleration. The authors also explained that blood pressure and the blood alcohol level which are also useful physiological signals for indicating abnormal behavior. The importance of using behavioral, psychological, environmental, and emotional factors to detect abnormal driving behavior is discussed in detail. Lack of real-time driving data is considered to be the drawbacks of VTD based system. The proposed system for personalized premium calculation explores the possibility of including emotional factors along with the behavioral factors for driving behavior detection.

Zhou et al. [21] have identified the aggressive/risky driving behavior patterns on horizontal curves using real field Basic Safety Messages (BSM) data. The parameters used are hard acceleration and hard braking. Private Usage-Based Scoring (Pri-UBS) algorithm and Probabilistic Usage Data Audition (Pro-UDA) protocol are used to identify the abnormality. The authors well stated that many environmental factors such as real-time traffic and traffic regulations could influence the driving speed.

Pattern monitoring using OBD

Bergasa et al. [22] have constructed DriveSafe mobile application for iPhone using the mobile camera, microphone, GPS and Sensors (Accelerometer, Gyroscope). Drivesafe uses lane drifting/weaving, acceleration, braking and turning events to analyze the driving behavior and calculate the driving score and detects inattentive driving behaviors thereby generating alarms during unsafe situations where as making the driving style safe at the same time. DriveSafe classifies the drivers into two categories normal and abnormal. The proposed solution for personalized premium calculation will classify the drivers into four categories based on their driving behavior and health condition to improve the safety of drivers.

Zhang et al. [23] have built a system called SafeDrive, to detect abnormal driving behaviors from large-scale vehicle data State Graph (SG). The parameters used are hard acceleration and hard braking. The accuracy of SG system is 93%. In SG based research, behavioral factors are considered with less importance given towards personalized premium calculation. The proposed solution for personalized premium calculation addresses the important objective of abnormal driving behavior detection using behavioral and emotional factors.

Nai et al. [24] have represented the UBI using the Fuzzy Risk Mode and Effect Analysis (FRMEA) method. Analyzing the driving style from the raw data collected by OBD module assesses risk level of each insured vehicle. Risk modes used are jerking low speed, always speed changing and jerking high speed. The parameters used are speed, hard acceleration, hard braking. Since the authors used the OBD device to collect the raw data, alerting the abnormal drivers are not considered. The proposed solution plans to use the GPS to collect the raw data and alert the abnormal drivers.

Several methods are proposed by different researchers to monitor the driving pattern. Researchers around the world have attempted to capture the real driving data by using OBD device, mobile sensors or GPS data. The parameters used by most of the researchers are speed, hard braking, acceleration, and cornering. In all the research works, the algorithms used to monitor the driving pattern are different and each one had its own set of advantages and disadvantages. The research paper [6] developed a solution for teen monitoring. The research papers [9, 10, 19] focused on providing the reward points based on driving behavior. The research papers [11,12,13,14] focused on providing personalized premium based on the driving behavior. The research papers [15,16,17,18] have used machine learning, big data and deep learning algorithms to classify the driving behavior. The research papers [20, 23] have suggested to consider other factors apart from behavioral factors to detect driving behavior.

Fatigue monitoring

Fatigue monitoring is the act of using technology to monitor the behavior of a driver to determine their level of fatigue while driving the car [25]. The benefit of fatigue monitoring includes improved decision-making and response times with increased productivity and a considerable reduction in accident severity. Various studies have suggested that around 20% of all road accidents are fatigue-related [26]. The following are some of the related works in this space:

Warwick et al. [27] have investigated a system to calculate the fatigue level by using an eye image. If there are three close frames out of five consecutive frames, then an alert is issued to drivers. The method is simple and the accuracy is less since it uses only one source of data like eye image.

Podder et al. [28] have discussed and developed a fatigue monitoring system by using machine vision and Adaboost algorithm. Face and eye classifiers are used. Image preprocessing, face detection, eye state recognition, and fatigue evaluation are used to identify the fatigue level. This system is complex in terms of implementation.

Chaitali et al. [29] have tested a driver fatigue detection and monitoring system using smartphone. Driver fatigue state is estimated by using eye blinking rate of the driver, yawning detection by tracking mouth, head rotation detection and gaze tracking for detecting driver distraction and stress detection from driver’s facial expression tracking. Smartphone front camera to take the driver’s image and back end camera are used to provide traffic sign detection. Big data and open computer vision technologies are used to track face from images.

Qiao et al. [30] have illustrated a fatigue monitoring system by using eye, mouth and face images. Mobile built-in camera is used to detect the driver’s eyes. Face and eye blink are captured using Haar-like technique and mouth detection for yawning with Canny Active Contour Method. The recorded real-time video is separated as frames and then processed for real-time.

Hu et al. [20] have generated a machine-learning model to detect the abnormal driving by analyzing normalized driving behavior. Three typical abnormal driving behavior patterns are characterized and simulated, namely the fatigue/drunk, the reckless, and the phone use while driving. Only simulated data is used. Based on the analysis of normalized driving behavior, an abnormality index is proposed.

Mandal et al. [31] have proposed a system and it consists of modules of head-shoulder detection, face detection, eye detection, eye openness estimation, fusion, drowsiness measure percentage of eyelid closure estimation and fatigue level classification. The system is considered to be easy and flexible for deployment in commercial vehicles. The approach may not be suitable for private vehicles such as private cars.

Al-Sultan et al. [32] have studied a driver behavior detection using a context-aware system in Vehicular Ad hoc Networks (VANETs) to detect abnormal behavior exhibited by drivers and to warn other vehicles. VANETs used Dedicated Short-Range Communication (DSRC) to allow vehicles in close proximity to communicate with each other or to communicate with roadside equipment. Normal, drunk, reckless and fatigue are used to monitor the driver. Fatigue is estimated by using the eye movements; reckless driving is estimated by driver acceleration; intoxication and drunkenness are estimated by controlling speed. Speed sensor, accelerometer sensor, GPS, cameras, alcohol sensor are used alongside the coordination with traffic management centers which provided information relating to traffic, weather, road conditions and adaptive hello message (vehicle alarm system).

Several methods have been proposed by various researchers to monitor the fatigue level of the driver. Cameras and wearable devices are used to extract the visual cues and contextual information. Different techniques are used to detect the fatigue level and their advantages and disadvantages have been described. The research papers [27, 32] monitored the fatigue level by using only eye image. The research papers [28, 30, 31] are implemented by using face and eye blinking and [29] eye image, eye blinking and head rotation.

Drowsiness detection

Drowsiness detection is an activity for ensuring car safety, which helps to prevent accidents caused by the driver getting drowsy. The following are some of the current systems to learn about the driver styles and detect when a driver is getting drowsy.

Bergasa et al. [22] have explained a driver drowsiness detection system. Driver heart beat rate and breathing rate are measured using bioharness 3 sensor produced by Zephyr Technology. The fact that the breathing rate goes down and the heart rate goes up when a person falls asleep is used for detection. Processing sensor data is taken care by the filter algorithm and fast Fourier transform.

Ke et al. [33] have addressed the drowsiness detection by using heart beat rate in Android-based hand held devices. Big data and open computer vision (OCV) technologies are used to track the face from images. Smartphone front camera is used to take the driver’s image and back end camera are used to provide traffic sign detection. Eye, mouth and head parameters are used. In addition to drowsiness, fatigue level are also estimated by using eye blink rate; yawning detection is done by tracking mouth and head movement.

Nai et al. [24] have conducted experiments for drowsiness detection system by using eye blink. Hamming window and FFT techniques are used to detect the drowsiness. ECG signals are acquired from a sensor and eye blink rate is obtained from the camera and transferred (via Bluetooth) to android device which is used for drowsiness detection. This system will work on all Android devices.

Rohit et al. [34] have focused on drowsiness detection by using electroencephalogram (EEG) and wearable sensors. Machine learning and big data technologies are used. SVMs are used to classify the drowsy states. The EEG signals are also used to characterize the eye blink duration and frequency of subjects.

Qian et al. [35] have compared the performance of drowsiness detection with other traditional feature extraction methods. Bayesian Non-negative CP Decomposition (BNCPD) is used to extract common multiway features from the group-level EEG signals. Automatic CP rank determination and plausible multiway physiological information of individual states are used.

Qian et al. [36] have designed a drowsiness system, to detect individual drowsiness based on the physiological features from EEG signals. Bayesian-Copula Discriminant Classifier (BCDC) is used. The results are not generalized to other experimental environments to detect the drowsiness.

Li et al. [37] have collected data for the drowsiness detection system using wireless and wearable technology. Brain Machine Interface (BMI) system is dedicated to signal sensing and processing for Driver Drowsiness Detection (DDD). Bluetooth low-energy module is embedded and used to communicate with a fully wearable consumer device, a smart watch, which coordinated the work of drowsiness monitoring and brain stimulation with its embedded closed-loop algorithm. Smart watch is required to detect drowsiness for this research.

Different techniques are proposed by eminent researchers around the world to monitor the drowsiness level of the driver. Different techniques and technologies (Big Data and Machine Learning) are used to extract the contextual information using the camera/wearable. Different approaches are used to detect the drowsiness level and the advantages and disadvantages are explained in detail. The research papers [22, 24, 33] monitored the drowsiness level by using heart rate and breath rate and the research papers [34, 37] have used wearable and the research papers [35, 36] have used EEG signals.

Driver distraction

Driver distraction is any activity that diverts attention from driving, including talking or texting on the phone, eating and drinking, talking to people in their vehicle, fiddling with the stereo, entertainment or navigation system—anything that takes the driver’s attention away from the task of safe driving. The numbers illustrating the dangers of cell phone use while driving are downright startling. At any given time throughout the day, approximately 660,000 drivers [38] are attempting to use their phones while driving. The following are some of the driver distraction detection systems available in this space:

Sigari et al. [39] have built a distraction detection system by using eye, mouth and head images. Big data and SVM with Polynomial kernel are used. The images from driver’s face are captured and the symptoms of fatigue and distraction are extracted from eyes, mouth and head. The success rate of the system is only 91.57% as it used only lesser number of images to build the model.

Abulkhair et al. [40] have adapted a driver face monitoring system used to identify fatigue and distraction, which captured the driver image and extracted symptoms of fatigue and distraction from eyes, mouth and head. The extracted symptoms are usually the percentage of eyelid closure over time, eyelid distance, eye blink rate, blink speed, gaze direction, eye saccadic movement, yawning, head nodding and head orientation. Haar-like and AdaBoost algorithms are used to process image.

Yuen et al. [41] have represented a driver distraction detection system by using face detection and head pose. “Kinect” device is used to work on single images at a time. Feed Forward Neural Network (FFNN) is used. 3-D head rotation angles and the upper body joint positions are recorded. Signals collected from the “Kinect” consisted of a color and depth image of the driver inside the vehicle cabin. The drawback of the system is that it used relatively costly “Kinect” devices.

Li et al. [37] have launched a system for early detection of driver drowsiness by using wireless and wearable BMI. Eye, mouth and head are used for extracting the parameters. A Bluetooth low-energy module is embedded in the BMI system and used to communicate with a fully wearable consumer device. The focus is only on the participant’s behavior changes of pre and post-simulation.

Hu et al. [20] have studied the drowsiness detection system using data-mining approach. Eye, mouth and head parameters are used. An abnormality index is proposed based on the analysis of normalized driving behaviors and applied to quantitatively evaluate the abnormality. Peripheral vehicle behaviors during gaze transitions are analyzed; classifiers are used to discriminate between the cognitive distraction and neutral states. Classifiers have been trained to manage various situations and provide high classification accuracy.

Multiple techniques have been proposed by respected researchers to detect the distraction level of the driver. Multiple algorithms and mining methods (Machine Learning and Big Data) have been used to extract the real-time video and video to frame conversion that is collected by using camera. Various approaches are used to detect the distraction level and the advantages and disadvantages are well described. The research paper [41] monitored the distracted level by using face and head while the research papers [20, 37, 39, 40] used eye, mouth and head.

Proposed solution for personalized premium calculation

The prime challenge faced by the insurance industry is the personalized premium calculation using the driving behavior detection. Assigning the category for the driver based on his mode of driving needs to be taken care. Real time alerts to fine tune the driving towards safety needs to be addressed. The driving behavior is classified into two techniques [42]:

  1. 1.

    Real-time detection: Identification of driving behavior based on the continuous stream of data collected in frequent intervals while the vehicle is being driven. The collected data will help insurers to precisely identify the driver behavior and calculate personalized premiums.

  2. 2.

    Non real-time detection: The complete data is received after each trip is completed and the identification of driving behavior is made on the basis of post trips analysis.

The listed four factors are influencing the driving behavior and the greater understanding of the factors will aid in the development of more appropriate and effective solutions

  1. 1.

    Behavioral factors: Driving pattern, fatigue, drowsiness driver and driver distraction monitoring.

  2. 2.

    Environment factor: Traffic and road condition.

  3. 3.

    Physiological/Psychological factor: Blood pressure and blood alcohol level.

  4. 4.

    Emotional factor: Stress, fear, heart rate and anxiety.

In the related work section, all the classifications of MHYD have been reviewed and found that in most of the research works, only behavioral factors are considered to identify the abnormal driving. As far as behavioral factors are concerned, ample number of solutions has been implemented to identify fatigue, drowsiness, and distracted drivers. However, niggardly solutions are available to identify the abnormal drivers by using driving pattern monitoring. The driving pattern monitoring is done based on four vital parameters acceleration, braking, cornering and speed where a less number of researches only have considered all the four parameters and most others have considered only two or three parameters. For example, Hu et al. [20] developed an abnormal driving behavior system and the parameters used are only speed, hard brake, and hard acceleration. The authors considered only behavioral factor, but explained the importance of using two or more factors.

In the proposed solution, the plan is to identify the abnormal driving behavior by considering both the behavioral and emotional factors. All the four parameters from behavioral factor and only parameter heart rate from emotional factor will be considered. The four parameters from behavioral factor are expected to detect abnormal driving behaviors with the identification of specific types of driving behavior such as good driver, regular bad driver, unhealthy driver and road rage/aggressive driver. The classification and detection will be done with the help of machine learning algorithms and big data technologies. The data obtained from the real driving environment is used to train and create the models. Once the road rage/aggressive drivers are identified, then alerts will be given to the drivers to improve their safety.

The proposed solution implementation has been categorized into three phases as shown in Fig. 7.

Fig. 7
figure 7

Proposed solution for personalized premium calculation

  1. 1.

    Data collection phase: GPS Data will be collected from different users and stored in the big data environment after applying few pre-processing algorithms.

  2. 2.

    Rage and aggression phase: Detection and identification of rage/aggressive drivers will be done in road rage and aggression phase and classified into online part and offline part. In offline part of modeling driving behaviors will build a model using machine-learning techniques based on the collected data and this model will be synchronized to the driver’s smartphone. In the online part of monitoring driving behaviors, after getting real-time readings from the driver’s smartphone, the data is compared with the generated model, which is already synchronized in the smartphone using any prediction (ML) algorithm. Finally, if any of the abnormal driving behaviors are identified, a live voice alert would be sent to receivers or the users.

  3. 3.

    Enterprise phase: Driver details such as number of normal trips, road rage and aggressive driving trips, time and date of travel and number of alerts are collected. Insurance companies will use the collected data to calculate the personalized premium.

Aggressive driving and Road rage detection are less explored in related work. The proposed methodology could be used as an add-on feature for the example applications such as Drivewise, snapshot, smartride and smiles to calculate the personalized premium and to ensure safety of the driver.

Uniqueness of the proposed solution

  • Real-time data collection: Extracted huge data from the GPS signals such as speed, latitude, longitude, altitude and course values. Hard acceleration, hard braking, hard cornering and over speeding events are derived using mathematical calculations with the help of extracted values using big data processing in cloud instances.

  • Driving factors: Considered the stack of all behavioral and emotional factors to identify the abnormal driving behavior during on board in the vehicle to observe the raw real time data and derived real time data.

  • Driving behavior classification: Classified the driver characteristics into four unique types in the proposed solution which greatly helps in precise personalized premium calculations whereas most of the research works in the related works have classified the driving behavior into two or three types such as normal/abnormal, normal/abnormal/moderate.

  • Live voice alert: Planned to use the text to voice engine, which will help to convert the personalized text to voice. The user gets the live voice alert or warning based on the driving behavior while driving that helps improving the safety of the driver.

  • Novel solution: Introduced a new methodology to detect road rage and aggressive drivers by using all the types of huge data collection and processing in the cloud instances which makes a complete eco system for driving behavior detection which is less explored in the related work.

Conclusion and future research directions

The paper is initiated with the introduction and motivation towards taking insurance for the customers on the basis of driving behavior detection where different data collection methods are presented including dongle, black box, embedded and smart phones with sensors and GPS signals. A bunch of parameters adapted by the industries and research papers, which are used for driving behavior detection, are also presented. The need for big data technology for implementing UBI in the insurance industry and the ten V’s of big data are described. The observation of existing research shows that, much focus has been provided on volume, variety, velocity and veracity with less available focus towards inference of values from the big data.

UBI and the three classifications such as PAYD, PHYD and MHYD are presented. MHYD is classified into four categories such as driving pattern monitoring, distracted driving monitoring, fatigue monitoring and drowsiness monitoring. Driving pattern monitoring is elaborated with the data collection sources such as GPS, mobile phone sensors and OBD. The survey provided the most comprehensive analysis of all the categories of MHYD and found that there is a lot of scope for research in MHYD model. Related work in terms of the objective of the research, data collection technique, identified issues/gaps, implementation algorithms, driving parameters, advantages, disadvantages and the inferences are discussed. The paper provides a foundation for further research directions on comprehensive driver behavior pattern monitoring and their applications.

To the best of our knowledge, the survey has motivated us to propose a solution to find the road rage/aggressive driver and offer a personalized premium calculation method. The objective and the need for the proposed methodology are discussed to alert the rage/aggressive driver during the trip on identifying any such aberration. Three phases of the proposed solution data collection phase, rage and aggression phase and enterprise phase for personalized premium calculation including machine learning and big data are highlighted.

The proposed research could be further extended towards many dimensions like consideration of additional factors such as environmental, physiological/psychological factors. The possibility of inculcating the habit of proactive alerts could be enhanced with the help of prediction using machine learning algorithms and big data to improve the safety and life of the drivers which could be a real service to the mankind.

Availability of data and materials

Not applicable.



Bayesian-Copula Discriminant Classifier


Brain Machine Interface


Bayesian Non-Negative CP Decomposition


Basic Safety Messages


driver drowsiness detection


Deep Sparse Auto Encoder


dedicated short-range communication






Feed Forward Neural Network


Fast Fourier Transforms


Fuzzy Risk Mode and Effect Analysis


Global Positioning System


Hidden Markov Model




On-Board Diagnosis


open computer vision






Private Usage-Based Scoring


Probabilistic Usage Data Audition


state graph


support vector machine


usage-based insurance


Vehicular Ad hoc Networks


Vehicle Test Data


  1. Statista. Number of passenger cars and commercial vehicles in use worldwide. 2019.

  2. World Health Organization. Global status report on road safety. 2013.

  3. World Health Organization. Global status report on road safety 2015. 2015.

  4. Verizon Connect. Advanced GPS fleet tracking software. 2019.

  5. Firican G. The 10 Vs of Big Data. 2017.

  6. IBM. Telematics for insurance: capitalizing on the rise in connected vehicles to enhance customer engagement and develop new value-added services. 2014.

  7. IBM. Analytics: real-world use of big data in insurance. 2019.

  8. Madan N. 3 ways big data can influence decision-making for organizations. 2018.

  9. Tselentis DI, Yannis G, Vlahogianni EI. Innovative insurance schemes: pay as/how you drive. Transp Res Procedia. 2016;14:362–71.

    Article  Google Scholar 

  10. Allstate. Stay smart on the road. 2019.

  11. TDInsurance. TD MyAdvantage. 2019.

  12. Progressive. Snapshot means BIG discounts for good drivers. 2019.

  13. Staefarm. Ajusto rewards safe driving. 2019.

  14. Smartride. Nationwide’s SmartRide program rewards safe driving. 2019.

  15. Yu J, Chen Z, Zhu Y, Chen Y, Kong L, Li M. Fine-grained abnormal driving behaviors detection and identification with smartphones. IEEE Trans Mob Comput. 2017;16(8):2198–212.

    Article  Google Scholar 

  16. Shi B, Xu L, Hu J, Tang Y, Jiang H, Meng W, Liu H. Evaluating driving styles by normalizing driving behavior based on personalized driver modeling. IEEE Trans Syst Man Cybern Syst. 2015;45(12):1502–8.

    Article  Google Scholar 

  17. Liu HL, Taniguchi T, Tanaka Y, Takenaka K, Bando T. Visualization of driving behavior based on hidden feature extraction by using deep learning. IEEE Trans Intell Transp Syst. 2017;18(9):2477–89.

    Article  Google Scholar 

  18. Daptardar S, Lakshminarayanan V, Reddy S, Nair S, Sahoo S, Sinha P. Hidden Markov Model-based driving event detection and driver profiling from mobile inertial sensor data. In: IEEE Sensors; 2015.

  19. Zhao J, Lim J, Chung HL, Leung S, Taffel M, Lo L. Introducing pay how you drive insurance; 2016.

  20. Hu J, Xu L, He X, Meng W. Abnormal driving detection based on normalized driving behavior. IEEE Trans Veh Technol. 2017;66(8):6645–52.

    Article  Google Scholar 

  21. Zhou L, Du S, Zhu H, Chen C, Ota K, Dong M. Location privacy in usage-based automotive insurance: attacks and countermeasures. IEEE Trans Intell Transp Syst. 2019;14(1):196–211.

    Google Scholar 

  22. Bergasa LM, Almeria D, Almazan J. Driving fatigue detection based driving fatigue detection based: an app for alerting inattentive drivers and scoring driving behaviors. In: Intelligent vehicles symposium proceedings, Dearborn, MI, USA; 2014. p. 240–5.

  23. Zhang M, Chen C, Wo T, Xie T, Bhuiyan MZA, Lin X. safedrive: online driving anomaly detection from large-scale vehicle data. IEEE Trans Ind Inf. 2017;13(4):2087–96.

    Article  Google Scholar 

  24. Nai W, Chen Y, Yu Y, Zhang F, Dong D, Zheng W. Fuzzy risk mode and effect analysis based on raw driving data for pay-how-you-drive vehicle insurance. In: IEEE conference on big data analysis (ICBDA), Hangzhou, China; 2016.

  25. Li Z, Sun G, Zhang F. Smartphone-based fatigue detection system using progressive locating method. IET Intell Transp Syst. 2016;10(3):148–56.

    Article  Google Scholar 

  26. Driver fatigue and road accidents—a literature review and position paper. Royal Society for the Prevention of Accidents; 2001.

  27. Warwick B, Symons N, Chen X. Detecting driver drowsiness using wireless wearables. In: Mobile ad hoc and sensor systems, Dallas, 2015, TX, USA.

  28. Podder S, Roy S. Driver’s drowsiness detection using eye status to improve the road safety. Int J Innov Res Comput Commun Eng. 2013;1(7):1490.

    Google Scholar 

  29. Chaitali Z, Kulkarni KYC. Driver aided system using open source computer vision. Int J Innov Res Comput Commun Eng. 2015;3(5):3779.

    Article  Google Scholar 

  30. Qiao Y, Zeng K, Xu L, Yin X. A smartphone-based driver fatigue detection using fusion of multiple real-time facial features. In: Consumer communications & networking conference, Las Vegas, NV, USA; 2016.

  31. Mandal B, Li L, Wang GS. Towards detection of bus driver fatigue based on robust visual analysis of eye state. IEEE Trans Intell Transp Syst. 2017;18(3):545–57.

    Article  Google Scholar 

  32. Al-Sultan S, Al-Bayatti AH, Zedan H. Context-aware driver behavior detection system in intelligent transportation systems. IEEE Trans Veh Technol. 2013;62(9):4264–75.

    Article  Google Scholar 

  33. Ke K, Zulman MR, Wu H, Huang YF. Drowsiness detection system using heartbeat rate in android-based handheld devices. In: First international conference on multimedia and image processing; 2016.

  34. Rohit F, Kulathumani V, Kavi R. Real-time drowsiness detection using wearable, lightweight brain-sensing headbands. IEEE Trans Intell Transp Syst. 2017;11(5):255–63.

    Article  Google Scholar 

  35. Qian D, Wang B, Qing X. Bayesian nonnegative CP decomposition-based feature extraction algorithm for drowsiness detection. IEEE Trans Neural Syst Rehabil Eng. 2017;25(8):1297–308.

    Google Scholar 

  36. Qian D, Wang B, Qing X. Drowsiness detection by Bayesian-copula discriminant classifier based on EEG signals during daytime short nap. IEEE Trans Biomed Eng. 2017;64(4):743–54.

    Article  Google Scholar 

  37. Li G, Chung W. Combined EEG-Gyroscope-tDCS brain machine interface system for early management of driver drowsiness. IEEE Trans Hum Mach Syst. 2018;48(1):50–62.

    Article  Google Scholar 

  38. WHO. 2013.

  39. Sigari M, Pourshahabi M, Soryani M, Fathy M. A review on driver face monitoring systems for fatigue and distraction detection. Int J Adv Sci Technol. 2014;64:73–100.

    Article  Google Scholar 

  40. Abulkhair MF, Salman HA, Ibrahim LF. Using mobile platform to detect and alerts driver fatigue. Int J Comput Appl. 2015;123:27–35.

    Google Scholar 

  41. Yuen K, Trivedi MM. An occluded stacked hourglass approach to facial landmark localization and occlusion estimation. IEEE Trans Intell Veh. 2017;2(4):321–31.

    Article  Google Scholar 

  42. Chhabra R, Verma S, Krishna CR. A survey on driver behavior detection techniques for intelligent transportation systems. In: International conference on cloud computing, data science & engineering; 2017.

Download references


Not applicable.


Not applicable.

Author information

Authors and Affiliations



SA proposed the idea of the survey, performed the literature review, analysis for the work, and wrote the manuscript. BR provided technical guidance and assisted with editing the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Subramanian Arumugam.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arumugam, S., Bhargavi, R. A survey on driving behavior analysis in usage based insurance using big data. J Big Data 6, 86 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: