3.1. Data Collection
We developed a smartphone app for iPhone that collects sensor data during a car trip. Although we collected data using an iPhone, we do not leverage any specific characteristic or feature; we expect to have similar performance on other devices which carry the same family of sensors, such as Android smartphones.
The Bluetooth connection of the smartphone with the car IVIS triggers the app, and since that moment, it samples data points at a 10 Hz rate till the Bluetooth disconnection. Each sample includes rotation (from the gyroscope) and acceleration. Rotation and acceleration are on the three axes X, Y and Z.
3.2. Smartphone Motion Sensors Data Pre-Processing
In this paragraph, we observe in depth the required preliminary data processing steps when collecting data in a real-world scenario and employing smartphone motion sensors. In this case, many external factors influence the data samples, and, therefore, we need to perform some actions to clean the data and prepare it for the following processing steps.
Every time a person moves the smartphone, it changes position and orientation with respect to the car. We need to discard all the data values related to the manual movements of the smartphone and split the trip into intervals in which the phone is stable in an orientation relative to the car so that we can detect it and reorient all the samples along the car axes.
We first use gyroscope samples to detect if the smartphone is idle in the car or is in the user’s hands. We could confirm the observation in [
1] that the smartphone’s rotation rate is relatively stable when positioned in the car, while it peaks when the smartphone is manually moved (for example, when the driver gets out of the car). Doing some tests, we established a threshold of 85 deg/s and associated more significant rotation rates with manual movements of the smartphone.
Therefore, we use gyroscope peaks to split the trip into intervals, in each of which the smartphone has a constant orientation.
3.3. Least Squares
The acceleration samples are studied to figure out which is the most probable car horizontal plane (
Figure 2). During a car trip, accelerations on the vertical axis (i.e.,
Z-axis) are close to 0 since they are due mainly to road disturbances and beginnings of ascents or descents. So, in most cases, the accelerations are concentrated on the axes
X and
Y. This fact lets us compute the least-squares plane [
13] for the acceleration points in time.
However, before deriving least-squares plane candidates, we need to remove outliers (
Figure 3). To do that, we use the Z-Score algorithm [
14] (indicated as
). First, we compute the standard deviation for every axis. Then, we empirically set the max value that we accept for Z-Score
to 5 (value chosen by optimizing FAR and FRR when looking for points of the plane in our dataset), and calculate the Z-Score
as in Equations (
1)–(
3) for each point
and for axis
x,
y and
z.
Finally, we discard all points exceeding the max value on any axis. This removal is only temporary to compute the right least-squares plane. Once the plane coefficients are defined, we use them to rotate all points in the space towards the horizontal plane (i.e., the plane ).
The idea behind the rotation is to rotate each acceleration data point (i.e., a 3D vector composed of the acceleration components on the three axes: X, Y and Z) around the axis common to the two planes.
Let
N be the vector normal to the plane
,
M be the vector normal to the newly found car horizontal plane,
the dot product calculation,
the normalization and
the cross product calculation. First, we calculate the required angle
as the ratio between the dot product of the plane vector
N and the car horizontal plane
M, and the product of their normalization (Equation (
4))
Then, the axis around which the rotation will be made is calculated as the ratio between the cross product of
N and
M and the normalization of their cross product (Equation (
5))
Finally the rotation matrix is calculated as:
The rotation matrix calculation is further described in [
15].
Then, the
Q matrix can be used to rotate every single point
as
If the two planes are similarly oriented, the rotation does not happen since it would not improve the precision. At this point, the processed data has a similar shape to the one that a smartphone would collect in a parallel position relative to the car’s horizontal plane.
3.4. Data Collection in the Simulation
Collecting a sufficient amount of data in actual vehicles is a non-trivial task for different reasons. Since the considered system deals with supervised learning and requires labeled data, the collection process has to be conducted exclusively by operators that are well aware of the system requirements and pay close attention to the collection details. Wrongly collected or incorrectly classified data are deleterious for the quality of the final model. Moreover, operators should annotate details about the smartphone’s orientation in the car during the collection and input them into the dataset. Measuring orientation angles in a natural environment can be challenging for human operators. Finally, to have sufficient samples, the operators would have to make many authentic vehicle trips, which is time- and fuel-consuming. We estimate that a satisfactory dataset should contain a few thousand samples. Random forests perform relatively well on datasets of this size.
The simulation provided by vehiclephysics.com (
Figure 4) allows us to emulate the behavior of a car that moves almost the same as a real one in the real world. The simulation generates all the movements and data we need from the car. We can extrapolate these data and prepare them the same way we do in the actual samplings.
3.4.1. Structure of the Simulation
The simulation comes in a customizable Unity 3D project, adaptable to specific needs. As a foundation, there are some basic vehicles and maps provided. One of the default maps includes on-street parking with lines drawn on the floor that specify the exact positions of the parking lots. This map also has city streets and highways to emulate real driving situations and different parking motions.
The two default car models provided are a sports car and a pickup truck. We decided to use the latter because its acceleration and brakes are more balanced and similar to most cars. Furthermore, we decided to limit the vehicle’s maximum acceleration and braking power to avoid unnatural levels. In fact, by default, the vehicle would accelerate significantly before automatically shifting the gear. In that case, the engine revolutions indicator would touch 4000–5000 RPM, and the acceleration would be too disruptive. These levels are far from usual city driving.
The vehicle can be controlled with a keyboard or a game controller. The main difference between the two options is that the car control is discrete in the first case, while in the second one, it is continuous thanks to moving cursors. Thus we can apply a variable acceleration to the vehicle that better resembles a real interaction with the car.
Since this is a Unity 3D project, we can also access external frameworks. We experimented with connecting the simulation to an Oculus VR. The Unity project sets a camera in the car by default. In order to connect the Oculus visor, we had to replace the default camera with the Oculus one and attach it to the vehicle. After doing that, the connection was seamless, and the driving experience was even more realistic and straightforward to simulate.
3.4.2. Obtaining the Required Simulation Data
The Vehicle Physics Pro simulation offers a comprehensive API and access to vehicle components. In particular, there is a wide variety of metrics related to physics values in a whole section of the API dedicated to telemetry data. The telemetry section is responsible for collecting the physics data generated by the core simulator and making them programmatically readable.
The Vehicle Physics Pro simulator is written in C#, and its code is accessible from third-party scripts, such as ours. We used a “vehicle” object that contains most of the information related to the car’s motion. This way, we extracted all the features that we needed:
acceleration on X, Y and Z axes (represented as the local acceleration);
rotation rate on X, Y and Z axes (represented as the angular velocity);
speed.
The script launches when the simulation starts, and it contains a thread that can be in two states: “not recording” and “recording”. As soon as the script executes, it spawns the new thread in the “not recording” state. When the user presses a specific key on the keyboard, the thread goes into the “recording” state and starts sampling the data at a fixed rate (10 times per second). After the pressure of another button, the thread goes back to the “not recording” state and saves the samples in a JSON file. So, we followed this procedure: (i) starting the simulation; (ii) pressing the record button; (iii) driving the car around the map; and (iv) stopping the recording. We did this many times, trying to cover as much terrain as possible. We tried to make each drive slightly different from the previous ones to populate the dataset with diverse information. We chose different streets, turns, acceleration and braking patterns to do this. We also simulated parking motions in different styles and shapes, such as parallel parking along the street and angle parking in the dedicated parking location.
3.4.3. Simulated Data Processing
Unlike the actual data situation, simulated data do not have human interference, so the gyroscope fluctuations problem does not appear. The data collected from the simulator can be seen as coming from a “virtual smartphone” positioned parallel to the car’s horizontal plane and heading towards its front.
3.4.4. Features Extraction for Orientation Model
The required angle in the simulated data is 0°. So, to create some valid labels, the data can be rotated on the horizontal plane of a random angle. This kind of 3D rotation is based on Euler angles [
16]. We labeled the sample with the random angle value. We repeated this process many times for every sample, picking random angles, eventually obtaining a larger dataset. In our case, we repeated it 100 times. Starting from a 150-sample dataset, we rapidly ramped up to a more significant 15,000-sample one.
We rotated the data points around the Z-axis (i.e., the axis orthogonal to the car’s horizontal plane) using a simple rotation matrix (Equation (
11), where
is the randomly picked angle) since it only interests one axis.
At this point, we can rotate every single point using the
R matrix, using the procedure described in [
17]:
where
is the dot product calculation.
At this point, we must compose the feature vector. We used the following features for each sample: acceleration mean for X and Y axes; acceleration standard deviation for X and Y axes; acceleration mean error for X and Y axes; rotation rate mean for X and Y axes; rotation rate standard deviation for X and Y axes; rotation rate mean error for X and Y axes. The Z axis is not significant in this case.
A tabular dataset is generated, composed of rows of features and a final target value. The file format is CSV (comma-separated values), so the first row indicates the names of the columns, separated by commas. All the other rows contain data, namely the relative value for every column, including the target value in the target column. Additionally, the data values are separated by commas, according to the column names.
3.5. ML Orientation Model
We created the model using Apple’s Create ML tree ensemble regressor, a predictive model composed of a weighted combination of multiple regression trees, also called random forest [
18] (
Figure 5).
A decision tree regressor is a predictive model that uses a set of binary rules to calculate a target value. Each tree is a simple model with branches, nodes and leaves. Every decision is made discerning on a feature of the initial set and is structured as a node that branches out in two separate nodes at the lower layer of the tree. Finally, leaves are the target values. A random forest uses a collection of different trees with random constraints limiting their learning freedom to reduce variance for the overall model. Some examples of constraints applied to the creation of models are maximum depth, the maximum number of features, and the minimum number of samples to do a split.
Create ML offers a wide range of tools to create, train and test new ML classifiers and regressors for different purposes, either via a GUI or programmatically in Swift language. We need a regressor that outputs a floating-point number based on the input sample for our purposes. In our case, we need to obtain an angle value as the output, expressed in radians, from 0 up to .
This kind of model receives a vector of features as input (
Figure 6). In particular, the input features of the model are the mean accelerations, mean rotation rates, acceleration’s standard deviations, and rotation rate’s standard deviations. These features have been chosen because they contains information about car maneuvers and movements, and we believe that these are good descriptors for understanding the angle between the smartphone and the car headings. They are taken along the X and Y axes, excluding the Z axis. We exclude the Z axis because the data is oriented in the horizontal plane at this point in the process. Therefore, only road bumps and slope changes are caught along that axis. That kind of information is of no help for estimating the yaw angle. These features are computed over the entire collection segment. We do so to compress data coming from the entire interval into a single group of features that is invariant to the length of the segment and can be processed through a single pass of the model.
The set of features used as input are previously vectorized and made compatible with the model. The initial set of features also contained mean error values. However, after performing a feature-importance analysis (
Table 1), we discovered that they were of no help for the model and excluded them. Interestingly, this analysis discovered that the feature representing the mean of the rotation rate along the Y-axis takes up almost 59% of the total importance.
The training procedure does not imply long waiting times since this model does not have the recursive complexity and layers depth of a more sophisticated neural network.