Tomáš Repčík - 2. 10. 2022

Fall Detection App: Cleaning Data and Extracting Features

Extracting features to differentiate falls among other activities

To create a fall detection app, we need data. As in a previous post, we measured 3260 activities with smartphones. Volunteers were falling on the ground, walking, sitting, and lying on a bed. However, we need to overcome some pitfalls.

Fall detection app: How I made people fall

People falling, sitting, walking, and laying down to create the dataset

Thresholds

Firstly, high acceleration peaks are in the measurements of the falls, which can be higher than 3g (g = 9.81 m/s2). The measurements with small amplitude overall are redundant for the research. The threshold has a value of 1.6g. The records, which do not contain a higher value, are ignored.

Secondly, Android phones do not have a constant sampling rate. The measurements, which contain a gap between two samples higher than 0.2 seconds, are ignored too.

Thirdly, if the peak occurred in the first or last second of the measurement, then the record is ignored too.

Result

After removing the measurements from the dataset by the rules. The dataset ended with 3129 measurements in total.

667 walks
840 falls
551 lays
1071 sits

Finding the region of interest (ROI)

Processing whole signals would be counterproductive. The aim is to create as efficient an algorithm as possible for smartphones, so we will limit ourselves to using only 1 second of the signal at max. The highest peak can be considered as the centre. The beginning of ROI starts with the first dip in acceleration. It contains values below 0.9g with a maximum length of 0.3s on the left side of the main peak. The end of the ROI is defined by the value of minimally 1.5g. The length of the right side is 0.7s at max.

Image illustration of the ROI

Features:

I went through multiple resources and merged my ideas with other proposals during my search on this topic. I created the list of 19 statistical features calculated for the ROI. A lot of them are borrowed parameters from EEG and ECG research. Research resources are at the bottom of the post.

average
standard deviation
variance / activity
mobility, complexity — Hjorth coefficients
average TKEO — envelope energy of the signal
average output — average power of the signal
entropy approximation — the amount of regularity in signal
waveform length — an average of the first derivation
Crest factor — max value/root mean square
change in angle — change in angle from the X and Z axis of the acceleration
change in angle with cosine — similar to change in angle, but it is taken from 1s before fall and 1s after fall (does not use the ROI)
angle deviation — generalization of change in angle, which takes into account all axes
free fall index — the value of acceleration during dip before impact
min-max difference — the difference between the min value and max value
3g ratio — a ratio of samples above 3g : samples below 3g
kurtosis — 4. standardized moment
skewness — 3. standardized moment
1g cross rate — number of crosses through 1g threshold

Programmed parameters can be found here: Github link

Finding and filtering outliers

We can calculate features for all of the ROIs. However, the ranges of the values change with the calculated feature, so normalization is required. To save as much data as possible and avoid outliers, I developed a filter based on IQR rules.

IQR filtering

The IQR rules define the samples below 25 percentile and above 75 percentile — fences. For these samples, we search for the closest neighbours and calculate the average from these neighbours. The theory is: if the measurement is similar to others, even though it is considered an outlier, it is a valid value because there are other neighbours almost equal. In other words, a group of values cannot be ignored, because they are part of the data. If the value happens to be the outlier, it is averaged by the neighbours and the effect of the outlier is partially cancelled.

Without this move and with straightforward normalization, the models would loose even 4% and more of their accuracy.

After cleaning, the data are normalized as followed:

Normalized data categorized by activity type

Data interpretation

The target is to identify the features, which differentiate the fall from other activities. The charts show, that falling and walking are similar in parameters like simple average, crest factor, kurtosis and 1g crosses. The main difference comes between those two in a change in angle parameter. During the fall, the phone changes its position relative to the ground, so the angle of acceleration changes. During the walking, it is not happening. The same thing goes for the change in angle with cosine.

Most of the features have similar behaviour. They have higher values for the fall than for all other activities. It will be crucial to remove correlating features and redundant features like e.g. complexity or TKEO.

In the next part, we will try to predict the most important features and train SVM/Random forest models.

Resources:

authors: SANTOYO-RAMÓN, José Antonio, Eduardo CASILARI and José Manue CANO-GARCÍA
work: Analysis of a smartphone-based architecture with multiple mobility sensors for fall detection with supervised learning.
DOI: doi:10.3390/s18041155
authors: FIGUEIREDO, Isabel N., Carlos LEAL, Luís PINTO, Jason BOLITO a André LEMOS.
work: Exploring smartphone sensors for fall detection
DOI: doi:10.1186/s13678–016–0004–1
authors: ABBATE, Stefano, Marco AVVENUTI, Guglielmo COLA, Paolo CORSINI, Janet LIGHT a Alessio VECCHIO.
work: Recognition of false alarms in fall detection systems
DOI: doi:10.1109/CCNC.2011.5766464
author: HJORT, Bo
work: EEG Analysis Based On Time Domain Properties
DOI: doi:10.1016/0013–4694(70)90143–4
author: MARAGOS, P., J.F. KAISER a T.F. QUATIERI
work: On amplitude and frequency demodulation using energy operators
DOI: doi:10.1109/78.212729

Subscribe for more