Behavioural Biometrics — Essential Tremor Detection

01

The problem

Essential Tremor affects ~70 million people worldwide. Current diagnosis is subjective, infrequent, and misses day-to-day fluctuations. We asked: can a smartphone do better?

Current clinical diagnosis

Relies on the Fahn-Tolosa-Marín (FTM) rating scale applied during infrequent clinic visits. It is subjective, catches only severe cases, and cannot track daily symptom changes. Patients are often diagnosed late, when treatment options are more limited.

Our approach — repurpose the phone

Modern smartphones have high-rate accelerometers and gyroscopes. We repurposed the HMOG behavioural biometrics dataset — collected for user authentication — to investigate whether normal phone use encodes detectable tremor signatures.

No extra hardware Passive monitoring TREMOR12 compatible

70M

People affected by ET

Most common movement disorder

4–12Hz

ET tremor frequency band

Captured by phone sensors

100Hz

HMOG sensor sample rate

Sufficient for tremor detection

100

Subjects in HMOG dataset

24 sessions, 3 task types each

02

The pipeline

Nine stages from raw CSV files to a 50-feature subject profile. Built incrementally — validated on one subject before scaling to all 99.

1

Data loading

Load Accelerometer.csv, Gyroscope.csv, TouchEvent.csv, and StrokeEvent.csv for each session. Files have no headers — columns are manually labelled from the HMOG data schema.

Analogy: Like plugging in a flight recorder and reading the raw log files. Each file is one sensor type — motion, rotation, touches, swipes.

→ Accelerometer: 76,855 rows | Gyroscope: 76,851 rows | Touch: 6,966 rows | Stroke: 94 rows

2

Magnitude computation — √(X² + Y² + Z²)

Each sensor gives 3 values: X, Y, Z. We combine them into one magnitude signal using the Euclidean norm. This removes orientation dependence while preserving total movement energy — tremor shakes in all directions simultaneously.

Analogy: Instead of measuring rain going sideways, forward, and down separately — just put a bucket out and measure total water collected. Magnitude captures the total shake regardless of phone orientation.

→ Accel magnitude mean ≈ 9.28 m/s² (mostly gravity, person sitting still) | Gyro mean ≈ 0.3 rad/s

3

Sliding windows — 4 seconds, 50% overlap

We chop the 76,000-row signal into 4-second windows (400 samples at 100 Hz) with 50% overlap (200-sample step). This lets us detect tremor frequency patterns while not missing events at window edges. Tremor happens at 4–12 times per second — 4 seconds gives enough cycles to detect the rhythm.

Analogy: Like watching a long movie looking for a specific scene — instead of all at once, you watch 4-minute clips each starting 2 minutes after the last. You never miss the scene because clips overlap.

→ 383 windows × 400 samples per window per sensor

4

Feature extraction — 8 features per window

For each 4-second window, we compute frequency-domain features using Welch's Power Spectral Density method, and time-domain statistics. The PSD ratio is the most critical — it compares energy in the 4–12 Hz tremor band to the 0.5–4 Hz baseline band. A high ratio = more tremor-like energy.

Analogy: Think of each window like a smoothie. Instead of analysing every drop, we measure 8 properties — colour, thickness, sweetness. Those 8 numbers describe the whole smoothie without examining every ingredient.

PSD ratio (4–12Hz / 0.5–4Hz) Tremor band power Dominant frequency Std deviation RMS Range Kurtosis Skewness

→ 383 rows × 8 features (accel) + 383 rows × 8 features (gyro) = 16 combined inertial features

5

Normalisation — StandardScaler (z-score)

Features have very different scales — PSD ratio reaches 83, tremor power is ~0.003. Z-score normalisation rescales every feature to mean=0, std=1 so no single feature dominates K-Means due to its raw magnitude.

Analogy: Comparing athletes using height in cm (175) and weight in tonnes (0.07). Height dominates everything. Convert both to a 0–10 scale — now the comparison is fair.

→ All 16 features scaled to mean≈0, std≈1

6

K-Means clustering — 3 clusters

K-Means groups all 383 windows into 3 clusters based on feature similarity. It picks 3 random centres, assigns every window to the nearest centre, recomputes centres, and repeats until stable. Quality measured by Silhouette Score (−1 to +1).

Analogy: Sorting 383 smoothies into 3 groups by taste, without being told the groups in advance. You'd naturally end up with mild / medium / strong. K-Means does this with numbers.

→ Silhouette: 0.226 (accel only) → 0.448 (combined) | Cluster 2 (calm): 147 windows | Cluster 0 (jittery): 24 windows

7

Isolation Forest — anomaly detection

Isolation Forest builds random decision trees. Normal windows need many splits to isolate. Anomalous windows are isolated in very few splits — they are genuinely unusual. The top 5% most isolatable windows are flagged as anomalous (label = −1).

Analogy: Finding the odd one out in a crowd. A person in a bright red suit in a room full of black is spotted immediately — just a few glances needed. Isolation Forest measures how quickly each window gets "singled out."

→ 363 normal windows, 20 anomalous (5.2%) | Gyro std 9× higher in anomalous windows | Cluster 2: 100% anomalous

8

Touch and stroke feature extraction

16 additional features capture how the person physically interacts with the screen. Touch: tap location spread, contact size variability. Stroke: speed variability, length consistency. These capture the "handwriting signature" of steady vs unsteady interaction.

Analogy: Asking someone to sign their name — healthy vs ET patient. The accelerometer tells you the pen was shaking. Touch features tell you the signature itself became messier, more spread out, and inconsistent. Both are needed for the full picture.

→ 195 taps, 94 strokes | tap Y spread std = 169px | stroke speed std = 1,029 px/s

9

Subject-level feature vector — 50 features, 24 sessions averaged

All window-level features are summarised (mean + std) across all windows per session, then averaged across all 24 sessions. This produces a single 50-feature row per subject — a stable "motor fingerprint" representing their movement patterns across months of recorded data.

Analogy: Instead of judging a student by one test, take the average across the whole semester across all subjects. 24-session averaging removes one-off anomalies and reveals the person's true patterns.

32

inertial summaries

2

anomaly stats

7

touch features

9

stroke features

03

Results

All 99 subjects processed across all 24 sessions. Key improvements documented at every stage.

0.226

Silhouette score — accelerometer only

Add gyroscope

+98%

improvement

0.448

Silhouette score — accel + gyroscope combined

Silhouette score — accel vs combined

Adding gyroscope revealed an entirely new anomaly class: extreme rotation events. ET produces both translational shaking AND rotational twisting.

Accel only

Combined

Anomaly breakdown by cluster

Cluster 2 (extreme rotation): 4/4 windows anomalous (100%). Cluster 1 (normal): only 3/334 flagged. The model correctly separates movement types.

Normal

Anomalous

Key features — normal vs anomalous windows

Gyro std is 9× higher in anomalous windows — the strongest single discriminating signal. PSD ratio, accel std, and kurtosis all elevated.

Normal

Anomalous

Session averaging effect — variance reduction

Averaging across 24 sessions reduces PSD ratio variance by 55% and stroke speed variance by 58%. Single-session analysis is unreliable for tremor screening.

Session 1 only

All 24 sessions

5.2%

Mean anomaly rate

Healthy baseline

0.5pp

Anomaly rate spread

5.1% – 5.6% all subjects

3.26

Mean PSD ratio

Max: 16.18 (subject 856401)

4.93Hz

Mean dominant frequency

Tremor band: 4–12 Hz

−55%

PSD variance drop

Session 1 → all 24 avg

23.9

Sessions per subject

Mean (max: 24)

Accel PSD ratio distribution — 99 subjects

Heavy right skew. Most subjects 1–5, but a tail reaches 16. These outliers show consistently elevated tremor-band energy across all 24 sessions — not one-off spikes.

Anomaly rate per subject — all 99 sorted

Remarkably flat 5.1–5.6% range — confirming a tight, reliable healthy baseline. Any ET patient showing rates consistently above ~7–8% would immediately stand out.

Subject-level clustering (k=3) produced a silhouette score of 0.21 — moderate but meaningful separation in a healthy population.

Cluster 0 — moderate movers

47

subjects

PSD ratio: 2.62
Gyro std: 0.148
Anomaly rate: 5.22%

Largest group

Cluster 1 — calm movers

37

subjects

PSD ratio: 2.27
Gyro std: 0.142
Anomaly rate: 5.22%

Lowest variability

Cluster 2 — high frequency

15

subjects

PSD ratio: 7.71
Gyro std: 0.147
Anomaly rate: 5.24%

Most clinically relevant

Why Cluster 2 matters: These 15 subjects show PSD ratios of 7.71 on average — nearly 3× higher than the other clusters — consistently across all 24 sessions. In a future study with ET patient data, this cluster is the primary comparison target.

04

Composite risk scoring

Seven tremor-relevant features, MinMax-normalised and averaged equally, produce a composite risk score per subject. Multi-feature scoring outperforms any single threshold.

Risk score distribution — all 99 subjects

Near-normal distribution centred at 0.32. Subject 501973 is a clear outlier at 0.504, followed by a gradual tail. Healthy population shows meaningful spread.

Top 15 subjects by composite risk score

Subject 501973 leads by combining high gyro std + anomaly rate + stroke variability — not the highest PSD ratio, but consistently elevated across all 7 features.

Top 10 highest risk subjects — detail view

501973

High gyro std (0.226) + anomaly rate + stroke variability — top across 6 of 7 features

0.504

733162

High gyro std (0.228) + moderate PSD — only 18 sessions but consistently elevated

0.487

540641

High PSD ratio (7.34) + elevated anomaly rate — strong inertial signal

0.482

219303

Highest anomaly rate across all subjects (5.64%) — most consistently unusual windows

0.478

342329

High gyro std (0.218) + anomaly rate combo — consistent rotation variability

0.477

862649

High gyro std (0.200) — consistent across all 24 sessions

0.466

693572

High gyro std (0.209) + consistent anomaly rate across sessions

0.461

525584

Highest PSD ratio in top 10 (10.44) — strongest tremor-band energy signature

0.450

986737

Highest anomaly rate in session-1 analysis (6.0%) — averaged to 5.4% across all sessions

0.438

720193

Elevated PSD (4.37) + gyro std (0.175) combo — moderate but consistent across all features

0.428

05

Key findings

📊

The healthy baseline is remarkably consistent

Across all 99 subjects and ~2,376 sessions, anomaly rates ranged only 5.1% to 5.6% — a 0.5 percentage point spread. This tight range establishes a reliable healthy baseline. Any future ET patient consistently above 7–8% would immediately stand out as warranting further clinical assessment.

5.1% – 5.6% across all 99 subjects

🔄

Gyroscope nearly doubles model quality

Adding gyroscope features increased the silhouette score from 0.226 to 0.448 — a 98% improvement. It also revealed a new anomaly class: extreme rotation events where gyro std was 30× above average. ET produces both translational shaking AND rotational twisting — both must be captured for robust detection.

Silhouette: 0.226 → 0.448 (+98%)

🎯

Multi-feature risk outperforms any single metric

The #1 risk subject (501973) had only a moderate PSD ratio of 2.31 — not in the top 5 by that feature alone. It ranked first by combining consistently elevated gyro std, anomaly rate, and stroke variability simultaneously. This validates the composite scoring approach — no single threshold is sufficient.

Top risk score: 0.504 (subject 501973)

📅

Session averaging dramatically improves reliability

PSD ratio variance dropped 55% when averaging across 24 sessions vs session 1 only. Stroke speed std variance dropped 58%. Single-session analysis is insufficient for reliable tremor screening — multiple sessions are essential for stable, representative feature estimates per subject.

PSD variance: −55% | Stroke variance: −58%

〰

Dominant frequencies already in the tremor band

Mean dominant frequency across all 99 subjects was 4.93 Hz — right at the boundary of the clinical tremor range of 4–12 Hz. Even healthy users produce low-amplitude movement in this range during normal phone use. ET patients would be expected to show much higher power concentrated at specific frequencies within this band.

Mean dominant freq: 4.93 Hz (tremor band: 4–12 Hz)

06

Limitations

No clinical tremor labels

HMOG participants are healthy adults. All anomalies reflect natural motor variability, not confirmed ET. The pipeline establishes what healthy looks like — future work needs ET patient data to validate what unhealthy looks like.

Walking condition confound

Gross body motion during walking sessions may mask or mimic tremor-relevant frequency components in the 4–12 Hz range, requiring careful stratification in future studies.

Pressure data unusable

This device reported fixed touch pressure (always 1.0), removing one potentially valuable tremor feature. Touch pressure variability is a known ET indicator that should be captured on devices with variable pressure reporting.

Single device type

All HMOG recordings were made on the same smartphone model. Accelerometer and gyroscope characteristics vary between devices — generalisation requires cross-device validation.

07

Future work

STEP 01 — IMMEDIATE

TREMOR12 real-world validation

The TREMOR12 iPhone app records accelerometer and gyroscope at 100 Hz and exports CSV files directly compatible with our pipeline. A healthy volunteer records 30–60 seconds of natural phone use. Our model should assign it a low risk score — confirming correct identification of normal motor behaviour. No clinic, no hardware needed.

STEP 02 — CLINICAL

ET patient data collection

Collect TREMOR12 recordings from ET patients (with FTM severity scores) and age-matched healthy controls. This enables supervised classification and direct correlation of model output with ground-truth clinical labels — transforming this exploratory study into a validated screening tool.

STEP 03 — ML

Supervised classification

With labelled ET patient data, replace or augment the unsupervised pipeline with SVM, Random Forest, or gradient boosting — trained on the same 50-feature vectors established in this work. The feature engineering is already complete and validated.

STEP 04 — ANALYSIS

Multi-task stratified analysis

Typing sessions involve rapid, precise finger movements expected to reveal stronger tremor signatures than reading. Stratify analysis by task type (reading, typing, map navigation) to identify which phone interactions are most sensitive to ET-like patterns.

08

Try it yourself

Test your own
movement patterns

Record 60 seconds of natural phone use with Sensor Logger, upload the zip file, and our pipeline will analyse your accelerometer and gyroscope data against the healthy baseline established from 99 subjects across 2,376 sessions.

100HzRecording rate

60sRecording time

50Features analysed

99Subject baseline

1

Download Sensor Logger

Search for Sensor Logger on the App Store (iOS) or Google Play (Android). It is free. The developer is Kelvin Choi.

Free app

2

Configure sensors

Open the app and enable only Accelerometer and Gyroscope. Turn everything else off — GPS, magnetometer, audio, camera. Set sample rate to 100 Hz.

100 Hz only

3

Record for 60 seconds

Sit down, hold your phone naturally in one hand like you are reading. Keep as still as possible. Hit record and wait 60 seconds. Then stop.

Sit still

4

Export and upload

In Sensor Logger, tap the recording and choose Export as ZIP. Then upload that zip file below. Results appear in seconds.

Upload zip

Upload your Sensor Logger recording

📦

Drop your zip file here

Or click to browse for the Sensor Logger export zip

Only your Accelerometer.csv and Gyroscope.csv are used. No data is stored.

Loading sensor data...

Computing sliding windows...

Extracting tremor features...

Running anomaly detection...

Computing risk score...

Behavioural Biometrics forEarly Detection of Essential Tremor

The problem

Current clinical diagnosis

Our approach — repurpose the phone

The pipeline

Data loading

Magnitude computation — √(X² + Y² + Z²)

Sliding windows — 4 seconds, 50% overlap

Feature extraction — 8 features per window

Normalisation — StandardScaler (z-score)

K-Means clustering — 3 clusters

Isolation Forest — anomaly detection

Touch and stroke feature extraction

Subject-level feature vector — 50 features, 24 sessions averaged

Results

Silhouette score — accel vs combined

Anomaly breakdown by cluster

Key features — normal vs anomalous windows

Session averaging effect — variance reduction

Accel PSD ratio distribution — 99 subjects

Anomaly rate per subject — all 99 sorted

Composite risk scoring

Risk score distribution — all 99 subjects

Top 15 subjects by composite risk score

Key findings

The healthy baseline is remarkably consistent

Gyroscope nearly doubles model quality

Multi-feature risk outperforms any single metric

Session averaging dramatically improves reliability

Dominant frequencies already in the tremor band

Limitations

Future work

TREMOR12 real-world validation

ET patient data collection

Supervised classification

Multi-task stratified analysis

Try it yourself

Test your ownmovement patterns

Download Sensor Logger

Configure sensors

Record for 60 seconds

Export and upload

Drop your zip file here

Behavioural Biometrics for
Early Detection of Essential Tremor

Test your own
movement patterns