This document serves as a readme for loading two different versions of the SMU-Textron Cognitive Load (SMU-TexCL) dataset. Specifics of this data collection are available in the methods section of the following paper:
When using this dataset, please cite the papers above. The dataset contains biometric information for a pilot as they fly through varying levels of turbulence in a simulator [1]. Numerous biometrics and features derived from the biometric data are available. Transfer learned features from the Wilson et al. model (BM3TX) are also included. Please see an explanation of these features in the paper referenced above [2].
Subjective scoring from the subjects for each trial is available. The cognitive load is captured using the NASA-TLX. The objective of this dataset is to use the biometric sensor streams to predict the reported task workload. This task workload has been processed in several different ways. One can use any of these measures as the ground truth label of workload. These measures are described more fully below.
The biometric data was collected on the throttle hand using an Empatica E4 sensor.
The dataset can be loaded in two forms:
The table data for this dataset has already gone through several preprocessing steps. The NASA TLX labels are also available in several different formats.
Meta Data: For each row the following information is included:
Cognitive Load Labels: When loading the table data the following columns are available:
Feature Data: The following precomputed features are available: -bm3tx_0 to bm3tx_31: These 32 features represent the bottleneck features form the BM3TX model published by Wilson et al. [2].
import pandas as pd
# the table data can be loaded either from CSV or from the parquet files
file_path = 'pre_proccessed_table_data.parquet'
df = pd.read_parquet(file_path)
# using different features
feature_columns = [
'bm3tx_0', 'bm3tx_1', 'bm3tx_2', 'bm3tx_3', 'bm3tx_4', 'bm3tx_5',
'bm3tx_6', 'bm3tx_7', 'bm3tx_8', 'bm3tx_9', 'bm3tx_10', 'bm3tx_11',
'bm3tx_12', 'bm3tx_13', 'bm3tx_14', 'bm3tx_15', 'bm3tx_16', 'bm3tx_17',
'bm3tx_18', 'bm3tx_19', 'bm3tx_20', 'bm3tx_21', 'bm3tx_22', 'bm3tx_23',
'bm3tx_24', 'bm3tx_25', 'bm3tx_26', 'bm3tx_27', 'bm3tx_28', 'bm3tx_29',
'bm3tx_30', 'bm3tx_31', 'raw_eda_max', 'raw_eda_min', 'raw_eda_mean',
'raw_eda_std', 'scr_count', 'scr_max', 'scr_min', 'scr_mean', 'scr_std',
'temp_max', 'temp_min', 'temp_mean', 'temp_std', 'accel_max',
'accel_min', 'accel_mean', 'accel_std', 'ibi_max', 'ibi_min',
'ibi_mean', 'sdrr', 'pnn50'
]
X = df[feature_columns]
# here is an example for getting labels from the quantiles
label_column = 'avg_tlx_quantile'
y1 = df[label_column].round()
# alternatively, you could predict another attribute
label_column = 'avg_tlx_zscore'
y2 = df[label_column].round()
# these features and labels can now be used for training and testing a classifier, regressor, etc.
This style of loading data is meant for users that wish to have access to the raw biometric signals and some precomputed values within each window of data.
The data can be loaded for each pilot and is stored in JSON format. Each window contains 60 seconds of biometric data and windows are calculated with 90% overlap to the next window. As such, there should be significant redundancy in the windows of data.
In the example below, we are loading the data for one pilot into the variable pilot_data
. This variable is a list, where each element in the list corresponds to a trial. We load the trial data in the $0^{th}$ index into a new variable one_trial
. Each trial is a dictionary with the following keys:
Windowed Features: We then save the various windows of data into the variable windows_of_data
. For this example, there are 13 overlapping windows of data. This is a list of the data for each window. One element in the list is a dictionary of the biometric data with the following keys:
import json
pilot_filename = 'ID001.json'
with open(pilot_filename) as fid:
pilot_data = json.load(fid)
print(f'Found pilot data for pilot id: {pilot_filename} with {len(pilot_data)} numbers of trials.')
# get data for one trial
one_trial = pilot_data[0]
avg_tlx_label = one_trial['label']
windows_of_data = one_trial['windowed_features']
duration = one_trial['meta_data']['duration']
print(f'Found pilot data of {duration} seconds, with TLX value of {avg_tlx_label}.')
print(f'Found {len(windows_of_data)} windows of data.')
for one_window in windows_of_data:
print('Seconds when window ends:', one_window['timestamp'], end=' ')
print('duration of window', len(one_window['ppg_input'][0])/64, 'seconds')
# this window of data might be processed by an algorithm or as the input to a sequential network
# such as an LSTM or transformer. Various other forms of processing might need to be done.
Found pilot data for pilot id: ID001.json with 10 numbers of trials. Found pilot data of 80.45082000000548 seconds, with TLX value of 10.0. Found 13 windows of data. Seconds when window ends: 215.99999999999994 duration of window 60.0 seconds Seconds when window ends: 221.99999999999994 duration of window 60.0 seconds Seconds when window ends: 227.99999999999994 duration of window 60.0 seconds Seconds when window ends: 233.99999999999994 duration of window 60.0 seconds Seconds when window ends: 239.99999999999994 duration of window 60.0 seconds Seconds when window ends: 245.99999999999994 duration of window 60.0 seconds Seconds when window ends: 251.99999999999994 duration of window 60.0 seconds Seconds when window ends: 257.99999999999994 duration of window 60.0 seconds Seconds when window ends: 263.99999999999994 duration of window 60.0 seconds Seconds when window ends: 269.99999999999994 duration of window 60.0 seconds Seconds when window ends: 275.99999999999994 duration of window 60.0 seconds Seconds when window ends: 281.99999999999994 duration of window 60.0 seconds Seconds when window ends: 287.99999999999994 duration of window 60.0 seconds