Visualizing Apple Watch heart rate data

Ever since late 2017, I have (mostly) been wearing my Apple Watch to track activity, heart rate, and sleep. Here I plot some of my heart rate data up, as an initial step toward a deeper analysis.

The Jupyter notebook in which this visualization is made can be found here, although commits after this post may change the most updated version of the code.

Code

First, export health data from your iPhone using the Health app.

Then, import modules:

```import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import re
import csv
from tqdm import tqdm_notebook

import sys
sys.path.append('../')
from functions import *```

Read in the resulting data file with the help of some code from https://github.com/minimaxir/get-heart-rate-csv:

```# note that this file is not pushed to GitHub
file_name = '20190508_export.xml'

# read in the file, and export to a .csv
pattern = '^.*IdentifierHeartRate".*startDate="(.{19}).*value="([0-9]*).*\$'

with open('/Users/yuempark/GitHub/data-projects/sensitive-data/20190508-heart-rate.csv', 'w') as f:
writer = csv.writer(f)
writer.writerow(['dt', 'bpm'])

with open(file_name, 'r') as f2:
for line in f2:
search = re.search(pattern, line)
if search is not None:
writer.writerow([search.group(1), search.group(2)])

parse_dates=['dt'])

# create a day of year and second of day column
for i in tqdm_notebook(range(len(HR.index))):
HR.loc[i,'dayofyear'] = HR['dt'][i].dayofyear
HR.loc[i,'secondofday'] = HR['dt'][i].hour*3600 + HR['dt'][i].minute*60 + HR['dt'][i].second

# sort by time
HR.sort_values('dt',inplace=True)
HR.reset_index(drop=True, inplace=True)```

Pull out just the 2018 data:

```slice_2018 = HR[(HR['dt']>=datetime(2018, 1, 1,hour=0, minute=0, second=0)) &\
(HR['dt']< datetime(2019, 1, 1,hour=0, minute=0, second=0))].copy()
slice_2018.reset_index(drop=True, inplace=True)```

Initialize an array where each column is a day, and each row represents a snapshot of the day:

```n_snapshots = 100
HR_array_2018 = np.zeros((n_snapshots,365))```

Fill that array by linearly interpolating to get the equally spaced snapshots:

```# create the second of day that we will interpolate onto
seconds_in_day = 60*60*24
secondofday_interp = np.linspace(10*60, seconds_in_day-10*60, n_snapshots)

# iterate through each day
for i in range(365):

# get data for this day
slice_day = slice_2018[slice_2018['dayofyear']==i+1]

if len(slice_day)!=0:

# pull out the second of day and the bpm for this day
secondofday_i = slice_day['secondofday'].values
bpm_i = slice_day['bpm'].values

# linearly interpolate, and set values outside of the x-range to be 0
bpm_i_interp = np.interp(secondofday_interp, secondofday_i, bpm_i,
left=0, right=0)

# store it
HR_array_2018[:,i] = bpm_i_interp

# replace zeros with NaNs
HR_array_2018[HR_array_2018==0] = np.nan```

Now plot it:

```# set up the figure
fig, ax = plt.subplots(figsize=(15,7))

# plot
c = ax.pcolor(HR_array_2018, cmap='magma')

# create the ticks for the y-axis
ax.set_yticks(np.linspace(0,100,13))
ax.set_yticklabels(np.linspace(0,24,13).astype(int))

# prettify
ax.set_xlabel('day of 2018')
ax.set_ylabel('hour of day')

# colorbar
cbar = fig.colorbar(c, ax=ax)

# annotations
ax.text(87, 110, 'TA for class that visits\nDeath Valley, CA',
horizontalalignment='center', verticalalignment='center')

ax.text(165, 110, 'Paris for\nmodeling\nresearch',
horizontalalignment='center', verticalalignment='center')

ax.text(195, 110, 'Sydney\nto visit\nfamily',
horizontalalignment='center', verticalalignment='center')

ax.text(307, 110, 'China\nfor field\nsymposium',
horizontalalignment='center', verticalalignment='center')

ax.text(340, 110, 'Ethiopia\nfor field\nwork',
horizontalalignment='center', verticalalignment='center')

ax.text(-60, 35, 'tend to work out\nin the morning,\nsleep in on weekends',
horizontalalignment='center', verticalalignment='center')

plt.show(fig)```

The resulting figure:

Insights

Most of the missing data can be explained by the fact that I switch to a more durable G-Shock watch (that doesn’t need charging) when I head out into the field to do research. Unfortunately, those field expeditions represent the times during the year in which I am most active (due to all the hiking we do during those trips), so that ‘high activity’ data is simply not captured.

The unusual data in the 2018 summer (around days 160-200) is due to travel to different time zones.

The data that remains does give some insight into my standard routine during 2018, without doing any further analysis:

• I did my best to work out in the morning of weekdays, right after getting up. These workouts represent the highest heart rates recorded. I did pretty poorly in the spring, better in the early-mid fall, then my schedule went all weird because of the field expeditions
• After working out, I would head to work, which involved a commute to Berkeley then a walk up hill to get to my department. We can see quite clearly a slightly elevated heart rate in the weekday mornings as I made this trip.
• The walk back down the hill to head home isn’t readily visible, at least in this particular visualization using a linear color map.
• I slept in on the weekends – probably a little later than I would have liked in retrospect…
• My total sleep didn’t change that much throughout the year, but my sleep schedule does seem to suddenly shift slightly earlier right around spring break, when the Death Valley trip took place. I don’t recall why this happened…
• I overall seemed to have a very slightly higher heart rate during the day during the spring than during the fall. This may be due to being more active during the day (I was taking/teaching more classes in the spring than in the fall), or perhaps the increased number of workouts in the fall lead to a lower resting heart rate? This question could use some more looking into…

Conclusion

Just a simple visualization can lead to some pretty informative insights! The next step, of course, is to analyze it quantitatively, and perhaps bring some more data into the mix.