My Journey Into Analyzing Apple Health Data

I’ve owned an Apple Watch for years – since the Series 2. I’ve also been running consistently for years – three of them, to be precise. I also like data. I’ve been collecting data on my workouts via my phone and watch for years, but getting data off of the iPhone’s small screen has always been problematic.

“But Peter!” you say “Apple lets you export data from the Health app!” Yes, it does. Have you ever looked at it? It looks something like this. Correction – it looks EXACTLY like this.

Well-structured XML

It’s XML data, and that doesn’t easily lend itself to a graph. Also, my data is over 1.3GB at present. That’s a lot of data for one guy. So I looked around for how to analyze my iOS Health data. The first site I found that looked promising was Analyze the Crap Out of Your Apple Health/HealthKit Data (sep.com) and GitHub – jonfuller/health-parse: Parses an Apple Health data export… for reasons. The developer offers an email address – email your health data to [email protected] and it will send back parsed stuff. Sure. I’ll email you my 1.3GB of Apple Health Data. Let me know when you get it. Okay, no, I did not try that because it’s never going to work. So I downloaded his code from Github, but I couldn’t get it to compile. Seems like I’m not the only one, as others reported the same issue.

Next I tried the Heartwatch app for iOS. So close! It generates some nice reports but only goes back one year. I want to track data over multiple years. I emailed the developer, and he said he’d consider it.

Then I tried the YouTube video (1) How to download, graph and assess physical activity and exercise data from Apple Watch – YouTube. OMG hilarious. Fail. 

Something in Python perhaps?  Analyze Your iOS Health Data With Python | by Guido Casiraghi | Better Programming | Medium Prerequisites: You know the basics of Pandas. I don’t even know what pandas is, other than a big bear-like thing that lives in China. 

I tried to import the XML files into Excel. Hahahaha. I’m running the 32-bit version. It cannot open a 1.3GB XML file. 

I poked around and found this article by Taras Kaduk: Analyze and visualize your iPhone’s Health app data in R. I was told R is easy to learn and use, so I figured I’d give it a try.

I installed R for Windows. The UI seems a bit dated and barebones. The Comprehensive R Archive Network (case.edu) How do I install libraries, anyway?  HodentekHelp: How do you install the XML library for R programming? Okay, manual process, must select stuff from a list by point and click. Yuck. 

How do I change directories in R?  how to set path in R on Windows – Google Search

Hm. This looks kinda neat and more polished.   Download the RStudio IDE – RStudioHave to install those libraries, but at least I can type their names in a comma-separated list. Much quicker. 

How do I change directories in R again?  getwd, setwd | R Function of the Day

What’s the path to my files in my OneDrive folder without spaces in it?  Use PowerShell to display Short File and Folder Names | Scripting Blog (microsoft.com)

How do you comment in R? Comments in R – GeeksforGeeks

How do you print more lines than it’s showing me?  how to increase the limit for max.print in R – Stack Overflow

What does that %>% do?  Simplify Your Code with %>% · UC Business Analytics R Programming Guide (uc-r.github.io)

Oof. Guess I should take a lesson. Learn R | Codecademy

Yup, that did it! The following R code imports my Health XML data and spits out a CSV. And yeah, it took a lot of floundering to get these few lines of code:

library(XML)
library(dplyr)
xml <- xmlParse('export.xml')
df_workout <-  XML:::xmlAttrsToDataFrame(xml["//Workout"])
write_csv(df_workout,'health_export.csv')

Now I have a CSV file! Great! I’ll make a chart in Excel. OMFG Excel charting is beyond convoluted. Why is it so F***ING COMPLICATED?!?! 

Google Sheets to the rescue. Finally. I have what I have sought for months.

I realized that with the right libraries, I likely could have accomplished the same thing with Perl or Python, but learning R has been fun and I may have applications for this professionally as well as personally. Also, I should be able to generate the graphs directly from R, but haven’t learned that yet. Finally, I will likely need to dive deeper into the data to incorporate steps per minute and heartrate into the above chart. I’m really interested in overlaying my steps per minute and average heartrate to see how this affects energy used and pace. So while I’ve taken the first step (no pun intended), I’m not done yet!