Making sense of Endomondo's calorie estimation

The other day I got curious how Endomondo estimates energy expenditure during the exercise.

On their website, they mention some paywalled paper, but no specifics, so I figured it'd be interesting to reverse engineer that myself. I've extracted endomondo data from their JSON export and plotted a regression.

I'm using Wahoo TickrX chest strap monitor, so the HR data coming from it is pretty decent.

First, I'm importing the dataframe from the python package I'm using to interact with my data. (I've mentioned it here).

It's private at the moment, but it's pretty specific to my use cases and the only interfacing in this post it through Pandas dataframe, so hopefully that wouldn't confuse you.

In [1]:
from my.workouts.dataframes import endomondo
df = endomondo()
WARNING:workout-provider:Unhandled: Cycling
WARNING:workout-provider:Unhandled: Cycling

Some sample data:

In [2]:
display(df[df['dt'].apply(lambda dt: str( == '2019-04-21'])
dt sport heartbeats kcal error
384 2019-04-21 10:11:28+00:00 Rope jumping 3873.500000 310.0 None
385 2019-04-21 10:47:58+00:00 Running 2860.666667 248.0 None

Heartbeats were calculated as average HR multiplied by the duration of exercise.

Error column is a neat way of propagating exceptions from data provider.

E.g. I only have HR data for the last couple of years or so, so data provider doesn't have any of HR points from endomondo. While I could filter out these points in the data provider, they might still be useful for other plots and analysis pipelines (e.g. if I was actually only interested in kcals and didn't hare about heartbeats).

Instead, I'm just being defensive and propagating exceptions up through the dataframe, leaving it up to the user to handle them.

In [3]:
display(df[df['dt'].apply(lambda dt: str(['2015-03-06', '2018-05-28'])])
dt sport heartbeats kcal error
17 2015-03-06 05:50:38+00:00 Running NaN 397.0 no hr
18 2015-03-06 13:20:06+00:00 Table tennis NaN 127.0 no hr
297 2018-05-28 10:11:45+00:00 NaN NaN NaN Unhandled activity: Cycling
298 2018-05-28 12:58:33+00:00 NaN NaN NaN Unhandled activity: Cycling

So, first we filter out the entries with errors:

In [4]:
df = df[df['error'].isnull() & (df['sport'] != 'Other')]

As well as some random entries which would end up as outliers:

In [5]:
df = df.groupby(['sport']).filter(lambda grp: len(grp) >= 10) 
In [6]:
%matplotlib inline
import matplotlib
from matplotlib import pyplot as plt
import seaborn as sns

matplotlib.rc('font', size=17, weight='regular')

sports = {
    g: len(f) for g, f in df.groupby(['sport'])

g = sns.lmplot(
ax =
ax.set_title('Dependency of energy spent during exercise on number of heartbeats')

ax.set_xlim((0, None))
ax.set_xlabel('Total heartbeats, measured by chest strap HR monitor')

ax.set_ylim((0, None))
ax.set_ylabel('Kcal,\nEndomondo\nestimate', rotation=0, y=1.0)

    labels=[f'{s} ({cnt} points)' for s, cnt in sports.items()],
    loc='upper left',

Unsurprising, it looks like a simple linear model (considering my weight and age barely changed).

What I find interesting is that for instance for me, running feels way more intense than any of other cardio I'm doing, definitely way harder than skiing!

However the regression coeffecient (basically, calories burnt per 'unit of heart activity') is more or less same. I guess that could potentially be explained by the fact that running involves more muscle activity, which Endomondo can't capture and doesn't try to infer from the exercise type (which you enter manually when you start logging the exercise).

With regards to the actual regression coefficient: seaborn wouldn't let you display them on the regplot (the author has a very strong opition about that, apparently), so we use sklearn to do that for us:

In [7]:
from sklearn import linear_model

reg = linear_model.LinearRegression()[['heartbeats']], df['kcal'])

[coef] = reg.coef_
free = reg.intercept_

print(f"Regression coefficient: {coef:.3f}")
print(f"Free term: {free:.3f}")
Regression coefficient: 0.093
Free term: -9.171

Basically, that means I get about 0.1 Kcal for each heartbeat during exercise. Free term ideally should be equal to 0 (i.e. just as a sanity sort of thing: not having heartbeat shouldn't result in calorie loss), and -10 is close enough.

Also, fun calculation:

In [8]:
normal_bpm = 60
minutes_in_day = 24 * 60

print(f'{coef * normal_bpm * minutes_in_day:.3f}')

8K Kcals per day? A bit too much for an average person. I wouldn't draw any conclusions from that one though :)