What data on myself I collect and why? [see within blog graph]
This is the list of personal data sources I use or planning to use with rough guides on how to get your hands on that data if you want it as well.
It's still incomplete and I'm going to update it regularly.
My goal is automating data collection to the maximum extent possible and making it work in the background, so one can set up pipelines once and hopefully never think about it again.
This is kind of a follow-up on my previous post on the sad state of personal data, and part of my personal way of getting around this sad state.
If you're terrified by the long list, you can jump straight into "Data consumers" section to find out how I use it.
Table of Contents
- 1. Why do you collect X? How do you use your data?
- 2. What do I collect/want to collect?
- Arbtt (desktop time tracker)
- Bitbucket (repositories)
- Bluemaestro (environment sensor)
- Browser history (Firefox/Chrome)
- Emfit QS (sleep tracker)
- Facebook Messenger
- Github (repositories)
- Github (events)
- Google takeout
- HSBC bank
- Kobo reader
- Monzo bank
- PDF annotations
- Plaintext notes
- Remember the Milk
- Shell history
- 3. Data consumers
- 4. --
¶1 Why do you collect X? How do you use your data?
All things considered, I think it's a fair question! Why bother with all this infrastructure and hoard the data if you never use it?
In the next section, I will elaborate on each specific data source, but to start with I'll list the rationales that all of them share:
It may feel unnecessary, but shit happens. What if your device dies, account gets suspended for some reason or the company goes bust?
Most data in digital form got timestamps, so automatically, without manual effort, constitutes data for your timeline.
I want to remember more, be able to review my past and bring back and reflect on memories. Practicing lifelogging helps with that.
It feels very wrong that things can be forgotten and lost forever. It's understandable from the neuroscience point of view, i.e. the brain has limited capacity and it would be too distracting to remember everything all the time. That said, I want to have a choice whether to forget or remember events, and I'd like to be able to potentially access forgotten ones.
¶2 What do I collect/want to collect?
As I mentioned, most of the collected data serve as a means of backup/lifelogging/quantified self, so I won't mention them again in the 'Why' sections.
All my data collection pipelines are automatic unless mentioned otherwise.
Some scripts are still private so if you want to know more, let me know so I can prioritize sharing them.
¶Arbtt (desktop time tracker)
- haven't used it yet, but it could be a rich source of lifelogging context
¶Bluemaestro (environment sensor)
How: sensor syncs with phone app via Bluetooth, /data/data/com.bluemaestro.tempo_utility/databases/ is regularly copied to grab the data.
- temperature during sleep data for the dashboard
lifelogging: capturing weather conditions information
E.g. I can potentially see temperature/humidity readings along with my photos from hiking or skiing.
How: via thriva, data imported manually into an org-mode table (not doing too frequently so wasn't worth automated scraping)
Also tracked glucose and ketones (with freestyle libre) for a few days out of curiosity, also didn't bother automating it.
- contributes to the dashboard, could be a good way of establishing your baselines
¶Browser history (Firefox/Chrome)
How: manual archive export.
I barely use Facebook, so don't even bother doing it regularly.
How: via API
- capable of exporting starred repositories as well, so if the authors delete them I will still have them
- only manual step: enable scheduled exports (you can schedule 6 per year at a time), and choose to keep it on Google Drive in export settings
- mount your Google Drive (e.g. via google-drive-ocamlfuse)
- keep a script that checks mounted Google Drive for fresh takeout and moves it somewhere safe
- Google collects lots of data, which you could put to some good use. However, old data is getting wiped, so it's important to export Takeout regularly.
- better browsing history
- (potentially) search history for promnesia
- search in youtube watch history
- location data for lifelogging and the dashboard (activity)
How: haven't got to it yet. It's going to require:
- extracting upvotes/saved items via web scraping since Hackernews doesn't offer an API for that. Hopefully, there is an existing library for that.
I'm also using Materialistic app that has its own 'saved' posts and doesn't synchronize with Hackernews.
Exporting them is going to require copying the database directly from the app private storage.
Why: same reasons as Reddit.
How: via API. Jawbone is dead now, so if you haven't exported it already, likely your data is lost forever.
- sleep data for the dashboard
How: regular copies of /data/data/io.nomie.pro/files/_pouch_events and /data/data/io.nomie.pro/files/_pouch_trackers
- could be a great tool for detailed lifelogging if you're into it
I tracked almost all nutrition data for stuff I ingested over the course of a year.
How: I found most existing apps/projects clumsy and unsatisfactory, so I developed my own system. Not even a proper app, something simpler, basically a domain-specific language in Python to track it.
Tracking process was simply editing a python file and adding entries like:
# file: food_2017.py july_09 = F( [ # lunch spinach * bag, tuna_spring_water * can, # can size for this tuna is 120g beans_broad_wt * can * 0.5, # half can. can size for broad beans is 200g onion_red_tsc * gr(115) , # grams, explicit cheese_salad_tsc * 100, # grams, implicit as it makes sense for cheese lime, # 1 fruit, implicit ], [ # dinner... ], tea_black * 10, # cups, implicit wine_red * ml * 150, # ml, explicit ) july_10 = ... # more logs
Comments added for clarity of course, so it'd be more compact normally.
Then some code was used for processing, calculating, visualizing, etc.
Having a real programming language instead of an app let me make it very flexible and expressive, e.g.:
I could define composite dishes as Python objects, and then easily reuse them.
E.g. if I made four servings of soup on 10.08.2018, ate one immediately and froze other three I would define something like soup_20180810 = [...], and then I can simply reuse soup_20180810 when I eat it again. (date was easy to find out as I label food when put it in the freezer anyway)
- I could make many things implicit, making it pretty expressive without spending time on unnecessary typing
- I rarely had to in nutrient composition manually, I just pasted the product link to supermarket website and had an automatic script to parse nutrient information
- For micronutrients (that usually aren't listed on labels) I used the USDA sqlite database
The hard thing was actually not entering, but rather not having nutrition information if you're eating out. That year I was mostly cooking my own food, so tracking was fairly easy.
Also I was more interested in lower bounds, (e.g. "do I consume at least recommended amount of micronutrients"), so not having logged food now and then was fine for me.
I mostly wanted to learn about food composition and how it relates to my diet, and I did
That logging motivated me to learn about different foods and try them out while keeping dishes balanced. I cooked so many different things, made my diet way more varied and became less picky.
I stopped because cooking did take some time and I actually realized that as long as I actually vary food and try to eat everything now and then, I hit all recommended amounts of micronutrients, so I stopped. It's kind of an obvious thing that everyone recommends, but one thing is hearing it as a common wisdom and completely different is coming to the same conclusion from your data.
- nutritional information contributes to dashboard
How: no extra effort required if you sync/organize your photos and videos now and then.
- obvious source of lifelogging, in addition comes with GPS data
¶Remember the Milk
How: ical export from the API.
I stopped using RTM in favor of org-mode, but I can still easily find my old task and notes, which allowed for a smooth transition.
How: many shells support keeping timestamps along your commands in history.
- potentially can be useful for detailed lifelogging
Apart from automatic collection of HR data, etc., I collect some extra stats like:
- whether I woke up on my own or after alarm
- whether I still feel sleepy shortly after waking up
- whether I had dreams (and I log dreams if I did)
- I log every time I feel sleepy throughout the day
How: org-mode, via org-capture into table. Alternatively, you could use a spreadsheet for that as well.
- I think it's important to find connections between subjective feelings and objective stats like amount of exercise, sleep HR, etc., so I'm trying to find correlations using my dashboard
- dreams are quite fun part of lifelogging
How: export script, using plamere/spotipy
- potentially can be useful for better search in music listening history
- can be used for custom recommendation algorithms
(not using it anymore, in favor of org-mode)
How: regular copying of /data/data/com.waterbear.taglog/databases/Buttons Database
- a quick way of single tap logging (e.g. weight/sleep/exercise etc), contributes to the dashboard
How: manual raw data export from 23andme website. I hope your genome doesn't change so often to bother with automatic exports!
was planning to setup some sort of automatic search of new genome insights against open source analysis tools
Haven't really had time to think about it yet, and it feels like a hard project out of my realm of competence.
¶3 Data consumers
Typical search interfaces make me unhappy as they are siloed, slow, awkward to use and don't work offline. So I built my own ways around it! I write about it in detail here.
In essence, I'm mirroring most of my online data like chat logs, comments, etc., as plaintext. I can overview it in any text editor, and incrementally search over all of it in a single keypress.
orger is a tool that helps you generate an org-mode representation of your data.
It lets you benefit from the existing tooling and infrastructure around org-mode, the most famous being Emacs.
I'm using it for:
- searching, overviewing and navigating the data
- creating tasks straight from the apps (e.g. Reddit/Telegram)
- spaced repetition via org-drill
Orger comes with some existing modules, but it should be easy to adapt your own data source if you need something else.
promnesia is a browser extension I'm working on to escape silos by unifying annotations and browsing history from different data sources.
I've been using it for more than a year now and working on final touches to properly release it for other people.
Timeline is a #lifelogging project I'm working on.
I want to see all my digital history, search in it, filter, easily jump at a specific point in time and see the context when it happened. That way it works as a sort of external memory.
Ideally, it would look similar to Andrew Louis's Memex, or might even reuse his interface if he open sources it. I highly recommend watching his talk for inspiration.