show timestamps

Map of my personal data infrastructure

Table of Contents

Well, it's been a year since I started the draft, so I guess it's about time to publish this! :)

This is a map of my personal data liberation infrastructure, with links to the scripts and tools used; and my blog posts elaborating on different parts of it.

My goal for data liberation is approximating the 'personal data mirror' concept, often despite crappy interoperability (or lack thereof) of different platforms.

I prepared this diagram for several reasons:

  • to give more context for my blog posts about data liberation and tools around it
  • to highlight the complexity and hoops we have to jump over because of the lack of interoperability
  • it was also sort of fun :)

This time I won't write too much text and just let you explore it. Tips for exploring the diagram:

  • perhaps open the full size SVG in a new tab
  • make sure to read the legend
  • links you can follow are marked with blue (and sometimes other colours)
  • there is a bubble (💬) near some nodes/edges, you can hover it to see the comment
  • some integrations are in progress: marked with WIP, construction signs (🚧🚧) and dashed edges
  • arrows roughly represent the direction of data flow
  • arrow colors roughly correspond to the data source (so it's easier to track how it flows)
  • there are some rendering issues
    • it's probably not very mobile friendly (it's barely desktop friendly!)
    • SVG support varies among web browsers, so there might be some minor artifacts (chromium works better, but firefox works well enough)
    • navbar centering isn't broken on this page – it's just a temporary hack to fit in the diagram till I figure out wide pages properly
G cluster_group cluster_legend Legend cluster_meta Meta (why I'm doing all this?) cluster_phone Android phone cluster_phone_fss Filesystem cluster_devices cluster_orger_cl Orger  ¶ Orger  ¶ orger_posts Orger: plaintext reflection of your digital self Managing inbound digital content Orger + Roam Research orger Github: orger Mirrors: kobo twitter instapaper youtube hypothesis github polar ...and more Queues: kobo2org ip2org reddit hackernews ...and more cluster_for_dashboard cluster_promnesia_cl Promnesia  ¶ Promnesia  ¶ promnesia_posts My journey in fixing browser history promnesia Github: promnesia cluster_orger_outputs Plaintext files cluster_pipelines cluster_exports Export layer  ¶ Export layer  ¶ exp_telegram_backup telegram_backup telegram_backup exp_messenger fbmessengerexport fbmessengerexport exp_takeout_manual semi-manual (periodic) semi-manual (periodic) exp_vkexport vkexport vkexport exp_twint twint twint exp_tw_manual manual request (periodic) exp_discord_manual manual request (periodic) exp_pinbexport pinbexport pinbexport exp_ghexport ghexport ghexport exp_github_manual manual download manual download exp_pockexport pockexport pockexport exp_rexport rexport rexport exp_pushshift pushshift_export pushshift_export exp_instapexport instapexport instapexport exp_kobuddy kobuddy kobuddy exp_remarkable_sync script exp_inp_weight manual input exp_inp_blood manual input exp_emfitexport emfitexport emfitexport exp_jbexport jbexport exp_inp_sleep manual input exp_garmindb GarminDB GarminDB exp_endoexport endoexport endoexport exp_inp_exercise manual input exports_infra Data export infrastructure Building data liberation infrastructure In search of a friendlier scheduler cluster_filesystem Filesystem  ¶ Filesystem  ¶ fs_twitter sqlite fs_twitter_archive json fs_vk json fs_telegram sqlite fs_messenger sqlite fs_reddit json fs_pushshift json fs_pinboard json fs_discord_archive zip/json fs_github json fs_github_archive zip/json fs_pocket json fs_instapaper json fs_takeouts json html fs_kobo sqlite fs_remarkable custom format fs_weight orgmode fs_blood orgmode fs_emfit json fs_jawbone json fs_sleep orgmode fs_garmin sqlite json  fit fs_endomondo json fs_exercise orgmode filesystem_blog Against unnecessary databases 🚧Ensuring backup safety 🚧Data exports deduplication fs_materialistic sqlite fs_bluemaestro sqlite fs_runnerup tcx workouts fs_gpslogger gpx tracks cluster_hpicl Human Programming Interface  ¶ Human Programming Interface  ¶ hpi_in_fs_messenger hpi_in_fs_reddit hpi_in_fs_pushshift hpi_in_fs_pinboard hpi_in_fs_github hpi_in_fs_github_archive hpi_in_fs_pocket hpi_in_fs_twitter hpi_in_fs_twitter_archive hpi_in_fs_discord_archive hpi_in_fs_kobo hpi_in_fs_materialistic hpi_in_fs_remarkable hpi_in_fs_vk hpi_in_fs_instapaper hpi_in_fs_bluemaestro hpi_in_fs_blood hpi_in_fs_weight hpi_in_fs_emfit hpi_in_fs_jawbone hpi_in_fs_sleep hpi_in_fs_garmin hpi_in_fs_endomondo hpi_in_fs_exercise hpi_in_fs_runnerup hpi_in_fs_gpslogger hpi_in_fs_takeouts cluster_hpi_core cluster_for_timeline cluster_for_hpi Device Device Cloud service Cloud service legend_auto Automatic script legend_manual Manual step legend_blog Entry from my blog (clickable) Entry from my blog (clickable) legend_ui User facing interface sad_infra The sad state of personal data and infrastructure The sad state of personal data and infrastructure Disk storage Disk storage legend_dead Dead service/product brain_coping How to cope with a human brain How to cope with a human brain mydata What data I collect and why? What data I collect and why? gps GPS app_garmin Garmin app gps->app_garmin app_runnerup Runnerup app Runnerup app gps->app_runnerup app_gpslogger Gpslogger app Gpslogger app gps->app_gpslogger google Google Browser history Location Takeout 💬 Unclear retention rules gps->google jawbone Jawbone (dead) 💬 Discontinued in 2017 API app_jawbone->jawbone endomondo Endomondo (dead) 💬 discontinued in December 2020 API app_endomondo->endomondo garmin Garmin Connect website (scraping) 💬 Scraping is inherently fragile app_garmin->garmin app_fs_bluemaestro sqlite app_fs_materialistic sqlite app_fs_runnerup tcx workouts app_fs_gpslogger gpx tracks app_runnerup->app_fs_runnerup app_materialistic Materialistic (Hackernews app) Materialistic (Hackernews app) app_materialistic->app_fs_materialistic app_gpslogger->app_fs_gpslogger app_bm Bluemaestro app app_bm->app_fs_bluemaestro telegram Telegram API telegram:api->exp_telegram_backup messenger FB Messenger API (private)💬 messenger:api->exp_messenger fragile fragile fragile google:takeout->exp_takeout_manual wahoo Wahoo Tickr X (HR monitor) Wahoo Tickr X (HR monitor) wahoo->app_endomondo BT wahoo->app_runnerup BT jawbone_band Jawbone sleep tracker jawbone_band->app_jawbone BT bluemaestro Bluemaestro (environment sensor) Bluemaestro (environment sensor) bluemaestro->app_bm BT garmin_watch Garmin watch garmin_watch->app_garmin BT emfit Emfit QS sleep tracker wifi (local API) wifi (cloud API) emfit_cloud Emfit API emfit:cloud->emfit_cloud vk API 💬 Messages API locked down vk:api->exp_vkexport API closed? API closed? API closed? twitter Twitter API 💬 Twitter is getting more and more hostile to hobbyist project and 3rd party clients website (scraping) 💬 scraping Twitter is extremely fragile archive twitter:website->exp_twint fragile fragile fragile twitter:archive->exp_tw_manual discord Discord API 💬 Hostile against alternative clients, e.g. can't retrieve DMs with api archive discord:archive->exp_discord_manual pinboard Pinboard API pinboard:api->exp_pinbexport github Github API 💬 only 300 latest events via API archive github:api->exp_ghexport github:archive->exp_github_manual pocket Pocket API pocket:api->exp_pockexport reddit Reddit API 💬 only 1000 latest items via API GDPR export 💬 Only on email request pushshift reddit:api->exp_rexport reddit:pushshift->exp_pushshift instapaper Instapaper API instapaper:api->exp_instapexport kobo Kobo reader sqlite kobo:sqlite->exp_kobuddy remarkable Remarkable 2 tablet ssh remarkable:ssh->exp_remarkable_sync scales scales scales scales->exp_inp_weight blood_tests Blood tests (GP/Thriva/etc) Blood tests (GP/Thriva/etc) blood_tests->exp_inp_blood emfit_cloud:api->exp_emfitexport jawbone:api->exp_jbexport dead sleep_subj Sleep data (subjective) Sleep data (subjective) sleep_subj->exp_inp_sleep garmin:website->exp_garmindb endomondo:api->exp_endoexport dead Exercise Exercise Exercise->exp_inp_exercise browser_for_promnesia Browser (extension) Browser (extension) promnesia->browser_for_promnesia archivebox Archivebox (web preservation) Archivebox (web preservation) promnesia->archivebox data mirrors (read only) data mirrors (read only) orger:mirrors->data mirrors (read only) todo lists interactive queues todo lists interactive queues orger:queues->todo lists interactive queues emacs Emacs (Doom) Emacs (Doom) logseq Logseq Logseq pkm_search_post Building personal search engine Building personal search engine exp_twint->fs_twitter exp_tw_manual->fs_twitter_archive exp_vkexport->fs_vk exp_telegram_backup->fs_telegram exp_messenger->fs_messenger exp_rexport->fs_reddit exp_pushshift->fs_pushshift exp_pinbexport->fs_pinboard exp_discord_manual->fs_discord_archive exp_ghexport->fs_github exp_github_manual->fs_github_archive exp_pockexport->fs_pocket exp_instapexport->fs_instapaper exp_takeout_manual->fs_takeouts exp_kobuddy->fs_kobo exp_remarkable_sync->fs_remarkable exp_inp_weight->fs_weight exp_inp_blood->fs_blood exp_emfitexport->fs_emfit exp_jbexport->fs_jawbone exp_inp_sleep->fs_sleep exp_garmindb->fs_garmin exp_endoexport->fs_endomondo exp_inp_exercise->fs_exercise fs_messenger->hpi_in_fs_messenger DAL DAL DAL fs_reddit->hpi_in_fs_reddit DAL DAL DAL fs_pushshift->hpi_in_fs_pushshift DAL DAL DAL fs_pinboard->hpi_in_fs_pinboard DAL DAL DAL fs_github->hpi_in_fs_github DAL DAL DAL fs_github_archive->hpi_in_fs_github_archive fs_pocket->hpi_in_fs_pocket DAL DAL DAL fs_twitter->hpi_in_fs_twitter fs_twitter_archive->hpi_in_fs_twitter_archive fs_discord_archive->hpi_in_fs_discord_archive DAL DAL DAL fs_kobo->hpi_in_fs_kobo DAL DAL DAL fs_materialistic->hpi_in_fs_materialistic fs_remarkable->hpi_in_fs_remarkable 🚧WIP🚧 fs_vk->hpi_in_fs_vk fs_instapaper->hpi_in_fs_instapaper DAL DAL DAL fs_bluemaestro->hpi_in_fs_bluemaestro fs_blood->hpi_in_fs_blood fs_weight->hpi_in_fs_weight fs_emfit->hpi_in_fs_emfit DAL DAL DAL fs_jawbone->hpi_in_fs_jawbone fs_sleep->hpi_in_fs_sleep fs_garmin->hpi_in_fs_garmin 🚧WIP🚧 fs_endomondo->hpi_in_fs_endomondo DAL DAL DAL fs_exercise->hpi_in_fs_exercise fs_runnerup->hpi_in_fs_runnerup fs_gpslogger->hpi_in_fs_gpslogger fs_takeouts->hpi_in_fs_takeouts hpi_usecases Usecases Making sense of Endomondo's calorie estimation Extending my personal infrastructure hpi_node gpslogger sb location&timezones for other modules messenger vk twitter discord sb pinboard github pocket reddit instapaper hackernews kobo and more... github/HPI bluemaestro body.weight body.blood body.sleep body.exercise hpi_node:pocket->promnesia hpi_node:reddit->promnesia hpi_node:hackernews->promnesia hpi_node:pinboard->promnesia hpi_node:discord->promnesia hpi_node:instapaper->promnesia hpi_node:twitter->promnesia hpi_node:github->promnesia hpi_node:messenger->promnesia hpi_node:vk->promnesia hpi_node:pocket->orger hpi_node:reddit->orger hpi_node:hackernews->orger hpi_node:pinboard->orger hpi_node:discord->orger hpi_node:instapaper->orger hpi_node:twitter->orger hpi_node:github->orger hpi_node:kobo->orger hpi_memacs Memacs Memacs hpi_node:main->hpi_memacs 🚧  WIP  🚧 jupyter Jupyter IPython Jupyter IPython hpi_node:main->jupyter hpi_http HTTP API (🚧wip🚧) HTTP API (🚧wip🚧) hpi_node:main->hpi_http hpi_spreadsheet Spreadsheet-like interface? Spreadsheet-like interface? hpi_node:main->hpi_spreadsheet 🚧  WIP  🚧 hpi_influxdb Influxdb Influxdb hpi_node:main->hpi_influxdb 🚧  WIP  🚧 🚧  WIP  🚧 🚧  WIP  🚧 hpi_ffi Other programming languages (FFI) Apache Arrow hpi_node:main->hpi_ffi 🚧  WIP  🚧 hpi_sqlite Sqlite (via cachew) Sqlite (via cachew) hpi_node:main->hpi_sqlite hpi_memri Memri Memri hpi_node:main->hpi_memri 🚧  WIP  🚧 timeline Timeline /Memex (🚧wip🚧) Timeline /Memex (🚧wip🚧) hpi_node:main->timeline dashboard Dashboard (🚧wip🚧) Dashboard (🚧wip🚧) hpi_node:bluemaestro->dashboard hpi_node:blood->dashboard hpi_node:weight->dashboard hpi_node:exercise->dashboard hpi_node:sleep->dashboard hpi_tech Libraries/patterns cachew persistent cache/serialization Configs suck Using mypy for error handling hpi_in_fs_messenger->hpi_node:messenger_in hpi_in_fs_reddit->hpi_node:reddit_api hpi_in_fs_pushshift->hpi_node:reddit_pushshift hpi_in_fs_github->hpi_node:github_api hpi_in_fs_github_archive->hpi_node:github_archive hpi_in_fs_pinboard->hpi_node:pinboard_in hpi_in_fs_pocket->hpi_node:pocket_in hpi_in_fs_twitter->hpi_node:twitter_api hpi_in_fs_garmin->hpi_node:ex_garmin hpi_in_fs_garmin->hpi_node:sleep_garmin hpi_in_fs_endomondo->hpi_node:endomondo hpi_in_fs_instapaper->hpi_node:instapaper_in hpi_in_fs_kobo->hpi_node:kobo_in hpi_in_fs_bluemaestro->hpi_node:bluemaestro_in hpi_in_fs_materialistic->hpi_node:hackernews_in hpi_in_fs_runnerup->hpi_node:runnerup hpi_in_fs_takeouts->hpi_node:loc_google hpi_in_fs_twitter_archive->hpi_node:archive hpi_in_fs_discord_archive->hpi_node:discord_in hpi_in_fs_jawbone->hpi_node:jawbone hpi_in_fs_emfit->hpi_node:emfit hpi_in_fs_vk->hpi_node:vk_in hpi_in_fs_weight->hpi_node:weight_in hpi_in_fs_blood->hpi_node:blood_in hpi_in_fs_sleep->hpi_node:sleep_manual hpi_in_fs_exercise->hpi_node:exercise_manual hpi_in_fs_gpslogger->hpi_node:gpslogger hpi_solid Solid project Solid project hpi_http->hpi_solid 🚧  WIP  🚧 hpi_metabase Metabase Metabase hpi_spreadsheet->hpi_metabase hpi_grafana Grafana Grafana hpi_influxdb->hpi_grafana see demo see demo see demo hpi_ffi->hpi_http hpi_sqlite->hpi_metabase 🚧  WIP  🚧 hpi_datasette Datasette Datasette hpi_sqlite->hpi_datasette 🚧  WIP  🚧 🚧  WIP  🚧 🚧  WIP  🚧 hpi_sqlite->hpi_grafana plugin plugin plugin hpi_memri->hpi_http hpi_memri->hpi_ffi dashboard->hpi_grafana 🚧  WIP  🚧 browser_for_dashboard Browser (HTML) dashboard->browser_for_dashboard jupyter2 Jupyter IPython dashboard->jupyter2 hpi_openhumans dashboard->hpi_openhumans 🚧  WIP  🚧

Some notes regarding the diagram:

  • it's plotted via graphviz, and you can find the source here (although the code is quite domain specific)
  • even though there is a lot of stuff on the diagram, it's still incomplete!
    • here there is an (also incomplete) list of data I collect/export
    • HPI modules are a good proxy for the data I'm using
  • note that despite some platforms dying (e.g. Jawbone/Endomondo), I can still use data produced with them!

    E.g. after Endomondo was discontinued, I was able to quickly switch to open source RunnerUp app, while preserving complete data compatibility.

  • note how many services are outright malicious with their anti-API/anti-scraping/anti-interoperability measures (yellow/red highlight for API nodes)
  • probably more platforms have GDPR exports, I just haven't tried yet
  • indirection is crazy

    Note how for some data, before I can get it on my computer, it goes as

    • device –> phone (over bluetooth)
    • phone –> cloud (over internet)
    • cloud –> computer (over internet)
  • for many phone apps the only way I can sync the data is by rooting my phone in order to access the /data/data directory

    This is getting worse and worse with every Android version. I understand the security concerns, but this is ridiculous.

  • some modules/packages (marked withsb superscript) were developed by Sean Breckenridge

    He's forked my HPI package and working on it in parallel. For now, we decided to hack on it independently, in the hope that eventually we figure out what's a good model for cooperating and maintaining the modules.

    Also, he's done some cool work on automatic HTTP API for HPI!


TODO[C][2021-02-07 19:53] hmm some 'HTML label' boxes seem to have extra padding?

although only in svg mode? png renders fine.

STRT[C][2020-02-03 01:57] fix css so it's occupying full screen width

  • [2020-02-07 19:49] a bit adhoc, but works for now

STRT[C][2020-02-03 01:57] legend

DONE[B][2020-02-07 19:51] labels don't fit into the boxes??

  • [2020-02-14 21:25] apparently only on desktop Firefox =/
  • [2021-02-07 19:46] looks fine now?

STRT[C][2020-02-14 21:30] Chrome doesn't support svg side attribute, so some labels appear upside down :(

fixing with JS for now…

2 ---

Let me know what you think, and as always happy to answer your questions!