How to annotate literally everything

Comprehensive overview of existing tools, thoughts on future and interacting with your data

TLDR: when I read I try to read actively, which for me mainly involves using various tools to annotate content: highlight and leave notes as I read. I've programmed data providers that parse them and provide nice interface to interact with this data from other tools. My automated scripts use them to render these annotations in human readable and searchable plaintext and generate TODOs/spaced repetition items.

In this post I'm gonna elaborate on all of that and give some motivation, review of these tools (mainly with the focus on open source thus extendable software) and my vision on how they could work in an ideal world. I won't try to convince you that my method of reading and interacting with information is superior for you: it doesn't have to be, and there are people out there more eloquent than me who do that. I assume you want this too and wondering about the practical details.

1 Motivation

At some point in my life I realized I didn't remember most of the books/papers/posts/videos I had consumed few years before.

That bothered me increasingly until I bought a Kindle which had 'highlight' functionality and virtual keyboard; and I had discovered it to help a lot with recalling.

I've become increasingly obsessed with this and these days ability to highlight when I read serves multiple purposes for me:

  • the very act of spending conscious effort on highlighting and commenting helps to remember better.
  • it's easier to recall the content I already read, I just skim through highlights and refresh the memory

    In particular, often I'd run on something on the internet that I remember reading before. If I have annotations for that, I can quickly go through them and restore the context.

  • it's easier to recommend content to other people because you can refer to specific moments or points you liked/disliked
  • it's got social value if highlights are visible to other people (e.g. Hypothesis, Medium, Goodreads)
  • it helps with book scoring. If I don't have any highlights, it probably means that the content was not interesting at all for me. Fiction books are not an exception: I tend to highlight use of language I liked, inspirational things, etc.
  • it serves as activity log if you are into .
  • you can populate your TODO list and step up your spaced repetition game.

I'm going to review some of the tools I tried using and still using and highlight their different positive and negative aspects. If you're getting impatient, you can skip straight to my comparison table.

2 Annotating web

Pocket

I won't really write much about it for one reason which is a big deal: while you can highlight text, you can't leave notes. Nearest functionality is 'recommending' a highlight while reading a comment, but that's only displayed on your 'timeline'.

Pocket API doesn't support exporting highlights too, or to be precise it seems to be hidden. If you need it you can use my script where I hacked around it.

Also, interesting enough, Kobo reader has got Pocket integration, but for some reason when you read Pocket articles on Kobo, you can't highlight at all (let alone syncing highlights with Pocket). Not sure what's the purpose of this integration.

Pocket was acquired by Mozilla in 2017, which might be a good thing, but so far their main focus seem to be readability features.

You can also read a rant raising similar issues to what I mentioned.

Instapaper

I won't go into Instapaper's readability capabilities (e.g. fonts and article formatting) because it's not something I care much about, so you might be better off googling that for yourself, here I'll concentrate on annotating aspect. Here are couple of recent extensive comparisons of Instapaper and Pocket, which feature screenshots and other aspects of Instapaper:

So, to read something in Instapaper, first you'll have to import the article into it (to unclutter and optimize it for reading). Due to this import process, you can only read and highlight in Instapaper's app, and you can only see your highlights there as well, which is its main limitation for me.

The only reason I'm using it at all is that its Android app has got offline capabilities, so I would export to Instapaper things I want to read on the tube while I don't have connection and read/comment while offline.

Mind that free version of Instapaper has got 5 notes per month limit. Personally I'm happy to pay 3$ per month for premium version of such a decent product though in absence of good alternatives.

Instapaper got Json API, through which you can access your saved articles, comments and highlights. I'm using a fork of python wrapper to access it. Highlights are only stored as text though (as opposed to CSS/xpath locators), so there is no easy way to match them against original text apart from some sort of fuzzy search.

Search function works for full text search in saved articles, but doesn't let you restrict search for highlights, and you can't search in notes at all.

One red flag was in 2018 when Instapaper wasn't available in Europe for few months until they resolved GDPR issues. While I don't blame it on Instapaper, this is a kind of thing that happens when you don't own your data and use a closed source product.

Wallabag

Wallabag is the most mature open source/selfhosted read-it-later kind of project I know of. Here's a review featuring some screenshots of their web app and Android app.

It's very similar to Instapaper in terms of having to import the article in Wallabag in order to annotate it. I used it for a while and only had some issues with importing articles heavy on MathJax backed Latex.

If you don't want to selfhost it, you can use wallabag.it hosting for as little as 9 euros per year and two weeks of trial.

There is also an Android app, but sadly it lacks support for highlighting.

I wish it had more attention from the community, and might try to work on Android annotation when I got more time.

Hypothes.is

Hypothesis is simply awesome and my favorite web annotation tool. Their killer feature is that it embeds a bit of JS in the page to provide an in-browser overlay, so you don't have to leave the page you were reading and can highlight and add comments natively. They use something cool called fuzzy anchoring to achieve this. That also makes annotations resilient to document markup changes, and if they can't locate your annotations it would be still shown in metadata as 'orphaned', so you never lose your notes.

Another cool feature is that you can choose to make your annotations public and see other people's annotations or create a private group if you want to share them among specific people only.

To get a sense of it you can skim through tutorial which has plenty of screenshots, and I also strongly recommend you checking it out in action here: Annotation Is Now a Web Standard, or try the very page you're reading now.

You don't have to install anything or register, it's just a widget embedded in the page, but do make sure to allow JS. You should see yellow highlights and the sidebar on the right.

It's open source, can be selfhosted and they provide their own service for free (but please consider donating them!).

Since Hypothesis powered by javascript, it actually works well in modern Android browsers via bookmarklet. It's somewhat not obvious in terms of browser UI how to actually use them though:

  • for mobile Firefox, once you added a bookmarklet, to invoke it you need to tap on the address bar and click the bookmarklet.
  • for mobile Chrome, it's a bit more tedious but also possible.

One downside of this service is that you won't be able to annotate while offline. I feel it's actually more of mobile browser's problem in general rather than Hypothesis though. While you potentially can annotate offline without querying API and preserving data in localStorage, if you can't load the page in the first place, it doesn't matter. Perhaps that can be given better support in browsers.

Hypothesis got JSON API which gives access to your and other people's public annotations. I'm using judell/Hypothesis python wrapper to access and back up this data.

Grasp

Shameless plug! If you just want to send annotations directly into a plaintext (e.g. org-mode) file and don't really care about displaying them within the original web page you can use my grasp browser addon for that.

I typically use it for highlights that would be good candidates for TODO items, e.g. something actionable like piece of advice or further reading.

Not available for mobile yet, but perhaps on Android native select and share capabilities (e.g. into orgzly) makes more sense anyway.

Summary

Hypothses.is is a clear winner for me on desktop and I'm using Instapaper for offline reading on Android.

3 Annotating PDFs

Small disclaimer: I don't own a Mac so have no idea what's going on in their world. Sorry! (UPD: I got few recommendations from a follower, perhaps they would be helpful).

PDF format is a complicated beast, and its native annotations are a whole different story to annotating web.

First, its ISO standard is not freely available. Adobe website has got some sort of reference which is not the same as standard, but apparently close enough.

There are quite a few different kinds of PDF annotations, e.g. you can see them here in section 12.5.6: Annotation Types or in Poppler source code. In addition to Highlight and Text types there are things like support for styling, underlines, strikethoughs, and even (heaven forbid) sounds, movies and 3D.

Using native PDF annotations has one major drawback: you will have to save metadata back to the PDF file at some point. At worst it could be impossible due to DRM, but in any case, you'd need to somehow remember that some of your documents might have private notes inside. I get around it by making a copy of the file I'm about to annotate first, and giving it [annotated] prefix so I wouldn't confuse it with the original file.

Okular, Evince, Atril

Probably most widely used PDF readers, all of these use Poppler library for working with PDF, which in particular does the messy business of annotation handling.

All of them would let you view existing annotations, but there are some nuances and limitations:

  • Atril (as of 1.20.3) allows you to add or edit popup notes only, other types of annotations aren't even displayed in the sidebar
  • Evince (as of 3.32.0) only allows to add or edit highlights or popup notes (no inline!). Here is article with some screenshot (not much changed since 2016).

    However it's got a nasty few years old bug (1, 2) that doesn't allow you saving over the same file you're editing. That means that to work around it every time you want to persist your changes, you'd have to save to a new file and reopen the new copy. That makes it pretty unusable unless you only want to do couple of changes.

  • Okular (as of 1.6.3) allows editing and adding pretty much every type of annotation that you would expect: highlights, popup and inline notes, freehand and more.

    Annotation process (screenshot) is pretty pleasant, hitting Ctrl-S results in saving the file you're working on without any problems.

    Okular also got support for something called 'document archive', which saves the original document in a zip file along with metadata.xml, which allows you to annotate non-PDF files (e.g. DJVU), which is a very neat feature. It's obviously Okular specific, in theory though it's possible to process metadata.xml with other tools.

    Search in okular can't be restricted to annotations only and while you can use normal PDF search for inline notes and highlights (along with the other text that happened to match), it doesn't work at all for popups.

    Even though Okular is part of KDE, there is no reason not to use it in other desktops environments, it's not that complicated in terms of UI; looks quite native in GTK, and few extra dependencies are barely a problem these days.

Emacs: pdf-tools

Pdf-tools (as of 0.90) is a PDF viewer for Emacs that meant to be more efficient than the builtin one (in terms of rendering), but is also capable of interacting with PDF metadata.

Here's a screenshot and a short screencast, interesting stuff starts somewhere around 02:00 mark.

One big drawback is that to highlight and add new annotations you still have to use mouse, which loses half of the benefits of using Emacs for me. Also it's got some minor issue displaying inline annotations text in the 'Content' buffer and annotations list (you can edit it if you click on it with your mouse though).

Other Linux readers

There are few other apps I tried using so figured it's worth mentioning.

  • mupdf (as of 1.14.0) is both rendering library (claimed to be faster than poppler) and PDF viewers. It's capable of displaying all types of highlights and annotations, but there is no way to add or edit them.

    It mentions annotation editing in changelog, but in something called 'mupdf-gl', and it doesn't seem to be available in Ubuntu.

  • zathura (as of 0.4.3) is capable of both poppler and mupdf backends, but suffers from the same problem that you can't edit and add new highlights. It's pretty sad, because I like it as a viewer: it's minimalist and capable of VI style keybindings.

Emacs: org-noter

Org-noter (as of 1.3.0) allows you to annotate a PDF while keeping the text annotations in a separate org file which keep track of PDF locations in Org note properties. Here's a short demo.

For me the main drawback is that it doesn't let you highlight, which I tend to do a lot.

Existing annotations in PDF can be imported via org-noter-create-skeleton function (it didn't work for me for some reason though, and I wasn't motivated enough to investigate).

Xournal

Xournal is different from the above PDF viewers, since it isn't using types of annotations described in the PDF standard and instead uses its own tools.

It doesn't modify the original files and instead keeps .xoj file containing the metadata and pointing at the original PDF, so in that sense it's pretty similar to okular. Similarly, it's xournal-specific and can't be viewed anywhere else unless you export it in PDF before sharing (at which point your annotations wold basically become background images).

Hypothes.is (again)

Already mentioned in the previous section, it's also capable of annotating PDFs via pdf.js.

Check out their guide, especially if you're using Chrome, apart from it it's as easy as opening the PDF in your browser and activating Hypothesis. It fingerprints the PDFs so you don't have to worry about losing your annotations and it's easy to collaborate with other people.

It seems to work fast enough for big PDF books as well, however generally reading long things in browser is not very convenient as you lose reading position if you close the tab.

Polar

Polar is a new project which aims to be not just reader, but 'personal knowledge repository'.

  • supports highlights and comments
  • document repository, so you have overview of all the stuff you ever read/commented. It also keeps track of your reading position.
  • the PDFs are fingerprinted, so you don't need to worry about moving them around your filesystem
  • ~.polar directory holds all the data, which makes it easy to share among your computers (e.g. via git, or if you keep it on Dropbox and symlink)
  • metadata is in well structured json files, which makes it easy to access from scripts
  • highlight locators keep matched text alongside the absolute coordinates, which leaves potential for matching against different editions of the PDF file
  • it's got builtin flashcards engine. Personally, I'm too used to org-drill now, but that's a great a way of introducing spaced repetition to people.
  • the author is very passionate about this project, invests a lot of effort and quite ambitious

If you like it, please consider donating them!

The only downside is that annotation format is Polar specific, so it'd be hard to share with other people unless they are willing to use Polar as well.

Annotating on Android

  • Adobe Reader

    Supports most reasonable ways of annotation: highlights, popup/inline comments, strikethough, styling, etc. (screenshot). "Comment List" gives overview of your document: screenshot.

    It offers Adobe Cloud and Dropbox integration, but I rely on Syncthing for syncing my stuff anyway.

  • Xodo

    Basically supports same things that Adobe does.

    For me, Xodo wins by a very thin margin because its interface tends to be a bit more denser and 'material': interface, annotations list. Otherwise, it's virtually no different from Adobe Reader.

  • mupdf

    The F-droid description claims it supports annotation, but it couldn't display any of the existing ones in my pdf files. What's more, the app wasn't responsive on any long taps or my attempts to select text, let alone highlight or comment.

    Perhaps PDF 1.7 is too outdated? Something weird has been going on with the 'full' version, maybe this is somehow related (1, 2).

  • Pen&Pdf: I tried this one since it was open source and claimed to support annotation, but it didn't even manage to pick up any of the existing ones.

Summary

If you want the convenience of editing and viewing on phone and working with other people, Okular wins on desktop and Adobe Reader/Xodo could be used on your phone.

If you care about preserving the original PDF files and want convenience in accessing the annotations programmatically, Polar is best.

4 Annotating E-ink

Two e-ink readers that support highlights and notes I know of are Kindle (I had Paperwhite 2) and Kobo (I own Kobo Aura One). Highlighting works as you would expect on E-ink touchscreen (long press and dragging the selection); and you can leave notes by typing on a virtual keyboard (somewhat laggy, but ok for up to few sentences). Perhaps the only differences are how you can search and access the annotations.

Kindle

Kindle stores bookmarks, notes and highlights in My Clippings.txt on the device. The good thing about the format is that it's already plaintext and fairly human readable, so you might be happy with that alone. The format is a bit nasty for parsing (as you would expect from something with .txt extension). Dates are locale dependent, document locators may or may not have roman numerals, separators are inconsistent at times, etc. When I was using Kindle I was just copying the file from time to time, and you can set up some sort of automatic copying when your device is connected similarly to what I'm doing with Kobo.

Kindle uploads your notes and highlights to Kindle Cloud Reader (screenshot, screenshot) , but it only works for stuff bought on Kindle store. Reportedly people also have issues displaying their highlights on Cloud Reader due to copyright restrictions.

Kindle also integrates with Goodreads, which synchronizes reading progress and lets you selectively share annotations to Goodreads. But that's also restricted to books bought from Amazon.

Search function is somewhat limited: you can search in the book and it displays your highlights alongside content it found in the book, but you can't restrict search to highlights. You can't search in notes either. Funny enough though, the My Clipping.txt file can be opened on Kindle itself (as any other txt file), and then you can search in it. It's not super convenient, but better than nothing. (I wasn't brave enough to try and see what happens if you try to highlight in this file.)

Kobo

Stores all of it's stuff in .kobo/KoboReader.sqlite on the device.

The database has got lots of cool stuff: in addition to highlights and notes you can also access reading progress, time spend reading and possibly some other interesting data I didn't manage to reverse engineer yet. You can check out kobuddy, which is my attempt to extract useful data from the database and provide nicer high level Python interface. It's also fairly straightforward to open it in sqlitebrowser and play with your own queries.

Kobo doesn't seem to support cloud sync for annotations. I was considering syncing the database wirelessly, as there are some SSH modules for its firmware, but people report it may break wifi on it. I'm using kobuddy as well to work around it.

There is an official Android app which lets you manage and annotate books from Kobo store and seems to be syncing progress between eink and phone. However annotations don't sync between Kobo and phone for me, and other people also report same experience: 1, 2, 3. Some claim it works on iphones though.

Kobo lets you conveniently search over all of your highlights and notes.

Koreader

Koreader is an alternative open source software for Kindle, Kobo and other E-ink devices.

It's got some very cool features, in particular most common document formats, dictionary and Wikipedia lookups, and various plugins.

It also supports highlighting, but (as of v2019.06), note taking is unsupported yet, but some progress is going on. I'd be keen to try it once it's implemented!

5 Miscellaneous

Annotating paper books

So far, for me the only downside of using nice tools for annotating digital content is that it ruined the experience of reading paper books for me.

Usually I don't own the books I read, so using a highlighter or pencil would be just mean to the owner. Even if you own the book and okay with that, it's still not searchable and not easily accessible, which feels very wrong to me.

To get around this I've tried few of tricks:

  • Take pictures of bits I'm interested in, perhaps highlight using image editor on the phone
  • Sticky notes are ok for commenting as long as you don't damage the book with the glue, but they down help with highlighting
  • Using paper strips as an annotation overlay.

    This one I'm particularly proud of coming up to as I haven't found anyone else doing that, and I rarely come up with useful meatspace things.

    This is how it looks in action: photo.

    Basically, before reading, I prepare a bunch of paper strips slightly longer than the page height, kinda like bookmarks. You will use it as a 'sidebar overlay' for writing notes and highlighting, so the width depends on your handwriting and how much you're expected to do that, I usually use something like 1/4 of the page.

    If you want to annotate the page, you'll align strip's bottom to the bottom of the page and mark lines you found interesting on this strip and write comments on it as well. You can also use the other side of the strip to annotate the other page.

    The downside of this is that in order to annotations to make sense, it requires a physical copy of exact same book. Another one is that it doesn't have automatic timestamps, which somewhat bothers my OCD. You can get around it by writing down time as well, but that's quite distracting.

When I'm done with a book, I'd spend a bit of time digitizing annotations and manually typing them into plaintext. Luckily, I don't have to do that often.

Annotating plaintext

Often, I want to leave a quick comment to an org-mode item. I've got a handy Emacs binding which appends a child note with a timestamp and enters edit mode, so the whole process is smooth. If you're not using org-mode you still can benefit from something similar, most of modern text editors allow you binding snippets on hotkeys.

One big drawback with Org mode (and I believe most of outline/task list formats) though is that if you insert child outline items in the middle of text, it would structurally break it in two parts, so you'd have to append your commend to the end of current outline (which can be potentially very long). On the other hand, plain list items, which you can insert in arbitrary place, are very limited and don't support most of things outline support like tags, timestamps, priorities etc.

Annotating videos

Often when I watch lectures or some talks on Youtube or in VLC, I want to leave a bookmark or write a note with a reference to a specific timestamp. This is pretty much not possible apart from opening your text editor and manually recording the position in video. All the video annotation software I know of is more oriented towards video editing/effects etc.

So, if I'm watching something in browser, I normally end up using grasp and manually type the timestamp.

This is distracting, but even worse is that whatever you use have no means of quickly jumping to the timestamp you recorded; you'd have to move the slider to it manually.

There is no common standard that I know of for jumping at a certain timestamp neither in web nor in desktop applications (e.g. via mime handler).

I'd say this is somewhat unsolved problem, which is surprising since presumably it could be helpful for lots of students.

Other notable mentions

Due to the lack of common standard for annotated content, some services try to implement their own:

  • Medium. They use highlights and annotation that also serve social function: when you read a Medium post you can see if a certain bit of text was highlighted by many other people.

    They don't tamper with browser selection, so you can still use external annotation tools like Hypothesis. However, judging by their API, there is no way to access your highlights. Anyway, I would encourage people not to use these especially if you only care about personal use, after all Medium is not the only source of information out there.

Hall of shame!

These are services that wouldn't let you select text. Not sure why that happens: could be some sort of copyright restriction, being assholes, or just accidental pointless restriction.

  • Facebook: Android app and mobile site prevent text selection.
  • Blinkist: Android app and website.

    You can't use native text selection as Blinkist forces some custom JS for highlighting. But their highlights suck: you can't leave a comment and also they actively prevent you from selecting text forcing to use their own JS thing.

    In addition you can't even export your highlights, best you can do seems to be syncing to an Evernote notebook, and perhaps then you can use Evernote API. I didn't bother with it though.

    UPD (20190818) I actually managed to dump my highlight data before cancelling Blinkist subscription by using (apparently private) API, here's the script.

6 What makes a good annotation system?

In my quest for the perfect annotation engine I've figured certain aspects that make or would make an annotation tool pleasant to use.

  • Uniform

    Highlighting a piece of content and leaving a comment are fairly straightforward operations, and you shouldn't have to think much about how exactly you do it and which program you use. Most current annotation engines are also somewhat tedious to interact with, add more content in existing annotaions, link, etc.

    Solving this requires the tools being cross platform and cross format.

    Hypothes.is is the move in the right direction, but there are plenty of other things starting from unsupported formats and working offline to paper books which are missed out.

    While current sad state of different tools/products for different forms of content is understandable, ideally it should be be format agnostic with some proper way of fingerprinting content. If humans can tell whether a novel published online as HTML and a paper novel are the same thing, so can software.

    Common standard (e.g. Web Annotation Data Model) is a good start, but even this one is pretty unknown and not widely accepted.

    Perhaps in the near future we could exploit existing (fairly robust) OCR technologies and augmented reality to develop a universal annotation tool, but so far that's a whole different ballpark.

  • Ease of interaction

    Annotating is meant to augment your limited memory capabilities and using them should be as easy as retrieving information from your brain.

    While brain-computer interfaces are not quite there, even with existing technologies you can achieve that with as little as few seconds lag just by using plaintext representations, indexing and incremental search.

    Personally, I'm solving this problem via orger.

  • Separate metadata

    Annotations layer should be loosely coupled to the underlying content. If it's not the case, it makes you too dependent on the specific tools, makes harder to keep track of your private data and to share data with other tools.

    For physical sources of information it matters even more; although they might decline completely in few decades, who knows.

    Good examples of this approach are Polar and Hypothesis which keep metadata in well defined format with locators.

  • Data ownership and resilience

    If annotations make essential part of your knowledge, you want to be capable of accessing them anytime.

    Ideally everything should work while fully offline without relying on any services.

    Currently it's not always feasible due to technical complications (e.g. having to selfhost), but this is a good value to pursue.

  • Social and collaborative

    Annotations are a valuable tool for collaborative learning and research, and improving tools can make these activities more pleasant.

    Blog comments seem to be somewhat in decline which is understandable since it's too annoying to register here and there. On the other hand, platforms like Facebook comments or Disqus are not very privacy friendly, don't give access to data stored (e.g. if Disqus disappears tomorrow so do comments in your blog), and are not very friendly towards people who do want to comment anonymously.

    Perhaps in some near future we could ditch all the internet commenting platforms and rely on annotation layer instead. Hypothesis basically lets you do that already, perhaps with a little work on design (sidebar is not necessarily convenient for social commenting), it could serve that purpose.

    I also consider comments people write as projections of their minds and it would be great to give other people easier access to that to get to know each other better.

    It's hardly worth mentioning that one should be in control whether highlights they are making are private or everyone else can see them.

  • Open source: not sure if that even needs justifying :)

    People have somewhat different requirements for their cognitive tools and it should be possible to can hack them and fix annoying bugs. That also gives way more potential for integrating them with other services.

Comparison

I'm only listing tools that support proper highlighting and commenting.

  mobile annotations fingerprinting search in annotations separate metadata sharing open source
Instapaper Y, offline n/a N N N N
Wallabag N n/a N N N Y
Hypothesis Y Y Y Y Y, web API Y
Copy-paste Y, offline N (manual) Y Y Y, file sync Y
Okular n/a N limited limited Y, file sync Y
Emacs pdf-tools n/a N N N Y, file sync Y
Emacs org-noter N N Y Y Y, file sync Y
Hypothesis (PDF) N Y Y Y Y, web API Y
Xournal N N N Y Y, file sync Y
Polar N, on roadmap Y N Y Y, file sync, cloud Y
Xodo/Adobe Reader Y, offline N N N Y, file sync, cloud N
Kindle N N limited Y limited N, but koreader ↑
Kobo N, broken↑ N Y Y N, but possible↑ N, but koreader↑

7 Using annotation data

Considering there are multiple tools I have to use none of which is fully capable of doing everything I would ideally want from annotation system, I've developed my ways of getting closer to my ideal. For that I've got some infrastructure set up.

Backups: I've already mentioned script I'm using to back up Kobo database, for cloud services I'm running bunch of daily cron jobs that query APIs for data. Most of the job scripts are fairly ad-hoc and just a matter of GET query with properly set oauth token so perhaps not worth sharing, but let me know if you want something specific. These files are always synced across all of my devices, including phone, so I always have access to them.

That serves not just as data backup, but also as data providers for my tools. I only interact with these daily snapshots on filesystem rather than directly with API. That helps to avoid dealing with rate limiting, flakiness in network connection or API itself, and makes it way faster to iterate and develop. The only downside is that the data is not necessarily up to date, but perhaps you can dump data more often to get around this; I would still highly recommend you to prefer that to interacting with API directly.

I'm using a special Python package to access the data, which I called my. It's always in my PYTHONPATH so I can use it from any script/tool or REPL. It's got bunch of different submodules, e.g. my.reading.instapaper, my.reading.kobo, my.reading.polar (there are other modules than my.reading, but that material for a whole separate post). The package itself is not public since it happens to have some personal data in it, at some point I'll try to strip it off and release it (not sure if it would be really relevant for other people anyway).

Extracting reading stats

As a specific example how I do it: recently my friend asked me if I could recommend them posts I found interesting on Slate Star Codex. With a tiny python script I was quickly able to give them some stats on posts I read, so they could choose among them.

import my.reading.hypothesis
for p in my.reading.hypothesis.get_pages():
    if 'slatestarcodex' in p.link: 
        print(f'{p.link} {len(p.highlights)}')

Searching in annotations

I've got bunch of scripts and a rendering tool which I named orger (yep, haven't invested that much thought into naming). Basically, these scripts take specific data source as input and produce org-mode output, e.g. renders json backed up from Instapaper into instapaper.org file. That runs every few hours and keeps the contents relatively up to date.

I chose org-mode as I was already used to its features, keybindings and metadata. Also the hierarchy (e.g. book → highlight → comments) fits naturally into outline format. But not that it's a real necessity, I feel that as long as it's searchable plaintext, it's good enough.

To search them, I've got a global keybinding, which invokes Emacs with incremental search prompt against the directory with rendered org files, which lets me interact with them in a blink on my computer. On Android I'm using DocSearch indexer (sadly it's not incremental, and app is not open source, so I'm looking for alternative).

Finally, I've got a Recoll indexer instance + web interface running on my VPS; so if necessary I can access and search annotations via the internet.

Providing TODO items

While reading, I often encounter something I want to google or check or read about later; or just come up to something actionable inspired by what I'm reading. But I also don't want to interrupt from reading and losing context: that especially matters while reading on a E-ink device: distracting from the book, fetching your phone etc is really annoying.

So, as a workaround, I have programmed rules that pick out notes that start with "TODO" or marked with "TODO" tags; etc, and they are automatically added to my agenda. Later, when I see it on agenda, I'd assign it a priority and reschedule/unschedule depending on importance.

Here's an example of me using using my.instapaper module for that.

Spaced repetition

It's kind of an extension of the previous use case: again, often you want to send something straight into your spaced repetition queue without having to remember to add that.

I've got two rules for that:

  • if something is annotated with a certain marker ('drill' for me, comes from org-drill package name)
  • if it's only got one word highlighted, which is useful for memorizing foreign words

Here's how I'm using it for Kobo highlights.

Life log

I'm a big fan of and all the timestamped highlights, comments and reading progress from Kobo are an effortless (no manual logging!) contribution to my personal timeline, which I render and sync on my devices every few hours.

I sometimes use it when conversation with other people comes to awkward silence, so I can recall something I was reading recently and spark off an interesting (well at least for me) topic.

8 --

I'd be interested to know what do you think, and how are you managing your annotations or if you need some help with your existing workflow. Please also let me know if I missed any tools or features!