Published on to joshleeb's blog
A few months ago I wrote a post, On Organizing Bookmarks, that proposed an extension to simple bookmark tagging called Hierarchical Namespaced Tagging (HNT).
Simple Tagging is the most common and familiar form of tagging seen on sites like Pinboard, Lobsters, and Youtube; in tools like Scrivener and Zotero; in browser-based bookmark managers; and in many more places. It involves assigning zero or more terms to describe and classify a resource. For example, this post might be tagged with the terms “bookmarking”, “organization”, and “tagging”.
HNT extends Simple Tagging by assigning paths made up of terms rather than assigning single terms. With HNT, we might tag this post with “bookmarking” and “organization/tagging” since tagging is one kind of organizational strategy and therefore more specific in the context of “organization”.
My last post on this topic goes into more detail and explains the problems HNT was designed to solve. Needless to say, after writing that post I was very excited to try it out with my own bookmarks. So that’s what I did.
Experimenting with HNT
I’ve been working on a personal project called Pinto, which is yet another bookmarking system but with a few neat features planned. I will be writing more posts and status updates as the project progresses. And this was my testbed for HNT.
With Pinto I spent a month using HNT to organize my bookmarks, an experience that was underwhelming and at times a little frustrating. My sense is that Simple Tagging does not provide enough structure whilst with HNT it is clear there is too much structure.
Each term of an HNT path provides context to its child terms. The term “architecture” could mean buildings or software but the path “software/architecture” is unambiguous. Now that I could provide more specific tags what I found was that I was encouraged to do so even when it wasn’t helpful, to the point where most full paths had one, maybe two bookmarks.
I also got caught up with how much more specific each subsequent term should be. Should it be “rust/concurrency/async” or “rust/async”? And what should the order of terms be, such as “rust/async” or “async/rust” or both?
I had given myself too much flexibility to create a structure and encode context and I had tied myself up in knots. So it was back to the drawing board.
Purpose of Tagging
Everyone, every system, and every use case has their own reason for tagging, their own tagging structure, and their own approach for choosing terms within that structure. But from what I’ve seen it all comes down to retrieval.
The difficulty is that when adding a bookmark and selecting the tags to use we don’t know how we’re going to want to retrieve that bookmark later. Search solves this problem when we know what it is we’re looking for and can craft a representative query, but it can’t help us when we are less clear about what we want to find.
Looking back through my bookmarks over the years I noticed that the reasons I added a bookmark could be split up into two broad categories.
- Specific. The article is interesting for a specific purpose such as a project I’m working on or a topic I’m researching. In this case I found it helpful to have tags that are more specific within that context. E.g: TrueType Fundamentals when I was building a font-atlas for a recent project.
- General. The article is generally interesting even though I don’t have a specific use for referring back to that information at the time of bookmarking. E.g: The Myers Diff Algorithm: Part 1.
Scoped Tagging
The variation of HNT we’ve been talking about so far is N-HNT, in that it allows paths of up to some arbitrary length N. The implementation in Pinto I was experimenting with allowed paths up to a depth of ten, so it was 10-HNT, though I rarely created paths deeper than three terms.
Scoped Tagging is structurally equivalent to 2-HNT but semantically it’s quite different. The idea is to introduce scopes that provide context on the purpose for storing a bookmark and retrieving it later. Then, within these scopes, bookmarks are tagged with single terms as with Simple Tagging.
For example, if I was doing a deep dive into the Clojure programming language, I might have a scope “clojure-deep-dive” within which bookmarks would be tagged with “transducers”, “concurrency”, “metaprogramming”, and so on. These tags on their own don’t mean much but within the context of the “clojure-deep-dive” scope they provide much more information. And for bookmarks that are tagged out of general interest, they will be placed in the unnamed scope.
How this differs semantically from 2-HNT is that,
- with 2-HNT: every term in the path is answering the question of classification with increasing specificity. The first term answers “what is this bookmark about?” and the second term answers “what is this bookmark about in the context of the first term?”; and
- with Scoped Tagging: the scope answers the question “what is the purpose for adding this bookmark?” whilst the tag answers the classification question “what is this bookmark about in the context of that purpose?”.
I believe this approach will encourage tags in the unnamed scope to remain general, preferring broad terms such as “clojure”, “web-design”, “project-management”, etc. Also, it will encourage terms in named scopes to be more specific for that context.
Wrapping Up
In On Organizing Bookmarks I also suggested organizing bookmarks with collections. Another way of thinking about Scoped Tagging is as Simple Tagging plus collections or folders. My hope is that Scoped Tagging and full-text search will provide the best of both worlds for organizing and retrieving even large volumes of bookmarks.
There’s a lot more that can be said about Scoped Tagging, but for now, the next steps are to implement and experiment. Then I’ll report back with whatever I learn next.