Skip to content

The Accidental Taxonomist

There are dozens of things that I do each day that I didn’t set out to do. I do accounting and billing work without a desire or intent to do it. I do sales and marketing – and neither are at the top of my list of things to do. I accidentally picked these things up when I decided to be an entrepreneur and run my own company well over a decade ago. Working with taxonomies – and becoming a taxonomist – can happen by accident too. That’s why The Accidental Taxonomist is appropriate for someone looking to learn how to create taxonomies. I’ve never heard a child say, “I want to grow up to be a taxonomist.” Despite this, there are those who have taxonomy as a part of their job – whether they intended it to be or not.

Long, Long Road

Before I get to the heart of the matter, it’s appropriate to tell you that I didn’t read the book in one sitting. I didn’t read it in a week, a month, or even a year. I started the process of reading The Accidental Taxonomist about half a dozen years ago. It was as I was putting the final touches on my Pluralsight course The Art and Practice of Information Architecture. I got the course done and never finished the book.

In just getting back to it, I felt a bit like some of my clients that struggle to get their taxonomy projects off the ground. Or, rather, my clients that needed to get something accomplished and realized they needed a taxonomy to accomplish their goals. The taxonomy was started, the goals were achieved, and the taxonomy sat aside for a while – sometimes a long while. Before we get too far, we should explain what a taxonomy is.

What is a Taxonomy, Anyway?

Barry Swartz in The Paradox of Choice explains that filtering is one of the basic functions of consciousness. What he doesn’t cover is that so is organization. We’re hardwired to make sense out of our world, and, as Gary Klein explains in Sources of Power, that comes through simplification until we have a model that we can run in our heads. Taxonomies allow us to organize our thoughts and information.

We’re all familiar – willingly or not – with the hierarchical biological taxonomy of zoology. That is to say that we learned Carl Linnaeus’ organization of all animals. We learned Kingdom, Phylum, Class, Order, Family, Genus, and Species as a way for differentiating one animal from another and identifying their nearest cousins.

We also learned, but most of us promptly forgot, how Melvin Dewey organized his library. The system of organization held a brilliant discovery for extension. He figured out that he could make his system flexible and allow for increasing levels of detail through the use of a numbering system.

We probably never learned about S.R. Ranganathan’s different approach to classification. He was frustrated that things could only be placed in one spot. There was in effect one “right” way to find things. His insight was to introduce facets. Instead of trying to capture the uniqueness of any given item in a single hierarchical dimension, he proposed that items be classified in several different categories, or facets, and the combination of these facets would be how the item was classified. This approach was called colon classification, because he chose the colon to separate the various facets.

I include Ranganathan’s system to point out that taxonomies are about organization. They’re not about hierarchy, though that is often the way they’re executed. They’re not about books or animals. Taxonomies are, at their core, about how we make sense of this world that is far too complex for our minds to process.

What’s a Thesaurus?

I remember first “discovering” the thesaurus in grade school. You could make your writing sound more impressive by looking up words that no one knew. I could take a simple, common, everyday word and replace it with something more profound and meaningful. (Perhaps I even looked up the word profound.) To me at the time, thesaurus only meant synonyms. I could find words with similar definitions. Eventually, I found the antonyms. However, for the better part of 30 years, that’s all they were.

When I started diving into information architecture and how we organize information in ways designed to make them easier to access, I realized that my old friend the thesaurus was more powerful than I had given her credit for. More than just synonyms and antonyms, the thesaurus contained the relationships between words. Where a dictionary can tell you what meanings are associated with a word, it’s the thesaurus that can put the word on the map in relationship to other words.

Understanding which words had broader and narrower meanings allows you to respond with precision words that encapsulate the exact scope that you wish to cover. There can be alternative spellings to help you understand how there might be multiple different ways to spell a word – such as color and colour. The thesaurus had more to offer than I had anticipated.

What’s an Organization’s Thesaurus?

The role of a thesaurus in an organization is even more powerful. Inside the context of an organization, a thesaurus can identify preferred terms over terms that are less preferred. They can share common misspellings. They can define terms across languages. They can translate the scientific to the everyday – and vice versa.

An organization’s taxonomy provides a map of the terms that are used in the organization and notes about how those terms are used – or are intended to be used. They provide the basic relationships between the terms. When the relationships get more complex, then we’ve moved from the world of thesaurus into the world of ontology.

Ontology’s Relationships

In a thesaurus, the focus is on words. They make up the tent poles on which the relationships are hung. However, ontologies focus much more on the relationships between words and the nuances of these relationships than the words themselves. Instead of being focused on the tent, ontologies are focused on the net that keeps circus workers safe. It’s not the individual ropes – or words – that keep performers safe. It’s the relationship and connection between the words that keep performers from falling.

Ontologies are a way of understanding a field of study or knowledge. Ontologies provide a rich map of how things in the field are connected to one another. The relationships are richer than simply one term being broader or narrower than another.

What is a Taxonomist?

If an organization organizes their content through a taxonomy in the form of a thesaurus and a set of ontologies, why do we call the role a taxonomist? At the root, it’s the development of an organizational structure – irrespective of which tools are used – that defines the core behaviors of a taxonomist. Their role is to organize and make easier to access information. The tools they use are just the tools of the trade.

The funny thing is that many taxonomists – but not all – aren’t in full-time roles. Few taxonomists have it in their title, though some have it in their job description. It’s more common to have taxonomy development as a prerequisite for something the role requires, so often the taxonomist isn’t a person who spends all day organizing structures. Most of the time, the taxonomist is someone who has a job to do that is made better by taxonomic development.

Special Skills

If categorization and organization is a part of the basic functioning of consciousness, then shouldn’t everyone be considered a taxonomist? At some level, yes. However, what differentiates every man from the taxonomist is in the tools that they’ve developed for clarifying, codifying, and communicating what the structure of organization is. By learning what humanity knows about psychology, neurology, and the organization of large information, taxonomists can distinguish their capabilities.

While these aren’t likely enough for a taxonomist to feel truly confident in every situation, this knowledge and these skills are useful.

Taxonomy Purpose

A taxonomy’s purpose is to help organize content, that’s easy. However, taxonomies provide structure and framing that shapes the way that people think. As a result, taxonomies are more than just a way to browse to the information you want. Taxonomies can be helpful in shifting the way the organization works.

Sometimes this is through the inclusion of detailed terms in a hierarchy to encourage users to be more specific about what they mean. Other times, it might be through the use of preferred terms. Preferred terms in the taxonomy can shift the thinking from package delivery to package assurance. It’s a subtle shift that focuses the corporate consciousness on assuring shippers and recipients that their package will make it to their destination.

Polyhierarchy

Atoms have a challenge that they can only exist in one place at a time. However, in our electronic taxonomies, we can put things into more than one place in the taxonomic tree. Consider a color taxonomy that starts with a level of red, blue, and green. Where does the second-level color blue-green belong? Blue sure, but green as well. This is a polyhierarchy, where an item has multiple parents. While logically this seems like the exception, polyhierarchies are more common than most would like to admit.

The truth is that taxonomy projects are messy. It’s only a matter of time before you’re going to run across the digital equivalent of a platypus. The platypus has a mixture of reptilian, avian, and mammalian genes. It’s a classic challenge for the zoological taxonomy that splits reptiles, birds, and mammals all the way at the top. With a polyhierarchy, the platypus can find its place in all three taxonomies.

Tagging

Being a taxonomist solves only one part of the puzzle. Taxonomists create the structures, but it’s often up to others to tag the content to fit into the taxonomy. This split means that, in many cases, the taxonomist must make a point to sit with those who are actually doing the categorization to understand what is and isn’t working. Similarly, they should sit with users who are actually trying to find the information.

The key challenge in taxonomic development isn’t in designing the taxonomy. The key challenge is getting the users – who are often not dedicated indexers – to enter the metadata necessary to make the taxonomy work. Too many taxonomy projects are abandoned before the work really gets started, because the people indexing the content refuse to do it.

Pre and Post Coordination

There are tricks that can be used to improve results. Search can aggregate terms by leveraging synonyms even if the users aren’t always using the preferred term. Facets can go a long way to simplifying the search process, and full-text indexing makes some level of taxonomic identification unnecessary. Automatic classifiers – whether rules or machine learning-based – can help the content get the correct metadata with minimal help from the indexers.

With all this mess, it’s hard to keep track of when the metadata is known and to judge its reliability. Whether it’s entered at or near the time of creation in the form of pre-coordination or it’s managed through the searching process, getting it right is hard. Maybe you find that you’re not getting the findability that you want, so to fix the problem, you’re going to become The Accidental Taxonomist. Perhaps a quick read can give you tips that will make the process easier and less painful.