Seth Maislin on The Unscalability of Indexing (And What To Do About It)
By Carol Reed Fall 2011
Our afternoon session at the fall meeting featured taxonomy consultant Seth Maislin, who challenged us to look at the future of indexing with a broader perspective. “Unscalability” is Seth’s term for the trend we’re seeing in the information world: the volume of information is growing so rapidly, there’s no way traditional indexing can keep up.
The content market is changing drastically, but it’s not threatening traditional indexing as a niche occupation. We’re seeing nonfiction e-book sales grow, but not (yet) at the expense of print sales. O’Reilly’s print book sales, for example, have gradually increased alongside rapid growth of their e-book sales. Scholarly content will be slowest to adopt electronic formats, and will continue to require precise indexing. Traditional book indexing is not going away in the foreseeable future.
However, the definition of “book” is changing, and content producers are finding it easier and more profitable to sell information in bits and pieces. Seth argues that many types of content just plain work better in digital formats—for example, searchable, interactive reference databases, or software help systems that encompass both manufacturer content and user contributed workarounds, or any chunks of information that get recombined in different ways and on different devices. Indexing is becoming far more than just book indexing, and the changes in the content market are opening up a lot of new opportunities to indexers.
The quantity of information has surpassed the resources to index it traditionally. There just aren’t enough indexers, for one thing. Plus, the level of precision that’s appropriate for indexing books is simply not sustainable— or even necessary—for much of the content out there. Content producers can often get more bang for their buck by structuring information well, designing solid navigation in apps and websites, and providing intelligent search than by creating traditional indexes. Dealing with this vast quantity of information requires new thinking, but it also requires the knack for context and analysis that indexers already possess. So how do we apply those skills in a changing market?
Seth sees three approaches to dealing with the unscalability challenge:
Let go of those details that take time to index but in all likelihood are rarely used by readers (akin to the “long-tail” or obscure products on internet stores that few, if any, customers will buy). Focus instead on the main topics that will serve the majority of users. This works best if the text is well structured and thesaurus-guided search is also available.
Let authors, data producers, and consumers do the tagging. Folksonomies have some advantages, but also present many governance issues.
There are a number of approaches with varying degrees of human involvement, such as semi-automatic indexing, statistical and rules-based autoclassification, and entity extraction. Though each has its limitations, these tools are becoming very powerful. Forward-thinking indexers will look for ways to automate parts of the process in ways that are appropriate to specific content and business situations. The key to using automation effectively is what Seth calls the “global semantic backbone”—taxonomy, ontology, content specialization, cultural knowledge, human factors, and oversight. An understanding of the global semantic backbone is exactly what indexers and taxonomists can bring to the table.
As the publishing industry changes, indexers are involved earlier and earlier in the book production process, whether we’re indexing while the text is being copyedited, or embedding tags while the text is still in development. When we work with content that gets delivered as electronic bits and pieces rather than bound books, indexers have the opportunity to be involved even earlier in the content production process. Information architecture and taxonomy get designed up front, and sometimes even influence the content itself.
Adapting to the changing content market requires looking objectively at how we can meet current information and business needs. Less detail and more automation may initially be at odds with indexers’ love of precision. But for those who are willing to think outside the index, the future holds exciting opportunities.