Normal
0
false
false
false
EN-US
X-NONE
X-NONE
We taxonomists have long been advocating how a taxonomy of
disambiguated concepts tagged to content retrieves more accurate results than
search algorithms alone. But if users prefer simply entering text strings into
a search box and not browsing taxonomies, how best to support users with a
taxonomy can be a challenge.A faceted taxonomy with taxonomy aspects as filters for
refining search results has become a common taxonomy solution, especially for
intranets, partner portals, and knowledge bases. For these purposes, certain
facets, such as Content type, Product/Service, Location, and Department, are
common and logical. When it comes to the designating “Topics,” however, it’s
not so easy.
Specific Terms Gathered from Analysis
When gathering information and sources for terms, most sources
will yield highly specific terms. These include terms arising from search log
analysis, brainstorming sessions with sample users, automated text analytics
term extraction from a large corpus of content and manual review a
representative sample of documents/pages. These are all standard methods for
taxonomy design, which I conduct as a consultant.
The difficulty is that there are often so many specific
topics, so the new topical taxonomy could potentially have many hundreds of
terms. Some may be relevant to only one or two documents or occurred in only a
couple of searches out of thousands. They would not serve the purpose to refine
searches.
Another problem is that many of the terms suggested from
these methods are not even topical. Often, the top searches found in search
logs of enterprise/intranet searches are for commonly used named tools,
platforms, or services.
The main issue, however, in deriving terms for a topical
facet/filter based on search terms is that the objective of the topical facet,
like all facets, is to limit searches, not to duplicate searches. What
is really needed in the topical facet are topical categories that are broader
than the search terms. How to identify these broader topical categories can be
more challenging.
Identifying Broader Topical Categories
Identifying broader terms or categories for topic filters is
not as simple as identifying specific search terms, nor as straightforward as
identifying the set of facets. Typical methods of obtaining candidate terms
from both users and from the content need to be done, but with a focus on identifying
broader terms or categories.
Categories from Stakeholder Engagement
Engaging stakeholders or other sample users in activities to
brainstorm taxonomy terms will result in a mix of specific and broad terms. It
is then the task of the taxonomist-facilitator to help guide the participants
to identify which terms are broader and which are narrower within the same topical
facet. Involving stakeholders/sample users is important, because if a single
taxonomist or an external consulting team tries to do this on their own, their
designated broader terms, while hierarchically correct, might not suit the
intended users. The taxonomist-facilitator may suggest broader terms and then
obtain immediate validation from the participants of the appropriateness of those
suggestions.
Categories from Content Analysis
Analyzing content for broad topics is more effectively done
manually than with automated methods. Manual content analysis will yield both
specific and potentially broader concepts. A taxonomist or content strategist
experienced in content analysis for identifying meaning will be able to determine
the main concept for a piece of content.
Automated methods, based on text analytics technologies,
tend to focus on term extraction, and will extract terms even more specific and
less useful than search log results. However,
if a list of derived search terms is large enough (as may search logs or automated
term extraction lists tend to be), another, newer option is to make use of LLM and
generative AI technologies to categorize the specific terms and thus generate
broader terms. The LLMs should be trained on the same or similar content, which
is internal enterprise content, not the public web, to provide the correct
context. Even then, the identified broader terms or categories will not always
be correct and will require an experienced taxonomist to review.
Other Topical Facets
Topical terms, however, do not all have to be in a single “Topics,”
facet. Depending on the use case, there could be other topical facets, which
are not the usual named entities, departments, locations, or product/service
types. These could be for Function, Activity, Issue Type, Technology, Research
Field/Discipline, etc. If and how to break out these facets can be a challenge
and should involve extensive discussions or other research with stakeholders
and user representatives.
Finally, a topical facet for filtering search results could even
be based on the existing navigation menu’s top levels, especially on an intranet
or an enterprise content management system. Facets as filters are available to
refine searches only, but if users choose instead to navigate the site menu,
then they have no options to use other facets/aspects to help restrict what
they are looking for. By duplicating the navigation menu’s one or two top
levels into a facet, perhaps called “Topic Area,” users can limit a search with
the categories for the areas with which they are familiar, and they can also restrict
the search further by filtering on terms selected from any of the other facets.
I will be discussing the wider activity of coming up with
terms for a taxonomy in my upcoming Taxonomy Boot Camp presentation, “Thecomplete guide to sourcing terms” November 18, in Washington, DC.