Standards for taxonomies are of two kinds:
1) data models for interoperability and machine-readability, namely SKOS
(Simple Knowledge Organization System) published by the W3C, and
2) best practices guidelines, which focus on thesauri but are relevant for
taxonomies. These are ANSI/NISO Z39.19 and ISO 25964. The International
Organization for Standardization will publish a revised edition of ISO 25964
Part 1: Thesauri for Information Retrieval later this year. I have been
contributing to the revision as a member of its international working group
I have written before on Standards for Taxonomies, which is at a high level, and I will write again
about the revisions in the new version of ISO 25964 Part1, when it will be
published. For now, I’d like to discuss some the specifics of defining an
international standard which I have been working on recently.
Different Language Versions
The international standard is written in English, but it may
be translated into other languages in the future. Since it cannot be assumed to
be translated into certain languages, and since the standard covers
multilingual thesauri, it needs to include examples in different languages. Some
of the examples in ISO 25964-1 are translated into common languages, such as
French, German, and Spanish, but other languages are not included. Thus, this
standard also includes an extensive table of the “tags” and “expansions” or
terminology that appears in a thesaurus for 10 additional languages. Examples include
BT (Broader Term), NT (Narrower term), and SN (Scope note).
A German reviewer pointed out some errors in the German column
of the table, which prompted me to look more carefully, and I noticed some
issues in the Russian and Arabic, languages I had studied long ago and which
are not represented by native speakers in our working group. I sought other sources on thesauri in those languages,
examples of thesauri on the web, and native-speaker experts.
As it turns out, for the specialized use of thesauri, it’s
not just a matter of a translation, but what is used in the context. Scope note
could have various translations in a languages, as both the words “scope” and “note”
can have different translations. Even, “broader,” narrower” and “related” can
be translated differently. Broad can mean “wide” and perhaps “superordinate”
and “subordinate” are better translations in a different language.
Variations and Lack of Standards
The thesaurus terminology is quite standardized in English
and somewhat less so in other languages. Although the original ISO and German
DIN thesaurus standards go back to 1974 and 1972 respectively, these standards have
never been free and are actually rather expensive for the number of pages,
unlike the ANSI/NISO standard, which has been made freely available since 1974.
Thus, the free English-language standard from the United States, has been most
widely read and followed. Creators of other standards sometimes translate from
English inconsistently rather than relying on a standard in their own language.
There are different reasons for variations. Some thesaurus
authors prefer to use terminology closer to English, while others prefer to
user terminology that is more native, when near-synonyms exist. For example in
Russian, “related” could be “assotsiativny” or “rodstvenny,” and “concept”
could be “kontsept,” or “ponyatiya.” There is also the matter of saving space with
concise labels. While English has a single word for “broader” or “narrower,” a
correct translation for the comparative requires two words, as in “more broad”
or “more narrower” in other languages, such as French, Spanish, and Russian. Often
the word for “more” is omitted to save space, but in other thesauri it is included
for preciseness. Arabic-language
thesauri vary in their use of tags/terminology depending on the region within
the large Arabic-speaking world.
I found the multilingual UNESCO thesaurus and UN library’s UNBISthesaurus good sources to consult, since you can change not only the terms, but
also the user interface with its tags and designation into different languages.
However, these two UN-related sources are not even consistent with each other!
I suspect that in some thesauri the terminology was simply
translated from English by a translator who was not familiar thesauri, rather
than developed by a thesaurus specialist/taxonomist who would research the
formats of other thesauri in the language.
Legacy Standards and Future Direction
Thesauri were originally developed for print, where space is
an issue so short tags were created. Now thesauri are online, and tags are not
needed and rarely used. But the new edition of the standard continues to include
tags to be comprehensive. I think at least the tags should be de-emphasized,
and we authors of the standard should not be creating them where they do not
exist.
Should the standard be more descriptive or prescriptive?
Descriptive would mean describing what is done in thesauri in existence. I
looked up various thesauri online to see what tags ad terminology they were
using. If a certain designation is used more than another, such as the phrase used
to mean “broader term,” then we could decide that is the standard for a
language.
Prescriptive would mean to dictate the standard, typically
based on expertise and belief it what would be best. In face of inconsistencies,
the standard could be prescriptive. Being prescriptive would also mean that the
latest revision of the standard should follow the prior edition and any previous
translations of it, rather than merely following the usage practice the of
leading examples of thesauri on the Web.
Although the distinction between terms and concepts
is addressed in the current ISO thesaurus standard, the current summary table of tags
addresses only “terms” and term relationships. The table of tags and
terminology in the new version will now includes Broader concept, Narrower concept
and Related concept (which do not have tags). As new additions to the standard,
the names for these in other languages thus need to be prescribed by the
standard. Relying on thesauri published on the web, I found, results in too
much inconsistency. Official translations of the SKOS data model are a good source but exist for only some languages. I
even looked at the user interface of a SKOS-based taxonomy management software
(PoolParty) in German and found yet other translations for broader, narrower,
and related.
I hope the new edition of ISO 25964 Part 1: Thesauri for Information
Retrieval will be read more widely and provide more consistency for
thesauri.