Project Blog Archive

Back to list of blog posts

Versions and Conversions in the Historical Thesaurus of English

Posted by Marc Alexander on the 23rd of March 2015

This post was written by Dr Fraser Dallachy, Research Associate on the SAMUELS project and Project Assistant on Mapping Metaphor, alongside Dr Marc Alexander.

At the core of the Mapping Metaphor project is the data structure of its parent project, the Historical Thesaurus of English. Whilst Metaphor overlays its own numbering system on the Historical Thesaurus data, it is still essential for users to be able to refer back to the Thesaurus. One of the major differences between the Thesaurus and a Tyrannosaurus is that the former is not fossilised, and so it is important that users are aware of the ways in which this living resource has changed since publication of the print edition in 2009.

The hierarchy of the Historical Thesaurus, as is often noted, is the result of decades of consideration, organisation and re-organisation. However, there is always scope for improvement, and there were aspects of the structure as it stood when the print Thesaurus was produced which have since been adjusted. The current version of the Thesaurus, as available online, is numbered 4.2.2. Version 1 is the printed paper copy, whilst version 2 constituted a change in the database structure in which the data was stored rather than alteration of the hierarchy or numbering. Version 3 took the important step of standardising the numbering system by removing ‘00’ digits which were the legacy of spot-fixes for larger numbering difficulties. However, more substantial changes were felt to be necessary for the fourth version.

One of the areas which was considered in need of improvement was 01.02 Life. 01 The physical world is, structurally-speaking, the largest section in the Thesaurus, containing 121,000 out of its 235,000 categories and subcategories. Within this already large section, Life contained 53,000 categories/subcategories – that is to say almost half of the total for section 01, and roughly a fifth of the entire Thesaurus! This reflects the richness and diversity of the living world, which has inspired generations of biologists with occupation and pleasure (and occasionally madness) in the grand project to arrange the planet’s flora and fauna taxonomically.

The ungainly mass of 01.02 was more than just cumbersome; it increased the effort required to find concepts within this almost catch-all category, and caused essential everyday ideas such as ‘cat’ and ‘dog’ to be buried remarkably far down the hierarchy compared to concepts of similar importance elsewhere in the Thesaurus. The solution was to split Life into seven new level 2 categories: 01.02 Life, 01.03 Health and disease, 01.04 People, 01.05 Animals, 01.06 Plants, 01.07 Food and drink, and 01.08 Textiles and clothing. Raising these categories from level three to two addresses the problem of their previous low status in the hierarchy.

The new 01.02 Life is more focussed on the process of living, rather than an amalgamation of things which can be considered to be alive and things which are associated with being alive. In particular, the categories Food and drink, Textiles, Clothing, and Cleanness sat uneasily alongside sections providing words for plants and animals. Re-evaluation of Cleanness also led to the conclusion that it made more sense to have this category located under (what is now) 01.09 Physical sensation, where it joins similar experiential categories such as Sleeping and waking, Sexual relations, and the bodily senses.

Equally awkwardly bundled were the categories contained within the former 01.05 Existence in space and time. Although coherent, this category gathered too much within itself. The possibility of extracting a portion and naming it Time and relative dimensions in space was considered and reluctantly discarded. The solution adopted was to split the old 01.05 into sensible discrete chunks: 01.11 Existence and causation; 01.12 Space; 01.13 Time; 01.14 Movement; and 01.15 Action.

Similarly if on a smaller scale, the categories formerly 03.04.13 Law and 03.10.13 Trade and commerce were removed from 03.04 Authority and 03.10 Communication respectively and promoted to level two categories in their own right. As with the Life categories, this had the double effect of reducing the size of the previously unwieldy Authority and Communication, and giving more importance to Law and Trade and commerce (now renamed Trade and finance).

02 The mind has also been adjusted, though less extensively. 02.01.15 Attention, judgement has been moved up to level two, and is now 02.02. Now under its aegis are the categories formerly belonging to 02.04 Aesthetics, as they involved making judgements on what is or is not attractive. Also deserving of more importance was the categorisation of good and bad, meaning that the former 02.01.15.07.08 Quality of being good and 02.01.15.07.09 Badness/evil have been transferred to belong to the new category 02.03 Goodness/badness. As with the Life and ‘time and space’ categories, this also gives them a more prominent position in the hierarchy. The problematic 02.06 Refusal/denial is the only one to have had its importance downgraded, and is now to be found under 02.07.06 Statement.

The result of this movement of categories has been an expansion in the level two categories in 01 The world and 03 Society. 02 The mind, on the other hand, has contracted by one of these categories, taking it as far as 02.07. The level two categories in 01 previously extended to 01.07 The supernatural, but have been doubled so that they now reach 01.14. Likewise, renumbering 03.04.13 and 03.10.13 as 03.05 and 03.12 has had a knock-on effect on the numbering of other level two categories in 03 Society.

It has not escaped our notice that these category changes may cause some concern, especially amongst users of the print copy of the Thesaurus. To address this, a version number converter has been added to the ‘Versions of the Thesaurus page, reached through the ‘About the Thesaurus’ tab on its menu bar. This allows a user to enter a category code from the print Thesaurus, and be returned the current code corresponding to that category, or vice versa.

The balancing of certain sections in this way makes for a more satisfying structure. Thesaurus Linguae Anglicanae has reached a point of equilibrium in its evolution for the moment, and no further changes are expected to its structure at present – we’re happy with the length of its arms. It could still do with some feathers, though... The exciting prospect of updating its contents is, however, a story for another day.