Abbreviation on Demand

A Crowdsourced Approach to Building an API for Abbreviating Long Words in Labels

Contributors

Mariana Shimabukuro and Christopher Collins

About

This project is the product of my M.Sc. thesis defended in 2017. The problem we were trying to solve is that in the context of visualization, sometimes the text labels can get too long to fit in the available space. The commonly used solutions usually include making the font size smaller, which can cause legibility problems; or apply some sort of techniques such as dropping the vowels and/or truncating the words to make them fit. In the picture below we can see a visual comparison of known techniques versus the Abbreviation on Demand technique.

Techniques comparison

A comparison of abbreviation techniques, including our Abbreviation on Demand algorithm. In this prototype, given a minimum font size, the text starts getting abbreviated according to the available space. We compare the performance of the techniques against simply re-sizing the font, which can be very challenging to read. (Find this prototype at vialab.ca/abbrVisualization/resizablePrototype)

The Abbreviation on Demand technique is essentially a recommendation algorithm that uses data on which letters are less important, consequentially the least important letter can be dropped automatically creating an abbreviation. The abbreviations created by this technique aims to make the word fit in a given space (screen space in pixels considering typeface configurations) or a number of character while keeping as many letters as possible.

The data fed into the algorithm was collected in an adaptive crowdsourced study which collected abbreviations for given English words and original words for given abbreviations. When we say adaptive crowdsourced study we mean that we had a pre-set list of words to be abbreviated and we used a ranking algorithm to select a relevant abbreviation for a word when participants moved to the task where they had to guess the original word from an abbreviation. The rationale behind this is for each word several abbreviations were created, but in order to test all the abbreviations without running two separate studies we had to created a way to select the ones that are relevant (relevance was calculated based on the similarity of the abbreviation to the original word and some others factors for de-biasing the algorithm).

The adaptive study design is what allowed us to run both tasks in the same study, however, the same results could had been achieved with we had manually selected the abbreviation using the same criteria.

Finally, at this point, our Abbreviation on Demand API can have its performance improved if we feed it more data about how abbreviations are created, the more data the better the algorithm can abbreviate different words. So, our next step is to collect more abbreviation to feed into the Abbreviation on Demand API; and in order to validate our approach, we plan to run an evaluation to compare the performance of our technique versus the previously mentioned techniques.

We can see below some other examples of the Abbreviation on Demand technique being applied to visualizations.

Click to see it bigger

Docuburst text visualization, where on the top, highlighted by a white rectangle we can observe that the nodes in blue were abbreviated. When compared to the bottom part of the figure where the Abbreviation on Demand API was not applied, the top part shows less clutter.

Click to see it bigger

A visualization of the DMOZ dataset. Each of the yellow highlighted labels has been abbreviated by our algorithm, which drops as many letters as needed to fit the text. It chooses the least important letter based on the character and its position within the word.

Thesis Abstract

A known problem in information visualization labeling is when the text is too long to fit in the label space. There are some commonly known techniques used in order to solve this problem like setting a very small font size. On the other hand, sometimes the font size is so small that the text can be difficult to read. Wrapping sentences, dropping letters and text truncation can also be used. However, there is no research on how these techniques affect the legibility and readability of the visualization. In other words, we don’t know whether or not applying these techniques is the best way to tackle this issue. This thesis describes the design and implementation of a crowdsourced study that uses a recommendation system to narrow down abbreviations created by participants allowing us to efficiently collect and test the data in the same session. The study design also aims to investigate the effect of semantic context on the abbreviation that the participants create and the ability to decode them. Finally, based on the study data analysis we present a new technique to automatically make words as short as they need to be to maintain text legibility and readability.

The Abbreviation on Demand API

Based on this project we implemented and made available online an API which allows other programmers to use our abbreviation algorithm in their web applications.

API available at: https://abbreviation.vialab.ca

GitHub project: https://github.com/vialab/Abbreviation-On-Demand-API

Demo and Supplemental Materials

For some demos applying our Abbreviation on Demand algorithm, and some visualizations of our study data access: http://vialab.science.uoit.ca/abbrVisualization/

Publications

2017

Shimabukuro, Mariana: An Adaptive Crowdsourced Investigation of Word Abbreviation Techniques for Text Visualizations. 2017, (Master’s Thesis). (Type: Masters Thesis | Links | BibTeX)
Shimabukuro, Mariana; Collins, Christopher: Abbreviating Text Labels on Demand. 2017, (Poster Paper). (Type: Journal Article | Links | BibTeX)
Shimabukuro, Mariana; Collins, Christopher: Abbreviating Text Labels on Demand. 2017, (Poster). (Type: Journal Article | Links | BibTeX)