Using automated text analysis in an evaluation project

The problem

In a recent evaluation project, I had to analyse records of nearly 2,000 activities undertaken by a large public-sector organisation. As well as numeric and categorical data, each activity had a field of descriptive text about the scope and objectives of the activity, written by people involved in it. This ranged from a few sentences to a few pages and it wasn’t feasible to review the text for all activities manually.

Each activity was already assigned to a primary category relating to its intent, but from reading the descriptive text for a few activities, it seemed that some of them cut across several categories and/or had wider impacts than were captured in the somewhat limited set of pre-defined categories. I wondered if the descriptive text could be used to put the activities into additional categories to complement the existing classification.

Applying automated text analysis

One use of automated text analysis (or natural language processing — NLP) is topic classification. This involves assigning blocks of text to one or more pre-defined topics depending on whether the text relates to the topic or not. More specifically, a topic classification algorithm estimates the probability that a block of text relates to a given topic. If this probability is high enough then we can assume the topic is relevant for the text.

In technical terms, this can be framed as a zero-shot learning problem. This involves giving a pre-trained language model some blocks of text and topics that it has not seen before and asking it to estimate the probability that each block of text relates to each topic. In this project it wasn’t feasible for me to train a specialised language model myself, so I needed to use a generic ‘off the shelf’ model.

Setting up the model

For this task I selected the bart-large-mnli model. This is based on the bart-large English language model originally developed by Facebook. Without getting into too much detail, this model embodies some semantic ‘understanding’ of English, obtained from training it on large amounts of text data. This ‘understanding’ allows it to recognise topics or concepts in text that can be expressed in different ways. Topic classification using this type of model is therefore typically much better than simply searching for the presence of keywords in text since it can accommodate the variety of ways that concepts can be written in natural language.

The bart-large-mnli model has had additional training on the MultiNLI dataset of ‘premise-hypothesis’ pairs of sentences. This dataset helps the model to recognise whether one sentence implies another or not. Topic classification is then a matter of posing topics as hypothesises which are tested against the available text. The model returns the estimated probability that the text implies the topic hypothesis. I used a probability threshold of 0.5 for determining whether or not a block of text related to a topic. This worked well in my application but in general it may be necessary to test different thresholds to get a good level of accuracy.

The results

Based on the objectives of the evaluation project and my knowledge about the types of activities being evaluated, I created a list of suitable topics, expressed as single words or short phrases. For some topics that were a bit hard to define, I created several words or phrases to describe them and combined the results of topic detection by taking the highest probability across all words or phrases in each group of related topics.

To test the accuracy of the classification, I randomly sampled some of the activities from the dataset and manually reviewed the descriptive text versus the topics selected by the automated analysis. I found that the selected topics were the same as what I chose about 80% of the time, which meant I felt confident enough to use these topics for further analysis.

As I originally suspected, it turned out that a substantial fraction of activities cut across more than one topic, and the primary category that was originally assigned to each activity was not always the best description of what that activity involved. This was quite interesting for the client, and prompted some thinking about how the classification of activities could be improved.

More text analysis tools for evaluation

There are other automated text analysis techniques that can be applied in evaluation or other research projects, including sentiment analysis, summarisation, and topic modelling. I’m working on a brief guide to these methods for practitioners, which I hope to share soon.