17 October 2023
In the first of this series, we described how and why Metia uses large data sets to understand the audiences that are important to our clients. Gathering online data at scale provides valuable insights, but it can be challenging to make sense of the vast volumes of data gathered – often we are working with millions of items of data. That’s when advanced techniques for topic modelling need to be applied.
Topic modelling is a technique used to extract a list of meaningful topics that appear within a large dataset. In a typical client project, topic modelling helps identify audience priorities and language so that our clients can most effectively understand and address the needs and interests of their customers.
A topic model is derived from documents that are mixtures of topics, and topics that are mixtures of terms. These topics are learned based on which words tend to appear together in documents in the dataset. The topic modelling process includes the following steps, described here in simplified form:
For example, in Metia’s B2B Directions data set, the following three topics are among the topics generated:
These topics illustrate two important facts about topic models that allow them to capture the complexity of how we speak online:
Topic modelling also calculates the prevalence of each topic, or what percentage of the dataset is described by that topic. This allows us to quantify the importance of the topics at the all-up dataset level as well as for different subsets of the data. For example, we may investigate differences in conversation by audience, vertical, year, or thematic area.
Metia uses topic models so that our clients can align their content, messaging, and commercial strategies to the topics and language that matters most to their target audiences, improving marketing performance and providing a better, more satisfying experience for customers.
Insight Series
This post is one of a series in which the Metia Insight team explain the various tools, systems, research techniques and methods we use to help answer the challenges set by our clients.
Read the first blog in the series here - What is Digital Data