Insight Series: What is digital data?

Misia Tramp

09 October 2023

The Metia Insight team mostly work directly with client marketers, rather than market research professionals. This means we hear all their challenges firsthand and so can design the best research approaches and methods to answer them. But often we need to explain the techniques and methods we recommend. Some questions we get asked a lot, so we thought it helpful to share the answers in a series of posts. 

To kick off, we thought it best to start at the source – the data source, that is. At Metia, in addition to traditional market research methodologies such as surveys, in-depth interviews, and focus groups, we leverage digital data and data science techniques to gain differentiating insight into our clients’ target audiences. 

In this post, we explore the following key questions:  

  1. What is digital data and how do we collect it? 
  2. What are the benefits of digital data in discovering market insights? 
  3. What are the limitations of digital data? 

What is digital data and how do we collect it? 

In the context of our insight activities, put simply, digital data is any content, conversation, or comment that individuals create and/or share online. Digital data is collected from a plethora of online sources wherever our target audiences engage in context-relevant conversations, including social media platforms (e.g., X/Twitter, Facebook), blogs and forums (e.g., Reddit), website reviews and comments (e.g., news article comments), to name a few. This source data embraces words, images and video. 

At Metia, we use social monitoring and analytics platforms to help us access, collect, filter and analyze conversations in the marketplace. In addition to their wide reach, these platforms also enable us to use a combination of keywords, audience profiling, and spam filters to narrow our search efficiently and effectively to the conversations and topics that are relevant to our research questions, rather than being confronted with every digital reference point available. 

Using platforms to gather source data removes the burden of privacy compliance, enables us to gather data in massive volumes quickly and efficiently. Once gathered and filtered, we can then apply our own proprietary data science techniques and tools to derive the insights our clients seek. 

What are the benefits of digital data in discovering market insights? 

In employing digital data to answer market research questions, we can benefit from the following methodological strengths:  

  1. Larger sample sizes: As previously mentioned, our digital listening platforms allow us to narrow digital data collection to a relevant domain and context to answer our research questions. However, the sheer volume of digital conversations enabled by machine-led data collection remains a strength for digital data, as our datasets often range in the tens to hundreds of thousands of relevant mentions. These larger sample sizes translate to widely encompassing datasets that can more completely represent the flavours and nuances of the target audience’s perspective. 
  2. Quicker time to insight = quicker time to value: Traditional market research methods and data inherently lag audience trends, as comprehensive survey, interview, or focus group programs take weeks or even months to field. Then, additional time is required to analyze the data for insights. In today’s ever-accelerating digital age, digital data can be collected in near-real-time and analyzed using advanced natural language processing (NLP) techniques, meaning insights can be discovered and actioned faster than other methods. 
  3. Indirect observation minimizes bias: Digital data is collected from a third-party, observational perspective, meaning that the bias introduced by the researcher during data collection is minimized. In other words, whatever the target audience members en masse are willing to share digitally in their natural online environments is likely to be more honest, earnest, and forthcoming than what they may be willing to vocalize when talking to a human researcher or responding to a structured survey. 

What are the limitations of digital data (and how do we mitigate for them)? 

As with any data type and research methodology, digital data has its limitations. Primarily, the limitations we encounter with digital data are related to context relevance and, increasingly, author/source/information authenticity. To minimize the impact of irrelevant and inauthentic conversations, we employ a rigorous data cleaning process and in-house tools by which we are able to ensure data sets are at least 85% relevant to the research domain, context, and topic.  

To supplement human-led data cleaning and ensure our large datasets are as relevant as possible, we also leverage data science-led topic modelling to catch and exclude irrelevant commentary. In addition to helping our analysts begin to understand what topics and priorities matter most to target audiences in relevant conversations, topic modelling surfaces irrelevant/spam so it can be excluded prior to analysis, insights, and interpretation.  

To learn more about topic modelling, and how it helps Metia generate insights from millions of digital data conversations, look out for the next blog in our Metia Insight Series: What is a Topic Model?