The Dangers of Small Data Sets for Decision-Making

Small Data Sets and Big Decisions

The Dangers of Small Data Sets for Decision-Making

A Small Data/Correlation Confusion Story

A marketing director walks into a CEO’s office. Tucked under the marketing director’s arm is a company notebook—an enthusiastic report containing the requisite images, spreadsheets, charts, and graphs, for a CEO wow-job. Ah, visuals! The spills from the glossy pages as the director exclaims,

“We’ve increased traffic to the site from social media networks 400% this quarter! Plus, our social media trend lines tell us that we need to expand our target market to include an even older demographic.”

The director shows the CEO charts and graphs that visually lend credence to her claims. The CEO, thrilled to see these lines head north, says “Well, that’s fantastic! We should move budget to support those efforts!”

Waiting for the punchline? The only punch here lands on the CEO’s budget.

What the CEO doesn’t realize is that in all actuality, 400% could mean increasing from 20 visits to 80 visits. Plus, the social media data indicating more older adults finding the website via social media, especially Facebook, may have other variables to consider. Baked into that data is the fact that younger users are leaving Facebook, while users age 55 and older continue to join the platform.

These trends are worth a discussion but without more information are hardly a solid basis for a strategic shift in target, budget, and resources. Why? Because the data set is too small, and the CEO and possibly the marketing director have fallen sway to the age-old problem of assuming a correlation is a cause.

Size Matters

Small Samples Yield Unreliable Results

The smaller your sample size, the more likely outliers — unusual pieces of data — are to skew your findings.

Sample size is a count of individual samples or observations in any statistical setting.

Small numbers raise statistical issues and alter the accuracy and usefulness of your data. Lots of reliability problems arise with small numbers. These issues are due to the fact that rates and percentages are subject to random variation. Thus, these numbers often fluctuate.

Context Is Everything

Percentages don’t mean a lot if you remove context. We recommend always looking at percentages with the corresponding hard numbers. Growth is relative. All those percentage increases might appear impressive until you assess the context.

Think About Contextual Questions

If you’re a small company and just getting into video, reporting that 80% of your Facebook visitors watched your Facebook video ad is a real coup! Way to go. That may be a huge accomplishment for you. But if there were only 5 visitors, that number sort of sucks, huh?

Now let’s go a step further. Did your video have captions? No? That’s an important contextual question, since 85% of Facebook users consume video content with the sound off. Add captions, and you may increase views by as much as 12%. Which, in your case, would be like darn-near perfection. Now you need to get those overall numbers up, huh?

Placing too much emphasis on small data sets may lead to poor decisions about budget allocation. Just as important, you may miss important contextual questions that, when answered, will help you improve.

Digital marketing numbers fluctuate, and that’s okay! Although all data is important, the most significant guideline is how to look at your data and what you do with that data.

Avoid Blowing Your Budget On Poorly Informed Strategies

So how do you avoid making big decisions based on small numbers? Shift the conversation away from impressive looking percentages and focus on real numbers.

A client of ours recently asked for recommendations on how to adjust content on their website. We dove into their Google Analytics account and searched for the biggest opportunities for the company.

Unfortunately, their company only receives about 100 hits to their site each month. If you take away all spam traffic and the business IP address, we’re down to about 80 hits. That’s barely any data to work with.

This small data set is not worth recommending an expensive restructure of their site. 80 visitors are simply not enough of a sample from which to base recommendations.

Beware Confusing Correlation with Causation

It doesn’t take a degree in marketing analytics to figure out that the more you grow a well-planned social media strategy, the more likely your organic traffic to your website will increase. But likes, clicks, and shares on social media don’t automagically turn into conversions. Engagements on social media might make you feel like a prom queen, but they might have little to do with what’s going down on your website.

Correlations May Or May Not Be Coincidental

When presented visually, it’s tempting to get pretty excited when the lines run roughly parallel on a graph. Those lines might be wildly coincidental, like the way the divorce rate in Maine correlates with the per-capita consumption of margarine.

Correlation Example
Image Credit: http://www.tylervigen.com/spurious-correlations

Correlations Have Yet-To-Be-Tested Variables

Moz founder Rand Fishkin explains the difference between correlation and cause like this: “Correlation can help you predict what will happen. But finding the ‘cause’ of something means you can change it.”

Another way to define a correlation is: a relationship between two variables that appear to be related. No one really thinks margarine consumption is related to divorce. But a hopeful marketing director certainly wants to drill down to see what relationship (if any) there is when trend lines for website traffic and Facebook mentions run parallel.

Unlike Correlations, Cause And Effect Relationships Aren’t Random Accidents

According to Archana Madhavan, writing for Amplitude, you have to:

  1. Examine your correlations.
  2. Lay out the variables in your metrics.
  3. Figure out which variables you can control and change to meet your stated goals. This usually takes testing by controlling some variables and then measuring the different outcomes.
  4. Keep in mind most causations have lots of factors involved. Even so, knowing even a degree of causality is valuable.

Finding The Variables In Your Metrics Takes Good Metrics

Ben Yoskovitz, the co-author of the book, Lean Analytics, defines analytics “as the measurement of movement towards your business goals.” That requires good metrics, and he identifies characteristics of good metrics:

Comparative— across time periods, user groups, or competitors.
Understandable—should be understandable to people who use them; make sure they aren’t too complex.
A ratio or rate— are inherently comparative and help you understand the health of your business easier.
Behavior changing—a good metric should immediately help you understand what to do differently. If not, it’s a bad metric. Vanity metrics are often bad metrics, because they’re not really actionable.

Takeaways

  1. If you are given a report with only percentages showing – push back and ask for the hard numbers.
  2. If someone presents correlations as causes, balk.
  3. Get comfortable with asking for the metrics. Ask if these are the appropriate metrics to use. The findings may surprise you.
  4. Finally, ask questions and get explanations for numbers that look too good to be true.

Information Overload? We Can Help!

Yeah, we eat data and metrics for breakfast and then snack on it throughout the day. Need your information in smaller bites? We can serve that up, too. Check out some options below. Or, if you’re to a point where you just need someone to talk to about your digital marketing goals, give us a call or shoot us a message. We can help you figure it out. Contact us!

Cynthia Powell
cynthia@tresemergroup.com

A copywriter by day, a novelist when the sun skips town. With a MA in history, she convinced teens for twenty-seven years that political science mattered. How? Client focus and engaging materials. She now applies that formula to marketing with creative, memorable, and audience-focused copy.