Lisa Morgan's Official Site

Strategic Insights and Clickworthy Content Development

Month: January 2017

Common Biases That Skew Analytics

How do you know if you can trust analytical outcomes? Do you know where the data came from? Is the quality appropriate for the use case? Was the right data used? Have you considered the potential sources and effects of bias?

All of these issues matter, and one of the most insidious of them is bias because the source and effects of the bias aren’t always obvious. Sadly, there are more types of bias than I can cover in this blog, but following are a few common ones.

Selection bias

Vendor research studies are a good example of selection bias because several types of bias may be involved.

Think about it: Whom do they survey? Their customers. What are the questions? The questions are crafted and selected based on their ability to prove a point. If the survey reveals a data point or trend that does not advance the company agenda, that data point or trend will likely be removed.

Data can similarly be cherry-picked for an analysis. Different algorithms and different models can be applied to data, so selection bias can happen there. Finally, when the results are presented to business leaders, some information may be supplemented or withheld, depending on the objective.

This type of bias, when intentional, is commonly used to persuade or deceive. Not surprisingly, it can also undermine trust. What’s less obvious is that selection bias sometimes occurs unintentionally.

Confirmation bias

A sound analysis starts with a hypothesis, but never mind that. I want the data to prove I’m right.

Let’s say I’m convinced that bots are going to replace doctors in the next 10 years. I’ve gathered lots of research that demonstrates the inefficiencies of doctors and the healthcare system. I have testimonials from several futurists and technology leaders. Not enough? Fine. I’ll torture as much data as necessary until I can prove my point.

As you can see, selection bias and confirmation bias go hand-in-hand.


Outliers are values that deviate significantly from the norm. When they’re included in an analysis, the analysis tends to be skewed.

People who don’t understand statistics are probably more likely to include outliers in their analysis because they don’t understand their effect. For example, to get an average value, just add up all the values and divide by the sum of the individuals being analyzed (whether that’s people, products sold, or whatever). And voila! End of story. Except it isn’t…

What if 9 people spent $100 at your store in a year, and the10th spent $10,000? You could say that your average customer spend per year is $1,090. According to simple math, the calculation is correct. However, it would likely be unwise to use that number for financial forecasting purposes.

Outliers aren’t “bad” per se, since they are critical for such use cases as cybersecurity and fraud prevention, for example. You just have to be careful about the effect outliers may have on your analysis. If you blindly remove outliers from a dataset without understanding them, you may miss an important indicator or the beginning of an important trend such as an equipment failure or a disease outbreak.

Simpson’s Paradox

Simpson’s Paradox drives another important point home: validate your analysis. When Simpson’s Paradox occurs, trends at one level of aggregation may reverse themselves at different levels of aggregation. Stated another way, datasets may tell one story, but when you combine them, they may tell the opposite story.

A famous example is a lawsuit that was filed against the University of California at Berkeley. At the aggregate level, one could “prove” more men were accepted than women. The reverse proved true in some cases at the departmental level.

Emotional Analytics is Next. Are You Ready?

In the near future, more organizations will use emotional analytics to fine-tune their offerings, whether they’re designing games or building CRM systems. Already, there are platforms and software development tools that allow software developers to build emotional analytics into desktop, mobile, and web apps. In a business context, that can translate to mood indicators built into dashboards that show whether the customer on the phone or in a chat discussion is happy, whether the customer service rep is effective, or both — in real time.

Such information could be used to improve the efficiency of escalation procedures or to adapt call scripts in the moment. It could also be used to refine customer service training programs after the fact. In many cases, emotional analytics will be used in real time to determine how a bot, app, IoT device, or human should react.

Although the design approaches to emotional analytics differ, each involves some combination of AI, machine learning, deep learning, neural nets, natural language processing, and specialized algorithms to better understand the temperament and motivations of humans. The real-time analytical capabilities will likely affect the presentation of content, the design of products and services, and how companies interact with their customers. Not surprisingly, emotional analytics requires massive amounts of data to be effective.

Emotion isn’t an entirely new data point, and at the same time, it is. In a customer service or sales scenario, a customer’s emotion may have been captured “for training purposes” in a call or in a rep’s notes. In the modern sense, emotions will be detected and analyzed in real time by software that is able to distinguish the nuances of particular emotions better than humans. Because the information is digital, it can be used for analytical purposes like any other kind of data, without transformation.

Voice Inflection

What people say is one thing. How they say it provides context. Voice inflection is important because in the not-too-distant future, more IoT devices, computing devices, and apps will use voice interfaces instead of keyboards, keypads, or gestures designed for mobile devices.

Because humans and their communication styles are so diverse, contextual information is extremely important. Demographics, personas, account histories, geolocation, and what a person is doing in the moment are just a few things that need to be considered. Analyzing all that information, making a decision about it, and acting upon it requires considerable automation for real time relevance. The automation occurs inside an app, an enterprise application, or a service that acts autonomously, notifies humans, or both.

Body Language

Body language adds even more context. Facial expressions, micro expressions, posture, gait, and gestures all provide clues to a person’s state of mind.

Media agency MediaCom is using emotional analytics to more accurately gauge reactions to advertisements or campaigns so the creative can be tested with greater accuracy and adjusted.

Behavioral health is another interesting application. Using emotional analytics, healthcare providers can gain insight into conditions such as depression, anxiety, and schizophenia.

The potential applications go on, including law enforcement interrogations, retail, and business negotiations, to name a few.

A Tough Problem

Natural language processing, which is necessary for speech and text analysis, is hard enough to get right. Apple Siri, Microsoft Cortana, and even spellcheckers are proof that there’s a lot of room for improvement. Aside from getting the nuances of individual languages and their dialects right, there are also cultural nuances that need to be understood – not only in the context of words but the way in which words are spoken.

The same thing goes for gestures. Large gestures are fine in Italy, but inappropriate in Japan, for example. The meaning of gestures can change with culture, which intelligent systems must understand.

As a result, emotional analytics will crawl before it walks or runs, like most technologies.