Bias in Machine Learning

Bias in Machine Learning
Image by Kittipong Jirasukhanont/ Canva

Machine learning, part of AI, is everywhere, and there's no denying it's changing the game in a big way. But as much as I'm all for data-driven decisions and the cool things AI can do, I think people need to be aware that it's not without flaws. It's pretty easy for bias to be introduced into it.

And the impact of some of these may be insignificant. For instance, a music streaming service favours a particular genre and suggests recommendations that are slightly off your taste. Yeah, sure, it's annoying, but no biggy.

However, some biases can have serious ramifications, like predictive policing. Imagine a situation where some law enforcement agencies use machine learning algorithms to predict where crimes will happen or who might be likely to commit them. If the data fed into these systems reflects historical biases- say, with higher minority populations having been over-policed in the past- the algorithm might disproportionately target these communities. You can imagine how far-reaching the consequences can be, affecting people's freedom, trust in the justice system, and perpetuating systemic inequalities.

So, I want to raise awareness about this topic in today's blog. We shall look at:

  • What is Machine Learning Bias
  • Source Machine Learning Bias
  • Approaches to Reduce Machine Learning Bias

What is Machine Learning Bias?

Machine learning bias goes by a few names: AI or algorithm bias. It refers to systematic errors or unfairness in a model's prediction due to flawed assumptions in the algorithm, bad data or the way the model is interpreted.

Sources of Machine Learning Bias

Bias in machine learning can creep in from various sources, affecting the outcomes of what many of us might consider an objective system. We can group these sources into three main categories of where they typically originate:

  • Data Collection & Training
  • Algorithm Design
  • Humans- Cognitive Bias

Data Collection & Training

You know that saying that most of us are now well versed in "garbage in, garbage out (GIGO)" couldn't be more accurate in machine learning and any data work. Machine learning models are trained to make predictions based on their fed data. This means, if the data is biased, the model's predictions will likely be biased too. Here are a few ways bias manifests in data:

Unrepresentative Data: If the training data doesn't represent the diversity of the real world or the specific scenario in which the model will operate in, it will make skewed predictions. For instance, a facial recognition system trained predominantly on one racial group will perform poorly on others.

Historical Bias: Sometimes, the data reflects historical prejudices or societal inequalities. When trained on this data, models can perpetuate or even amplify these biases. For example, suppose a company with a historically male-dominated workforce uses its employee records to train a hiring algorithm. In that case, the system might learn to prefer male candidates, reflecting past gender biases. Despite current efforts to ensure equality, the algorithm, influenced by the old data, might perpetuate the gender imbalance by favouring characteristics more commonly found in past successful male employees. This is how historical bias can inadvertently continue past unfairness into the present.

Labelling Bias: Labeling bias happens when the people putting tags or labels on data for machine learning make mistakes or let their opinions sway the labels. This can happen because people have subjective views, we aren't always consistent, and we make mistakes.

Algorithm Design Bias

Algorithm design bias occurs when the very structure and decisions within an algorithm lead to unfair or skewed outcomes, even if the data used is good. This can happen in several ways:

Model Simplification: Algorithms often simply the real world to understand it betters. But if these simplifications miss out on the full picture or complexity of real-life behaviours and diversity, the result can be biased. E.g. if an algorithm simplifies people's health data without considering variations like genetic predispositions prevalent in certain demographics, it might provide inaccurate health predictions for those groups.

Flawed assumptions & Weighting: When developers build algorithms, they make many decisions, including how much weight to give certain factors. If they base these decisions on flawed assumptions or their own biases, even unconsciously, it can lead to biased outcomes. For instance, if an algorithm in hiring software gives more weight to applicants from certain universities based on the developer's perception of those universities, it might unfairly favour or discriminate against candidates based on their education background.

Programming Errors: Sometimes, simple mistakes in coding or failing to account for certain scenarios can introduce bias. These errors can skew the algorithm's decision-making, leading to unfair outcomes for certain groups.

Cognitive Bias

Finally, the very humans who design, develop and deploy machine learning models bring their own biases to the table. This affects how algorithms are created and used:

Confirmation Bias: Developers might favour data or interpretations confirming their beliefs or hypotheses, overlooking contradictory evidence.

Overfitting to Personal or Cultural Norms: Developers might unintentionally design systems that work well for users like themselves but poorly for others from different backgrounds or cultures.

Groupthink: In a team setting, the desire for harmony or conformity might lead to an environment where critical examination of biases is discouraged, leading to oversight.

Approaches to Reduce Machine Learning Bias

Here are some ways to reduce bias in machine learning:

  • Representative data: Ensure the training data is as diverse and representative as possible. This includes gathering data from various sources and ensuring it reflects the diversity of the population the model will serve.

Transparent and interpretable Models: Use models that are transparent and explainable. Users can more easily spot potential biases When they understand how and why decisions are made. Simpler models or tools that visualize the decision-making process can help achieve this.

Regular monitoring: Conduct audits of the machine learning models to check for bias. This includes testing them against different scenarios and monitoring their performance to ensure they remain fair over time. Independent audits can also provide an unbiased review of the models.

Inclusive Algorithm design: Involve a diverse team in the design and development of algorithms. A team with varied backgrounds and perspectives can help anticipate and mitigate different types of bias. Encourage a culture of questioning and critical thinking to challenge assumptions and prevent groupthink.

User Feedback loop: Implement a system where users can report biases or unfair outcomes. This feedback can be used to improve the model continuously.

So, there you have it! Tackling bias in machine learning isn't just a nice to-do; it's a must-do. Recognizing, understanding and mitigating bias is an important part of developing and deploying machine learning technologies responsibly.

We've seen how AI and machine learning are super cool and useful but imperfect. Small biases might only mess up our music playlists, but big ones can really affect people's lives, especially in serious stuff like law enforcement.

The good news is that we can improve at spotting and fixing these biases.