Published: January 16, 2023
"Biased Algorithms are easier to fix than Biased People"
AI Bias arises when ML systems discriminate against a particular group of people and thus have the potential to magnify preexisting bias. Example: “Google’s AI-based hate speech detector is biased against black people". AI systems are the norm now and users/consumers interact with them on a daily basis. Example: Netflix, Youtube, All E-Commerce Portals, and definitely Social Media.
"ML Algorithms should not only be good at prediction but should also be ethical”.
Types of AI Bias:
- Data-Driven: When underlying data itself is biased then the trained model on this data is also biased.
- Societal: When AI behaves in ways that reflect deep-rooted social or institutional discrimination.
Let's see more about Data-Driven Bias as Societal bias is more about policy and implementation and there is little that a data scientist can do to influence the outcome.
Data-Driven or Algorithmic bias
- COMPAS system for predicting the risk of re-offending was found to predict higher risk values for black defendants (and lower for white ones) than their actual risk.
- Google’s Ads tool for targeted advertising was found to serve significantly fewer ads for high paid jobs to women than to men.
Algorithmic Bias occurs when AI and computing systems act not in objective fairness, but according to the prejudices that exist with the people who formulated, cleaned, and structured their data.”
Garbage In: Garbage Out is the same as Bias In Bias Out
AN unbiased AI System can be a USP for Selling AI Products.
How Do You Tackle AI Bias?
Sources of Bias
- Skewed Sample: Dataset is skewed towards a certain group or may not reflect the real world.
- Tainted Examples: Unreliable labels, historical bias.
- Sample Size Disparity: Do we have enough data?
- Limited Features: Feature collection for certain groups may not be informative or reliable.
- Proxies: Zip Code or school can be proxies for race. School of sports activity can be proxies for gender.
Steps to tackle Bias in Representation (Statistical Parity) and Bias in Error (False Positive/Negative).
- Identify protected features in your dataset.
- Select an appropriate “fairness metric” for your use case and value system.
- Build insights to identify & understand your model’s potential bias.
- If possible, mitigate bias uncovered in your data or model.
Typical Machine Learning Pipeline and how to check for bias in each step.
- Data Source: Sampling vs Population. Make sure the sample is representative of the population. Also, check if bias exists in the population itself.
- Data Pre-Processing and PCA: Imputation, Transformation, PCA, etc should be done in a manner to avoid bias. Example: If you want to eradicate gender bias then remove the gender flag so that model cannot learn. You will need to think of the tradeoff in model performance.
- Modeling: Check for over-fitting and under-fitting and you can also use fairness constraints. For example in categorization problem constrain can be PF~=FN.
- Testing: Not only test for accuracy but also test for bias. Check what results from the model consistently produces. Are these results biased?
- Usage: What is the final goal of the model. Upon implementation model is used in all its fairness or only apart of it is used. Example: The user might be interested only in TP but ignores FN.
AI Bias is a developing field and there is no one definition or source of documented approach to fix different kinds of Bias. I am listing a few resources for you to develop your understanding of the subject. Hope you enjoyed the short read.
Meanwhile, if you are looking to develop an AI system for your brand, BEEU Tech is the AI Development Firm you need!
Online tools and games
- Closing Gaps Ideation Game from the Partnership on AI
- Survival Of the Best Fit: a game on bias in hiring
- Stealing Ur Feelings interactive documentary from Mozilla
- IBM AI Fairness 360 Opensource Toolkit
- WEBVis: A Human-in-the-loop Auditing Tool for Exploration and Mitigation of Social Biases
- Google What-If Tool: Visually probe the behavior of trained machine learning models.
- FairLearn: A Python package to assess and improve the fairness of machine learning models.
- The Ethical Algorithm, Michael Kearns and Aaron Roth.
- AI bias experts, games, podcasts, and papers — Quartz
- Algorithms to Live By, Brian Christian, and Tom Griffiths
- Algorithms of Oppression: How search engines reinforce racism, Safifia Umoja Noble
- Weapons of Math Destruction: How big data increases inequality and threatens democracy, Cathy O’Neil
- Biased: Uncovering the hidden prejudice that shapes what we see, think, and do, Jennifer L. Eberhardt