Breaking the Cycle of Data Bias
Laura Kornhauser explains how we can leverage technology to uncover, understand, and undo bias for a more equitable future.
Thanks for coming back to the Empire Startups newsletter. New here? Subscribe to get our newsletter sent straight to your email.
Hi there,
“Past performance is not indicative of future returns.”
After over a decade of these words appearing at the bottom of every email in my previous banker life, this quote is etched into my brain. And while the statement may seem simple enough, it is something often strangely missed in the world of AI.
You may have heard that AI and the subfield of machine learning solutions are only as good as the data you feed them; often referred to as the “garbage in, garbage out” problem. With the expected proliferation of these solutions, what can one do to ensure that “bad” data doesn’t engrain “bad” decisions into your organization?
First, it is important to understand that there are many ways in which data can be bad and many places in which biases are introduced - by both humans and machines - in a process. Some features of bad data are more obvious, such as inaccurate or missing data, but others are harder to identify. For this piece, we will focus on biases in data that make the data “bad”.
1. Looking backward ≠ Looking ahead
All data has the blind spot - or bias - of representing only what has happened in the past. In times of rapidly changing market conditions, such as the ones we are in today, how do you optimize decisions for the future?
While data can provide valuable insights from past patterns, it cannot inherently adapt to unforeseen circumstances or net new environments. Government stimulus during COVID artificially made certain customer segments appear less risky than they actually turned out to be. Experts knew this context around the data before the data did.
2. Baked-In Bias
In a dataset, there's often a desired outcome that you are predicting and optimizing decisions for: Your “source of truth.” However, the data usually lacks the context that may have yielded these outcomes.
The Stratyfy team recently completed a study of 2021 HMDA mortgage data where we found that 1,117 black applicants were unjustly denied mortgages, equating to $387 million of capital that was withheld from these applicants in JUST ONE YEAR. These loan rejections could not be attributed to any of the applicants’ actual financial data that would be predictive of creditworthiness – and therefore can only be attributed to human bias.
These biased decisions create a waterfall effect on those individuals and their communities, forcing them to resort to products with higher interest rates and/or hidden fees. Failing to consider the nature of this loan will incorrectly attribute the default to the characteristics of the borrowers, perpetuating that pattern again, and again, and again.
3. Reading “Between” The Lines
We have more data today at our fingertips than we ever. But what does it lack? Context. Our data sets almost always lack at least one crucial variable that explains the outcome we’re interested in – the “why.”
In the example of lending, credit data doesn’t capture the full story of a borrower. Credit models assume their repayment behavior will mimic others with similar financial variables. This may be true on aggregate, but the adverse effect of missing context for past delinquencies can disproportionately burden communities with fewer resources.
A recent study from Urban Institute found that Black and Native American communities have the lowest median credit scores in the US. Young adults in majority-Black and majority-Hispanic communities are more likely to begin adulthood with lower average credit scores than their peers in majority-white communities. This is due to the structure of our financial system that often provides less access to financial services to communities of color , creating roadblocks that further limit them from accessing and building credit and wealth.
The future of Machine Learning is Now
If we do not address and correct for the underlying causes of biases in the data, AI models can perpetuate and multiply these biases in decisions going forward. While these challenges may seem insurmountable, the truth is that advanced modeling and decisioning technology can help address each of these areas of bias, as long as you know what capabilities to look for.
1. Transparency is non-negotiable
We often hear that it's a problem when we don't understand how AI and machine learning systems make decisions (the "black box" issue). To address this, two main approaches have emerged: Interpretability and Explainability.
Interpretable Machine Learning: This means using methods that allow us to see how the AI system is making decisions. It's like having a clear window into the system, so we can understand what's going on inside.
Explainable Machine Learning: This approach involves adding extra models on top of the AI system to try and explain its decision-making process. It's like creating a guide that helps us understand what's happening inside the "black box."
For issues that directly affect people's lives, like who gets approved for a loan, or who gets the opportunity to interview for a job, we must prioritize transparency using interpretable machine learning approaches. We need to be able to look inside the system, understand it, and make necessary changes to dismantle biases and ensure fairness.
2. Seeing the forest through the trees
When optimizing a machine learning algorithm, modelers aim to closely match patterns in the data. They use various metrics to measure this match and then adjust the machine's settings to improve it using training and testing data. It is now possible though to set dual objectives, where the machine tries to meet more than one goal or follow a goal while considering one or more limitations. It's criticalto make bias a key indicator of a model's performance. Setting these multiple objectives and constraints at the onset of model development will not only ensure that your organization is on the same page concerning priorities but also empower you to see the forest through the trees.
3. Human in the Loop
Data alone cannot give you the full picture of what happened in the past or what will happen in the future. Select machine learning approaches allow you to learn from data, see what you learned, and then change it. This type of approach can incorporate subject matter expertise that can help address the aforementioned biases.
What’s Next
Data will always be biased as long as we look at the past to inform the future. Today, we can uncover, understand, and undo bias with the help of the right technology. This is how we will address these critical issues now and drive a more equitable future.
–
Laura Kornhauser, Co-founder & CEO of Stratyfy, Empire Startups Contributor
Empire Startups Contributors are a community of experts providing unique perspectives and insights on the latest in FinTech. Our model is is merit-based and does not offer monetary compensation.
🎪 Empire FinTech Conference 2024
Step right up! Tickets are officially on sale for the 2024 Empire FinTech Conference.
Get ready for a jam-packed day with power-packed keynotes, hands-on masterclasses, live podcasts, cutting-edge demos, and priceless networking opportunities.
Don’t miss out on the most important FinTech event on the East Coast.
🗣 Call For Content
We're on the lookout for the boldest, brightest, and most innovative minds in FinTech to grace our conference stage. Are you one of them? 🎤 ✨
Our team wants to hear from you – the leaders and visionaries of the FinTech community – as to what’s top of mind, and who should be speaking about it.
Submit yours for a chance to be on our 2024 stage.
If your email client clips some of this newsletter, click below to see the rest.
🎟 Featured FinTech Events
BOSTON
NEW YORK
OTHER CITIES
SAN FRANCISCO
🗞🎧 The latest news in FinTech.
Reads
📲 Consumer Bureau seeks to supervise digital payment apps | The New York Times
A proposed new rule would subject Google, Apple, PayPal and other digital wallet providers to the same scrutiny banks face.
👀 Millennials led the consumer fintech revolution post-2008. Here’s why Gen Z-ers are about to do the same | Fortune
In the wake of financial volatility, big companies get built to solve new financial pains.
🔨 Klarna lays groundwork for IPO; avoids strike | FinExtra
Klarna is setting up a UK holding company ahead of a widely-expected initial public offering.
😬 FOMO fuels venture-backed fraud trials | Axios
In all three recent major fraud trials — Theranos, FTX and now Rothenberg — investors have skirted on due diligence and good corporate governance, amid fears of missing out on the next big thing.
🙅 VCs No Longer Do DTC | Crunchbase News
Among the most heavily funded direct-to-consumer brands, a common theme over the past several quarters has been to promote their wares more heavily in offline channels.
Listens
📉 The VC Funding Downturn | Fintech Insider by 11:FS
Based on year-on-year funding in August, funding is down 37% and is at its lowest since the Covid pandemic.
So in this episode we discuss if this is just a temporary slump or the start of a worrying downward trend. What are the implications on fintechs? And who, despite these numbers, is managing to buck the trend and find investment?
💪 Breaking Barriers with Fintech’s Most Powerful Women | Humans of Fintech
With women making up less than 30% of the fintech workforce and only 8% holding leadership roles, something needs to change. These inspiring women share their experiences and insights on how to break through the barriers and create a more inclusive industry.
From the importance of storytelling and confidence to the need for more female VCs and founders, this roundtable discussion covers it all, touching on the power of male allies and sponsors in supporting women in fintech.
🚀 Featured FinTech Funding
SEED
EarlyBird, $4.5M (Wealth Management, Chicago)
Revenue Roll, $2.5M (Lending, New York)
Animus Technologies, $1M (Blockchain/Crypto, New York)
SERIES A
Charlie, $16M (Digital Banking,Los Angeles)
Final Offer, $5M (Mortgage/Real estate, Hingham)
SERIES C
Nowsta, $35M (Payments/Billing , Brooklyn)
💼 Featured FinTech Jobs
New York
Director, Private Business Credit , Yieldstreet
Director, Enterprise Sales, Navan
Head of Content Marketing , MoneyLion
[ Risk ] Portfolio Manager , Capchase
Remote
AVP, Country Compliance Oversight , Flutterwave
Senior Product Manager, Acquisition - Remote, US , Earnest
Alliance Manager II, Strategic Partnerships, AvidXchange
Claims Specialist , Counterpart
San Francisco
VP, Data Science and Analytics , Jerry
Nature Investment Associate / Senior Associate , Ethic
Shareholder Communications Senior Manager , Ripple
Demand Generation Lead, Banyan Infrastructure Corporation