Product Management Tips for Data Science Projects

Data science has traditionally been an analysis-only endeavor: using historical statistics, user interaction trends, or AI machine learning to predict the impact of deterministically coded software changes. For instance, “how do we think this change to the onboarding workflow will shift user behavior?” This is data science (DS) as an offline toolkit to make smarter decisions.

Increasing, though, companies are building statistical or AI/Machine Learning features directly into their products. This can make our applications less deterministic – we may not know exactly how applications behave over time, or in specific situations – and harder to explain. It also requires product managers to engage much more directly with data scientists about models, predictability, how products work in production, how/why users interact with our products, and how our end users measure success. (Hint: most users don’t understand or care about F1-scores; they just want to get the right answer.)

So here are a few tips for product managers who may be pulling data science – and data scientists – into our product processes.

1. Provide Much Deeper Context than Traditional Software Projects, Especially Use Cases and Business Goals

We’ve known forever that we (product managers) must include our developers and designers in the earliest stages of problem definition and solution ideation. Putting the best minds together builds shared understanding of users, success metrics, constraints, architectural choices, work-arounds. Piping in real users’ voices and challenges motivates the whole team. And creates a storehouse of resonant insights. Ultimately, we build better products collectively.

But product management’s relationship with data scientists can feel like where we were 25 years ago with core development teams: weak understanding on both sides; specialized terminology; assumptions that data science is easy; opportunities for resentment. We versus They.

Data science teams are often new to our companies, organizationally isolated, with weak understanding of our company’s audiences and economics. Many data scientists are fresh from academia, with less general business experience than our core development teams. Less appreciation of commercial software development pressures.

So when we’re framing market problems or opportunities for data science to improve our products, we need to be aggressive about providing context:

Walk the DS team through detailed use cases. Who are the players, and what do we call them? What happens when? Where might we inject model-based insights or machine learning (ML)? Key constraints or regulatory obligations?
Dig deep into success metrics, business goals, and economics. Is this improvement intended to boost our license revenue from new customers, or reduce our churn rate, or boost customer satisfaction? Why will paying customers care, or what specific pain are we alleviating? How will we measure these, and how much improvement is enough? Be prepared to explain qualitative things to DS teams in quantitative terms.
Share your validation/user research assets: recordings of interviews, videos of users struggling with your interface, product comparisons, revenue projections, sales wins, whatever will connect the DS team emotionally to your end users.

2. Remember That Data Science Projects are Uncertain and Our Judgment May Be Weak

It’s easy to assume good outcomes without having done upfront DS investigation. Of course this data set will predict customer satisfaction; that machine learning process will uncover manufacturing inefficiencies; some AI tool will help cure cancer! But the real world gets in our way: dirty data, models with poor prediction scores, entirely obvious results, existing human processes that already work pretty darn well. We can let our enthusiasm and urgency get out ahead of actual results. So plan to:

Frame a hypothesis in the language of data science: “I think dataset X will help predict Y.” That gives everyone a common starting point – something concrete to critique. And you’ll be bridging cultural gaps by talking like a data scientist.
Have your DS team experiment with several data sets. Which (if any) looks interesting and predictive? Should we combine several data sets?
Avoid committing final delivery dates until we know that we have something workable
Ask your DS team to talk you through their 4 C’s: coverage, completeness, correctness, currency. [Unrelated to the diamond industry’s 4C’s.]

For instance, we may have some intuition that machine learning could help us predict future stock market moves based on historical earnings announcements and public disclosures. But there’s good evidence otherwise. (And financial quant firms are spending $100M’s/year chasing this – if anyone has found a solution, they are trading on it rather than telling the world.) Before we promise our Board that we can outthink markets and competitors, it would be great to prove the theory.

3. Choosing/Accessing Data Sets is Crucial

Data science projects succeed or fail on actual data sets and models, not intentions or intuition. Some data sets are better than others: more accessible, more complete, cleaner. And data may be hidden behind organizational or regulatory walls: you may have trouble using your company’s consumer e-commerce history without a GDPR-flavored privacy review. Or the General Manager of US Sales dislikes the Managing Director of EU Sales, so refuses to share. Or getting access permissions from IT takes 6 months. So it might be useful to:

Investigate ownership and permissions for internal data sets at the start of a project. What have similar teams run into recently
For external data sources, check for acceptable uses and commercial terms. Are we allowed to build them into our products or collect revenue? Any required notifications in documentation or legal disclosures?
Be especially careful with identifiable consumer data and end user permissions

4. Describe How Accurate This Application Needs to Be, and Anticipate Handling “Wrong” Answers

Some DS projects only need to give us a little extra insight; others have huge downsides when we get it wrong. So level of accuracy is an essential conversation at the very start of any data science project. We might spend a year and $2M on training data when “somewhat better than coin flip” accuracy is enough… or put lives at risk in a medical prediction application with lots of false negatives.

For example, we might be optimizing routes for urban package delivery trucks. Sorting packages onto the right trucks and sequencing delivery stops might save lots of time and fuel and money, even if it’s far from perfect. 70% accuracy could shave 20 minutes/day off our routes and $M’s in operating costs. Talking through this with our DS team avoids gold-plated solutions that need an extra year of validation – good enough is good enough.

Machine analysis of water quality reports must be much more accurate: failing to identify lead in the drinking water supply (false negatives) has serious implications , while incorrectly raising water quality alarms in safe areas (false positives) creates unnecessary concern. We should have some energetic arguments about 98% vs. 99% versus 99.95% accuracy and how our users will put our predictions into practice.

And every DS project will have some results that surprise us. Some answers will be flat wrong, some will teach us something true about the world, and some will highlight that our human experts disagree with each other. We need a plan for human review of results and escalation to humans when outcomes seem incorrect.

For instance, a bank might spin up an AI/Machine Learning approval process for residential mortgages. Responding to mortgage requests fifty times faster than human loan officers would dramatically chop staff costs and improve qualified homebuyers’ satisfaction. But our model will inevitably produce some questionable results. How will we review these, and who will decide if the AI answer is “wrong?”

And analysis (here, here) suggest watching for many kinds of biases or errors in our mortgage app:

Training data could include previous biases (e.g. redlining) or obvious inferences (borrowers in rich zip codes unsurprisingly have lower default rates)
Laws change, economic conditions change, new mortgage products have different risk profiles. Outdated models and outdated inputs give us outdated results.
Training data could be wrongly tagged, not representative, misunderstood

We will need plan for human review of complaints/escalations; some way to explain results to human borrowers; and a schedule for revisiting and periodically re-building our models. Skynet can’t be left in charge of our lending decisions.

5. “Done” Means Operationalized, Not Just Having Insights

Most newly minted data scientists come from an academic environment where success means showing that a model meets some target for accuracy. They get applause for uncovering an interesting correlation or improving natural language processing (NLP) against shared test data. But they rarely have to put their work into ongoing commercial production.

In product development, we have to incorporate those models and insights into working software. Fraud detection systems have to decide in real time if a transaction is suspect; ecommerce recommendation engines have to choose which clothing to display; fire detection systems have to sound emergency alarms when necessary; weather forecasts have to distinguish storm surges from sharpies.

Our main development teams have been building full production software for a long time. So we need to facilitate technical discussions between application engineers and data scientists – early and often:

Where will models and AI processes live in production? How do we turn these processes on/off, secure them, manage capacity, patch/update underlying tools?
How do signals (decisions) get from the model to the core application: APIs, file exchange, nightly analytic runs?
What response times are required from data models to other production modules?
Our automated test suites make sure that traditional code changes don’t break the system. How will we monitor for unexpected DS/AI behaviors?
Who has authority and system access rights to shut down misbehaving models? Can we run some DevOps drills to confirm that works?

Operationalizing data science for the first time will be challenging. It’s not important that we product managers have all of the answers, but instead that we get the right folks in the room to identify and solve issues. (Double points for having someone write these down, so we have a starter list for next time.)

Sound Byte

Data-driven applications are more complicated than deterministic software products. And working with data scientists has some unique challenges. We need to approach these thoughtfully, recognize the patterns, and respect the special talents of each group.