Anatomy of an ML startup: part II

By Jay Bartot, CTO of Madrona Venture Labs, in partnership with Madrona Venture Group

Finding product market fit for your machine learning startup

In Part I, I shared the challenges of creating machine learning (ML) startups and how oftentimes, the biggest challenges you’ll face will be procuring the data you need to feed your data-hungry ML solution.In this post, let’s dig into what happens when you get that first bit of data, run your algorithms and ML technology, and the subsequent challenges you might encounter trying to find product-market fit (PMF)--the degree to which your product satisfies a strong market demand.

In order to illustrate my points and experience, I am going to focus on one specific type of data mentioned in part I, “Operations Data.” This type of data, sometimes referred to as corporate “exhaust” data, is increasingly finding utility in ML applications being developed by startups.

An increasing number of companies have crossed the digital transformation and are interested in exploiting their data using technology to improve and optimize their businesses. SaaS applications, for example, are increasingly used by companies to manage many aspects of their businesses. These applications are inherently collecting and storing data. There are opportunities for ML startups because much of this type of data has yet to be exploited for the value it contains, and it can often be collected via API calls to the SaaS platforms. See “Operations Datain Part I for more detail on the opportunities and challenges working with this type of data.

You probably started your company with a hypothesis like, “if we were able to get this data from a customer and apply state-of-the-art ML technologies to it, I bet we could figure out X value from it or extract Y actionable insights, etc.” If you have expertise developing ML applications then this is probably a strong hypothesis - perhaps you’ve worked on similar problems at larger tech companies or other startups. But how will you really know for sure until you see some real data, and try out some ML technology on it?

One of the tenets we hear in the world of data and analytics especially when we are still learning is that “real-world data is messy and noisy.” It is a tenant because it is true. Unlike example data used in textbooks to illustrate analytics algorithms and solutions, real-world data often has all kinds of problems and issues that stem from the imperfect systems and processes that are creating the data. Missing and extreme values and errors can be common, for example. These problems can trip up your analytics algorithms, tricking them into producing erroneous signals. The reality is that you don’t really know how well an analytics technology is going to work until you try it on some real data.

A hypothetical ML startup example

In order to illustrate some important points, I am going to present a hypothetical ML startup, GoodMeeting.com. Imagine you co-founded GoodMeeting.com. Your value proposition is that you will be applying ML technology to extract patterns that reveal the efficacy and utility of routine company meetings. You are going to take data about meetings in technology companies (which we all know are far too frequent, expensive and often lack purpose or produce utility) and classify the usefulness of the meeting based upon the people who attended, a list of written meeting goals, the time of day, the number of participants, the total cost of the meeting (salaries of the participants for that hour), whether there were laptops present, etc.--whatever features you can dream up and engineer to maximize the effectiveness of the predictive model. 

The target variable (whether a particular meeting was useful or not) will be collected by sending a survey to participants after every meeting for a two-month onboarding period. The average meeting survey result will be used to train a regression model. Hopefully, after 2 months of data collection, your team at GoodMeeting.com can train a model with reasonable performance. From that point, you can start predicting meeting utility and ultimately eliminate excessive meetings, changing the meeting culture of the customer for the better.

This sounds doable, but you won’t really know until you get your first batch of labeled data. A quick analysis of the ground truth data will hopefully reveal there is a good distribution of useful and non-useful meetings, as determined by the meeting participants. The goal of the technology is to separate the two types of meetings and save the company a lot of time, money, and resources. From your previous ML and data science experience on similar problems, you’ve found yourself telling potential pilot customers that you think you can do this with ~90% accuracy. Based on this promise, you convince the first pilot customer to give your company and product a shot. You and your team manage to get through the data security audit and grilling (mentioned in Part I). 

You design the survey, give it to the customer and after a few months of anxious waiting, you have your first batch of training data. With great trepidation you look at the raw data and see an encouraging distribution across positive and negative meeting ratings. You train up a model based upon your first shot at engineered features and run n-fold cross-validation. The results? Not bad - 71% accuracy. You share the results with the customer and they seem generally pleased, although not thrilled. They agree to keep going and pretty soon you get another month of data. And then another month. The results start to improve, now climbing into the high 70s of accuracy. Still not in the 90s, but better than the low 70s. 

In the meantime, you’ve brought on a second pilot customer. Unlike the first pilot customer who was a new and booming technology company full of young employees, the second pilot customer is an older company with a successful legacy product line that goes back a few decades. You get two month's worth of onboarding data and this time find there are fewer negative meeting ratings. You train up a model and the performance is poor. You collect a few more months of data and still poop. The pilot customer is starting to lose interest. Then, one of the sponsors mentions offhandedly that the company does four product release cycles per year (maybe they haven’t heard of agile). On a hunch, you try to train a model on just the previous month, having noticed that just the previous month had a larger distribution of negative meetings. You train and cross-validate and suddenly, wow, this model’s performance is decent! 

Upon further investigation, you realize, just as patience is starting to wear thin, that the employees are much more sensitive to wasting time in meetings when they are close to a shipping deadline and have real work to do. You create some features to represent this phenomenon and pull back in the older data and yes, the model still performs OK, let’s say high 70s. So now you're in decent (but not awesome) territory with both of your pilots. They are each requiring similar but not identical ML methodologies. A third pilot customer comes aboard who also shows idiosyncrasies in their meeting behavior and patterns and has model performance in the mid-70s. 

The pressure is starting to mount and tensions are starting to flare up with your team, especially between business and tech. Investors are starting to get worried too. What if each customer requires custom, non-repeatable work? You really don’t want to be a professional services (aka proserv) shop. Although promised 90% accuracy, two pilot customers are not really giving clear signals as to whether the results are still valuable. However, one of the pilot customers reveals that high 70s accuracy is actually fine for now. Why? Their problem of meeting-itis was so bad before that anything would help, even a model that was ~75% accurate. Whew! You're grateful for the lifeline, but you know there is a lot more work ahead of you.

So what’s really happening here? GoodMeeting.com is trying to find product market fit (PMF), which at this point is largely a function of data and ML technology. The main three questions at play here are:

Does this type of data have the signal in it that you hoped it would, ideally across many customers?

Can the same (or very similar) ML technology be leveraged across customers?

What is an accuracy bar that is good enough to provide value, and good enough to charge for? For GoodMeeting.com, they seem to have found that ~75% is good enough for one pilot customer. Will that hold for other customers as well?

The Goodmeeting.com experiences described above may sound gut-wrenching, but they are actually normal. There’s not a lot you can do to find answers to these questions without engaging in the pilot process and seeing for yourself whether you can fulfill your value proposition. And engaging in the pilot process will probably require enough resources that you’ll need some funding/investment upfront. So you’re likely sticking your neck out to your early investors as well as pilot customers that you can do what you say and think you can do. It can feel very risky and scary.

How to weather the challenging journey to PMF

Build a strong, supportive culture. As startup pundits often say, build a great culture. Here’s where it matters. Stick together as a group, support each other, and be empathetic with one another. The business folks should support the technical folks. They’re doing the best they can with data they have never seen before. The technical people should be empathetic that the business people have to go back to the customer, perhaps with underwhelming news and try to keep them engaged for another round. Have a conversation with your team before you get started that these early days can be challenging and to hold on for the ride. Being supportive and sticking together will be paramount to helping the company weather the tough times and ultimately work together to come up with solutions that benefit everyone. Of course, when you have wins, even small ones, be sure to celebrate them wholeheartedly as a group.

Find common ground with your team. There is an element of negotiation to product-market-fit, both between you and the customer as well as within your team. The business and product folks may be saying, “I gotta have X!” While the data science team may be saying, “for the time being, I can only give you Y!” Again, this can be stressful and create a lot of tension. Be aware that figuring out how to move forward will be a negotiation, so think of the conversation (or argument) in those terms. Write down precisely where the parties stand and agree on a mediator within the group (e.g. CEO) who can be empathetic to both parties and help find a common ground. 

Choose customers by vertical or industry. Not every pilot customer is a good pilot customer. This one is tough because beggars (i.e, startups) usually can’t be choosers. But if you can, try to choose pilot customers who all come from a particular vertical or industry. Doing this increases the likelihood that the data and patterns you see across your first handful of customers will be more homogeneous and thus can benefit from similar ML solutions. Later, after you nail your first vertical, you can start branching out to other verticals that may have different types of data and patterns. Investors will be supportive of this strategy too. Although they want to know you’re playing in a massive space, they’ll want you to focus early on and get a particular use case right before you branch out to a wider audience of customers.

Be careful what you promise to pilot customers. Like finding the right pilot customers, this is easier said than done. Of course you want to sell hard and convince early adopters that you have something really special to offer. But overpromising can get you into trouble. And as GoodMeeting.com found out, some early customers may be grateful for nearly any solution, especially if they have some confidence you’ll get better over time. Know too that you won’t be able to please everyone, especially in the early days. You’ll win some and lose some. Hopefully in the long run you’ll win more than you lose.  

Explore synthetic data generation techniques. This is an area of data science that is gaining some momentum. Remember GoodMeeting.com had to wait 2 months for their first dataset? Maybe that could be reduced to a few weeks at which time initial data distributions could be fitted and sampled. Other techniques such as data augmentation or weak supervision can also be helpful depending on the nature of your problem. Waiting for data to accumulate in the early days of a startup running on fumes can be agonizing. And it’s not uncommon to realize after the fact that your earliest attempts to collect data systematically are flawed. Though not a magic bullet, some techniques can be helpful in increasing the size of the datasets you have to work with, giving you more answers sooner rather than later. 

Build tools so everyone can help. Business people may feel helpless as they wait for data scientists and ML engineers to do that voodoo they do. This can cause anxiety and add to the pressure the whole team feels. Perhaps there are some tools that can be built that empower the business people to help with data analysis, data cleaning, munging, labeling etc., that helps make them feel useful and part of the process of finding a solution. 

Don’t be afraid to do things that don’t scale. I have been burned by this before and so Paul Graham’s essay really spoke to me. The whole point of a lot of ML solutions is automation. The technology should be intelligent enough to do what humans can do, but much faster and more scalably. That’s why venture capitalists invest. They don’t want to invest in proserv companies that do everything manually; that’s what consultants do. But in actuality it is often OK, with that first handful of customers, to do things manually (or take an initial, simple, rules or heuristic-based approach) in order to please the customer, but more importantly, to learn and understand what it is they ultimately need badly enough and are willing to pay up for. Startups definitely don’t want to get stuck doing things that don’t scale, but early on, while learning about the problem and solution (i.e., PMF), this approach can be acceptable. 

Try not to get discouraged. You may have thought this was going to be easier than it is turning out to be. But remember that you are probably at the beginning of a long journey and this is the first inning of the game. Your company is in an intensive learning cycle that is experiencing a lot of turbulence. This is where characteristics like grit and tenacity that we hear about so much in the startup zeitgeist really come into play and matter.

I hope a few of the lessons I have learned will save you time and resources, which tend to be most precious for a startup as you figure out PMF and ultimately, help raise your next round of funding. That’s all I have for now. Go forth and build your ML startup! 

Recent stories

View more stories

Let’s start a company together

We are with our founders from day one, for the long run.

Start a company with us