Using AI to automate business processes is hard. We’ve been working in this area for a few months now, and I wanted to share some of our learnings. If you’re looking to build a product or a business around AI, this is for you. I’ll assume you’ve validated that the problem exists and your customers are willing to pay for it. This applies to B2B products at the application layer. It does not apply to companies conducting foundational research or trying to solve a problem over a longer time horizon. Most of these learnings are based on using multi-modal models and assume you have aren’t training a model of your own.
Do you need an LLM?
Your first step is to evaluate whether AI is the right tool for the job. Start by ignoring AI. Why has the workflow not been automated entirely? Does it have an element of subjectivity that AI can help with, or is there a different reason for it not being automated? Don’t fall into the trap of assuming everything is a software problem. Spend time with your customers to map out the workflow in detail. For example, when considering AI for legal workflows, drafting routine documents like NDAs is a strong fit, as AI can automate the subjective task of generating standard language based on specific inputs. However, advising clients on bespoke litigation strategy may be harder with AI.
Minimum acceptable failure rate
Start by defining a minimum acceptable failure rate.
Once you’ve established that AI could meaningfully improve the workflow. Your next step is to determine if AI is the right tool for this workflow. AI always comes with a certain rate of failure. It will decrease over time but more on that later. You need to form a view on the maximum acceptable failure rate for the workflow in question. Talk to your customers to understand the failure rate today. Your product needs to beat it. You could argue that a higher failure rate is acceptable for a reduction in cost, but this feels like a losing strategy long-term.
It’s possible to recover from failure using assistance from a human. This is something you should consider for your workflow, but if you do, consider the following points. First, will your approach actually make the workflow more efficient? If yes, consider the cost of learning a new tool. The increase in efficiency needs to be large enough to warrant your customer investing time in this. If both of these are true, you should ask yourself what stops an existing software vendor from optimising the workflow in the way you imagine.
Prototyping the failure rate
Build a prototype to test what the baseline failure rate is.
At this point, you haven’t written a single line of code. You’ve talked to enough customers to determine that AI can meaningfully improve the workflow either entirely or assisted by a human. You’re going to build a prototype to validate the failure rate.
Focus only on the part of the workflow that you believe you need AI for. Take the most capable models available to you: Claude Opus, GPT-4 or Gemini 1.5 are your best candidates. Write code to establish if AI gives you the output you need for each step.
In most cases, this is a simple Python script. You will be tempted to use a library or framework, don’t do it. You will need lots of bells and whistles to turn this into a product, ignore all of them for now. Remember that your sole focus is the components of the system that AI can help with. Verify outputs manually by yourself or with your customers. The latter is preferred because you want someone with domain expertise on the workflow. Eventually, you will have an automated set of evaluations, but it’s too early for that right now. You need to know what good looks like before you can automate it.
Most of your time will be spent on trying different approaches with AI, such as: optimising prompts, breaking up promts, using a mix of models (e.g. vision + text), using RAG (retrieval augmented generation), and in some cases, specialised models.
The failure rate and time to market tradeoff
You’ve established a baseline failure rate, it’s time make a decision with imperfect information.
Your prototype will give you a failure rate for each step, and a failure rate for the overall workflow. Over time, this failure rate will improve thanks to your team’s efforts and models getting better. The critical question is whether the rate, as it is today, is acceptable to get started with customers.
Instead of thinking of this as a binary yes or no, think of it along the axis of time to market. The best-case scenario is that the failure rate is acceptable for your customers. The next best case is that it’s not quite there yet but you see a clear path to getting there. If the failure rate is very high but you believe you can bring it down over time, it might actually be a sizeable opportunity. You should go for it. Equally, you should acknowledge that it’s going to take some time before you can market the product and sell it. This is okay, it just helps to set expectations for your team and your stakeholders.
Ship fast, and ship often
There’s really no substitute for getting into the weeds. If you must abide by one principle, it is to ship fast and ship often. Choosing the right workflow and setting clear expectations on time to market will help you find the best workflow to go after.
great to hear from you again Krishna!