
An AI forecast come undone in some dramatic fashion. The failure is more likely to start in a quiet way.
Perhaps a product team puts together a prompt that says what they want it to say. Or the sales side will request a probability score but leave “success” undefined. A manager might go along with a model’s recommendation on the strength of its apparent objectivity. Before you know it, the forecast is on a dashboard, the dashboard is in a meeting and the number has been made into a decision. At that point you are not so much dealing with a technical issue as a procedural one.
Rebecca Surtay has been looking at just that kind of thing in her work on how AI is used for decision-making. A B2B marketing and product development specialist from Kazakhstan, Surtay zeroes in on the practical matter of how to vet the human assumptions that exist between what a model spits out and the business call that follows.
She doesn’t offer up another model for the solution. Instead she has put forward a protocol for human-factor validation so teams can have a good look at their AI-assisted forecasts before they let them inform any managerial or marketing moves. It comes down to this: technical validation will show you if the model is doing its job. Human-factor validation is about whether your organisation is reading it in a reliable manner.
The Missing Step in Many AI Workflows
You will find that the majority of AI workflows are data, model and output driven. In its simplest form it is a five part process:
- Set out the task.
- Get your data in order.
- Put the prompt or model to work.
- Look at what comes out.
- Decide on a course of action.
Step four is where things get problematic; it is all too often given short shrift. A team might be content to see if the output “looks right” without pausing to consider if they have framed the question properly. They could put two of the AI’s answers side by side for comparison and miss the fact that both rest on the same faulty premise. Or they will have the AI predict a project’s success but not bother to specify what they mean by that – is it about revenue, user engagement, contract closure, retention or simply whether the operation is feasible?
Rebecca Surtay has an answer for this. Her work zeroes in on the gap between what the model produces and what a human does with it. There is plenty of room for error in that gap. The prompt can carry a bias, or a team might let the apparent precision of an output lull them into ignoring uncertainty. You can have a forecast taken out of context, or a decision that is propped up by the AI instead of being put to the test. The model may be sound from a technical standpoint but wrong in its strategic application. For a product manager or business analyst, it is a vital point to understand: the model can do its job and the decision can still be a failure.
Model Validation Is Not Decision Validation
Rebecca Surtay makes a handy distinction between validating the model and validating the decision. When you are doing model validation, you want to put the numbers to the test: is the data you can trust? Is the model up to the job in terms of accuracy? You will be looking for performance hiccups or any red flags on fairness, privacy and security. And does it hold up consistently from one test case to the next?
Decision validation is an altogether different exercise. Here you have to wonder if we were asking the right question in the first place and whether our interpretation of the output is sound. What sort of assumptions are we making in the prompt? For this recommendation to be viable, what has to be true? Has someone from outside the team put the result to the test? Or are we simply using AI to put a stamp of approval on a decision we have already come to?
You cannot swap one for the other. A model may be technically sound, but if the problem was poorly framed by the team, it will underpin a weak decision. A probability score can give you a false sense of confidence when the context is anything but certain; a generated recommendation might be persuasive yet ignore the realities of the market or your organization. For that reason, you should make human-factor review an integral part of the AI workflow and not some soft afterthought.
A Human-Factor Audit for AI Forecasts
Rebecca’s validation protocol as an audit of the human element. It is not a substitute for model testing, data governance, a cybersecurity review or legal compliance checks. Its function is more focused: to look at the assumptions and interpretations that turn AI output into a business decision.
One way to put this kind of audit to work in practice is to run it through five questions.
1. What have we asked the AI to evaluate?
Too often an AI-assisted decision is predicated on a question that lacks precision. A team might put forward something like “Will this product launch be a success?” which is overly broad. You are better off with: “Given comparable launches in this market, what will drive first-quarter adoption with mid-sized enterprise customers?” The latter is more useful since it lays out the context, the audience and the time frame for the outcome you want to see.
Mrs. Surtay would say problem framing is key. A well-crafted answer can still be misleading if the question was vague to begin with. Or if your prompt is already leaning towards a positive result, the model may give you a forecast that simply validates the team’s optimism. So a proper human-factor audit has to involve some prompt inspection:
- Are we asking for evidence or just reassurance?
- What is the actual decision at hand?
- Does the prompt make the outcome unambiguous?
- What sort of assumptions have we put in there?
2. For the forecast to be valid, what has to be true?
The next step is causal analysis. When an AI puts forward a forecast that a project will be a success, the team ought to question it. Why does it think so? You want to know what is behind that figure: which factors are taken for granted as stable, what risks have been dismissed as inconsequential and where do the unmanageable dependencies lie?
Take the case of a prediction based on customer interest. The model might be of the opinion that this will lead to sales. Yet in B2B you can’t make that assumption so easily. Interest is only part of the story; the outcome is just as likely to be shaped by procurement rules, budget cycles, internal politics or the trust of your stakeholders, not to mention the burden of implementation.
A good causal review would put some hard questions to the table:
- What is really driving the predicted outcome?
- Can we put any of those drivers to the test?
- Where are the uncertain assumptions?
- Is there anything that could invalidate the forecast?
- Are we mistaking correlation for causation?
You can’t afford to skip this when making calls on market entry, product strategy or business development with the help of AI. In the end, data doesn’t tell the whole story; human behavior and the context of the organization do.
3. What happened in similar cases?
Then you have reference class forecasting for the third step. It is true that teams will tell you their project is one of a kind, and in some ways it may be. Yet most business decisions are part of a larger category: think of product launches or software rollouts, market expansions, enterprise sales, AI pilots, wellness programs or user adoption drives.
The point of reference class forecasting is to put your present decision up against the likes of it from the past. You want to move beyond the question “What do we think will happen?” and instead put to the team: “What is the usual course of events with a project of this sort?”
It is a way to check overconfidence. Say there is a history of similar initiatives running into trouble due to poor stakeholder alignment, no clear owner or adoption hurdles; those are patterns you should factor in before turning a forecast into a decision.
In a review you might go through some practical questions:
- How does our case stack up against comparable ones?
- What were the real results of those projects?
- What drove them to succeed or fail?
- Were there any risks that kept cropping up?
None of this is to say you should disregard your own circumstances. Only that you shouldn’t be guided by internal optimism alone.
4. Who is reviewing the forecast from outside the project team?
Then you have the fourth step: an independent review by stakeholders. There is a risk with AI in decision-making, namely that the team tasked with making sense of the forecast is also the one with an interest in a good result. It is a natural conflict and, even if there is no ill will, they are inclined to go along with outputs that suit their purposes.
To counter that, Mrs. Surtay’s protocol puts in place a formal challenge from a party not on the project team. You might bring in an executive from another function, an outside adviser or domain expert, a risk reviewer, a user rep, or someone from a different market altogether.
The point isn’t to put the brakes on things. Rather, it is to get a fresh set of eyes on it before the organization puts its resources on the line. A competent reviewer will be asking the hard questions: What is missing? What has the team taken for granted without proof? What would cause this to fail? And most importantly, would we have made the same call had the AI not been there? In doing so, human review ceases to be a mere formality and serves as proper governance.
5. What decision record are we leaving behind?
Then you come to the matter of documentation. All too often an organization will not put on record how an AI-assisted decision was reached, making it hard to go back and review. You might have the output on file but not the prompt; or the final call is documented while the assumptions are not. A forecast can be cited without any mention of the objections that were put forward prior to approval.
What a human-factor audit ought to produce is a decision record. In it you would find the original model query, the recommendation or forecast, the assumptions at play, the reference cases that were looked at, and the independent reviewer’s input. It should also show the final decision and why the team chose to accept or spurn what the AI put forward.
There is no need to make it complicated; a brief decision log is enough to ensure accountability. And this is not about being bureaucratic, it is for learning. When a decision holds up, you know your assumptions were sound. When it does not, the team can trace the failure back to the execution, the interpretation, the data or the model itself.
Why This Matters for Product and Data Teams
You will hear a lot of talk about AI governance at the executive or compliance table. Yet in practice, the decisions that are AI-assisted are put in place long before they get there by product managers, analysts, marketers and the like.
It is these people on the ground – your data teams and operational leaders – who determine what to ask of the system and which data to put in. They make sense of the outputs and craft the recommendations for leadership, often spinning model results into a business narrative.
So you have to factor in human validation at the working level. It has its uses for every department: a product team might avoid building a strategy on shaky assumptions; a data team can be sure the output is being read correctly; marketing can be kept from turning an AI forecast into some well-polished wishful thinking; and it gives managers a more disciplined way to go from prediction to action.
The logic holds up outside of B2B forecasting as well. Whether you are in education, healthcare, wellness or movement analytics, an AI may give you an output, but it is still up to a human to decide what it means and what to do with it.
From AI Output to Human Context
You can see the link to Surtay’s wider preoccupation with human-centered AI in her work on things like movement analytics and wellness education. Take BodyFusion for instance: the AI is not churning out a sales forecast here, but rather offering movement feedback or flagging some pattern in how the user behaves.
The underlying principle is no different. It does not matter if the system has picked up on a pattern; it is the human experience that will tell if the feedback actually improves behavior. The same holds for forecasts in a business context. A model might put forward a recommendation, but the organization has to be the one to interpret it with responsibility. For the AI output to be of any use, people have to be able to trust it and make sense of it in context.
What you find running through all of Surtay’s work is this: do not judge an AI by how sophisticated its output looks, but by the quality of the decision-making that goes on around it.
A Practical Checklist
Teams relying on AI for their forecasts would do well to run through a few questions:
- Have we been clear on what the decision is?
- Have we looked at the prompt for any hidden assumptions?
- Do we know the causal logic of the forecast?
- Have we put the case side by side with comparable projects?
- Has an outsider to the project team put the result to the test?
- Is there a record as to why we went with (or didn’t) the AI’s suggestion?
- And did we distinguish between the confidence of the model and our own?
That final point is perhaps the most critical. There is a difference between the two. A model will be confident simply because it recognizes the input. An organization should only be confident in its decision once it has vetted the risks and context of the output for itself.
The Real Challenge Is Not Just Better AI
AI taking hold in business workflows these days, but the challenge ahead is no longer a matter of just putting together more powerful models. The task is to put better decision systems in place around them.
That means you need technical as well as human-factor validation; data governance and prompt discipline in equal measure. There has to be a human-in-the-loop review, one with some real independence to it.
Take Rebecca Surtay’s protocol for instance, which shows how teams can put this absent layer in order. It gets an organization to stop asking “What did the AI predict?” and start posing a question of more use: “Have we put the decision this prediction will sway to the test?”
For those integrating AI into their day-to-day work, that is a question that will only grow in significance. In the end, the worth of your AI will not be in the speed of its answers so much as in whether you have the wherewithal to challenge them before you act.
Comments
Loading comments…