AiCore logo

Lesson 2 — Critically Evaluating the Solution

Module 3, Unit 1 | Lesson 2 of 3

By the end of this lesson, you will be able to:

  • Distinguish the four main AI solution archetypes — pre-built API, off-the-shelf product, fine-tuned foundation model, and custom build — and the realistic trade-offs of each (K7, K13)
  • Apply a vendor and solution evaluation framework that is specific to AI, not borrowed wholesale from generic IT procurement (K24, S3, S5)
  • Estimate the total cost of an AI solution honestly, including the costs that do not appear in vendor pricing (S22, S15)
  • Recognise when the right answer is not to build at all — and articulate that recommendation defensibly (B1, B2, B3)

Why solution evaluation is its own discipline

By the time a project reaches the how should we build this? question, the team has usually already fallen in love with one of the answers. The data scientists want to fine-tune; the procurement team wants to buy a product; the senior sponsor read an article last week and wants to use the API the article mentioned. These are not deliberations. They are preferences dressed up as analysis, and they tend to lock in a delivery path before the actual trade-offs have been examined.

Solution evaluation is the discipline of slowing down long enough to ask which of the realistic options would actually serve the problem best. It sits between problem framing (Unit 1 Lesson 1) and delivery planning (Unit 2 Lesson 3), and it is the step the portfolio specifically asks you to do well — critically evaluate the solution is the third item on the list. The reason it is its own discipline, rather than a sub-bullet under scope, is that the four AI solution archetypes have radically different cost curves, time-to-value profiles, lock-in exposures, and capability implications. Choosing between them by instinct, or by what the team already knows how to build, is one of the more common ways AI initiatives end up technically successful and operationally regretted.

🔑 Key term: Solution archetype — the structural shape of how an AI capability will be delivered. The four most common are pre-built API (calling someone else's hosted model), off-the-shelf product (a vendor application with AI inside), fine-tuned foundation model (adapting a third-party model to your data), and custom build (training a model from scratch). Each archetype implies different costs, risks, and capabilities — and choosing between them is rarely the same decision as the technical implementation choices that follow.


The four archetypes

These four archetypes are separate categories, not a journey. Choosing between them is a question of which one fits the problem you are trying to solve — not a question of which one is more advanced or more mature. Each represents a different bet about where the value sits: in the model itself, in the integration around it, or in the data it learns from.

The pre-built API archetype calls a hosted model that someone else trained, hosts, and updates. OpenAI, Anthropic, Google, AWS, and others all offer general-purpose APIs for language, vision, and speech. The model is theirs; you supply the prompts and the integration. The strength is speed: you can prototype in hours and deploy in days. The exposure is that you do not control the model — its behaviour can change without notice, your data may or may not be used to improve it depending on contract terms, and the per-request cost can become expensive at scale. This is usually the right archetype for general-purpose tasks where the value sits in your integration, not in the model itself.

Worked example. A bank's customer-services team uses the OpenAI or Anthropic API to summarise call transcripts and draft follow-up emails. The team owns the prompt design and the integration into the ticketing system; the model itself is rented, not built. The bank's effort sits in the workflow around the model — quality checks, escalation rules, and the human review step before a draft is sent.

The off-the-shelf product archetype is a vendor application that already wraps an AI capability — a customer-service platform with built-in summarisation, a recruitment system with built-in screening, a coding assistant. You are not building anything; you are licensing a product that has done the AI work for you. The strength is that you also inherit the vendor's investment in evaluation, fairness testing, security, and ongoing improvement — work that is invisible from outside but expensive to do well. The exposure is that you also inherit the vendor's roadmap. If they discontinue the feature, change the pricing, or interpret the model's behaviour differently from how you would, you have limited recourse.

Worked example. An insurer rolls out Microsoft 365 Copilot for the workforce, and licenses a vendor platform such as HireVue for first-stage screening of recruitment applications. The AI sits inside the vendor product; the insurer's effort goes into procurement, vendor management, governance sign-off, and configuring the product to its own policies. The insurer is buying capability, not building it.

The fine-tuned foundation model archetype takes a general-purpose model and adapts it to your specific domain or task by training it further on your own data. The base model's general capability is preserved; the fine-tuning adds specificity. The strength is that you get behaviour shaped to your context — your terminology, your tone, your edge cases — without the cost of training from scratch. The exposure is that you now own a model artefact, with all the lifecycle obligations that brings: monitoring, retraining when the base model updates, evaluation, governance documentation. Fine-tuning is often pitched as "the best of both worlds" but is in practice a significant capability investment, not a quick adaptation.

Worked example. An insurer fine-tunes a foundation model on its own claims-handling notes so the system understands the firm's terminology, the recurring edge cases, and the tone its customers expect. The base model still belongs to its provider; the adaptation belongs to the insurer — and so does the ongoing duty to monitor, evaluate, and re-tune the model whenever the underlying base model updates.

The custom build archetype trains a model from scratch on your own data. It is rare, and it is rarely the right answer. Custom build belongs at the end of a process of elimination, not at the start. The bar to clear is high: the data is genuinely proprietary and unavailable to existing models, the problem sits outside what foundation models already address, the organisation has the in-house ML and MLOps capability to maintain a model over years (not months), and the value of doing this in-house clearly exceeds the multi-year cost of building, evaluating, governing, and retraining it. Most problems that feel unique on first inspection turn out to be solvable with one of the three lighter archetypes — and a 'custom build' recommendation should be treated as suspicious until each of the lighter archetypes has been ruled out on the evidence.

Worked example. A research-led bank trains its own credit-risk model on decades of internal underwriting data because no external model has access to that history, and the regulatory regime requires the bank to own and audit the model end-to-end. The team accepts that this is a multi-year capability investment, not a project, and resources it accordingly.

The four AI solution archetypes — four separate categories, with the kind of problem each one fits best


The decision framework

Choosing between archetypes is not a checklist; it is a structured conversation about five trade-offs. The five questions that matter most are these.

The first is where the value actually sits. Is the value in your data — something proprietary about your customers, your processes, your domain — or is it in the integration, the workflow change, the user experience, the operational fit? Fine-tuning or custom build only make sense if the value really is in the data. For most problems the value is in the integration, and a pre-built API or off-the-shelf product is enough.

The second is time-to-value. Pre-built APIs and off-the-shelf products can deliver value in weeks. Fine-tuning takes months. Custom build takes much longer. The right question is not which is fastest, but which timeline matches the problem's urgency.

The third is what capability you are building inside the organisation. Each archetype leaves the team with different skills afterwards — vendor management, prompt design, data engineering, applied ML. None of these is intrinsically better, but a plan that does not name which capability you are choosing to build will produce capability by accident rather than by design.

The fourth is what it ties you to long-term. Every archetype creates dependency: a vendor, a roadmap, a base model, your own maintenance team. The right question is not how do we avoid lock-in? but which dependency are we choosing, and is it one we are willing to live with?

The fifth is where the data goes. Different archetypes have very different implications for who processes your data, where it is stored, and which rules apply. The earlier this is checked, the better — initiatives have been stopped at deployment over data questions that should have been raised at planning.

Coach Cora

Doing this with AI

Once you have a draft recommendation, paste it back to the model with this prompt: "Score each of the four archetypes against the five trade-offs from my evaluation, and tell me which archetype I have not seriously considered and why I should." That last bit is what pays off — most teams have already chosen their archetype before the evaluation begins, and the model is reliably better than colleagues at calling it out.

Total cost of ownership for AI solutions

Vendor pricing rarely tells you what an AI solution will cost. The headline number — per-request API fees, annual licence, fine-tuning compute — is usually the smallest component of the actual cost over the system's life. The costs that get missed sit in three categories.

The first is integration and adaptation. Hooking an AI system into existing workflows, identity systems, data pipelines, monitoring infrastructure, and audit logging is rarely trivial. For pre-built APIs and products, this is often the dominant cost — frequently several times the licence fee. Budget plans that show only the vendor cost are signalling that integration has not yet been thought about.

The second is operational and oversight cost. AI systems require monitoring (drift, performance by group, error patterns), evaluation refresh, and human oversight. This is recurring, not one-off. For models you fine-tune or build, it also includes retraining cycles and the data engineering required to keep training data current. A common pattern is for organisations to budget for build but not for run, and to discover the run cost six months in when the system has degraded.

The third is risk and governance cost. Documentation, fairness assessments, security reviews, regulatory engagement, incident response capability — these are real costs and they scale with the sensitivity of the use case. A model that supports an internal productivity tool has a different governance cost than one that makes decisions about customers. The temptation is to treat governance as overhead; the more accurate framing is that it is part of the actual delivery cost, and pretending otherwise produces budgets that fail at the gate where they would have succeeded if they had been honest.

The total-cost picture frequently changes the archetype recommendation. A pre-built API that looks cheap on the surface can become expensive at scale once usage grows; a custom build that looks expensive can become cheaper over five years if the use case is high-volume and the API costs per request would have compounded. The point is not that there is a generally correct answer — it is that the headline number is rarely the right number, and a defensible recommendation has done the longer arithmetic.

Total cost of ownership for AI solutions — vendor cost is usually the smallest of three layers

Curious Cat

Did you know?

In 2023, Bloomberg announced BloombergGPT — an AI model they had built from scratch on their own financial data. At the time it was held up as the example of doing AI seriously: train your own model, on your own data, for your own industry. About eighteen months later, general-purpose models like GPT-4 and Claude, given the right prompts, were doing as well or better on the same finance tasks. The point is not that custom build never works. The point is that the choice of archetype has to take into account how fast the field is moving — what feels like a smart, defensible build today can be overtaken before the project even finishes.

When the right answer is no AI for this problem

The fifth option is no AI for this problem. You are AI practitioners — making this recommendation is not abandoning the discipline; it is refusing to apply it where it will not help. Sometimes a process change, a policy fix, training, a clearer escalation path, or a non-AI tool will produce a better outcome at lower cost than any AI solution. Forcing AI onto a problem that does not need it produces solutions in search of problems, and tends to fail loudly later. The patterns are familiar: benefits do not justify run cost once total cost of ownership is honest; the data required does not exist or cannot be obtained at acceptable cost or risk; the regulatory landscape is moving fast enough that committing now would be premature; or the capability the solution would build is not one the organisation actually wants.

A "no AI here" recommendation is credible when it shows the AI options were genuinely examined, when at least one non-AI alternative is named with a rough estimate of cost and benefit, and when the recommendation comes with conditions that would change the answer — "we revisit if data quality improves to X, if regulatory clarity emerges on Y, if volumes exceed Z." That conditional framing is what distinguishes a thought-through recommendation from a retreat.


Project Activity — Complete section 2.1: critically evaluate the solution

Open the Module 3 Project workbook and complete section 2.1 Critically evaluate the solution. Use the problem statement and root-cause analysis from Part 1 as your anchor.

  1. Score all four solution archetypes: pre-built API, off-the-shelf product, fine-tuned foundation model, and custom build.
  2. For each archetype, use the same five criteria from the lesson: where value sits, time-to-value, capability built, lock-in, and regulatory exposure.
  3. Take the no AI for this problem option seriously. Name the conditions under which the right recommendation would be to pause, redirect to a non-AI alternative, or not proceed — and identify what that non-AI alternative might be.
  4. Choose your recommended archetype and write the reason as a decision, not a preference. Link it back to the problem, the root cause, and the evidence you have.

Project Checklist

  • Section 2.1 evaluates all four archetypes, not only the option the team already prefers.
  • Each score has a short justification tied to my own project context.
  • I have included lifecycle implications: integration, maintenance, monitoring, governance, and capability.
  • The lock-in and regulatory exposure are explicit for the recommended option.
  • My recommended archetype follows from the problem and evidence in Part 1.

A team is choosing between calling a hosted language-model API and fine-tuning a foundation model on their own data. Which question, more than any other, should drive the decision?

A vendor is offering a pre-built API for £0.02 per request. The project will make 5 million requests per year. The total cost of the AI solution over three years is most likely to be:

Of the four AI solution archetypes, which is most often the right answer for problems where the AI capability is general-purpose (e.g. summarisation, classification, basic extraction) and the differentiating value of the solution sits in how it integrates into an existing workflow?


⏭️ Up next — Lesson 3: With the solution archetype chosen, Lesson 3 turns to the pitch. You will move from analytical work to communicating it — the executive one-pager, the pitch deck, and the four AI-specific failure modes that send otherwise strong projects back to the drawing board.