Lesson 1 — Goal Statements and Success Metrics

Module 3, Unit 2 | Lesson 1 of 3

By the end of this lesson, you will be able to:

Apply the SMART framework honestly, recognising the failure modes that make most SMART goals look correct but mean nothing (K3, K4, K7)

Include the AI-specific metrics that almost always belong in the goal — fairness, drift, override rate, reversibility — and explain why each one is not optional (B1, B2, B3, B4)

Why goal statements are the part of a project that fails first

Most AI projects fail their goals quietly, and they do it before delivery even begins. The failure happens at the moment the goal is written. If the goal is vague — "improve customer experience", "streamline operations", "leverage AI to drive efficiency" — the project will eventually declare success against it because vague goals cannot be falsified. If the goal is specific but measures the wrong layer — counting workshops held, or systems deployed, rather than what the project was meant to actually change — the project will declare success against the wrong evidence. Either way, the goal-setting step is the one that determines whether success will be a recognisable thing or a rhetorical thing.

Goal-setting also has a particular quality on AI projects that makes it harder than on traditional initiatives. Traditional projects can usually predict their outcomes with reasonable confidence: this system will replace that one, this report will be automated, this process will move from manual to digital. AI projects have an additional layer of uncertainty: the model's behaviour will depend on data it has not yet seen, on edge cases that have not yet appeared, on patterns that may shift after deployment. A goal that does not acknowledge this uncertainty will misrepresent what success means; a goal that acknowledges it too much becomes unfalsifiable. The discipline is in finding goals that are falsifiable, honest about uncertainty, and useful as decision aids — and most goal-setting templates do not help with this.

🔑 Key term: Goal statement — a written commitment to what the project will achieve, expressed precisely enough that a reasonable observer could later determine whether the commitment was met. If the statement cannot be tested against evidence, it is not a goal — it is an aspiration. AI projects need goals that include behavioural and operational measures, not only delivery measures.

SMART, applied honestly

The SMART framework — Specific, Measurable, Achievable, Relevant, Time-bound — is the most widely taught goal-setting structure in the world. It is also the most widely abused. The problem is that it is easy to satisfy each letter cosmetically while still producing a goal that means nothing. "Deploy the AI assistant by Q4 to improve productivity" satisfies SMART on its face, and tells you nothing about what success would look like.

A more useful version of SMART acknowledges the failure mode behind each letter.

Specific is meant to mean clear about what is being changed and for whom. The failure mode is generic specificity — a goal that uses concrete-sounding words but applies to almost any project. "Improve customer service quality" is generic; "reduce average claim handling time for first-notification cases in personal lines insurance" is specific.

Measurable is meant to mean associated with a metric that can be observed. The failure mode is measurable but meaningless — a metric that exists but does not capture what the project was meant to change. "Number of training sessions held" is measurable. "Number of cases handled correctly within the new SLA" is meaningfully measurable.

Achievable is meant to mean plausible given the resources and constraints. The failure mode is achievable but trivial — a goal so easy that achieving it tells the organisation nothing it did not already know. The discipline here is to ask: if we hit this goal, would anyone care? If the answer is no, the goal is too easy.

Relevant is meant to mean connected to a real organisational priority. The failure mode is relevant by assertion — a goal whose link to organisational priority is asserted in a covering memo but not actually demonstrated. The test is whether someone outside the project team could reconstruct the link from the goal statement alone.

Time-bound is meant to mean with a date by which success will be assessed. The failure mode is time-bound but unowned — the date exists, but no one is named as accountable for the assessment, so the date passes without the assessment happening. SMART rarely names this; it should.

Used honestly, SMART is a useful discipline. Used cosmetically, it produces goals that look professional and behave like nothing.

SMART goal-setting applied honestly — the failure mode behind each letter, and how to avoid it

AI-specific metrics that almost always belong in the goal

Generic project metrics — schedule, budget, scope, satisfaction — are necessary but not sufficient for AI initiatives. Four further metrics almost always belong in the goal statement, because their absence is what causes AI projects to be later judged differently than their goal claimed.

The first is a fairness measure. The metric will depend on the use case — performance parity across demographic groups, false-positive rates by segment, equality-of-opportunity, demographic representation in the training data — but the principle is the same: a goal that does not commit to a fairness outcome is a goal that has decided fairness is not part of success. That decision will be revisited later, by someone less sympathetic to the project, and at higher cost.

The second is a drift threshold. Models in production change behaviour over time as the underlying data changes. The goal should specify what level of drift would trigger retraining, escalation, or a service decision. "Performance is monitored monthly; if accuracy on group X falls below threshold T, the model is reviewed within Y days" is a goal commitment, not a monitoring detail.

The third is a human-override rate. For systems that include human-in-the-loop or human-on-the-loop oversight, the rate at which humans accept, modify, or reject the model's suggestions is an outcome metric in its own right. A system whose override rate is too high is producing low value; a system whose override rate is suspiciously low is producing automation bias. Both are detectable only if the metric is in the goal.

The fourth is a reversibility test. Some AI deployments are easy to roll back; others bake themselves into operations in ways that make rollback expensive or impossible. A goal that includes a commitment to reversibility — "the system can be removed and the prior workflow restored within Y days, with these specific implications" — is one that has thought about the failure case before celebrating the success case.

These four metrics are not optional decorations. They are the difference between a goal that holds up under scrutiny and one that becomes embarrassing when the scrutiny arrives.

Four AI-specific metrics — fairness, drift, override rate, reversibility — and the failure mode each one prevents

Did you know?
Google's famous OKR system — Objectives and Key Results — has a surprising lineage. Peter Drucker wrote about "management by objectives" in 1954. Andy Grove adopted and reshaped the idea at Intel in the 1970s, calling his version OKRs. John Doerr learned the system as a young Intel engineer, and when he became a venture capitalist at Kleiner Perkins he taught it to a thirty-employee startup he had just invested in. The startup was Google. The framework that now runs goal-setting in much of Silicon Valley travelled through three CEOs and one investor across nearly fifty years before it became the household name it is today. Most of the teams using OKRs have not heard of Drucker.

Project Activity — Complete section 2.3: goal statement

Open the Module 3 Project workbook and complete section 2.3 Well-defined goal statement. This is where your recommendation becomes measurable and governable.

Write one SMART goal for the initiative. Include baseline, target, timeframe, accountable owner, and the organisational outcome it serves.
Add the four AI-specific commitments from the workbook: fairness measure, drift threshold, override rate, and reversibility.
Write the condition that would invalidate the goal. If no evidence could prove the goal wrong, it is not yet measurable enough.

Project Checklist

Section 2.3 contains a SMART goal with baseline, target, timeframe, and owner.
I have committed to a fairness measure appropriate to the people affected by the system.
I have set a drift threshold and explained what happens if it is breached.
I have defined override-rate expectations where human oversight applies.
I have explained how the system could be reversed or removed if it fails.
The goal is specific enough for a reviewer outside the project to verify later.

Quick Check

⏭️ Up next — Lesson 2: With goals committed, Lesson 2 turns to scope discipline — how to draw the line between what is in and out of an AI initiative using MoSCoW, SWOT, and the Iron Triangle, and how to translate scope into measurable deliverables.

Next: L2.2 - Scope Discipline