Lesson 1 — Data Protection and UK GDPR
Module 2, Unit 2 | Lesson 1 of 3
By the end of this lesson, you will be able to:
- Identify the six lawful bases for processing personal data under UK GDPR and explain which are most likely to apply in AI automation contexts
- Describe what counts as personal data in an AI context, including less obvious categories such as behavioural patterns and inferred characteristics
- Explain the rules around special category data and automated decision-making under Article 22, and recognise when they apply to an AI workflow
- Describe the principles of data minimisation, purpose limitation, and privacy by design, and explain what they mean for the way an AI system should be built
Why data protection is not optional
UK GDPR is not a bureaucratic formality that applies to organisations large enough to have a data protection team. It is a legal framework that applies to anyone who processes personal data about UK residents — and "processing" means almost anything you do with that data: collecting it, storing it, routing it through a system, using it to generate an output, sharing it, or deleting it.
If your AI automation project touches data about people — which almost every AI project does — UK GDPR applies to your work. Understanding it at the level of a practitioner means knowing which questions to ask, which obligations exist in principle, and when to escalate to a qualified expert or your organisation's Data Protection Officer.
The governing legislation is the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018, which together form the UK's post-Brexit data protection framework. The primary regulator is the Information Commissioner's Office (ICO), which publishes guidance, investigates complaints, and can issue fines of up to £17.5 million or 4% of global annual turnover for serious violations.
Key references:
- UK GDPR: https://www.legislation.gov.uk/eur/2016/679/contents
- Data Protection Act 2018: https://www.legislation.gov.uk/ukpga/2018/12/contents
- ICO guidance hub: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/
What counts as personal data?
Personal data is any information that relates to an identified or identifiable living individual. That sounds straightforward, but in an AI context the definition extends further than most practitioners initially assume.
The obvious examples — names, email addresses, phone numbers, national insurance numbers — are clearly personal data. Less obvious but equally covered: IP addresses, device identifiers, location data, and any combination of data points that could, together, identify a specific person even if no single point could on its own.
In an AI context specifically, the following are regularly treated as personal data and require the same care:
Behavioural patterns — if your system learns from how specific users interact with a platform (what they click, when they respond, how long they spend on particular content), those patterns are personal data even if no name is attached.
Voice and image data — recordings of calls, photos, or video feeds from which individuals could be identified constitute personal data. In some cases they are biometric data, which triggers additional protections.
Inferred characteristics — if an AI system infers something about a person — their likely income range, their political views, their health status — from patterns in their data, those inferences are personal data even if never directly observed.
Model training data — if personal data was used to train an AI model, the fact that it is now embedded in model weights rather than stored in a database does not automatically remove the data protection obligation. This is an evolving area with active ICO guidance.
Most data protection law was written before large-scale AI training existed. The question of whether a model 'holds' personal data — and therefore whether deletion requests can apply to it — is one of the most actively debated areas in AI regulation right now. The ICO is developing specific guidance on this, and several national regulators across Europe have already investigated foundation model providers on exactly this basis.
🔑 Key term: Personal data — any information relating to an identified or identifiable living person. Identifiability is the key test: if someone could be identified from the data either alone or in combination with other information you hold, it is personal data.
💬 Reflection
Think about the process you are planning to automate. What data does it currently use or produce? Go through each type and ask: does this relate to a living person? Could it contribute to identifying someone? This exercise will form the basis of Part A of your Legal Compliance Checklist in the unit activities.
Key references:
- ICO reference on what constitutes personal data: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/personal-information-what-is-it/
The six lawful bases for processing
Under UK GDPR, you cannot process personal data unless you have a lawful basis for doing so. There are six. They are not ranked in order of importance, and only one needs to apply — but it must genuinely apply, and you must identify it in advance.
Consent — the individual has given clear, specific, informed, and unambiguous agreement to the processing. In professional AI contexts, consent is often harder to rely on than it sounds: it must be freely given (which is difficult when there is a power imbalance, as between employer and employee), and it must be as easy to withdraw as to give.
Contract — processing is necessary to perform a contract with the individual, or to take steps at their request before entering one. If you are automating a customer onboarding process, and the automation processes data the customer provided to set up their account, this basis may apply.
Legal obligation — processing is necessary to comply with a legal obligation on the organisation. Tax record-keeping or fraud prevention reporting are examples.
Vital interests — processing is necessary to protect someone's life. This is a narrow basis relevant in emergency or healthcare scenarios.
Public task — processing is necessary for a task carried out in the public interest or in the exercise of official authority. Relevant primarily to public sector organisations.
Legitimate interests — the organisation has a legitimate interest in the processing, that interest is necessary, and the impact on the individual's rights does not override it. This is the basis most commonly relied upon in commercial AI contexts — but it requires a documented Legitimate Interests Assessment (LIA) and cannot be used as a default because no other basis applies.
Legitimate interests is the most flexible basis, but that flexibility comes with conditions. Think of the LIA as a three-step test you have to pass in writing:
- Is the interest real and specific — not just 'we want to improve our service'?
- Is processing genuinely necessary to achieve it, or would a less intrusive approach work just as well?
- Would the people whose data you are processing reasonably expect this use, and does the benefit outweigh any privacy impact on them?
Fail any of those three, and the basis does not hold.
For AI automation projects, the most commonly relevant legal bases are legitimate interests (particularly for internal or operational processing) and contract (where processing is strictly necessary to deliver a service to a customer). In employment contexts, consent is rarely appropriate due to the imbalance of power, as highlighted in ICO guidance. Instead, organisations typically rely on legitimate interests or legal obligation for employee data processing.
Key references:
- ICO guidance on lawful bases: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/lawful-basis/a-guide-to-lawful-basis/
- ICO guidance on employee monitoring and consent: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/employment/monitoring-workers/
Special category data
Some categories of personal data are considered particularly sensitive and receive additional protections under UK GDPR. Processing special category data requires both a lawful basis and a separate condition from Article 9. The eight special categories are:
- Racial or ethnic origin
- Political opinions
- Religious or philosophical beliefs
- Trade union membership
- Genetic data
- Biometric data (where used for identification)
- Health data
- Data concerning sex life or sexual orientation
For AI practitioners, the most common risk is processing special category data without realising it. An AI system trained on call recordings may inadvertently process voice data that reveals health conditions or accents indicating ethnic origin. A sentiment analysis tool applied to employee communications may infer political views. A scheduling or performance monitoring system may reveal health-related absences.
The principle here is not to avoid all contact with special category data — it is to recognise it when it is present and ensure the additional legal conditions are met before processing it. If you are uncertain whether your system touches special category data, that uncertainty is itself significant: it should be investigated before deployment, not after.
Key references:
- ICO reference on special category data: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/lawful-basis/special-category-data/
Automated decision-making: Article 22
Article 22 of the UK GDPR gives individuals specific rights where decisions are made about them using automated systems without meaningful human involvement, and where those decisions have legal or similarly significant effects.
🔑 Key term: Automated decision-making (Article 22) — a decision made solely by automated means, without meaningful human involvement, that produces a legal effect or a similarly significant effect on an individual. Examples include automated credit scoring, CV screening systems, and automated redundancy selection.
The key questions for a practitioner are:
Is this decision solely automated? If a human reviews the AI’s recommendation before acting on it, and that review is genuine, Article 22 may not apply. However, this review must be meaningful. A superficial or “rubber stamp” check is not sufficient and does not remove Article 22 obligations. This aligns closely with the principle of meaningful human oversight.
The phrase 'solely automated' is doing a lot of legal work. The ICO's position is that a human review must be genuinely meaningful — the reviewer must actually look at the AI's output, have the information to evaluate it, and have the authority to override it. A reviewer who approves 95% of outputs in under ten seconds is not providing meaningful oversight. The obligation depends on the reality of the review, not the existence of a review step in the process diagram.
Does it have legal or similarly significant effects? Legal effects include decisions about employment, credit, housing, or benefits. "Similarly significant" effects are those that substantially affect a person's circumstances — being rejected for a role, having an account restricted, or being routed to a lower level of service.
Where Article 22 applies, individuals have the right to obtain human review of the decision, to express their point of view, and to challenge the decision. Organisations must inform individuals that automated decision-making is taking place and provide meaningful information about the logic involved.
This has direct implications for AI automation in recruitment, performance management, customer credit decisions, and content moderation — all common automation use cases. If your project process touches any of these areas, Article 22 should be explicitly addressed before deployment.
Key references:
- ICO guidance on automated decision-making: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/individual-rights/individual-rights/right-related-to-automated-decision-making-including-profiling/
Data minimisation, purpose limitation, and retention
Three of UK GDPR's core principles are especially relevant to AI automation design.
Data minimisation means you should collect and process only the data that is adequate, relevant, and limited to what is necessary for the specified purpose. AI systems are often data-hungry by design — more training data can improve performance, and more input data can improve output quality. The data minimisation principle pushes back directly against this instinct. If a classification task can be performed reliably on the text of a customer email, the system should not also ingest the customer's account history, browsing behaviour, and previous complaint record unless that additional data is genuinely necessary for the stated purpose.
Purpose limitation means data collected for one purpose cannot simply be reused for a different purpose without fresh justification. If customer data was collected to process transactions, it cannot automatically be used to train a sentiment analysis model without a separate legal basis for that new purpose.
Retention means you must not keep personal data for longer than necessary. Automated systems that retain all historical data indefinitely — because storage is cheap and historical data might be useful someday — create legal exposure. Retention periods must be defined and enforced.
These principles matter at the design stage, not after deployment. Building a system and then asking the data protection team to review it is the wrong order. The right approach — privacy by design — is the subject of the next section.
Key references:
- ICO guidance on data minimisation: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/data-protection-principles/a-guide-to-the-data-protection-principles/the-principles/data-minimisation/
Privacy by design
Privacy by design is the principle that data protection should be built into a system from the start — not layered on afterwards. It is required by UK GDPR (Article 25) and is good engineering practice for reasons that go well beyond compliance.
In practice, for an AI practitioner at Level 4, privacy by design means:
Asking data protection questions at the design stage, not at review. Before specifying what data your system will ingest, ask what data it genuinely needs. Before building a logging function, ask what is being retained and for how long. Before connecting to a database, ask whether the system needs access to the full dataset or only a filtered subset.
Designing for the minimum. Default settings should collect and retain the minimum data necessary, not the maximum. Expanding data collection should require a positive decision and a documented justification.
Involving the right people early. If your organisation has a Data Protection Officer, they should be involved in the design process for any AI system that processes personal data — not consulted at the end for sign-off.
Key references:
- ICO guidance on privacy by design: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/accountability-and-governance/guide-to-accountability-and-governance/accountability-and-governance/data-protection-by-design-and-default/
Data subject rights
UK GDPR grants individuals a set of rights over their personal data. As a practitioner building automated systems, you need to know that these rights exist and that your system design must accommodate them. The rights most relevant to AI automation are:
Right of access — individuals can request a copy of the personal data held about them, including any data used to make automated decisions. An AI system that produces outputs based on personal data must be designed so that data can be identified and retrieved.
Right to explanation — where automated decision-making applies, individuals have the right to meaningful information about the logic involved. "The algorithm decided" is not a sufficient explanation. Practitioners must be able to describe, in plain language, what inputs the system considers and how they affect the output.
Right to object to automated processing — individuals can object to processing of their personal data for profiling or automated decision-making purposes. A system must have a pathway to handle such objections.
Right to erasure — individuals can request deletion of their personal data under certain circumstances. In an AI context, this extends to the question of whether data used in model training can be effectively "forgotten" — an area of active legal and technical debate.
The 'Right to be forgotten' in AI training is genuinely unsolved — technically and legally. Researchers have developed approaches called 'machine unlearning' that attempt to remove the influence of specific data points from a trained model, but these are not yet mature enough for consistent regulatory reliance. Meanwhile, the Italian and French data protection authorities have both issued enforcement actions against AI providers on erasure grounds. Search for 'machine unlearning GDPR' or 'right to erasure AI model training ICO' to explore where the debate currently sits.
Understanding that these rights exist is the first step. The second step, for a practitioner, is to ask whether the system you are designing makes it possible to honour them — and to flag it if the answer is no.
Key references:
- ICO guide to individual rights: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/individual-rights/
If your work crosses UK and EU borders
UK GDPR and EU GDPR are, in substance, nearly identical. UK GDPR was created by retaining the EU regulation into UK law at the point of Brexit and making targeted amendments to account for the UK operating as a third country. For most practical compliance purposes — the lawful bases, the individual rights, the special category rules, the Article 22 obligations — the two frameworks require the same things.
The difference is jurisdictional, not substantive.
Which law applies to your AI system depends on whose data it processes and where your organisation operates:
- EU GDPR applies when you process personal data of people in the EU, regardless of where your organisation is based. A UK company building an AI system that processes data about EU customers is subject to EU GDPR for that processing.
- UK GDPR applies when you process personal data of people in the UK. An EU company processing UK customer data must comply with UK GDPR.
- If your AI system processes data about people in both jurisdictions, both frameworks apply simultaneously. This is not unusual — it simply means your legal and compliance review needs to cover both regulators.
Data transfers between the UK and EU are currently facilitated by the EU's adequacy decision for the UK, which allows personal data to flow from the EU to the UK without additional safeguards. This decision is subject to periodic review. Transfers in the other direction (UK to EU) are unrestricted, as the UK has designated the EU and EEA as adequate.
Different regulators are the most significant practical difference. In the UK, the regulator is the ICO. In the EU, the relevant supervisory authority depends on where your organisation has its main establishment — the "lead supervisory authority" under the one-stop-shop mechanism. For organisations without an EU establishment, the supervisory authority in each member state where affected individuals are located has jurisdiction.
If your organisation operates in EU markets, or your AI system processes data about EU residents, you should flag this during the design phase so that your legal team can confirm whether EU GDPR compliance obligations apply and whether any differences from the UK position are material for your specific use case.
Key references:
- ICO guidance on UK GDPR and EU GDPR: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/
- European Data Protection Board (EDPB) — guidelines and opinions: https://www.edpb.europa.eu/our-work-tools/general-guidance/guidelines-recommendations-best-practices_en
- ICO guidance on international data transfers: https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-resources/international-transfers/
📝 Activity 1 — The Legal Compliance Checklist
Estimated time: 60 minutes (complete in stages across the unit)
Work through the two sections below, applying each area to your own project process. Complete Sections 1 and 2 now, after this lesson. Section 3 follows Lesson 2. Section 4 follows Lesson 3.
For each question, answer from your own knowledge first. If you are confident, write your answer and move on. If you are uncertain — or want to check your thinking — use an approved GenAI tool to help, then note briefly what it added, confirmed, or got wrong.
The flag question at the end of each section is always yours to answer. That is where you name what needs escalating to a colleague, DPO, or specialist before your project proceeds.
Section 1 — Data Protection and UK GDPR
Question 1. Does your process involve personal data? If yes, what type and whose?
Question 2. What is your proposed lawful basis for processing? Which of the six bases applies, and why?
Question 3. Does your AI component make automated decisions with legal or similarly significant effects? If yes, what Article 22 obligations apply?
Question 4. What data minimisation and retention approach will your system take?
Flag. What areas in this section require further investigation or input from a DPO or legal colleague?
Section 2 — Special Category Data
Question 1. Does your process involve any of the eight special categories? If yes, which — and is that presence direct or indirect?
Question 2. What additional safeguards would be required if special category data is confirmed to be present?
Flag. If you are uncertain whether special category data is involved, note that uncertainty here. It should be investigated before deployment, not after.
💡 Complete Sections 3 and 4 after Lessons 2 and 3. Those sections cover employment and equality considerations and transparency and accountability — the content you need is ahead.
⏭️ Up next — Lesson 2: With the data protection framework established, Lesson 2 turns to employment law, equality, and the legal considerations that arise specifically when AI automation affects people's roles, opportunities, and working conditions.