Does AI Development Qualify for SR&ED? The 2026 Guide

Using AI isn't R&D. Advancing it is.

·10 min read

Every Canadian tech company is building with AI right now — and CRA reviewers know it. AI-related SR&ED claims are exploding in volume, which means the category gets read with sharper eyes than any other. The companies that get paid are the ones that understand a distinction the CRA cares about deeply: using AI is not R&D. Advancing it is.

We see two opposite mistakes, both expensive. Teams that wrapped a hosted LLM API in a slick product claim everything and get carved up in review. Meanwhile, teams doing genuinely hard applied-AI work — fine-tuning under data constraints, fighting hallucinations against a measurable quality bar, squeezing inference onto hardware it was never meant for — assume “we just used existing models” disqualifies them, and never file. This guide is about where the line actually falls in 2026.

TL;DRCalling a model API with documented patterns is not SR&ED. Solving problems the documentation doesn't cover — with hypotheses, experiments, and measured results — usually is. The credit follows the engineering decisions you had to validate empirically, not the fact that AI was involved.

The three-part test, applied to AI

SR&ED eligibility always reduces to the same three criteria: technological uncertaintya competent professional couldn't resolve from existing knowledge, a systematic investigation(hypothesis → experiment → measured result → iteration), and an advancementof technical knowledge — even if only internal to your team. We cover the general version in our SaaS eligibility guide; what makes AI different is how fast the “existing knowledge” baseline moves.

In 2023, getting a basic retrieval-augmented generation (RAG) pipeline to work at all involved real unknowns. In 2026, standing up stock RAG with a vector database and an embedding API is a documented, tutorial-grade exercise — the CRA's hypothetical “competent professional” can do it from public material. The uncertainty has migrated up the stack: into retrieval quality under your specific corpus, evaluation methodology, latency and cost budgets, and failure modes that only appear at your scale. Claims that describe 2023-era uncertainty in 2026 filings are exactly what triggers reviews.

AI work that typically qualifies

Fine-tuning and training beyond the recipe

Following a vendor's fine-tuning quickstart with default hyperparameters is adoption. But the moment your problem leaves the recipe — a domain where labeled data is scarce or noisy, catastrophic forgetting you had to diagnose and mitigate, a custom loss or curriculum because the standard objective plateaued below your quality bar — you're running experiments whose outcomes weren't knowable in advance. That's the pattern the program pays for.

Retrieval and context engineering past stock patterns

Hybrid retrieval strategies you benchmarked against each other because no published result covered your corpus; chunking and re-ranking experiments driven by a measured answer-quality target; context compression to fit a latency or token budget that stock pipelines blow through. The signal is empirical comparison: you built an evaluation, tried competing approaches, and kept score.

Reliability engineering for agents and tool use

Getting an agentic system from a demo to a dependable product is, right now, a frontier full of genuine unknowns: error recovery when a tool call fails mid-sequence, guardrails that hold under adversarial input, orchestration patterns whose failure modes you had to discover by instrumented testing. If you can show the iteration trail — what broke, what you hypothesized, what you changed, what the failure rate did — this work maps cleanly onto the SR&ED test.

Inference optimization under hard constraints

Quantization and distillation to hit a cost or latency target the published model card says nothing about; batching and caching strategies for spiky production traffic; getting acceptable quality on-premises or at the edge because your customers can't ship data to a hosted API. Constraint-driven optimization with measured trade-offs is one of the most defensible AI claim patterns we file.

Evaluation harnesses and hallucination mitigation

When off-the-shelf benchmarks don't measure what your product needs, building the eval isthe experiment: defining metrics for factuality or grounding in your domain, building labeled test sets, and using them to drive iteration. Hallucination-reduction work — grounding strategies, verification layers, abstention tuning — qualifies when it's done against numbers rather than vibes.

Data pipelines and synthetic data with quality uncertainty

Synthetic data generation where it was genuinely unclear whether the generated distribution would transfer; deduplication and contamination detection at scales where published methods break down; labeling strategies you had to invent and validate. Unglamorous, heavily claimable, chronically under-claimed.

AI work that gets denied

“We integrated GPT / Claude into our product”

Calling a hosted model through a documented API, with documented SDK patterns, is not experimental development — no matter how transformative the feature is for your users. Product impact and technological advancement are different axes. The CRA only pays on the second one.

Prompt tweaking without measurement

Iterating on prompts until the output “looked better” is trial and error, not systematic investigation. The same activity canqualify when it's run as an experiment — a fixed eval set, scored runs, documented hypotheses about why a structure should help — but casual prompt fiddling logged nowhere is the single most common soft spot reviewers poke in AI claims.

Standing up vendor infrastructure as documented

Deploying a vector database, wiring an orchestration framework, following the LangChain or LlamaIndex docs — this is competent engineering, and none of it clears the uncertainty bar. Pushing one of those tools past its documented behavior, and proving it with benchmarks, can.

“We built faster because we used Copilot”

How the code got written is irrelevant to eligibility. AI-assisted development doesn't make routine work experimental — and, equally important, it doesn't disqualify real experimental work. The CRA looks for evidence of human technical judgment: hypotheses formed, alternatives evaluated, results measured. The credit follows the engineering, not the keystrokes.

Activity-by-activity: where the line falls

ActivityQualifies?Why
Added an AI chat feature by calling a hosted LLM API with standard streaming patternsNoDocumented integration; no technological uncertainty
Fine-tuned an open-weights model for clinical terminology after stock recipes plateaued, iterating on data mix and evaluationYesOutcome unknowable in advance; systematic, measured iteration
Set up a vector database and embeddings pipeline following vendor docsNoTutorial-grade in 2026; competent-professional baseline
Built a domain-specific eval harness and drove hallucination rate from 14% to 3% across scored iterationsYesBespoke measurement methodology; empirical, documented advancement
Quantized a model to run within an on-prem GPU budget while holding a measured quality floorLikelyConstraint-driven experimentation beyond published results
Rewrote marketing copy and internal workflows using ChatGPTNoTool usage; no engineering investigation at all
The patternEvery “Yes” row contains a constraint, an experiment, and a number. Every “No” row describes following instructions. When you write your project narratives, that contrast is the entire game.

The hybrid trap: products that mix both

Real AI products are a sandwich: a layer of genuinely experimental work between thick slices of routine integration, UI, and plumbing. The claims that survive review are the ones that allocate honestly— claiming the engineers and the months spent on the experimental core, not the whole team for the whole year because “the product is AI.”

Practically, that means time allocation by activity, not by job title. An ML engineer's month spent wiring SSO doesn't qualify; a full-stack developer's sprint building the instrumented eval pipeline does. Reviewers respond well to claims that visibly drew this line themselves — it signals the rest of the file can be trusted.

Documentation: you probably already have it

Here's the good news unique to AI work: the tooling you already run generates exactly the contemporaneous evidence the CRA wants to see.

  • Experiment trackers.Weights & Biases, MLflow, or even a disciplined spreadsheet of runs — hyperparameters, data versions, metrics per iteration. This is a lab notebook in all but name.
  • Eval dashboards and scored runs.Benchmark results over time prove both the systematic method and the advancement. Keep the failed runs — abandoned approaches are some of the strongest evidence of genuine uncertainty.
  • Model cards, dataset versions, and design docs. Anything that records why an approach was chosen and what was expected before the result came in.
  • Git history of prompts, evals, and pipelines. Iteration trails with commit messages that name the problem being attacked are quietly excellent SR&ED evidence.

The one habit worth adding: a short written hypothesis before each experiment cycle. One paragraph — what you expect, why, and what number will tell you it worked. It takes five minutes and it converts a pile of runs into a narrative a reviewer can follow.

What it's worth (BC CCPC, 2026)

  • Federal:35% refundable on the first $3M of qualified expenditures for Canadian-controlled private corporations — cash back even pre-revenue.
  • BC: 10% provincial credit on top, refundable for CCPCs.
  • Combined headline: up to 64%of eligible R&D salary cost recovered, depending on labour mix and the proxy overhead method.

For an AI team of four engineers spending half their time on qualifying work, that's routinely a six-figure annual refund. Plug your own numbers into the SR&ED calculator for a real-time estimate.

Next steps

  1. Inventory last quarter against the table above. Tag each AI workstream: constraint-experiment-number, or following-instructions. Most teams find more of the first category than they expected.
  2. Turn your experiment tracker into the system of record. If runs, evals, and hypotheses live in one place, the filing basically writes itself at year-end.
  3. Get a second opinion before you file — in either direction. Over-claiming invites a review that can taint clean years; under-claiming is just donating your refund. See how we work on the SR&ED service page, or describe your stack via the contact pageand we'll send back a one-page eligibility read within a business day.

The AI gold rush has flooded the CRA with claims that boil down to “we used the new thing.” That's exactly why well-framed claims stand out: a constraint the docs didn't cover, a hypothesis, an experiment, a number that moved. If your team is doing that work — and most serious AI teams are — the program was built for you.