Canadian Association of Law Libraries / L'Association canadienne des bibliothèques de droit - AI in Legal Research and Writing Applications: Assessment Guide

AI in Legal Research and Writing Applications: Assessment Guide

Home
Learn
Resources
AI in Legal Research and Writing Applications: Assessment Guide

AI in Legal Research and Writing Applications: Assessment Guide

Do you have questions or comments about this report? Contact us!

Une version française de ce guide d'évaluation sera bientôt disponible | A French version of this assessment guide is coming soon.

Download the full-text of the report.

Download the worksheet.

Click on each header below to see its contents.

In August 2023, the Canadian Association of Law Libraries (CALL) Executive Board approved a proposal for the creation of a new AI Standards Working Group (AI WG). The AI WG was created as a sub-committee of the Vendor Liaison Committee (VLC).

This Guide is governed by Creative Commons license CC BY-NC-SA 4.0.

The Committee members are:

Annette Demers, University of Windsor (proponent and co-Chair)
Sandy Hervieux, McGill University (co-Chair)
James Bachmann, University of British Columbia
Katarina Daniels, Davies Ward Phillips & Vineberg S.E.N.C.R.L., s.r.l.
Erica Friesen, Queen’s University
Sarah Gibbs, Parlee McLaws LLP
Bryony Livingston, Legislative Assembly of Ontario
Anita Susac-Bilyk, Goodmans LLP

This group is well-positioned to provide guidance on the topic, as they have decades of combined experience with applying electronic legal research tools in legal practice.

The WG consulted widely in preparing its recommendations. The consultations included members of CALL and legal professionals, vendor representatives and related professionals.

The goal of this guide is to shape informed consumers who will set base standards and expectations that all vendors of AI legal research and writing (LRW) solutions will aim to achieve.

As many Law Societies in Canada ¹ have stated, licensees who use artificial intelligence have ethical and professional responsibilities in so doing. Accordingly, this guide was written to help users. Users include law librarians, judges, lawyers, administrators, paralegals, clerks, faculty, and students. The public and other users, such as legislators or policy makers may also benefit from the information provided in this guide.

Users need to know what to look for, and what to ask for when:

procuring
learning how to use
teaching and mentoring others on the use of
searching or requesting information from
employing the results of, and
managing the risks of

AI applications designed to assist with LRW.

For those users who are judging a particular system for procurement or other purposes, we have also provided a box at the end of each section that allows users to score a system on those criteria and to provide other notes. A summary table of all scores is provided at the end of the document as well that allows users to provide weighted results. An excel spreadsheet has also been developed as a complementary tool to this guide.

The content of this guide is provided for both AI users and vendors for information only. It is not intended as legal advice. Everyone uses AI products at their own risk and should undertake due diligence to ascertain any liability in using them. The authors of this guide make no warranty whatsoever with respect to the accuracy and reliability of information contained in this guide or any related content provided.

This guide is intended to complement other guidelines and best practices that may be provided in court practice directions, rules of professional conduct, legislation, standards, benchmarks and policies.

This guide does not endeavour to address all possible ethical issues pertaining to AI systems, such as concerns relating to the environment, human rights, etc. Users are encouraged to make their own inquiries.

Algorithm

“Mathematics and Computing. A procedure or set of rules used in calculation and problem-solving; a precisely defined set of mathematical or logical operations for the performance of a particular task.” ²

Artificial Intelligence

“Information technology that performs tasks that would ordinarily require biological brainpower to accomplish, such as making sense of spoken language, learning behaviours or solving problems.” ³

Artificial Intelligence System (AI System) ⁴

An Artificial Intelligence System is “any computing system using artiﬁcial intelligence algorithms.”⁵.

Data

Structured and unstructured information that is collected, stored, and analyzed to support decision-making.

Labelled Data

“In machine learning, data labeling is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it. For example, labels might indicate whether a photo contains a bird or car, which words were uttered in an audio recording, or if an x-ray contains a tumor. Data labeling is required for a variety of use cases including computer vision, natural language processing, and speech recognition.” ⁶

Ethical Walls

Also known as “information barriers” or “firewalls”, are procedural safeguards used within organizations to prevent the exchange of confidential information between departments, teams, or individuals whose interactions could result in a conflict of interest or ethical breach. These barriers are designed to protect sensitive information, maintain client confidentiality, and uphold the integrity of professional practices by ensuring that only authorized personnel have access to certain information.

Foundation Models

"Foundation models are artificial intelligence (AI) models trained on vast, immense datasets and can fulfill a broad range of general tasks. They serve as the base or building blocks for crafting more specialized applications."⁷

Generative AI

Generative AI, also known as generative artificial intelligence, is a subset of AI that uses advanced models to create new content such as text, images, videos, and audio based on user inputs. These models, often deep learning algorithms, learn patterns and structures from large datasets and use this knowledge to generate original content.

Hallucination

A generative AI output “is considered hallucinated if it is either incorrect or misgrounded.” In other words, an AI system has hallucinated if it “makes a false statement or falsely asserts that a source supports a statement.”⁸

Large Language Model (LLM)

“A deep learning algorithm that uses massive amounts of parameters and training data to understand and predict text. This generative artificial intelligence-based model can perform a variety of natural language processing tasks outside of simple text generation, including revising and translating content. LLMs aim to produce the most probable outcome of words for a given prompt.”⁹

Machine Learning (ML)

Machine learning (ML) is a branch of artificial intelligence (AI) focused on enabling computers and machines to imitate the way that humans learn, to perform tasks autonomously, and to improve their performance and accuracy through experience and exposure to more data.¹⁰

Semi-supervised Learning

“Machine learning that makes use of both labelled and unlabelled data during training”. ¹¹

Supervised Learning

“Machine learning that makes only use of labelled data during training.”¹²

Unsupervised Learning

“Uses machine learning algorithms to analyze and cluster unlabeled data sets. These algorithms discover hidden patterns in data without the need for human intervention (hence, they are ‘unsupervised’).”¹³

Reinforcement Learning

“Reinforcement learning (RL) is a machine learning (ML) technique that trains software to make decisions to achieve the most optimal results. It mimics the trial-and-error learning process that humans use to achieve their goals. Software actions that work towards [the] goal are reinforced, while actions that detract from the goal are ignored.”¹⁴

Model

An AI model is a program that analyzes data to find patterns or make predictions.¹⁵

Natural Language Processing (NLP)

“Natural language processing (NLP) is a subfield of computer science and artificial intelligence (AI) that uses machine learning to enable computers to understand and communicate with human language.”¹⁶

Prompt

(See also “user input”).

Output

Content that is produced by an AI system in response to a prompt/input.

Retrieval-Augmented Generation (RAG)

RAG is an artificial intelligence technique that can improve the quality of generative AI output by allowing large language models (LLMs) to access specialized data sets (e.g. primary or secondary legal databases) outside of its training data sources.

User Input

Instructions or data provided by a user to an AI system which the system uses to produce an output. ¹⁷ (See also “prompt”.)

Vendor

A natural or legal person, public authority, agency or other body that sells or provides access to an AI system.¹⁸

All marketing and product documentation should be provided in plain language.

Marketing

Statements made in marketing should be true, accurate and transparent. Marketing should reflect realistic benefits and risks presented by the product.

When drawing on survey results to market the product, vendors should make available the questions asked, and methodology used.

Documentation for Prospective Users

Vendors should make available to all prospective users, documentation that sets out, in a transparent manner, all aspects of the AI system, as detailed in this guide.

Documentation Designed to Assist Subscribed Users

Vendors should make available to all subscribed users, documentation that sets out, in a transparent manner, all aspects of the AI system, as detailed in this guide.

Additionally, subscribed users should have access to tips for how to use the product.

All documentation and tips should be accessible from within the product (e.g. a “Help” page).

The documentation should outline the types of support available to a user, including, for example, access to live support from a human representative.

Full-Width White Box

Marketing and Documentation Score: 0 - 5 (0 - product doesn't meet the requirement at all. 5 - best possible score.)

Notes:

Users should recognize that most, if not all, AI applications for LRW are built on foundation models trained on the open internet. LRW applications should be built upon a transparent collection of legal information sources. Vendors should clearly provide information about the following:

1. What data sources are used by the AI system? Examples include:

a. The vendor’s proprietary data
b. Data owned by the user’s organization
c. External resources (specify)
d. The open internet
e. A combination of any of the above

2. Is the scope of each dataset provided? The scope should minimally include, where relevant:

a. Jurisdiction
b. Level of court (name of court, tribunal, or body)
c. Type of material (judicial decisions, statutes, regulations, rules, rule of court, forms, by-laws, Hansard debates, legislative reports, public commentary on proposed legislation, journal articles, etc.)
d. Date of the earliest content in each dataset
e. Currency date (how up to date is each dataset?)
f. Currency statement (how often is each dataset updated?)
g. Gaps in content coverage

3. What data is NOT included when producing output? For example, is the AI using or not using certain subscribed content, and is unsubscribed content excluded?

4. Do different services offered within the product also specify the scope of data (as described above) upon which each service is trained and from which outputs are constructed?

5. Have data quality standards been established and documented?

Bias

What, if anything is being done to mitigate bias originating from data sources? See also the section on Bias below.

Full-Width White Box

Data Transparency Score: 0 - 5 (0 - product doesn't meet the requirement at all. 5 - best possible score.)

Notes:

The data on which an AI system has been trained affects the accuracy, validity, and relevance of AI outputs. Vendors should disclose the data from which tools draw, and the quality control mechanisms deployed to assess outputs and prevent errors. If this information is not known for foundation models, the vendor should clearly disclose this. Information on these points should include:

1. On what data is the AI being trained? For example, is the training data limited to content from legal databases (if so, which ones?) or is it drawn from external resources or the open internet? Can the user find out all the sources used in training? What gaps have been identified in the training data that might contribute to hallucinations or anomalous outputs?

2. Has the user been alerted to the possibility of incomplete, irrelevant, or hallucinated outputs resulting from gaps in the specialized training data?

3. Are user inputs used as data to train the AI system? If so, is the data anonymized or otherwise stripped of identifying data? Does the vendor provide the option to opt out?

4. Are the AI system’s own outputs used to train itself?

5. Was data labelling done by humans, by another AI system, or by a combination thereof? Describe any processes, standards, or policies relevant to the labelling process.

Bias

What, if anything, is being done to mitigate bias originating from the training data and the training processes? See also the section on Bias below.

Full-Width White Box

Data Used in Training Score: 0 - 5 (0 - product doesn't meet the requirement at all. 5 - best possible score.)

Notes:

Vendors should provide a detailed overview of the AI system, including the components outlined below. If this information is not known for foundation models, the vendor should clearly disclose this.

Algorithms

What methods and algorithms are used to train and operate the AI system (e.g. natural language processing, retrieval augmented generation, type of learning (e.g. semi-supervised learning), transformer-based models, etc.)? Examples of related questions to ask:

1. How do the algorithms work?

a. How do the algorithms prioritize the content of datasets in producing outputs? For example:

i. Are court decisions prioritized based on precedential value, judicial treatment, jurisdiction and/or level, recency, citation frequency, etc?

ii. How does the AI system prioritize primary sources versus secondary sources, when secondary sources are included in the data?

b. How do the algorithms determine relevancy of sources in producing outputs? For example:

i. Can, and if so, how do they parse out the facts to determine whether and to what extent a specific case applies to a given legal situation?

ii. How do they interpret sources such as judicial decisions, legislation, and specific wording therein?

iii. How does proprietary editorialized content/classification/tagging affect relevancy?

iv. How are secondary sources prioritized, including those involving conflicting viewpoints by different authors and conflicting study outcomes?

2. How are the algorithms trained? For example:

a. Supervised, unsupervised, semi-supervised, or reinforcement learning, or a combination thereof?

b. What is the training process? What are the standards of review, how in-depth is the review, etc.?

c. To what extent and in what ways are humans involved? For example:

i. Are they lawyers? Are they subject matter experts? Are they data analysts? etc.

ii. What methods are used to minimize errors?

3. What third-party product(s)/AI is the AI system based on (e.g. GPT-5, DeepSeek) and to what extent is the third-party product trained on specialized/subject specific-data?

a. When and to what extent is the algorithm relying on third-party training versus local/specialized training?

b. Is the AI system based on more than one third-party product? How many and which ones?

c. Does it rely on one third-party product more than another? To what extent and in which situations?

Full-Width White Box

Algorithm Transparency Score: 0 - 5 (0 - product doesn't meet the requirement at all. 5 - best possible score.)

Notes:

Output Factors

Vendors should be transparent about how the AI system’s outputs are tested, such as:

1. What constitutes adequate versus inadequate output?

a. What quality control measures are in place to ensure output is adequate?

b. How is adequacy of the output determined and who is responsible for the determination?

c. Does quality control occur continuously / how often are outputs tested?

d. How often are the quality control measures reviewed and updated?

e. What happens when an output is deemed inadequate?

f. When and how are algorithms/AI systems updated if outputs are found to be inadequate, and what steps are taken between the discovery of insufficient output and the completion of sufficient algorithm/AI system updates? For example:

i. How often are quality control checks performed, who performs them, and what level of expertise do they have?

ii. What level or type of problem triggers updates or revisions to the algorithms/AI system?

iii. What level or type of problem triggers a temporary suspension of the AI system, and when and how are users notified of current or past problems (including those reported by other users)?

g. What should a user do when an inadequate output is identified? (e.g. does the vendor solicit feedback from users?)

2. Are the testing instrument(s) and test results available? When did the testing take place? How representative of the current system configuration are the test results?

Full-Width White Box

Output Factors Score: 0 - 5 (0 - product doesn't meet the requirement at all. 5 - best possible score.)

Notes:

Hallucinations

For generative AI systems:

1. How are hallucinations defined by the vendor? Are there types of hallucinations that are not acknowledged by the vendor? See for example our definition of hallucination above.

2. Does the vendor specify that hallucinations are possible? If so, which type(s)?

3. What percentage of output is hallucinatory? Does (and if so, how does) this percentage vary depending on the type of prompt/task that the AI is asked to perform? Does (and if so, how does) this percentage vary depending on the type of hallucination in question?

4. What controls are put in place to prevent hallucinations?

5. What should a user do when a hallucination is identified (e.g., does the vendor solicit feedback from users on instances of hallucinations)?

Bias

1. What steps have been taken to create algorithms that reduce and eliminate bias and discrimination?

2. To what extent and how are humans involved in reducing and eliminating bias in algorithms, data, and outputs?

Scores Summary Table

Criteria	Score	Weight
Marketing and Product Documentation
Data Transparency
Data Used in Training (Training Data)
Algorithm Transparency
User Inputs / Prompt and Prompt Engineering
Outputs
Other Risk Management
Privacy
Copyright
Bias
AI Generated Legal Commentary

¹ For example, the Law Societies of Alberta, BC, Manitoba, Ontario, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Prince Edward Island, Quebec and Saskatchewan have all issued guidance on the use of AI in legal practice, and the Government of Canada has issued its Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems (Innovation, Science and Economic Development Canada, September 2023), online: https://ised-isde.canada.ca/site/ised/en/voluntary-code-conduct-responsible-development-and-management-advanced-generative-ai-systems [perma.cc/EAF7-QP8R].

² Oxford English Dictionary, (last accessed 4 September 2025) online: https://www.oed.com/dictionary/algorithm_n?tab=meaning_and_use&tl=true [perma.cc/8FR9-P94S].

³ Government of Canada Treasury Board Secretariat, Directive on Automated Decision Making" (last modified 25 April 2023), online: https://www.tbs-sct.canada.ca/pol/doc-eng.aspx?id=32592#appA. [perma.cc/NCS8-NHYJ].

⁴ As this document considers a range of AI products that rely on varied technologies, we use 'AI System' unless a topic addresses a specific type of AI model.

⁵ Definition derived from: Montréal Declaration on Responsible AI (Université de Montréal, 2018), online: https://declarationmontreal-iaresponsable.com/wp-content/uploads/2023/04/UdeM_Decl-IA-Resp_LA-Declaration-ENG_WEB_09-07-19.pdf> [perma.cc/WX8P-UD9E].

⁶ Amazon Web Services, “What is Data Labelling?” (last accessed 7 March 2025), online: https://aws.amazon.com/what-is/data-labeling/#:~:text=In%20machine%20learning%2C%20data%20labeling,model%20can%20learn%20from%20it [perma.cc/YDB9-W6DH].

⁷ Rina Diane Caballar, “What are foundation models?” (last accessed 26 August 2025), online: https://www.ibm.com/think/topics/foundation-models [perma.cc/8ET5-FUHF].

⁸ Varun Magesh et al., “Hallucination-Free? Assessing the Reliability of Leading Ai Legal Research Tools” (2025) J Empirical Leg Stud 1, online: https://doi.org/10.1111/jels.12413 at 8.

⁹ Britannica "large language model" (last accessed 28 February 2025), online: https://www.britannica.com/topic/large-language-model [perma.cc/R7G4-YYSV].

¹⁰ Adapted from IBM, “What is machine learning?” (last accessed 28 February 2025), online: https://www.ibm.com/think/topics/machine-learning [perma.cc/J7FH-M3YQ] and UC Berkeley School of Information, “What is Machine Learning (ML)?” (26 June 2020), online: https://ischoolonline.berkeley.edu/blog/what-is-machine-learning [perma.cc/PST3-K73E].

¹¹ ISO/IEC, Artificial Intelligence Concepts and Terminology, ISO/IEC 22989:2022(en) https://www.iso.org/obp/ui/#iso:std:iso-iec:22989:ed-1:v1:en. [perma.cc/3CRX-Y96Z].

¹² Ibid.

¹³ IBM, “Supervised versus unsupervised learning: What’s the difference?”, (last accessed March 7, 2025), online: https://www.ibm.com/think/topics/supervised-vs-unsupervised-learning. [perma.cc/G2WL-39CR].

¹⁴ Amazon Web Services, “What is Reinforcement Learning?” (last accessed March 14, 2025), online: https://aws.amazon.com/what-is/reinforcement-learning [perma.cc/8N52-U3YR].

¹⁵ Adapted from IBM, “What is an AI Model?”, (last accessed March 7, 2025), online: https://www.ibm.com/think/topics/ai-model#:~:text=An%20AI%20model%20is%20a,they%27ve%20been%20programmed%20for [perma.cc/T6BK-39W9].

¹⁶ IBM, “What is NLP?”, (last accessed March 14, 2025), online: https://www.ibm.com/think/topics/natural-language-processing [perma.cc/5TBQ-W3LL].

¹⁷ Regulation of the European Parliament and of the Council of 13 June 2024 laying down harmonized rules on artificial intelligence and amending Regulations, (EC) No 300/2008, (EU) No 167/2013, (EU) No 169/2013, (EU) 2018/858, (EU) 2018/1139 and (EU) 2019/2014, PE/24/2024/REV/1, OJ L, 2024/1689, 12.7.2024, (In force 1 August 2024), online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ%3AL_202401689 [perma.cc/26Y7-HQ4L]. [EU Act].

¹⁸ Adapted from definition of “provider” in EU Act, ibid.

¹⁹ Some court practice directions and other guidelines are requiring clear attribution when AI is used to generate content. The recommended enquiries below are designed to assist with fulfilling these requirements. See for example, Practice Direction Re Use of Artificial Intelligence in Court Submissions, (Court of King's Bench of Manitoba, June 23, 2023), online: https://www.manitobacourts.mb.ca/site/assets/files/2045/practice_direction_-_use_of_artificial_intelligence_in_court_submissions.pdf [perma.cc/X4DY-595A] and Notice to the Parties and the Profession: The Use of Artificial Intelligence in Court Proceedings, (Federal Court of Canada, updated May 7, 2024), online: https://www.fct-cf.gc.ca/Content/assets/pdf/base/FC-Updated-AI-Notice-EN.pdf [perma.cc/F54B-N29G]. COAL-RJAL Editorial Group, Canadian Open Access Legal Citation Guide, Canadian Legal Information Institute, 2024 CanLIIDocs 830, online: https://canlii.ca/t/7nc6q, provides users with sample citations to use when citing AI generated outputs.

²⁰ Some scholars refer to outputs that do not cite a source to support a statement as” ungrounded outputs”. See for example, Magesh et al, supra note 8 at 6, 7 and 27.

²¹ See for example, Office of the Privacy Commissioner of Canada, ”Assess if a privacy breach poses a real risk of significant harm to an individual” (last accessed 11 August 2025), online: https://www.priv.gc.ca/en/privacy-topics/business-privacy/breaches-and-safeguards/privacy-breaches-at-your-business/rrosh-tool/ [perma.cc/RX5N-Z6NG].

The coding for this website was partially generated by Microsoft Co-Pilot™.
Last updated ADemers October 6, 2025.

40 Eglinton Ave. E., Suite 200, Toronto, ON, M4P 3A2 647-346-8723

This website is best viewied in Firefox or Google Chrome.