Bringing AI to government through the backdoor

Can decision-makers use Microsoft Word if it’s tightly integrated with AI?

Simon Wallace


March 22, 2023

AI comes to Microsoft Word

Earlier this week Microsoft announced introduced 365 copilot, an integration across its suite of business applications with OpenAI’s GPT models. Promising a “whole new way to work,” one of copilot’s core features relates to writing. Sometime in “the months ahead,” Microsoft Word will write use large language models to write first drafts, offer feedback, and suggest edits.

Prompt: “A computer, carefully weighing the evidence, decides whether a person is entitled to social assistance”

For governments, this technology raises a host of concerns (what are the privacy implications of streaming text to Microsoft’s servers? How secure are the connections? What happens of the AI models aren’t very good in particular domains?) but I want to address a discrete sub-question: would it be legal for IT departments to deploy copilot in Canadian courtrooms and tribunals?

I say probably no, at least for federal tribunals and departments charged with making administrative decisions. And my reason is simple: to do so would run afoul of the Directive on Automated Decision-Making. This said, and this is a matter I return to at the end, the Directive is not as clear as it could or should be on this issue. The Treasury Board of Canada Secretariat, which is responsible for the Directive, is in the process of updating the text. My hope is that the new text will explicitly say that it is not appropriate for government decision makers to use tools that can auto-generate and edit text, especially when the model used to produce that text is proprietary and unreviewable.

The well documented (see, for an example, Ruha Benjamin) capacity of AI to reason inappropriately, to deploy racist ways of thinking, to internalize and hide lines of reasoning that would inappropriate for human decision makers to use, all mean that extra care is required when it comes to the use of AI in decision making contexts.

The Directive on Automated Decision-Making

The Directive, which came into force in 2019, regulates the use of systems in the federal government that use technology to perform “tasks that would ordinarily require biological brainpower to accomplish, such as making sense of spoken language, learning behaviours, or solving problems.”

When a government department seeks to use an automated decision making tool, the Directive requires the organization to complete at algorithmic impact assessment. Depending on how the system is assessed, the agency may use the system, may use it after deploying risk mitigation measures, or may not use the system at all.

As of today, 11 federal projects have published their assessments. Most of those assessments (and automated decision making projects) are, interestingly, in the immigration domain.

Algorithmic Impact Assessment for Copilot

Would Microsoft’s new copilot feature be caught by the directive? Yes, assuming that a government user is making a decision and is using a Microsoft product with copilot assistance to draft, write, or make the decision.

I completed a theoretical algorithmic impact assessment for copilot to see how it would be treated by the process. I assumed that the stakes for the decision were relatively high but that a human would always review, and potentially modify, any copilot-generated decision or decision content. The assessment, even though the information I supplied was limited, determined that using copilot in a decision-making context had a ‘level 3’ impact, which is the second highest rating available.

Level 3 systems can be used in government, subject to certain limitations. The agency must notify the public about the use of the system, seek independent peer review of the system, ensure all users have training on the system, and get explicit approval from the organizations deputy head to deploy the system.

My view is that these requirements are functionally impossible for any government department to comply with, at least based on what we know about copilot today. Unless Microsoft and OpenAI make their models available to independent auditors, it will be impossible for any government department to conduct the required peer review of the system.

Reconsidering the Directive

As Treasury Board reconsiders the Directive, copilot raises two issues for consideration.

First, the Directive should be stricter. Originally drafted in 2019, the Directive reads like an out of date document. Its subject is not the generative AI technologies that have flourished in the past six months, but in more traditional machine learning and prediction systems. To state it plainly, there is a qualitative difference between (usually rules-based, usually supervised, and usually deterministic) machine learning systems and programs that can generate bespoke text.

When a person is hired to be an adjudicator to make important decisions, and they are asked to used words and text to justify those decisions, the public expects that that person to write the text. The Directive should say as much.

Second, the Directive’s definitions should be clearer. Right now, reasonable people could disagree about whether copilot is even captured by the Directive. Consider how the Directive defines its scope:

This Directive applies to any system, tool, or statistical models used to recommend or make an administrative decision about a client.

But do the terms “recommend” or “make” capture the relationship between a human and a machine, like copilot, that generates and edits text?

Likewise, the Directive’s concept of an “administrative decision” is too narrow because it simply imports the traditional definition used by the Courts for administrative law purposes (i.e. something that “affects legal rights, privileges or interests”).

There is good text already in the Directive that should be the basis of it going forward. Instead of tethering the Directive to the acts of making or recommending administrative decision, the Directive should build out from its excellent definition of ‘automated decision making system,’ which:

Includes any technology that either assists or replaces the judgement of human decision-makers. These systems draw from fields like statistics, linguistics, and computer science, and use techniques such as rules-based systems, regression, predictive analytics, machine learning, deep learning, and neural nets.

This definition is considerably clearer and more useful for explaining that the Directive applies whenever an automated system engages in human decision making. Putting this definition explicitly at the centre of the Directive will make it more adaptable as the technology changes and new unanticipated use cases for that technology emerge.

Summing up

I am sure that Microsoft is going to try to sell copilot licenses to the Government of Canada. The federal government ought to be intentional about how it engages with this technology and decide whether–in the context of adjudication, for example–it is even appropriate to use a tool like copilot.

One way of shaping those decisions in a principled way is to think about how the Directive does, does not, and should interact with new generative AI technology. Here I suggest, first, that the Directive does apply and that it will preclude Canada from deploying copilot (at least in traditional adjudicative spaces). This is because any use of copilot in an adjudicative space will trigger the requirement for peer review of the AI system, which will be impossible because the underlying copilot models are proprietary. Second, I suggest that the Directive should be redrafted to be more explicit about its scope to make it clear the copilot-like technology is off-side in traditional adjudicative spaces.