The Ghost in the Machine: Admissibility of AI Evidence


[This Article has been authored by Rishi Nookala, a 3rd Year Student at NALSAR University of Law, Hyderabad.]

Introduction

We live in an era where our lives are increasingly governed by the invisible hand of algorithms. Over the last few years, Artificial Intelligence (AI) has significantly impacted various industries, and the legal system is no exception. The rapid integration of AI into criminal justice systems worldwide, from predictive policing algorithms patrolling our streets to risk assessment tools influencing sentencing, has created challenges for legal frameworks originally designed around human judgment and raised crucial questions about the reliability, fairness, and ethical considerations of AI. However, at this juncture of the technological revolution, we need to ask a critical question: Is our legal framework equipped to handle a witness who cannot be cross-examined, does not swear an oath, and often cannot explain their own reasoning?

Defining the Ghost: What is AI Evidence?

AI is not a neutral technical tool; it creates an imbalance in power and poses risks to fair trial and equality. Most current applications are Narrow AI, that is, systems designed to solve tightly constrained problems, such as speech recognition, rather than Strong AI or Artificial General Intelligence, which attempts to mechanise human-level intelligence across domains. The output of an AI system is primarily dependent on the data it is being trained on. Professor Ligeti led a Luxembourg research team that identified a three-category approach: (1) AI-gathered evidence for time-intensive tasks (such as analyzing large datasets for incriminating emails), (2) AI-generated evidence including forensic analyses and enhanced surveillance footage (where AI improves quality), and (3) AI-assisted investigative processes to generate leads, such as using facial recognition for identification of a suspect.

Seng and Mason categorised a similar approach. This approach distinguishes between evidence created with human input, primarily hearsay, and data produced by a self-contained automated device considered real or mechanical, as well as a combination that requires both human input and automated data processing. Judges must understand AI algorithms, training data, underlying biases, and potential risks, such as deepfakes, to determine the admissibility of AI evidence.

Legislative Gap in BSA and the Black Box Problem

The foundational issue with AI evidence in India lies in our reliance on the presumption of reliability. According to Section 63(4) of the BSA, if a device is working correctly during the relevant period, its output is presumed to be accurate. But applying this presumption of reliability for AI systems is dangerous. This approach fails to account for the fact that systems based on Machine Learning can produce subtle mistakes that are not obvious. Unlike a mechanical clock that either works or stops, an AI system can be fully operational and bug-free in its code, yet produce a biased output due to the data it was trained on.

For an Indian court, this presents a unique challenge. When a prosecutor presents an output from a predictive policing tool, they will attach a Section 63 BSA certificate. A judge may admit the evidence presuming the computer is reliable based on the certificate. This is where the issue lies. A certificate proving the hardware was functional tells us nothing about whether the algorithm’s decision-making logic was flawed. We are effectively certifying the pen, ignoring the validity of the essay it wrote.

Validity v. Reliability

The core evidentiary challenge lies in establishing the validity and reliability of the AI system in question. These terms are used interchangeably, but in the context of AI, they have distinct meanings. Validity refers to accuracy: does the AI system measure, classify, or predict what it is designed to? Reliability refers to consistency: does the AI produce the same results when applied to substantially similar circumstances?

The Hearsay Trap: Who is Actually Testifying?

One of the most fascinating puzzles posed by AI is the question of hearsay. In India, we generally exclude hearsay statements unless an exception applies. We traditionally view computer output as direct evidence rather than hearsay, as a computer cannot lie. However, AI challenges this orthodoxy, as AI systems produce results from datasets that contain embedded human biases and assumptions. For instance, consider a medical AI that can diagnose a condition based on notes from thousands of doctors. If this AI diagnosis is brought to court, whose statement is it? It is a synthesis of thousands of doctors’ notes, which may not have been verified through cross-examination. 

If we treat AI output as purely mechanical, we deny the defence the right to challenge the underlying human subjectivity that trained the machine. If an AI system’s output relies on supervised learning using training data labelled by humans, then the output is linked to those human inputs. If the programmer or the dataset curator is not called to testify, the evidence could be considered hearsay. If an algorithm is a black box of hidden human subjectivity, then admitting it would violate the fundamental principles of a fair trial.

Furthermore, because AI often inherits biases from flawed historical data, it perpetuates societal prejudices. If a dataset used to train a risk assessment tool is derived from historically biased policing data, the AI system will learn that bias and replicate it. Even with access to source code, explaining outcomes remains challenging, frustrating lawyers and judges. 

In India, we often rely on the testimony of the person who recorded the video to authenticate it. But what if the video is a perfect, AI-generated fabrication? Article 3(60) of the EU Artificial Intelligence Act defines deepfakes as AI-generated or manipulated image, audio, or video content that resembles existing persons or objects and falsely appears authentic. The author notes that in recent times, deepfakes have become super realistic and could create false representations of individuals. Popular cases include Pope Francis wearing a Balenciaga coat or former U.S. President Joe Biden, allegedly inviting people to boycott the elections. Such manipulations are dangerous as they could mislead judges and result in wrongful convictions or dismissal of legitimate claims. This fake content can destroy trust; attorneys on both sides will struggle to prove or disprove authenticity, as deepfakes are becoming almost indistinguishable from factual evidence. On the positive side, forensic experts can also use AI to detect deepfakes. 

For Indian courts, this suggests accepting a video on a prima facie basis. This means we cannot simply look at the image; we must examine the metadata, the chain of custody, and employ counter-AI forensics to detect manipulation.

Comparative View of Foreign Jurisdictions

United States: Structured Gatekeeping

The US Federal Rules of Evidence outline the types of evidence that can be presented in court. Rule 401 states that the evidence must be relevant, meaning it must make the fact more or less likely than it would be without it. While evidence must be considered appropriately, courts determine if its probative value is greater than its prejudicial impact through Rule 403. Rule 702 allows experts to give their opinions based on their expertise. The Frye standard is too rigid, aiming to limit admissibility to established techniques and restricting emerging techniques. In contrast, the Daubert standard allows judges to act as gatekeepers in determining admissibility, based on factors such as testing and peer review.

In Washington v. Puloka, the Washington State Superior Court ruled against the use of AI-enhanced video evidence, treating it as a novel technique under the Frye standard. The defence, which sought to enhance low-quality footage using Topaz Video AI, failed to prove that the forensic video analysis community generally accepted the method. The court noted that Topas Video AI enhancement tools have not undergone peer review by the forensic video analysis community. The court ruled that the AI-enhanced video was unreliable because it did not accurately depict what occurred; instead, it used unclear methods to display what the AI believed it should. Although this ruling doesn’t set a binding precedent, it clearly foreshadows the future challenges regarding the admissibility of AI-generated evidence in U.S. courts.

European Union: National Discretion

The EU does not have a unified test of admissibility. Unlike the US, most member states follow the freedom of evidence principle, which leaves admissibility largely to national discretion, allowing courts to rule on the admissibility and authenticity of any evidence presented by the parties. Conversely, some experts see a dual approach. Some countries use controlled systems to strictly screen courtroom evidence, while others use free proof systems, in which judges can exclude illegally obtained evidence.

The current Dutch Criminal Procedure Code allows the use of AI Algorithms. In the Marengo trial, Defendants contested the Hansken digital forensic platform for opacity and limited access to counter-expertise. The Supreme Court upheld the use of the Hansken digital forensics platform, despite defense claims that it was unregulated and violated the right to a fair trial (Article 6 of the ECHR). The Court ruled that Hansken only considers existing evidence and doesn’t rely on it alone. Although acknowledging the risk of relying on algorithmic analysis of bulk data, the Court stressed the defense failed to request targeted access to the platform’s source data to prepare a counter-argument.

The European Court of Human Rights permits national courts to consider evidence, even if it was obtained unlawfully, provided the trial remains fair in accordance with Article 6 of the ECHR. There is no automatic exclusionary rule; the Court assesses the nature of the unlawfulness (e.g., if human rights were violated) and whether the accused could effectively challenge the evidence. The Court of Justice of the EU similarly suggests excluding evidence only if the accused cannot respond to it. This lack of a clear, mandatory standard for excluding improperly obtained evidence leads the author to conclude that European legal systems are not ready to handle the complexity of AI evidence.

Hybrid Approach for the Indian Context

Section 39 of the BSA, which addresses expert opinions, has generally been interpreted to refer to scientific opinions. We rarely demand error rates or peer-reviewed validation studies for forensic software. India could adopt a hybrid approach that combines elements of the US and EU systems. Like the US, India has an adversarial system that aligns with the principles of relevance and reliability. However, the absence of Daubert-equivalent scientific validation standards creates uncertainty. Instead, courts can rely on judicial discretion similar to EU flexibility, which creates a risk of inconsistency. To apply this approach, Indian courts may consider requiring a pre-admission hearing for AI evidence. This would not involve checking whether the device was working as per Section 63 of the BSA, but instead would verify the AI’s methodology. Key questions for this inquiry could include: Is the AI being used for a purpose for which it was not designed? Was the data sanitised to remove potential biases before it was used for training? Has the system been tested in the wild or only in the lab, so that we can understand the error rates?

Based on the evolving role of AI in courts as evidence, the Indian legal system needs to undertake two-pronged legal reform. First, it must create specific standards for evaluating AI, potentially adapting the Daubert standard to the Indian context. This includes requiring technical validation and demanding transparency (balancing trade secrets with a defendant’s rights).

Second, India needs procedural safeguards, including mandatory disclosure when AI is used in investigations, ensuring the defence has expert access to challenge it, and providing comprehensive training for judges, as these are essential components of a fair system.

Questions arise: Should humans always verify AI outputs? Is AI suitable as evidence or just investigative leads? Can prosecutors selectively use it? Does withholding details from the defence, often citing trade secrets, violate fair trial rights? Privacy concerns are also critical, as technologies like facial recognition can infringe on rights, even through data collection alone. Existing privacy laws may not be sufficient; criminal proceedings may require tailored protections.

Evidence must meet admissibility standards to be integrated safely and to maintain the integrity of the justice system. However, India’s unique context, including vast socioeconomic diversity, historical policing biases, and limited judicial technical training, requires careful adaptation of international approaches. 

The most practical approach is not to call for new laws, but rather to adopt a new strategy for lawyers and judges. Lawyers and judges cannot evaluate AI from a state of ignorance. It is high time we understand what AI is, how it works, what it does accurately and reliably, and what it does not, as well as the logic it adopts. This is a wake-up call. We cannot leave the technical details to the IT cell. We need to know enough about Machine Learning to ask crucial questions: What data was the model trained on? What is its error rate?

As we enter a new era of digital evidence, we must recognise that AI is not merely a sophisticated calculator but a complex probabilistic witness. The Black Box problem of AI systems cannot be addressed through a simple check-box certificate under BSA. Reliance on a Section 63 certificate, which only proves the hardware’s functionality and not the validity of the algorithmic logic, is risky. We need to distinguish between the machine’s reliability and the algorithm’s validity. The path forward is not to exclude AI evidence but to move beyond procedural comfort with certification toward substantial inquiry into the AI’s methodology, error rates, and training data. This demands a new kind of legal literacy, involving the cross-examination of the algorithm, which should become as fundamental as cross-examining a human witness. Ultimately, if the machinery of justice is to rely on the machinery of code, we must ensure that this silent witness is subject to the same rigorous standards of transparency that define our constitutional commitment to a fair trial.

Tags:

Leave a comment