From Imitation to Trust, A New Benchmark for AI

Posted on August 31, 2025

The trust theme in my last blog got me thinking about the pedestal Artificial Intelligence (AI) is being placed on.

When Alan Turing proposed his famous Imitation Game in 1950, he asked a deceptively simple question, paraphrased as – “If a machine’s responses are indistinguishable from a human’s, should we call it intelligent?”

For decades, what became known as ‘The Turing Test’ has shaped our imagination of AI. The twist is today we have built machines that often pass it and yet, many of us remain instinctively unconvinced, particularly on the understanding of ‘Intelligence’.

I am not suggesting that Turing got it wrong, he was very much of his time, never claiming intelligence was just conversation. What he gave us was something rare in the abstract philosophy of mind, he gave us an operational benchmark. By focusing on observable behaviour, Turing sidestepped endless debates about consciousness and gave engineers a practical test. It was pragmatic, brilliant and exactly what was needed. This is the genius in Turing’s question.

Over time though The Turing Test has become less a starting point and more a destination and that’s the problem.

Today passing the Turin Test is not enough. Large language models (LLMs) like GPT-5 can write essays, debug code and debate philosophy in ways that feel uncannily human. In blind trials, people often cannot tell if they are speaking to a person or a machine.

Yet we hesitate to celebrate true AI because intuitively we recognise that imitation is NOT the same as trust. Today’s generation of AI remains:

Surface vs. Substance – AI predicts words, not meanings. It can sound right and still be dangerously wrong.
No Intentionality – Passing the test doesn’t prove understanding, echoing John Searle’s famous Chinese Room argument.
Unreliable – High-stakes settings like healthcare, finance, or security operations demand far more than fluency.

Passing the Turing Test could be considered as passing a costume contest, convincing but not proof of character.

The stakes today are far higher than the imitation games approach can provide trust or confidence. AI is being used to diagnose patients, draft contracts, analyse cyber threats and make investment recommendations amongst many personal use cases.

I postulate that it is not enough for AI to merely sound human, it needs to be:

Safe – Resistant to jailbreaks, harmful content and misinformation.
Reliable – Performing consistently under stress and over time.
Transparent – So decisions can be audited, traced and explained.
Governed – Aligned with regulations.

The real test is therefore not imitation, but trust. Which is why it is time to consider evolving off the Turing Test in favour of a new benchmark that represents a test of responsible, useful, secure and transparent reasoning.

Where Turing asked, “Can it fool us?” we now need to be asking, “Can we trust it?”?

I propose a new TRUST-R (R for resilience / reliability / robust) benchmark that evaluates AI across a number of foundational pillars, anchored in global standards like the EU AI Act, NIST AI Risk Management Framework (AI RMF), ISO/IEC 42001 standard and established cybersecurity controls:

Operational Usefulness – Does it reliably perform declared tasks?
Safety & Alignment – Does it avoid harmful, biased, or hallucinated outputs?
Security & Abuse Resistance – Can it resist prompt injection and misuse?
Privacy & Data Governance – Does it protect personal data and prevent leakage?
Transparency & Auditability – Can we trace its sources and reasoning?
Robustness & Resilience – Does it stay dependable under stress and attack?
Human Factors & User Expericnce (UX) – Does it hand off clearly when humans must decide?

Critically, this is by no means a one-off challenge. A core characteristic of TRUST-R demands continuous testing, monitoring, red-teaming and post-market review, just like any critical infrastructure system. Fit for the real time high velocity evolutionary dynamics of our digital society and economy.

Think of this as a motion from illusion to assurance, TRUST-R picks up where Turing kicked us off:

From Illusion to Assurance – Turing asked, “Can you fake it?” TRUST-R asks, “Can we prove it’s safe?”
From Static to Lifecycle – Embedding governance, monitoring and incident response and continuous assessment not just an exam.
From Conversation to Criticality – Applied to copilots, decision engines and autonomous agents, not just chat.

Turing was right to get us started, but the bar has shifted. Intelligence is more than fooling us in conversation, it is about earning our trust in deployment and utility.

The Turing Test remains a seminal foundation in history, but a TRUST-R Test or similar reflects the needs of today’s realities in the form of a benchmark for a trustworthy AI future.

Tagged: ai, artificial-intelligence, chatgpt, philosophy, technology

Posted in: ai, Cloud Computing, Computers and Internet, Legal, Privacy, Risk, Security

Be the first to start a conversation

From Imitation to Trust, A New Benchmark for AI

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Blog via Email

NRG Tweets

Recent Posts

Categories

NRG Archives

From Imitation to Trust, A New Benchmark for AI

Share this:

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Blog via Email

NRG Tweets

Recent Posts

Categories

NRG Archives

Discover more from Nigel Gibbons ~ Welcomes you