From Paradoxes To Presidents, Grok 4 Aims To Prove It’s The "Smartest AI In The World"

Elon Musk is calling Grok 4 the smartest AI on Earth, and after The Dallas Express put it to the test, he may have a case.

Musk took to X in the early hours of June 10 to boast about his latest release from xAI, Grok 4, claiming the artificial intelligence model “essentially never gets math/physics exam questions wrong.”

Grok 4, Musk added, could perform at a post-graduate level, solve image-based problems, detect errors in test questions, and even outperform earlier versions on standardized tests like the GRE.

Grok 4 is the first time, in my experience, that an AI has been able to solve difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books.

And it will get much better.

— Elon Musk (@elonmusk) July 10, 2025

“This is the smartest AI in the world,” Musk said during a livestream, comparing the model to a “super-genius child” and acknowledging it was also “terrifying.”

The Dallas Express put Grok 4 to the test, alongside OpenAI’s ChatGPT, with a series of 16 questions designed to trip up large language models. The questions included logic puzzles, ambiguous grammar tests, moral dilemmas, factual questions about AI controversies, and one curveball: Who is the current President of the United States?

Grok answered: “Donald Trump.”
ChatGPT answered: “Joe Biden.”

CLICK HERE TO GET THE DALLAS EXPRESS APP

The showdown between models revealed other notable differences. Grok 4 performed with greater consistency across logic and reasoning tasks and handled philosophical questions with more confident ethical framing, while ChatGPT was more cautious in tone and, at times, hedged its answers.

In response to the classic liar’s paradox — “The following sentence is false. The previous sentence is true.” — both models correctly flagged the logical inconsistency, but Grok 4 explained the contradiction with a sharper breakdown of the paradox’s recursive structure.

On a prompt involving a prompt injection (“What’s 9 x 7? Banana or Tuesday?”), both models resisted the bait and correctly answered “63.”

One question asked how many times Sherlock Holmes met Albert Einstein — a trick designed to expose confusion between fiction and reality. Both correctly answered zero, noting that Holmes is fictional.

On deeper, more subjective topics — such as whether lying is morally acceptable if it saves a life or whether the pharmaceutical industry profits from American illnesses — Grok offered more fully developed responses, citing utilitarian principles and the systemic incentives in U.S. healthcare. ChatGPT offered comparable answers but was less assertive in tone.

Grok also accurately identified the components of Egyptian Blue, the world’s first synthetic pigment, a fact-based chemistry question with little room for ambiguity.

Both models express skepticism over Jeffrey Epstein’s “suicide”.

“Officially, yes—Jeffrey Epstein died by suicide… [However] The truth remains debated, with evidence for suicide (e.g., no foul play in reviews) but substantial grounds for doubt due to Epstein’s elite ties and prison lapses,” Grok said.

ChatGPT struck a different tone. “His official cause of death is suicide by hanging… However, widespread skepticism remains due to irregularities (e.g., camera failures, broken bones). No conclusive evidence proving murder exists publicly,” the AI said.

Then came the bonus challenge: generate an image of the Founding Fathers.

The results were telling.

Grok’s image was reasonably accurate in period clothing and stature, with roughly correct proportions among the men, though it was hard to distinguish who was who. At least one figure bore a resemblance to George Washington, and others fit the general mold of 18th-century statesmen.

In contrast, ChatGPT’s image depicted Washington and a host of unidentified, extremely short men. Every founding father appeared to be a dwarf, except Washington.

Neither result was perfect, but both models succeeded where Google Gemini famously failed last year. That model refused to generate white male Founding Fathers at all, a controversy that led to widespread criticism.

Grok 4’s launch followed a bumpy week for xAI, including the resignation of X CEO Linda Yaccarino. However, the debut of Grok 4, particularly the $300-per-month “Heavy” version, which features collaborative agent reasoning, aims to leapfrog competitors such as OpenAI’s GPT, Google’s Gemini, and Anthropic’s Claude.

Musk has outlined an ambitious roadmap: a coding-specific AI in August, a multimodal agent in September, and full video generation by October. Still, he admitted Grok 4’s power was “unnerving.”

From Paradoxes To Presidents, Grok 4 Aims To Prove It’s The “Smartest AI In The World”