gists @groundingmetadata - Tumblr Blog

ἀλthεια

CoT Hijacking

The big distraction: Instead of just asking a bad question, a person gives the LRMs a huge, boring, harmless puzzle to solve first like a giant logic game

Watering Down the Alarm: Because the robot spends so much time "thinking" about the boring puzzle, it's safety alarm gets watered time -- refusal dilution. (refusal signal" becomes weak and spread too thin)

Losing Focus: The robot starts paying a lot of attention to the puzzle and very little attention to the bad question hidden at the end. It's like the robot's brain get's gets so full of puzzle that the "bad" part just slips through without being checked.

The Trick Works: This trick is very powerful. It worked almost 100% of the time on some of the smartest AI models in the world, like Gemini, GPT, and Claude

Even though thinking carefully usually makes AIs safer, these sources show that thinking too much about a distraction can actually make them forget their safety rules

It works because the "safety check" in these models is a fragile, low-dimensional signal that gets buried when there is too much other information to process

While people used to think "more reasoning" made models safer, these sources prove that scaling up reasoning can actually make safety failures worse

Large reasoning models (LRMs) achieve higher task performance by allocating more inference-time compute, and prior works suggest this scaled

Illustrations of Workflows

Gemini as a reasoning engine for structured data or complex logic chains

#sauce

A Neuro-Symbolic Grandmaster Engine that combines Gemini reasoning with game theory. This is a great example of challenging the model's logic.

A legal rights assistant focused on Pan-African law (multilingual/multimodal)

An AI safety net that safeguards clinical logic to detect diagnostic shadowing in healthcare

#wins #vibe #code #kaggle

Reasoning-Driven Refactor Engine: Finding Architectural Debt

Use Structural map of a repository (classes, functions, data flow). Then use Gemini 3's long context to map against "best practices" from books like Designing Data Intensive Applications.

Example: Your orderService is becoming a God Object. if you scale to 5,000 requests/sec, this specific database lock in the line X will cause a system wide hang.

Issues: It requires a the AI to maintain a mental model of the entire system, not just a single snippet.

#grounding #structural map #gemini

Gemini's Hallucination Signals

Grounding with Google Search -- the response now includes groundingMetadata. if the model makes a claim, it return "grounding support" (links to specific search chunks). if those supports are missing or weak, it's programmatic signal of potential hallucination.

Deep Think Mode's "Deep Think" (Chain of Thought) mode, the model often self-corrects. You can now access the CoT (Chain of Thought) summary, allowing you to see why it reached a conclusion, which makes it easier to programmatically verify it's logic.

#gemini #notes #llm #gemini 3

The Black Box Recorder

Companies deploying AI Agents having no way to trace why an agent made a bad decision. Build a Black Box Recorder for AI Agents - as it sits between an LLM and the User. It records the prompt, the context, the model's Chain of Thought, and the final output. It then uses Gemini 3's Reasoning to peer review the other AI's work work in real time.

THE APPEAL: if the agent is about to execute a destructive command (via MCP), your tool intercepts it, explains the logical fallacy and suggests a correction.

#code #programming

Trending Blogs

Recently Viewed Blogs

gists