ἀλthεια

No title available

PR's Tumblrdome

Discoholic 🪩

pixel skylines

★
2025 on Tumblr: Trends That Defined the Year
Aqua Utopia|海の底で記憶を紡ぐ
will byers stan first human second

No title available

JVL
Claire Keane
hello vonnie
wallacepolsom
🪼
taylor price
Stranger Things

No title available

Kaledo Art
Lint Roller? I Barely Know Her
AnasAbdin

seen from Germany

seen from United States
seen from United States
seen from United States
seen from Belgium
seen from United States

seen from Bangladesh
seen from Lithuania

seen from United Kingdom
seen from United States

seen from Malaysia
seen from Romania

seen from Netherlands

seen from United Kingdom

seen from United States
seen from China

seen from Canada

seen from United States

seen from Belarus

seen from New Zealand
@groundingmetadata
ἀλthεια
CoT Hijacking
The big distraction: Instead of just asking a bad question, a person gives the LRMs a huge, boring, harmless puzzle to solve first like a giant logic game
Watering Down the Alarm: Because the robot spends so much time "thinking" about the boring puzzle, it's safety alarm gets watered time -- refusal dilution. (refusal signal" becomes weak and spread too thin)
Losing Focus: The robot starts paying a lot of attention to the puzzle and very little attention to the bad question hidden at the end. It's like the robot's brain get's gets so full of puzzle that the "bad" part just slips through without being checked.
The Trick Works: This trick is very powerful. It worked almost 100% of the time on some of the smartest AI models in the world, like Gemini, GPT, and Claude
Even though thinking carefully usually makes AIs safer, these sources show that thinking too much about a distraction can actually make them forget their safety rules
It works because the "safety check" in these models is a fragile, low-dimensional signal that gets buried when there is too much other information to process
While people used to think "more reasoning" made models safer, these sources prove that scaling up reasoning can actually make safety failures worse
Large reasoning models (LRMs) achieve higher task performance by allocating more inference-time compute, and prior works suggest this scaled
Illustrations of Workflows
Gemini as a reasoning engine for structured data or complex logic chains
A Neuro-Symbolic Grandmaster Engine that combines Gemini reasoning with game theory. This is a great example of challenging the model's logic.
A legal rights assistant focused on Pan-African law (multilingual/multimodal)
An AI safety net that safeguards clinical logic to detect diagnostic shadowing in healthcare
Reasoning-Driven Refactor Engine: Finding Architectural Debt
Use Structural map of a repository (classes, functions, data flow). Then use Gemini 3's long context to map against "best practices" from books like Designing Data Intensive Applications.
Example: Your orderService is becoming a God Object. if you scale to 5,000 requests/sec, this specific database lock in the line X will cause a system wide hang.
Issues: It requires a the AI to maintain a mental model of the entire system, not just a single snippet.
Gemini's Hallucination Signals
Grounding with Google Search -- the response now includes groundingMetadata. if the model makes a claim, it return "grounding support" (links to specific search chunks). if those supports are missing or weak, it's programmatic signal of potential hallucination.
Deep Think Mode's "Deep Think" (Chain of Thought) mode, the model often self-corrects. You can now access the CoT (Chain of Thought) summary, allowing you to see why it reached a conclusion, which makes it easier to programmatically verify it's logic.
The Black Box Recorder
Companies deploying AI Agents having no way to trace why an agent made a bad decision. Build a Black Box Recorder for AI Agents - as it sits between an LLM and the User. It records the prompt, the context, the model's Chain of Thought, and the final output. It then uses Gemini 3's Reasoning to peer review the other AI's work work in real time.
THE APPEAL: if the agent is about to execute a destructive command (via MCP), your tool intercepts it, explains the logical fallacy and suggests a correction.