Code Interpreter And GTI For Gemini Malware Analysis
Using Google Threat Intelligence and Code Interpreter to Empower Gemini for Malware Analysis
What is Code Interpreter?
A tool that converts human-readable code into commands that a computer can understand and carry out is called a code interpreter.
What is code obfuscation?
A method called “code obfuscation” makes it more difficult to understand or reverse engineer source code. It is frequently used to hide data in software programs and safeguard intellectual property.
Giving security experts up-to-date tools to help them fend off the newest attacks is one of Google Cloud‘s main goals. Moving toward a more autonomous, adaptive approach to threat intelligence automation is one aspect of that aim.
As part of its most recent developments in malware research, it is giving Gemini new tools to tackle obfuscation strategies and get real-time information on indicators of compromise (IOCs). While Google Threat Intelligence (GTI) function calling allows Gemini to query GTI for more context on URLs, IPs, and domains found within malware samples, the Code Interpreter extension allows Gemini to dynamically create and run code to help obfuscate specific strings or code sections. By improving its capacity to decipher obfuscated parts and obtain contextual information depending on the particulars of each sample, these tools represent a step toward making Gemini a more versatile malware analysis tool.
Building on this, Google previously examined important preprocessing procedures using Gemini 1.5 Pro, which allowed us to analyze large portions of decompiled code in a single pass by utilizing its large 2-million-token input window. To address specific obfuscation strategies, it included automatic binary unpacking using Mandiant Backscatter before the decompilation phase in Gemini 1.5 Flash, which significantly improved scalability. However, as any experienced malware researcher is aware, once the code is made public, the real difficulty frequently starts. Obfuscation techniques are commonly used by malware developers to hide important IOCs and underlying logic. Additionally, malware may download more dangerous code, which makes it difficult to completely comprehend how a particular sample behaves.
Obfuscation techniques and additional payloads pose special issues for large language models (LLMs). Without specific decoding techniques, LLMs frequently “hallucinate” when working with obfuscated strings like URLs, IPs, domains, or file names. Furthermore, LLMs are unable to access URLs that host extra payloads, for instance, which frequently leads to speculative conclusions regarding the behavior of the sample.
Code Interpreter and GTI function calling tools offer focused ways to assist with these difficulties. With the help of Code Interpreter, Gemini may independently write and run bespoke scripts as necessary. It can use its own discretion to decode obfuscated elements in a sample, including strings encoded using XOR-based methods. This feature improves Gemini’s capacity to uncover hidden logic without the need for human participation and reduces interpretation errors.
By obtaining contextualized data from Google Threat Intelligence on dubious external resources like URLs, IP addresses, or domains, GTI function calling broadens Gemini’s scope while offering validated insights free from conjecture. When combined, these tools enable Gemini to better manage externally hosted or obfuscated data, moving it closer to its objective of operating as an independent malware analysis agent.
Here’s a real-world example to show how these improvements expand Gemini’s potential. Here, we are examining a PowerShell script that hosts a second-stage payload via an obfuscated URL. Some of the most sophisticated publicly accessible LLM models, which include code generation and execution in their reasoning process, have already been used to analyze this specific sample. Each model “hallucinated,” producing whole fake URLs rather than correctly displaying the correct one, in spite of these capabilities.Image credit to Google Cloud
Gemini discovered that the script hides the download URL using an RC4-like XOR-based obfuscation method. Gemini recognizes this pattern and uses the Code Interpreter sandbox to automatically create and run a Python deobfuscation script, successfully exposing the external resource.
After obtaining the URL, Gemini queries Google Threat Intelligence for more context using GTI function calling. According to this study, the URL is associated with UNC5687, a threat cluster that is well-known for deploying a remote access tool in phishing attacks that pose as the Ukrainian Security Service.
As demonstrated, the incorporation of these tools has improved Gemini’s capacity to operate as a malware analyst that can modify its methodology to tackle obfuscation and obtain crucial information about IOCs. Gemini is better able to handle complex samples by integrating the Code Interpreter and GTI function calling, which allow it to contextualize external references and comprehend hidden aspects on its own.
Even while these are important developments, there are still a lot of obstacles to overcome, particularly in light of the wide variety of malware and threat situations. Google Cloud is dedicated to making consistent progress, and the next upgrades will further expand Gemini’s capabilities, bringing us one step closer to a threat intelligence automation strategy that is more independent and flexible.
Read more on govindhtech.com
















