I'm constantly seeing stories of people devastated that ai deleted their code or borked their computer or permanently deleted all their important work records or whatever, and while I'm sure many stories are fake I know at least some of them are real, and I don't get it. I don't understand why the ai is capable of doing this. Even if you do think "vibe coding" or whatever is a useful practice, you... you check the code and push it yourself, right? The ai can't push code? The ai shouldn't be able to touch your backups? WHY IS THE AI PHYSICALLY CAPABLE OF PUSHING CODE? WHY IS IT ABLE TO DELETE YOUR FILES AND DELETE YOUR BACKUP FILES? People tell it "do not change this without approval" and then get confused and upset when it does but WHY WAS IT PHYSICALLY ABLE TO? Am I misunderstanding something here? Shouldn't this be basic? Is there some complicated computer reason I don't understand where ais have to be physically able to fuck around with your important shit and your important shit's backups if you use them? Because that doesn't sound right. At least back up your records to an external device nightly or something if there's some architectural reason I don't understand that means ais have to be able to delete them all.
I'm of the opinion that even if you have to use AI it shouldn't have access to anything that you wouldn't be comfortable giving your thirteen year old nephew access to. Why is "AI irretrievably deleted all my emails" a possibility.
It’s a failure of sandboxing and not wanting to be overly prescriptive of all file permissions. If I want the agent to be able to modify a file, it has to have permission to write that file. Write permission means delete permission too.
Pushing code is a one line command: git push. Having it push commits to your own working branch is (sorta) fine. It’s when it pushes to the deploy branch (master or main) that it’s like “ok no it shouldn’t be doing that.” But people don’t differentiate that sometimes or the agent ignores the directive and. Well.
See maybe I'm a dinosaur but I just assumed that the ai would be writing the code in a safe locked off environment and the user would have to export it to somewhere else from where it could be pushed. Like the AI would be writing in Notepad or something, or at least a similar program that preserved code formatting properly. But I guess that very simple security feature is too time consuming. Sure let Spicy Autocomplete fuck around with your important shit. I don't know, I'm not a coder.
I'm actually a software engineer and security researcher that has been working on this kind of problem for years, since well before LLMs came around. The short version is that everything has this problem because a long time ago we decided to build all computers on a model that is harder to explain and harder to use and less secure but fits very nicely to a corporation that wants to be the sole authority on what a computer can do mixed with programmers/users who don't want to feel like there's anything they can't do. Most of the time you don't notice this because the malice, incompetence, and mistakes of other people usually gets filtered out before it bites you in particular too hard, and people don't think about putting up safeguards that protects them from themselves. AI just lets there be a new source of mistakes crop up that no one reviews and doesn't appear to be your mistake. The sandboxing you were talking about used to be the case at the very dawn of AI dev tools when it could only be a chat, but then as soon as people started adding features like "let the AI check test results" and "let the AI look for what might be relevant" that punched massive holes through the security that couldn't be fixed without dramatically reimagining how the security is done, and that takes vision and time and effort, which are notably lacking in current software development.
The long version is that by default absolutely everything you run on a computer has full authority to do anything you can do. Your web browser can read any file on your computer, any game can snoop on your browser history, your dev tools can wipe your hard drive. The only ways around this without changing our minds on how our computers secure things are very difficult and time consuming and require individual attention to get working per program.
The current model is something called "access control list" (mostly) which means that for every thing on your computer there is a list of who is allowed to access it, and whenever something tries to access it the computer checks if the who that something is for can access it. Therefore, everything you do on the computer can do everything you can, unless you specifically make a new "who" that the what belongs to and go through and enumerate in a list everything it should be able to do and whenever you want to, say, give it a new file to read, you have to add that file to a list of things it may access before providing the file rather than just giving it the file. Anything that looks like it is violating this (outside of some very niche systems like seL4, capnp, or my own research which I don't expect you to have encountered) is actually just transparently making a "who" and setting up lists behind the scenes. For example, a web browser will make a new "who" (but not the same as the OS's "who") for each website domain you visit and automatically populate the list from Cross Origin Resource Sharing settings the server provides. That's why going to one website doesn't steal your credentials on other websites unless it can trick the browser or trick you into telling the browser to let it.
The alternative model is called "capabilities" with a bunch of adjectives describing different ways they can be represented in different scenarios. (Ocaps, Zcaps, etc.) In this, instead of having a big list of things you are allowed to do any anything you ask to do things inheriting that list, when you ask something to do things, you hand it some reference that both lets it find the object and tells the system it's allowed to access the object. This means that programs you run can't do everything you can, they can only do things you say they can, and the act of saying they can do something and telling them to do it is the same action which makes it secure by default instead of insecure by default.
The android permissions system where the Tumblr app can't just steal and wipe your entire photo roll can be viewed as a sort of attempt to build a capability system on top of an ACL system without quite understanding that. For example, trying to open a file in Android opens a separate privileged file picker and the app that asked to open a file only gets the result of the action, as a cap should be. In windows, however, when you ask an app to open a file it just looks directly at every file you have access to and then you tell it which one.
I and a variety of other researchers are working, sometimes at cross purposes, to make things based around caps more flexible and easier to get started with and more powerful. ACLs have a lot of accumulated software and infrastructure, but caps can subsume it and make things nicer and cleaner and easier in the future. And as you might guess from the length of this post I am always excited to talk about what these systems can become.
I understand that the original Unix security model was "users don't trust each other, and root doesn't trust users, but users trust the system programs and their own programs with their own data" while the Android security model is "users don't trust their own apps with their data". I understand your "ocaps, zcaps" parenthesis to mean something like intents (on Android) or XDG Desktop Portal.
I'd go for something less "research grade" and more "practical". I'm not trying to disparage your research, but if you want to be less "ivory tower", I can see two ways you can go:
Option A: Vibe Coding IDE
You make something that is like "Eclipse for vibe coding" or "AI native Visual Studio". You have different plug-ins, different tools, and a LLM system that can do stuff for you. Plug-ins could be a build system, a test runner, a GUI designer that takes MSPaint.exe drawings and turns them into code or XAML, a build system plug-in, a version control plug-in, and so on. These plug-ins all run with user permissions, and they aren't self-modifying LLM output. They might be vibe-coded, but there was a human in the loop. They are part of the IDE, that is the important part. Maybe there is also a plug-in for documentation, so the LLM knows about all the APIs you are using. Maybe there is a standardised format for docs. If you press "run", it runs, without any extra steps.
Now importantly, the LLM has access to a suite of non-destructive tools. It can look at the version history, it can query LLVM-based parsers and linters, it can see compiler errors and warnings, it can use static analysis. But what it can't do is delete tests, download files from the Internet, or change the build system. (Maybe it can change the build scripts, but then you'd have to run the build system inside a sandbox, too). This works best when the LLM works with a language like Rust or Haskell, one with an effect system or guaranteed no side effects. If the LLM wants to add a dependency or change the build scripts, or if the LLM wants to do a git command that isn't just adding and committing changes, this must go through the user.
If the LLM wants to run or de-bug the code it has written – i.e. code that isn't part of a trusted IDE plug-in – then it will run inside a firejail or a similar jail, so that the generated code doesn't have the permission to head $HOME or to read ./git.
You could even configure the system so that it runs static analysis first, and when it can't prove that a LLM-generated code will behave, then the test-suite will run the LLM-generated code inside a sandbox. You could also configure the system so that the user will be prompted when the LLM generates or modifies "naughty" code, or alternatively, the LLM can be asked to refactor "naughty" code to behave better, and if that fails, the decision is escalated to the user.
Theoretically, a combination of static analysis, interfaces to trusted tools, and sandboxing should be good enough for a LLM-based IDE, unless you really want the thing to operate completely autonomously.
The biggest problem with this approach is that in practice, you need to run arbitrary code in the build system. In the real world the build system often needs to download things from online, and update itself, and every project uses a different build system, so you write plug-ins for build systems to interact with each other, and then another build system crops up and nothing is interoperable. Gone are the days of "./configure && make && make install". Gone are the days of self-contained Visual Studio projects with .sln files. Everything needs an SQL database and a job queue and a document database and the credential to an S3 bucket.
That brings us to:
Option B: Containerise It!
You let the LLM run wild and execute arbitrary shell commands. Instead of giving it an interface to tools, you just let it run GDB and clang and whatever it desires. I realise that the idea of sandboxing the program with firejail is involves ACLs, but this one is really different. You have the LLM running (predicting tokens) on your machine, and you have git running on your machine, or some other system to periodically snapshot the progress, but any code generated by the LLM, and any commands executed by the LLM are always running inside a container. There is no ACL in the sense that everything the LLM ever sees is sandboxed.
This is still problematic. The LLM could generate a program that runs "sudo rm -rf /" inside your container, and then later you take the output of your independent LLM agent, and you compile it, and it wipes your machine. It's still possible. The solution is still code review and a human in the loop, but at least when it happens, you only have yourself to blame.
















