We cannot have a safe model that doesn’t interact with reality and doesn’t get feedback, because it won’t be coupled with reality. But we can’t trust a model that interacts with reality unless we’re already been convinced its representations are valid and not likely to slip. This is not to conclude that the problem is unsolvable - it is just saying that the ways that model developers are coupling models to reality seems to assume we’ve solved the problem that we’re trying to fix by doing so.
Cooperative AI fails unless its words reliably point to the world. Unfortunately, it seems like the way models are built is assuming the pro










