SIGCHLD, PTYs, Why Exet Still Can't Properly Deal With Fully Interactive Programs, and What it Will Take to Fix it
Programming can be hard. Sometimes everything just falls into place, and other times, it can seem like every party responsible for the tools you're presented with conspired to ensure that combining them in the way you need would be as difficult as possible.
For this rant, I will be looking at a case of the latter scenario. Get comfortable, because this is going to be a long ride.
I have been working on something called exet. It is designed to be able to do everything the expect utility does, and then some, using its own programming language, purpose built to handle the task. In many ways, it already accomplishes this, and I'm nearing an alpha release, so that others can start playing around with it.
However, there is still one major thing that exet can't do. It can't properly deal with fully interactive programs. Programs that present a prompt, and wait for a user to enter something.
Currently, exet deals with external programs through commands, which are file channels (what exet has instead of conventional variables) with a path that begins with a bang (!). Exet executes commands in much the same way as the shell, connecting to them using pipes. Most interactive programs will actually set themselves into a non-interactive mode when they detect that they're connected to a pipe, which allows this method to work for the majority of programs. This is also easier, and more efficient than the alternative, which is why I set it up first, and have avoided the other "solution" until now. But what happens when a program doesn't have a non-interactive mode, or when you need the interactive version of the command, because of some quirk of what you're trying to do?
A pseudoterminal, or PTY as I will refer to them from here on, is like a pipe, in that it allows one process to be connected to another. The difference is that, to the process on the slave end of the PTY, it appears to be connected to a terminal device, and therefor believes that it is interacting with a user. Whenever you open a graphical terminal window, or "terminal emulator", one of the first things that software does is start a PTY session. It then controls details of the PTY to reflect the size of the window, and the type of terminal being emulated, as well as capturing special "control codes" and responding to them, so that the software running in the session behaves as we would expect it to on a proper terminal.
So how does this help us, and what is the problem?
The first real problem we encounter when dealing with programs attached to pipes is that of buffering. Many programs use the standard I/O library to read and write text to whatever they're connected to. If this is a terminal, then the output of the program is "line buffered", by default. This means that the program waits to output anything until either the buffer is full, or a newline is stored to the buffer. This makes sense if you're interacting with a user, and want every full line to be immediately visible to them. However if the standard I/O is connected to a pipe, it instead defaults to "fully buffered", and the output won't be sent until the buffer is full, or if the program explicitly flushes the buffer. Remember that last bit, it'll be important later.
So, if we're using a pipe to redirect output of a program into exet, we could end up waiting forever for input that will never come, because the the buffer never fills, because the process is waiting for exet to make the first move, but we don't know that. But we can solve this by making it think that it's talking to a user by connecting it to PTY. Cool. So we just use a PTY then?
If it were only that simple...
Lets start by looking at the process for setting up pipes, and compare that to the process of setting up a PTY. I will try to condense this as much as possible. It's a lot of information, and we don't really need the details of it.
To start a process, and connect it via pipes, we need to first, create a set of pipes using the pipe() function, one for the new process' stdin, and one for stdout, create a duplicate of the current process using fork(), close the ends of the pipes we don't need in their respective processes (if we set the O_CLOEXEC option on the pipes, we can skip this for the child process), duplicate the correct ends of the correct pipes to STDIN_FILENO and STDOUT_FILENO using dup2() in the new child process, then replace the child process' image with the image of the program we want using an exec call of some flavour (exet uses execvp()). Once we've done all that, we can read from the pipe ends in the parent process to see and respond to what the child is doing.
If that sounded complicated, remember, that's the easy way. Now lets look at what it takes to set up a PTY.
There are actually a number of ways to open a PTY, some of which are much easier to work with than others. However, because portability is something I strive for, the one we want is the POSIX method, which also happens to be one of the ones that requires a bit more work.
The first thing we need is to open the PTY master using posix_openpt(), which will give us the file descriptor for the PTY master. Next we need to make sure that the slave PTY has the correct owner and permissions by using grantpt() on the master, and if that sounds stupid and pointless, it is, but it's also necessary and a problem, but we'll come back to that. Next we need to unlock the slave PTY, so that we can actually connect something to it, which we do with unlockpt(). Once that's done, we can finally get the path to the slave PTY by using ptsname(), open the slave like any other file, fork() a new child process, attach its standard I/O to the slave PTY using dup2(), just pile we did with the pipes, set the mode of the new PTY, if needed, and exec into the command we want to run.
Okay, so that's all we need to do right? We can just do that.
There is one thing that I haven't talked about, until now, and that's what happens after a child process is running. Or, more specifically, what happens when a process ends.
On UNIX-like systems, when a process dies, it doesn't immediately die completely. Instead, the return code, and the conditions that caused the process to terminate are recorded, and the kernel saves this information in the form of a zombie. These mostly dead processes linger in the systems process table until their parent process waits on them using something like waitpid(). The parent process is notified of its undead children using the SIGCHLD signal, so by catching that signal, and performing a wait, it is able to free the zombie processes, allowing the kernel to put its soul to rest while returning the details of how they died, and what they had to say about it.
Without getting too much into the details of this system, and how absolutely awful it is to actually deal with, know that exet needs to be able to do this for it's child processes. Capturing SIGCHLD, and dealing with the task of waiting on any dead children is something that took a lot of effort to do correctly. It would be really inconvenient if something conflicted with that system.
Remember that grantpt() step?
The grantpt(), unlockpt() combination was designed to deal with a potential security flaw. Particularly, a case when an application goes to reuse a PTY. Most contemporary systems don't have to worry about it, and some, like most BSDs, don't do anything with grantpt(), other than checking that it was passed a valid PTY master. However, on Linux, specifically under glibc, the grantpt() function does do something. In most implementations, it fork-execs an external command to set the owner and permissions of the slave PTY.
Wait. So it forks a new process?
Lets take a look at what the manual has to say:
The behavior of grantpt() is unspecified if a signal handler is installed to catch SIGCHLD signals.
Why wouldn't it conflict with SIGCHLD?!
Okay, so we can open a PTY, but not in the main exet process, without a lot of extra work to solve a conflict with a system that was already a massive pain to set up. Are we cooked?
Well, there is a way to get around it. We can fork a process using the pipe method first, then use that as a barrier between the main process, and the process connected via PTY. It's more overhead, but we knew what we were getting into. In for a penny, in for a pound.
So then that's it. We spawn a process connected to the main exet process via a pipe, reset all the signal handlers in the new process, create a PTY session, and spawn another process connected via PTY. That will fix all of our problems, right?
Well, remember way back near the start, when I mentioned that a program can explicitly flush its output buffer? It's time to talk about the real problem. It's time to talk about-
The Problem With Prompts:
When an interactive program wants to read something from the user, typically, it expresses this by presenting the user with a prompt. If you've ever used any kind of command line utility, you've almost certainly seen this. It prints something right before the cursor to let you know that it's waiting on your input, and usually telling you something about your environment.
Note that I said before the cursor. A prompt doesn't typically end in a newline.
Do you see the problem yet?
Exet currently will only accept full lines. Partial lines are ignored until a newline is seen. So how do we detect a prompt?
The answer is, there is no real answer. Any solution we use will have some kind of flaws that we'll need to work around. So, with that in mind, what is the best option?
Well, in a post a few days ago, I mentioned rlwrap. It uses a PTY to help provide a GNU readline wrapper to utilities that don't natively support things like command history and editing. If we take a look at its prompt detection, it uses a timeout (and some other magic that we don't need to worry about). This isn't perfect, but it seems to work well enough.
A naive way to apply this in exet would be to handle the timeout in the process controlling the terminal. That way it could simply append a newline to the current buffer once the timer expires, and send it on as though it were a complete line. However, realistically, we want our exet programs to be aware whether a line is complete, incomplete, or a continuation of an incomplete line. So, with this in mind, we can implement the timeout in the main exet process, then make the details of how "complete" the line was available to our exet programs using special channels.
Alright then. We finally have our answer.
We have to fork a piped process, which will create a PTY, which will be connected to another process, which will execute our command, which we can then communicate with through the piped process, through the PTY, and if the main process times out waiting for a response from the command, then we submit whatever's currently in the buffer to our exet program, even if there's no newline, while also letting it know how complete the information was.
It's done. We know how to fix it.
So that's it. It's "solved", and the only thing left is to make it happen. It'll be a while, but I will be adding this functionally. The language will never really be complete without it.
I still need to decide on some of the exact details of the implementation, but that's a problem for when I actually go to build it. Until then, I can think about it, and toy with how I can improve this, incredibly flawed, system. The alpha release comes first anyway.
Sorry for the somewhat anticlimactic ending. The post is long enough as it is, and I didn't want to keep dragging it out. If you actually managed to get through this, thank you for taking the time. I hope it was entertaining, and informative, seeing what happens when nothing goes the way you want, and what it takes to overcome these kinds of issues. Feel free to stick around, ask questions, what have you. I'm always happy to share knowledge, and help with whatever I can.
I would have done some illustrations, but I didn't have the mental or physical energy. Maybe next time, if anyone actually reads this.