Java NIO: Getting Started (Part II)
Stream based IO model of Java
Before we take up NIO its important that we refresh our understanding of the stream based IO model. It forms the core to java.io - the fundamental package which "provides for system input and output through data streams and serialization."
Let us analyse the model with focus on data streams as this is where the need of an improved treatment of IO in Java was felt.
Data Stream is a sequence of data units that can be accessed in order. The unit of data is fundamentally either a byte or a UTF-8 encoded character. Now we can either read from a stream or write to it. Any application that requires to read or write data would ultimately function at the level of data streams. A typical program reads data from a "SOURCE" in the form of an "INPUT" stream and delivers it to the "DESTINATION" as an "OUTPUT" stream. This approach generally translates into a loop that reads the input stream and writes to the output stream byte by byte or character by character. The following diagram attempts to elucidate the same:
Implications of this "stream oriented" approach:
1.) We cannot move back and forth. Data stream is actually a continuous flow of data. Unlike an array it is not indexed and the data units (bytes or characters) are not cached anywhere. So we simply cannot traverse the stream at will. If while writing an output stream you suddenly have to rewrite a portion of it, it becomes virtually impossible. Similarly while reading an input stream if you wish to reread a portion of the former half, you are hopelessly stuck. All that you can do to circumvent this is to cache your data in a buffer and then move back and forth. 2.) The thread remains blocked. In java.io the streams are "blocking". That is, if a thread is performing a read or a write operation it shall remain blocked until there is some new data to read or it is finished with the writing of data. More so, that if two threads attempt to read from a single input stream ,one beginning its read() after another the second thread will be forced to begin from where the first one leaves. 3.) Impedance mismatch. As has already been discussed in the previous blog, "impedance mismatch" is linked to the byte-by-byte or character-by-character treatment of data streams by the JVMs. To put it succinctly, the OS carts data in large chunks. These chunks need to be broken down by java.io before they can be used. As a result there is a huge mismatch in the speed with which data can be delivered by the OS to java.io and the speed with which java.io actually processes it further. To quote,Ron Hitchens from Java NIO, "The operating system wants to deliver data by the truckload. The java.io classes want to process data by the shovelful." This indeed appropriately summarises the performance spoiler java.io can sometimes be. 4.) Every connection has its own thread. In the Stream based model each connection (i.e., your input sources or output destinations) will require a separate thread. It is a direct fallout of the blocking nature "java.io" streams. So to deal with multiple connections at a time you shall need multiple threads. One for each. It turns out be a cumbersome when the amount of data in consideration is small and the connections are in thousands. 5.) Overhead generated by each IO request. Well this is one of the factors that result in the aforementioned mismatch of impedance. While the program loops the data byte-by-byte (or character-by-character) each such read or write request often triggers expensive operations like disk access or network activity. Again if we need to minimize these overheads we need to buffer the data first.
With the vision to rectify the inefficiencies of the stream oriented approach an improved model was developed and packaged as Java NIO!
[References:
https://blogs.oracle.com/slc/entry/javanio_vs_javaio http://docs.oracle.com/javase/tutorial/essential/io/bytestreams.html http://tutorials.jenkov.com/java-nio/nio-vs-io.html ]












