We snuck the 0.2.4 release out in February without saying too much about it, so I think its time for a long overdue discussion of some release highlights. Overall, 0.2.4 represents a gain in momentum towards the ambitious 0.3.0, which should include awesome new additions such as ddfs and discodex.
The new scheduler and scheduler framework provides fair scheduling and fifo scheduling options. The fifo scheduling works fine in small restricted environments, but fair scheduling is a must when jobs are competing for resources. This is the single biggest feature provided in 0.2.4 and a great reason to upgrade.
These were created to make it easier/possible to do things like read and write compressed inputs, and they make disco's input mechanism way more flexible. An input stream has a similar signature to a map reader (which will eventually be deprecated in favor of input streams), except input streams take the additional params object. Input/output streams can also be chained together by providing a list of them.
The default input stream function uses the scheme of the input URL to delegate to a more appropriate input stream. For instance, in discodex we build indices that store distributed index chunks which can later be queried against using a normal disco job. An output stream writes the resulting urls as `discodb://...`, and query jobs which provide the list of these URLs as inputs automatically use the discodb input stream to open the discodbs and get the appropriate iterator for the map function.
Input streams can also be used to do things like read from a database, or pretty much anything else you can think of that has to do with reading the input and providing appropriate objects for map.
The `disco test ...` command uses unittests that were rewritten to be safer and more controlled. The idea here is to eventually provide a test suite that makes step-by-step debugging installation easier, even filing reports to the issues list automatically when possible.
events format and exception handling:
Disco 0.2.4 uses a new format for sending events to the disco master, providing the groundwork for the new events API that will be part of 0.3.0. Related to this are changes in the way exceptions are handled in the disco worker, so that errors and their causes are always bubbled up to the master. Corner cases where certain types of exceptions were being masked were eliminated by these changes.
These are the highest profile changes from 0.2.4, but under the hood there was a lot more going on. As Disco (and the contributing community) grows, its important that we streamline the processes for developing Disco. That's why 0.2.4 includes many readability and style-converging improvements as well as functional changes. We've started using the github issues list more as a roadmap for Disco, not just for reporting bugs. The issues list is a great way to prioritize and tag TODO items while improving the visibility of Disco development. We are constantly looking for ways to refine the development process (all suggestions are welcome :). Congratulations and thanks to everyone who helped make this release possible!