A very active BaseX user took the time to write a review about BaseX 6.5 and related changes. Thanks to Jan Vlčinský!
Storage: Text Compression, Backups, Import
XQuery 3.0: Group By, Try/Catch, Switch, Serialization
Built-In Functions: EXPath, HTTP, DB, Fulltext
GUI: Views, General Features and Hints
" Goal of this article is to find out what the new version 6.5 of BaseX brings and to make sure that we can safely use it our project. In addition, I want to learn some basic GUI skills.
The new website of the project looks very nice. Being completely redesigned it offers much more information in a clearer way. Also worth to mention is the short video tutorial that shows first steps with the GUI mode of BaseX. Although the old documentation was already usable, the new Wiki seems much more matured.
In contrast to BaseX 6.4, the new version 6.5 comes as a bundled download, containing BaseX GUI and BaseX Server, as well as the available APIs, REST server, sample documents and starting scripts for popular platforms (exe, dmg, zip). Using windows, an installer guides through the installation process. A simple JAR file is also included and serves as entry point for the individual scripts.
Text Compression reduces the size of text nodes up to 50%. Although I haven't thoroughly tested it, my experience shows that disk space requirements have already been very reasonable in previous versions. In my case, a database consumes approximately the same amount of disk space as the imported XML files. For some cases the disk requirements double, depending on indexes and overall document characteristics.
Enhanced Collection Support means that documents can be stored in folders and subfolders. It was already working well when I tested it with version 6.4. For example, the following query returns all documents folder1 contains:
collection("hierarchicCollection/folder1")
Backups Using the GUI this feature can be found under Database/Manage. It's also possible to call the Backup Command. A zip archive that contains the database is created without prompting and stored at the location of the database folder. Restoring works pretty much the same way. The most recent database backup (timestamp) is restored - again without any prompting. At first this seemed surprising, but I like the simplicity and straight-forward manner of the approach. In the end it seems very well usable.
Import Of CSV, HTML And Text enables the user to create new databases from CSV and Text data out of the box. Select the required input format in the 'Parsing' tab of the 'Create Database' dialog. The 'SET PARSER' command can be used to import data like this from a command line. At the moment tab, comma, and semicolon work as field separators. There's also no need for column headers. If a CSV file contains headers, they are stored in the first record. Plain text is imported in a way that each individual line is stored as a text node child of a 'line' element. All in all the import function is very useful as it creates an internal XML representation and enables the user to explore and search the data using XQuery or one of the interactive visualizations.
Below are a few examples for imported CSV and Text data.
<csv>
<record row="0">
<field col="0">date</field>
<field col="1">num</field>
</record>
<record row="1">
<field col="0">2009-04-22</field>
<field col="1">3</field>
</record>
</csv>
<text>
<line>Dear all,</line>
<line/>
<line>We are excited to announce the
relaunch of our homepage, which now</line>
<line>includes commercial offerings such as
professional support, individual</line>
<line>software solutions, and training courses:</line>
<line/>
<line> http://basex.org/</line>
<line/>
</text>
XQuery 3.0 (former XQuery 1.1) is still a working draft and not yet fully implemented in BaseX. For a list of currently supported functionality visit the documentation.
Group By the 'Group By' clause extends the FLWOR expression and has been a long-awaited feature among real-life users. I tested its functionality without stumbling over any problems. I would recommend to read the specification of 'Group By' first, as there are some specialties that seem odd at first.
Try/Catch For example, the try/catch block comes in handy in functions and for type related operations.
let $txtval := "123.15"
let $intval := try {
xs:integer($txtval)
} catch * {0}
return $intval
Switch A switch construct replaces bulky if-else chains in an elegant way. For example:
let $animal := "Fish"
return
switch ($animal)
case "Cow" return "Moo"
case "Cat" return "Meow"
case "Duck" return "Quack"
default return "What's that odd noise?"
Serialization Serializing XML to text works very well. In the past I faced problems that lines were starting with an extra space. This issue seems to be resolved. In the following example I create a CSV file using semicolons as field delimiter:
(: Using the XML created by importing CSV file:)
declare option output:method "text";
for $line in //record
let $date := $line/field[@col = 0]
let $num := $line/field[@col = 1]
return text {concat($date, ";", $num, codepoints-to-string(10))}
New And Modified Functions XQuery 3.0 offers some really handy functions. Visit the BaseX wiki and the specification for a complete overview. Below are some functions that I find very useful:
get first, get all but first
# fn:head()
# fn:tail()
element id generation
# fn:generate-id()
access environment
# fn:environment-variable()
# fn:available-environment-variables()
quick "dir" for documents
# fn:uri-collection()
formating values into readable text
# fn:format-integer()
# fn:format-number()
# fn:format-dateTime()
# fn:format-date()
# fn:format-time()
File The EXPath module has been implemented which offers functions to perform file system related operations. Included are functions to read, write and access the file system. A list of the complete functionality including a short explanation can be found in the wiki. All functionality is subject to change as the specification is still a W3C Candidate Module.
HTTP Another novelty is the HTTP client module that contains an XQuery function to send HTTP requests and handle responses. As this function is quite young I'm yet to be fully convinced by this feature.
DB There's a new module that allows to perform general database functions. An overview is listed here. I still miss a way to delete single nodes one by one. E.g. a document is stored multiple times if I insert it repeatedly. But the only way to delete this document is to delete it together with all documents that have the same name.
Fulltext Although I don't use the full text functionality myself, there are some new functions worth to mention. The W3C Full Text recommendation is extended by functions to mark results, extract relevant full text sections and explicitly access indexes and score values. An additional feature is the built-in Lucene and Snowball Stemming for XQuery Full Text. Again, consult the documentation fur details.
The GUI has already been excellent in older versions and got even better. Each view is shortly presented, although all of them have already been part of previous versions.
XQuery Editor For me this is the best environment to write XQuery code I have found so far. I also own XMLSpy, but BaseX is by far better. While typing it tells me the exact location of any error in my query.
Text View The text view resides on place two of my most used visualizations. Database contents and XQuery results are displayed here. When I click on the home button and a database is opened, the view shows the original input XML data. There's also a quick search option. I would like the Folder View to display query results. I am aware that it sounds easy but is a complicated task at the same time.
Map View I call the Map View 'a tree with rectangular branches'. It looks impressive but I don't use it. Although it might be useful for closer inspection of the data, as the number of displayed nodes is reduced.
Tree View This is a real tree visualization with the root being placed at the top.
Folder View I like this one a lot. People who like Microsoft Internet Explorer XML representation with folding option will also find this very useful.
Table View Being new to me it looks very promising but I still have to learn to use it to my advantage. I would like to control the table columns in some way.
Plot View This is a great tool if you want to gain some quick insight into your data. It requires some thinking at first, but then it is a very efficient tool for visualization.
Explorer View The Explorer View allows for some simple filtering without the use of XQuery.
General features and hints
Context Menu There lies great value in using your right mouse button. Call 'Copy Path' from the context menu on any node to get an XPath location path for the selected node - which is really handy for writing queries. This feature is available in most views.
Navigate The Document Tree Navigate into subtrees by clicking on nodes. Navigate back up via the 'Back' button in the menu bar, using backspace or the context menu. To reset all views press the 'Home' button.
I still miss an option to select and delete a particular document from a collection. So far I have to use XQuery and enter the document name manually. Being already very useful, some more instructions would greatly help to release the full potential of the GUI.
Conclusion In conclusion, the GUI is the best place to explore heaps of XML data (and more: CSV, Text, ...). It features short responsive times and is well designed. There are really a lot of options to evaluate XML data and its atoms as well. Yet the documentation is kind of short, so there are always new gimmicks to be found.
Although I personally use the Python binding the API seems to have evolved since the last version. Binding queries to variables now allows parametrical queries. The wiki shows some examples with PHP. Most APIs offer about the same functionality, I guess, which includes creation of a session, creation of queries, binding a variable to a query, execute a query and fetch results in an iterative manner, if required.
I haven't yet tested the Python binding with the actual release. But I expect it to be very usable like with older releases. Note, that Python uses one global socket which might lead to problems with concurrent sessions.
EXE-Installer I provide a quick overview on the installation process using Windows and the new EXE Installer.
File Associations Be aware that the installer associates all XML and .xq files with BaseX by default. I played around with the associations and it seems very promising and handy. If I open an XML file BaseX creates a database with the name of the input file. In case such a database already exists, it is silently overwritten. My recommendation would be to accept file associations as it is a very comfortable way to explore XML files. It's also much faster than using the Internet Explorer.
To protect other databases I just use some naming convention - two underscores for databases that I want to keep safe are sufficient, as common XML files do not use underscores. This way, nothing is overwritten accidentally. If I want an imported XML document to become my long term database, I simply rename it via Database-Manage-Rename.
When it comes to .xq files (XQuery) BaseX is simply the best editor I have found so far. These files are opened in a new BaseX window where one can edit and save them subsequently. If you want to test the query, just open the corresponding database. By default, BaseX is only associated with .xq files - I would recommend to add .xquery files manually.
The release of BaseX 6.5 and all related changes including the website and commercial offerings show, that the authors take their work very seriously.
One finds great support in the mailing list! It's a bit slower than google search (although not much), but provides real customized answers to your problems.
I think BaseX features a great GUI to explore XML and develop XQuery, plus a server, that is capable of handling lots of data, yet consuming a reasonable amount of ressources. In addition, the API allows to use BaseX through many different programming languages. "