Apache Gora data modelling and data bean to Oracle NoSQL mapping
Each class that must be persisted in Apache Gora should be defined as a data bean. Apache Gora uses Apache Avro to define its data beans. According to the Avro terminology such a definition is called an Avro Schema and is declared using JSON. An example of such an Avro Schema is the following (borrowed from the Apache Gora examples).
{ "type": "record", "name": "WebPage", "namespace": "org.apache.gora.examples.generated", "fields" : [ {"name": "url", "type": "string"}, {"name": "content", "type": ["null","bytes"]}, {"name": "parsedContent", "type": {"type":"array", "items": "string"}}, {"name": "outlinks", "type": {"type":"map", "values":"string"}}, {"name": "metadata", "type": { "name": "Metadata", "type": "record", "namespace": "org.apache.gora.examples.generated", "fields": [ {"name": "version", "type": "int"}, {"name": "data", "type": {"type": "map", "values": "string"}} ] }} ] }
The above JSON is an Avro Schema that will be used by Apache Gora to hold WebPage information. We save this in a file with the extension .avsc.
Next, we use the Gora compiler to compile this schema into a Gora data bean. This data bean is a Java class that extends the PersistentBase class and represents a Persistent class; in other words, a class that can be persisted in Apache Gora. Gora data beans should be created by the Gora compiler and not manually. As of v0.3 of Apache Gora, the generated data bean would the following:
public class WebPage extends PersistentBase { public static final Schema _SCHEMA = Schema.parse("{\"type\":\"record\", \"name\":\"WebPage\",\"namespace\":\"org.apache.gora.examples.generated\", \"fields\":[{\"name\":\"url\",\"type\":\"string\"},{\"name\":\"content\", \"type\":[\"null\",\"bytes\"]},{\"name\":\"parsedContent\",\"type\": {\"type\":\"array\",\"items\":\"string\"}},{\"name\":\"outlinks\",\"type\": {\"type\":\"map\",\"values\":\"string\"}},{\"name\":\"metadata\",\"type\": {\"type\":\"record\",\"name\":\"Metadata\",\"fields\":[{\"name\":\"version\", \"type\":\"int\"},{\"name\":\"data\",\"type\":{\"type\":\"map\",\"values\": \"string\"}}]}}]}"); public static enum Field { URL(0,"url"), CONTENT(1,"content"), PARSED_CONTENT(2,"parsedContent"), OUTLINKS(3,"outlinks"), METADATA(4,"metadata"), ; private int index; private String name; Field(int index, String name) {this.index=index;this.name=name;} public int getIndex() {return index;} public String getName() {return name;} public String toString() {return name;} }; public static final String[] _ALL_FIELDS = {"url","content","parsedContent","outlinks","metadata",}; static { PersistentBase.registerFields(WebPage.class, _ALL_FIELDS); } private Utf8 url; private ByteBuffer content; private GenericArray parsedContent; private Map<Utf8,Utf8> outlinks; private Metadata metadata; public WebPage() { this(new StateManagerImpl()); } public WebPage(StateManager stateManager) { super(stateManager); parsedContent = new ListGenericArray( getSchema().getField("parsedContent").schema() ); outlinks = new StatefulHashMap<Utf8,Utf8>(); } public WebPage newInstance(StateManager stateManager) { return new WebPage(stateManager); } public Schema getSchema() { return _SCHEMA; } public Object get(int _field) { switch (_field) { case 0: return url; case 1: return content; case 2: return parsedContent; case 3: return outlinks; case 4: return metadata; default: throw new AvroRuntimeException("Bad index"); } } @SuppressWarnings(value="unchecked") public void put(int _field, Object _value) { if(isFieldEqual(_field, _value)) return; getStateManager().setDirty(this, _field); switch (_field) { case 0:url = (Utf8)_value; break; case 1:content = (ByteBuffer)_value; break; case 2:parsedContent = (GenericArray)_value; break; case 3:outlinks = (Map<Utf8,Utf8>)_value; break; case 4:metadata = (Metadata)_value; break; default: throw new AvroRuntimeException("Bad index"); } } public Utf8 getUrl() { return (Utf8) get(0); } public void setUrl(Utf8 value) { put(0, value); } public ByteBuffer getContent() { return (ByteBuffer) get(1); } public void setContent(ByteBuffer value) { put(1, value); } public GenericArray getParsedContent() { return (GenericArray) get(2); } public void addToParsedContent(Utf8 element) { getStateManager().setDirty(this, 2); parsedContent.add(element); } public Map<Utf8, Utf8> getOutlinks() { return (Map<Utf8, Utf8>) get(3); } public Utf8 getFromOutlinks(Utf8 key) { if (outlinks == null) { return null; } return outlinks.get(key); } public void putToOutlinks(Utf8 key, Utf8 value) { getStateManager().setDirty(this, 3); outlinks.put(key, value); } public Utf8 removeFromOutlinks(Utf8 key) { if (outlinks == null) { return null; } getStateManager().setDirty(this, 3); return outlinks.remove(key); } public Metadata getMetadata() { return (Metadata) get(4); } public void setMetadata(Metadata value) { put(4, value); } }
Now that we have the Gora data bean in place, Apache Gora needs a way to know where in the datastore it should persist instances of this data bean and their fields. This is the work of the mapping. All we have to do is define a mapping file in XML that gives some important information to Gora such as the class for the key, the name of the database table to persist instances of the data bean, what the primary key is and the names of the fields and under which name they should be persisted in the database. Note that each datastore has its own, specific, structure for the mapping file. This is because each NoSQL database has its own, unique, structure and also to take advantage of the features of each data store. An example mapping file for Oracle NoSQL would be the following:
<gora-orm>
<class name="org.apache.gora.examples.generated.Employee" keyClass="java.lang.String" table="Employee">
<primarykey name="ssn" column="ssn" />
<field name="name" column="info/name" />
<field name="dateOfBirth" column="info/dateOfBirth" />
<field name="salary" column="info/salary" />
<field name="boss" column="info/boss" />
<field name="webpage" column="info/webpage" />
</class>
</gora-orm>
But what does this mapping file means in regards to Gora-Oracle datastore and consequently the Oracle NoSQL database? It is essential in order to allow the Gora-Oracle data store construct the Oracle NoSQL keys based on a predefined data model. Before we describe the data model of the Gora-Oracle datastore, first let's revise the basics of the Oracle NoSQL data model.
Oracle NoSQL is a key/value store. The value is an opaque byte array and the key is (or could be) a string. To be more specific, the key (full key) is composed on a Major key and a Minor key. Also, the Major key is composed of multiple components. Similarly, the Minor key is composed of multiple components
Using the same notation as Oracle NoSQL uses (http://docs.oracle.com/cd/NOSQL/html/javadoc/oracle/kv/Key.html#toString()), the full key can be illustrated as:
MajorKeyComponent1/MajorKeyComponent2-MinorKeyComponent1/MinorKeyComponent2:Value
Apache Gora in its API uses some terms that do not exist in the Oracle NoSQL database; terms such as table, schema, column, and column family. Oracle NoSQL knows only about key/value pairs. However, because of the multi-component nature of the Oracle NoSQL keys, we were able to emulate a table, a column, and a column family. Following is the data model that Gora-Oracle datastore uses:
We use the 1st component of the Major key to map the table/schema.
We use the 2nd component of the Major key to map the persistent key.
We use the 1st component of the Minor key to map the column/field family (if any).
We use the 2nd component of Minor key to map the field.
Then this maps to the following fullKey : value:
/TableName/PersistentKey-FieldFamily/Field : Value
By "table" I do not mean an actual table. I just mean that this key component would act as a container (called table in several NoSQL databases; but this has nothing to do with the relational table notion of RDBMSs).
In Gora API the term schema is loosely defined. In practice, a data store can define what "schema" means for its data model. Other Gora data stores use a table as a schema. In Oracle NoSQL there is no schema nor a table. Therefore, it was decided that the 1st component of the Major key would serve both as a table and a schema for the Gora-Oracle data store.
By "field family" I just mean that this key component would serve as a container for field that have common context. I do not mean that Oracle NoSQL actually supports field/column families. This is in order to be consistent with the existing Gora datastores.
Currently, the value of each field is stored in the Value part of the key/value pair as byte array value. As it is Oracle recommended to use Avro for storing the values, in the future this might change.
Let's look at an illustrated example how this data model maps a Gora data bean to Oracle NoSQL key/value pairs:
I hope that I managed to successfully explain some of the basics of Apache Gora and the data model of the Gora-Oracle data store.