Discover Top Posts Tagged with #hands-on-technology

Cassandra in a Nutshell

What is Cassandra ? Its a file+SSTable based NOSQL database optimized for clustered environments. With reference to CAP theorem, Cassandra is more optimized for Availability and Partitioning tolerance.It can be tuned for consistency but that will lead to more latency. It supports clustering through rings that consists of many Cassandra nodes, and different configuration parameters can be used to leverage strengths of a cluster with it. Optimized for writes. Reads may become slow for various reasons. But writes are super-fast. Read latency can exponentially grow if data is being deleted frequently (Cassandra does not delete immediately. It flags the data as tombstones and waits for a scheduled tombstone compaction. Read queries go through these ghost data, causing delays.) More details in [1]. How to model your data Works based on ;

keyspaces : This is similar to a schema within relational DBs. A keyspace consists of multiple column families.

Column Families : Similar to tables in relational data, but different inside. Instead of rigid columns, Column families contain a dynamic cell based approach.

Rows : Not restricted to a rigid column count within a column family.

Column : Different from a relational column. A column is bound only to its row and it contains an attribute of the row data.

Refer [2] to understand more on above elements. All these have specific size limitations to ensure Cassandra works as expected [3]. CQL is a modern Cassandra design language similar to SQL syntax [4]. Handy, in-built tools used to investigate Cassandra ---------------------------------------------- This is mostly the important part of this article. Since Cassandra is a very versatile DBMS, it can be tuned on various aspects, and multiple combinations can lead to various outcomes. In such a situation, knowledge about how to understand and investigate the software is crucial. To this end, I have summarized a few tools and commands used frequently during my work. nodetool

------------- This is the swiss-army knife when it comes to collecting external statistics of Cassandra. A bit of reading and command help will give you in-depth explanations of each use of nodetool, but I will explain a few for quick reference. Check status of the cluster - "./nodetool -host <ip> ring". This will display the status of each cassandra node, its token value, disk usage, and the weight distribution. Useful to check if the ring is balanced.

Move a node to a different token - "./nodetool -host <ip> move <newToken>". Even though Cassandra clusters are balanced initially, they can become imbalanced when some nodes fail and shut down. This command will help you to re-balance it. View compactionstats - "./nodetool -host <ip> compactionstats". This command will show the state of the current compaction running on Cassandra (if any). View live sstable access information - "./nodetool -host <ip> cfstats". This command will display a running view of the sstables being read at a given point and much more information. Find tokens to balance your cluster python -c 'print [str(((2**64 / <NODE_COUNT>) * i) - 2**63) for i in range(<NODE_COUNT>)]' cqlsh

-------

This acts like a query tool to investigate the actual data in Cassandra SQL style. The tool can be used with command "./cqlsh <ip>"

Display all keyspaces - DESCRIBE keyspaces;

Use a specific keyspace - USE "<keyspaceName>";

Count number of rows in a column family - select count(*) from "<keyspaceName>"."<columnFamily>"

View all meta information about a column family - DESCRIBE TABLE "<columnFamilyName>"

A sample query to alter the schema of a columnFamily - ALTER TABLE "MessageContent" with compaction={'tombstone_compaction_interval':600,'class':'SizeTieredCompactionStrategy'};

Assume that a specific column is the given data type. (If u want to check out byte column values, this is useful.) ASSUME "<keyspaceName>"."<columnFamily>"(<columnName>) VALUES ARE text ;

Apologies if this post feels rushed :) I wanted to share the information quick before i lose track of it. And again, there could be mistakes and misunderstandings here. Everything is open for discussion.

References : [1] : http://wiki.apache.org/cassandra/ArchitectureOverview [2] : http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/#.VFuG59a35dI [3] : http://wiki.apache.org/cassandra/CassandraLimitations [4] : http://www.slideshare.net/DataStax/understanding-how-cql3-maps-to-cassandras-internal-data-structure

#Hands-on-Technology

Moving to a Git Workflow - my experience

As someone with only very light touch on git and more experience with svn's traditional version control, I was a bit overwhelmed to move to a complete git-based work environment. However, given we had a few helpful super-gits around (shout out to asankaab) I was able to settle down to a tried-and-tested workflow fast. Here's what i do now, hoping it will help you :) Contributing to someone else's repository (with github)

1. Fork the repo to your own repository from the UI and clone it [1]. 2. Add the original repo as a new remote location. This is to ensure that you pull from the original repo and push to the forked repo. "git remote -v" will show you the current remote repos and if u have just cloned, you will see only one url for both pull and push options. "git remote add <name> <url>" will add a new location. (Lets assume the name is "parent_repo" for the rest of the article.) Now you have setup your clone with 1 branch (master) in a perfect manner to properly transfer your changes cleanly to the fork and then to the original repo. It is ideal to always keep it updated. It is also a rule of thumb to keep this one branch (master) always fresh without local changes. Updating your fork Once u have forked, the fork will not automatically fetch future changes from the original repo. You need to do it as needed. For this I use following steps. 1. fetch changes from original repo -> "git fetch parent_repo" 2. move to your master branch (if you are in a different one) -> "git checkout master" 3. Rebase master branch with the parent_repo's relevant branch (mostly we work with the master branch in original repo) -> "git rebase parent_repo/master" (Since your master branch is clean, you should not get conflicts in this step.) 4. Now if you perform "git log master...parent_repo/master" you will see nothing. Since you have rebased, the original repo and your local master are equal. But if you perform a "git log origin/master..master" [2] you will see that there are commits not found in origin/master. These must be pushed now, to keep your fork up to date. -> "git push -u origin master" Updating your fork has to be done every time before you send a feature to your fork + before you create a new branch. This will reduce conflicts. Working on a feature 1. Update your fork as above. 2. Create a new branch with a proper name for the feature. -> "git branch feature_1" 3. Now you can check it out ( "git checkout feature_1" ) and start working on it. Commit with every sub feature completion to the branch so you can clearly refer to different stages of the feature in development. Submitting your feature as a pull request Now this could get a bit messy in the first few attempts. So here's what I have found to work consistently: 1. Update your fork (using the local master branch. As i said, original repo's changes should only and only go through your master branch) 2. View the difference between your branch and your master. ("git log master..feature_1" = changes in your branch | "git log feature_1..master" = changes in your master). Now you know how much of a backlog your branch is going to face. 3. checkout your branch ("git checkout feature_1") and rebase it with your master ("git rebase master") [3]. 4. Chances are very high that the rebase will pause, because git can't understand how to merge some changed files. It can handle most scenarios, but it trusts us in critical ones. It will say that rebase failed and that you need to do some manual merging before continuing. 4.1 Issue command "git status" and identify the conflicted files. These are the ones not yet added to the commit. 4.2 Go through each file with your favorite editor, find lines with "<< HEAD" block and resolve as necessary. [] 4.3 Once you have gone through each of the conflicted files and resolved them, add them to the commit using "git add <file path>". 4.4. Issue command "git rebase --continue". 4.5 git will prompt you to add a commit message for the rebase now. It will have a pre-included message explaining the conflicted files. Add your own comments on top and proceed. 5. Voila ! you have surpassed the conflict dilemma ! Now, if you issue a "git log master..feature_1" you will see that only your branch commits + rebase commits are shown. This is how it should be. Commits irrelevant to your feature should not be seen in this result. If it does, that means something went wrong in previous steps. 7. You are now ready to push the feature branch to your fork in github. Perform a "git push -u origin feature_1" and your changes will be pushed to the fork as a new branch called "feature_1". 8. After verifying that you only have feature-related commits + rebase commits in the pushed branch, you can make the pull request to the original repo.[4] What if something goes wrong ? The whole process looks so easy, but theres so many things that could go wrong. 1. After you have made the pull request, if it takes too long to merge with original repo, more commits on the original repo can conflict with your pull request. To handle this, I do the following : 1.1. Update the fork as explained above. 1.2. Create a new branch -> "git branch feature_1_rebased" 1.3. use git cherry-pick command [5] to move your feature commits from feature_1 branch to the feature_1_rebased branch. You may face conflicts in each cherry-pick operation. Do not fear, resolve them as above and move on. 1.4. Now the newly created branch feature_1_rebased has your feature commits and is up to date with the original repo too. You can make a new pull request with it and after double checking, close the old one. (get the new request merged soon, or you will have to do it again :D ) 2. What if my individual commits look a bit messy and I want to combine them into one cohesive commit ? 2.1. Issue "git log" and identify the hash of the last commit before the commits you need to merge. (lets call this hash "#last_sane_commit") 2.2. Issue "git reset --soft #last_sane_commit" and you will have all later commits in an uncommitted state. If you issue a "git status" now, you will note that all files changed during the commits after "#last_sane_commit" are now waiting to be committed. 2.3. Commit these changes using "git commit -m #collective commit name". Voila you have all your tiny commits in one sensible commit now :) 3. What if I lose a commit accidentally ? 3.1. Not to fear :) git maintains a set of "dangling commits" that are usually left out during a rebase or a git pull or a similarly dangerous command [6]. You can view such commits with "git fsck --lost-found" 3.2. Once you find your lost commit you can restore it using command "git merge <commit_hash>" Well thats it for now. I've had a really frustrating first few days but I'm grateful to everyone that helped me learn and get used to my own work pattern with git. The cool thing is, once you are confident with your own workflow, you won't fear git, but wield it as a pretty handy tool. The above steps /commands are basically what I use everyday now to resolve any issue with my clones. But git+github is a vast,recursive infrastructure and there's much more to learn.

All above statements are open for discussion. Just sharing my experience :) References : [1] : https://help.github.com/articles/fork-a-repo/ [2] : https://www.atlassian.com/git/tutorials/inspecting-a-repository/git-log (git log <since>..<until>) [3] : http://git-scm.com/book/en/v2/Git-Branching-Rebasing [4] : https://help.github.com/articles/creating-a-pull-request/ [5] : https://ariejan.net/2010/06/10/cherry-picking-specific-commits-from-another-branch/ [6] : http://gitready.com/advanced/2009/01/17/restoring-lost-commits.html

[7] : https://help.github.com/articles/resolving-a-merge-conflict-from-the-command-line/

#hands-on-technology

MQTT in a Nutshell

What is it ?

A lightweight, resource-optimized networking protocol targeting device communication

uses TCP/IP for connectivity

Minimalistic

Why is it cool ?

One-to-many message distribution

Agnostic to payload content

Fixed length header is just 2 bytes

Ideal for IOT device-communication

Other Extras

Notify disconnections with "Last will and Testament" features

Keep-alive message

"Will" sent to clients on sudden unexpected disconnect

Last known good messages are retained by broker and given to new subscribers

Supports durable subscribers

Supports hierarchical topics

QoS Parameters

At most once - Reliability only ensured through TCP/IP

At least once - messages will arrive but can duplicate

Exactly once - messages should arrive only once and once. (Most reliable)

Security

SSL/TLS for security (over TCP)

Username/Password in connect message

Encrypted payloads

References

http://www.slideshare.net/paolopat/mqtt-iot-protocols-comparison

http://www.infoq.com/articles/practical-mqtt-with-paho

http://public.dhe.ibm.com/software/dw/webservices/ws-mqtt/MQTT_V3.1_Protocol_Specific.pdf

http://2lemetry.com/2013/08/22/mqtt-in-a-nutshell/

http://techexplosives-pamod.blogspot.com/2014/05/mqtt-transport-architecture-wso2-mb-3x.html

#IOT #MQTT #PubSub Patterns #Hands-on-Technology

WSO2 ESB in a Nutshell

A shortnote of sorts i made to act as a quick ref to the tool :) might not have 100% coverage or accuracy, but a good ref to start off.

#Hands-on-Technology #Quick References #WSO2 #WSO2 ESB #wso2 esb

MongoDB in a Nutshell

*This article does not dive into finer details of mongodb or document databases. that might come in another post :)

Experience and Learning Curve :

Pretty easy to get something running. Flexible. No structure implied in databases other than "collections" and "documents" concepts.

A collection is similar to a table and a document can be defined as a data row in a collection.

Once the database is created from the shell, collections (similar to tables in relational DBs) can be created when entering data itself.

Syntax is intuitive and if you've used any kind of databases , simple queries will just work on mongodb ( i tried WHERE clauses with even arrays and it worked on the first attempt. maybe sheer luck :P)

Install on Ubuntu : http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/

Using mongodb with Java : http://mvnrepository.com/artifact/org.mongodb/mongo-java-driver The functions are almost the same as mongo shell commands. (e:g: db.customer.find() will give u all customers)

Java Objects (even arrays) can be passed straight into mongo queries. They are converted to JSON and used.

I used a set of Object Mapping(OM) classes to communicate with the db and noticed that if the OM class has an "_id" property, it is populated when passing that object as a new "document" to the db.

How to store data : http://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf

Basically , theres the two extremes in mongodb where you either can

1. associate objects using only their IDs 2. associate full objects within each other (this causes repetition, but might give u query performance)

Both patterns are used in balance in different scenarios. However, its important to not get addited towards foreign key like associations all the time if you've used relational schemas before. it should always be an objective decision.

Basic Security : http://learnmongo.com/posts/quick-tip-mongodb-users/ http://blog.mongodirector.com/10-tips-to-improve-your-mongodb-security/

Access MongoDB in Ubuntu with authentication:

First, a user must be created and given admin privileges on the mongo server :

use orderdb db.createUser({ user: "orderAdmin", pwd: "hasd", roles:[{ role: "userAdmin", db: "orderdb" }] })

Then, that user must be used to open the session :

mongo --port 27017 -u hasd -p hasd --authenticationDatabase admin

This solved my confusion with how to use mongodb with the user permissions. basically , if u just use "mongo", you might not be able to do somethings u expect u can do. with the default privileges.

#Hands-on-Technology #Document Databases #MongoDB

Creating an Axis2 Helloworld Service with Maven

What is Axis2 ?

A super-cool Java web service container and enabler.

Provides extensive support for configuring messaging patterns, deployment, and code generation features like WSDL2Java and Java2WSDL.

Takes care of Message parsing with a super-fast Object model (AXIOM)

More info : (Axis2 in a Nutshell post - coming soon)

How to create a HelloWorld Service :

Maven has and will be my favourite build manager in Java. And when it comes to Setting up a simple Axis2 Web Service, it doesn’t disappoint .

Pre-requisites :

Java

Maven

A Servelet Container - Tomcat7 (Optional)

Steps to generate from maven archetype :

1. Choose where you want to create the project and navigate to that directory. (If in windows -> CMD. If in Linux -> Terminal)

2. Call for the maven axis2 archetype with following command.

[code language="css"] mvn archetype:generate -DarchetypeCatalog=http://axis2m.sourceforge.net/repo/ [/code]

2. Fill out the information required by maven to create your project as follows. The values are totally up to you. (e:g: project name, package containing the axis2 services)

Confirm with "Y" and Voila!! You have an axis2 service that by default runs on an in-built jetty server. (port:8080)

Maven will at this point create a complete project for you with all required axis2 dependencies as well as extra libraries needed for hosting the service in something like Jetty. Refer the generated "pom.xml" file to realize the different components that make the service work.

All hail archetypes.

Deployment

So, okay. You have the service and its runnable in a jetty server. But to deploy it in an existing axis2 container, you need to generate the .aar (Axis2 archive) file from the project. The following maven command will create it at the "target" folder.

[code language="css"] mvn axis2-aar:aar [/code]

There you go. Axis2 Web Services : check :)

#Hands-on-Technology