Diving into the NoSQL world with Riak KV and Riak TS

Hello and welcome to my first post! Over the last six months I have been creating end-user trainings for two NoSQL databases, Riak KV (Key-Value) and Riak TS (Time Series). It has been an interesting journey and I am excited to formalize a few thoughts. To start, I would like to  note the two biggest differences between relational databases and NoSQL databases. I will then provide some context about my familiarity with databases. Finally, I will outline my top five learning experiences with Riak KV and Riak TS.

First, I would like note the term ‘NoSQL’ is not a concrete term. It can generally be used to describe non-relational databases that do not use SQL(Structured Query Language). On the other hand, relational databases use relational logic for data storage and SQL as the query language.

Under the NoSQL umbrella there are actually many different types of databases and data stores. Here are a few examples by product name and store type:

  • Riak KV: key-value data store.
  • RethinkDB: document store.
  • DynamoDB: Amazon’s key value data store.
  • Redis: document store.
  • ArangoDB: graph database

 

My Experience with Relational Databases

At previous companies, we used MySQL and PostgreSQL as a backend in production. My experiences with both databases helped me become comfortable using SQL

At company “X”,  we installed MySQL on Windows XP laptops. The laptops were used to store motion tracking sensor data in Sr citizens homes, IoT style. It was a fun research project, to say the least. Anyways, back to MySQL, most of my interactions with the database were querying tables to ensure the data was being collected and truncating tables if there was any data corruption.

At company “Y’’, we used PostgreSQL to store various bits of server performance data. After I became comfortable with the PSQL command-line I was ready to query the database. Well not quite, company “Y” encouraged employees to use their application APIs to query and update data. This meant becoming very comfortable with GET, POST, PUT, and DELETE curl requests.

How Is Riak TS Different from Riak KV?

Before diving into the general details, I want to spend a few minutes talking about Basho’s latest product Riak TS. Riak TS stands for Time Series and was built on top of KV. It was built specifically to store time series data for IoT devices. The major differences are; TS implements tables and SQL (kinda)! Since I have a strong SQL background these differences were exciting!

The Details

Ok, let’s dive into my top five lessons learned! Like other NoSQL databases Riak is highly available, resilient, masterless, and scales easily. These are very appealing qualities, however my experiences with Riak were on a more granular and implementation level. Here are my thoughts about Riak at that level:

  1. The installation process was quick and easy. Not once did I have a failed install! I quite enjoy painless installs, so big thumbs up for Riak. Technical specifics of installing:
    1. I used curl to download x version from docs.basho.com. Then I used the OS’s package manager to install, yum install riak.package.version. At a minimum I would change the nodename in riak.conf to riak@<serverIP>  and the http.listener and pb.listener settings to <serverIP>:<port>. I would then start the node with, riak start.
  2. During my six months of testing I configured many clusters; this process was also easy. However, if you do not plan ahead and update the settings mentioned in point 1 the process could get a little sticky. On one occasion I accidentally started the node before updating the nodename. When I attempted to update it I encountered a side effect. Essentially the node vomited and refused to start again. The issue was resolved by clearing out the data directory.
  3. For a couple of trainings I needed to configure the Riak clients for Java and Python. Both were easy enough to configure. Generally all the Riak clients seemed to be robust and are highly customizable. They all implement a flavor of round-robin load balancing.
  4. Remember when I said SQL kinda, well there are some language caveats. Riak TS implements its own version of SQL. It has limited table operations and restricted SELECT commands. For example, I like to use select * from <table>; after adding a few rows of test data. In TS this operation is not permitted because there *must* be a WHERE clause that include the Riak TS primary key (more thoughts in point 5).  
  5. By specifying the Riak TS primary key you can decide how (sort order) and where (location) you want your data stored within the cluster! That is pretty cool, right? The primary key can greatly improve performance. In the long run the benefits of having a primary key far out weigh my initial annoyance while querying.

Concluding Thoughts

Overall, my experiences with Riak were great and it was a fun adventure! Before starting my training I was only vaguely aware of the different types of data stores. I am glad to finally feel comfortable with non-relational databases. On a final note, I do not recommend any particular database. When selecting a data store you have to decide what is right for your project.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s