Cassandra Query Language
I’ll be going through a cloud instance of Cassandra DB.
Data Types
In CQL there are many data types, but they can be grouped into three main categories:
- built-in data types
- collection data types
- user-defined data types
The user can choose any of them according to the requirements of the application and data model.
Built-in
The built-in data types are basically pre-defined in Cassandra. The user can refer the variables to any of them.
- Besides regular data types like Ascii, Boolean, decimal, double, float, int, and text, which are fairly straightforward.
Blob
Although Cassandra mainly stores text-based information, there is also the possibility to store blobs, which stands for binary large objects.
- Blobs are typically used to store images, audio, or other multimedia objects.
- While blobs represent a collection of binary data stored as a single entity, in Cassandra it is recommended that their size does not exceed 1 megabyte.
- Thus, you could store a small image or string using a blob.
Bigint
The bigint data type can be used for a 64-bit signed long integer.
- This data type stores a higher range of integers when compared to the int data type.
Varchar
- The well-known varchar is also available in Cassandra as a data type.
- It represents UTF8 encoded strings.
Collection Data Type
Cassandra provides collection types as a way to group and store data together in a column.
For example, in a relational database,a grouping such as a user’s multiple email addresses is related witha many-to-one joined relationship between a user’s table and an email table.
Cassandra avoids joins between two tables by storing the user’s email addresses in a collection column in the users table.
- Each collection specifies the data type of the data held
- A collection is appropriate if the data for collection storage is limited
- If the data has unbounded growth potential, like messages sent or sensor events registered every second, do not use the collection data type.
- Instead, use a table with a compound primary key where data is stored in the clustering columns.
Within the collection data types category, there are three data types.
Lists
Lists, this Cassandra data type represents a collection of one or more elements in a table.
- List is to be used in cases where the order of the elements is to be maintained and a value is to be stored multiple times, such as entries in a log.
Maps
Maps, this Cassandra data type represents a collection of key value pairs. Map is a data type that is used to store a key value pair of elements, such as entities in a journal entered using a date, and then text.
Sets
This Cassandra data type represents a collection of one or more sorted elements in a table.
- Set is a data type that is used to store a group of elements.
- The elements of a set will be returned in a sorted order. An example would be a list of email addresses.
Example
Let’s go back to the users table and add a new column to the table called jobs, which is basically just a list of jobs.
- We would like to store the jobs in the order of their occurrences.
- Remember that the users table is a static table with its primary key consisting of the user id column.
- In Cassandra, we will store all the users jobs in a single column.
- Since we cannot perform joins, in this case, we will use the list type of the collection data types because we want to preserve the order of the jobs.
- Another reason is that a person can work at a specific company more than once,so uniqueness is not required.
- We can add a job in the list either at the beginning or end of the list or in a specific position.
- The entries in the list can be repetitive as they are not unique.