2018/2/16

What is MySQL NDB Cluster?

By Jesper Krogh

MySQL is most famous for its Server product using the InnoDB storage engine for the back-end storage. However, MySQL uses an architecture that allows for pluggable storage engines, so there are other options. The NDB storage engine is the most notable.

The MySQL NDB engine dates back to the 1990s when it was first developed as the Network DataBase (NDB) by Mikael Ronström (who is still a MySQL Cluster developer) while he was with Ericsson. Focus was originally on the database requirements of the telecom industry in performing tasks such as the routing of phone calls. Ericsson later split NDB out to it’s a virtual company, Alzato, which MySQL acquired in 2003.

It is after that acquisition in 2003 that the pluggable storage engine API comes into play. It allowed MySQL to create an SQL front end to what otherwise could only be accessed using the C++ NDB API. Thus, MySQL NDB Cluster was born! (The actual product name is MySQL Cluster, but MySQL NDB Cluster is used here for clarity to avoid confusion with MySQL InnoDB Cluster and other clustered MySQL solutions).

So why bother about NDB Cluster when MySQL also has the excellent InnoDB storage engine? The answer lies in the need for high availability. The telecom industry demands high availability – after all it is not good for their business when phone calls cannot be routed. The NDB database was designed from day one to be highly available. In this context, the sense of “highly available” is extended to include stable query execution times. The answer was to make NDB a distributed (thus the network part of the name), in-memory database. The distribution of the data together with duplicating the data means it is possible to have a node be offline while allowing for data to still be read from the other nodes. With availability taken care of, querying data that is stored in-memory then provides a more reliable and stable response times than reading data from disk.

Before diving further into the MySQL NDB Cluster architecture, it is worth considering the following graphical representation of a cluster:

The core of the cluster is the data nodes. They are where data is stored and the main work of queries is done. There can be up to 48 data nodes, and the most common and recommended practice is to have two replicas (or copies) of the data. Data nodes that share the same data are said to belong to the same node group.

There are two main ways to execute queries in MySQL NDB Cluster:

Use the MySQL command-line client, a standard MySQL connector, or API to connect to an SQL node that in turn submits the requests for data to the data nodes.
Use one of the NoSQL APIs to submit queries directly from the application to the data nodes.

Using SQL nodes is the most common way of executing queries. An SQL node is the same as an instance of MySQL Server with the NDB storage engine compiled in. The NDB storage engine provides the bridge from MySQL Server to the data nodes. This bridging means that executing queries through SQL nodes make it transparent to the application whether you use the InnoDB storage engine or the NDB storage engine (with some exceptions as the storage engine dependent limitations are not quite the same). All SQL nodes and other API nodes that are connected to the cluster have the same view of the data and can be used concurrently; this is because they relay the data requests to the data nodes.

Finally, there are the management nodes in the cluster. The roles of the management nodes range from managing the configuration of the cluster and handling new connections (those are the “ad hoc” network links in the figure) to creating backups and handling arbitration in case of a node failure.

Following are some key features and characteristics to be aware of now that you have sense of what NDB Cluster can deliver in terms of availability:

While the data is primarily stored in-memory, the data is persisted through checkpoints. Non-indexed data can be stored in on-disk tablespaces.
NDB Cluster is transactional, supporting the READ COMMITTED transaction isolation level.
NDB Cluster is ACID compliant, though with the assumption that two data nodes in the same node group do not crash within two seconds of each other.
The data nodes support retrieving data in parallel even for a single query.
Sharding and partitioning of the data is done automatically and is transparent to the application.
Node failures are handled automatically.

About the Author

Jesper Wisborg Krogh is a member of the Oracle MySQL Support team and has spoken on several occasions at Oracle OpenWorld. He has a background with a Ph.D. in computational chemistry before changing to work with MySQL and other software development in 2006. Jesper lives in Sydney, Australia and enjoys spending time outdoors walking, traveling, and reading. His areas of expertise include MySQL Cluster, MySQL Enterprise Backup, and the Performance and sys schemas. He is an active author in the Oracle Knowledge Base, and regularly blogs on MySQL topics.

Has this short introduction whet your appetite to learn about MySQL NDB Cluster? I certainly hope so! You can read more in the book Pro MySQL NDB Cluster by Mikiya Okuno and yours truly.