Home Technology Upgrading the Aerospike cluster Part-2

Upgrading the Aerospike cluster Part-2

by riflerrick

Check out part-1 of this series here.

Feature Validations with Docker


For our feature validations, our goal was to test these scenarios with docker.

For validating features that were new, we built a docker image for aerospike 4.5.3.2 enterprise.

FROM aerospike/aerospike-server-enterprise:4.5.3.2

RUN apt-get update && apt-get install -y --no-install-recommends \
	procps \
    net-tools \
    telnet

RUN apt install -y ntp
COPY aerospike.ntp.conf /etc/ntp.conf

EXPOSE 3000 3001 3002 3003

CMD [ "/bin/bash" ]

For creating a cluster we created a docker network and attached 3 different containers to that network. This way we could control what IPs were allocated by docker to their corresponding network interfaces and therefore use those very IPs as mesh seed IPs in the aerospike configurations of the 3 containers(aka nodes of the cluster).

We also exposed the aerospike server ports to 3 different ports of our host machine so that we could connect to anyone of them during the tests we ran.

docker network create aerospike-c-new-network

docker container run -it --network aerospike-c-new-network -p 3000:3000 --name aerospike-new-1 aerospike:4.5.3.2

docker container run -it --network aerospike-c-new-network -p 4000:3000 --name aerospike-new-2 aerospike:4.5.3.2

docker container run -it --network aerospike-c-new-network -p 5000:3000 --name aerospike-new-3 aerospike:4.5.3.2

Inspecting the docker container, we will come to know the CIDR of the network and the IPs allocated to the corresponding containers, these IPs could then be used in the aerospike configs’ mesh addresses and voila, we have an aerospike cluster.

For testing Strong Consistency, we had to simulate a network partition between the nodes. This could be easily done by disconnecting the corresponding containers from the docker network. One thing to keep in mind though is that the host still needs to be able to connect to all the containers even when one of them is detached from the docker network. For this the containers would also have to be connected to the bridge network. This ensures that the containers are able still able to talk to the host.

docker network connect bridge  aerospike-new-1

docker network connect bridge  aerospike-new-2

docker network connect bridge  aerospike-new-3

The Migration

Migration strategies typically must be well thought through and all options should ideally be explored so that the final solution is efficient and cost effective. Given that aerospike is a distributed data store, on the surface a migration should be smooth. A simple rolling upgrade should do the job just fine. However, as with any major version upgrade, a backdoor is always kept so that a rollback is possible in case things don’t go according to plan.

In our case, we were upgrading from aerospike 3.12 to aerospike 4.5.3.2. There was an evolution of architectural changes done from 3.12 to 4.5 and an upgrade according to aerospike’s documentation required a jump version upgrade first from 3.12 to 3.13. This meant that if we would go for a rolling upgrade, we would have to do a jump version upgrade first and finally a final 4.5 version upgrade. Given this upgrade procedure, a rollback would be considerably hard. We therefore decided to go for a parallel cluster setup and incrementally shifting traffic to the new cluster.

Thankfully, aerospike comes out of the box with a solution to migrate data to a parallel cluster. Aerospike’s XDR or cross datacenter replication provides the perfect means to migrate incoming data to a new cluster. XDR as described in the official documentation can be setup in different ways following different topologies. The most basic way of setting up XDR would be to setup what would be an active-passive XDR meaning data simply flows from one cluster to another. At this point, it is important to point out that XDR only migrates incoming data, in other words, XDR will not migrate data that is already present in the cluster. For that we will need to do a backup and restore once XDR has already been setup in the cluster. Our clients are configured to use a private domain(a domain present only within our VPC) to talk to the aerospike cluster. For migrating our application stack onto the new aerospike cluster, all we really had to do was point the domains to the IPs of the new cluster and do a rolling restart of the application in the entire fleet of our application instances, this restart would basically remove the old connections and the application would start pointing to the new cluster. The catch though is that a rolling restart is not an instantaneous process, it takes a tangible amount of time which would mean that during a restart of the processes across the fleet of our machines, we would be in a state where certain machines are pointing to the old cluster and certain to the new, which in turn means that both clusters would have to be in-sync all the time to avoid consistency issues. Needless to say, an active-passive XDR replicating data from the old to the new would not cut it during the cutover as we would run into consistency issues with our data. Note that we actually do use aerospike as a persistent data source for some source of truth data too so we could not afford to go for inconsistency in our data during the cutover.

XDR also provides an active-active topology. At first, it seemed like we had found our solution. However after going through aerospike documentations we realized that in the case of an active-active XDR, if the same key is updated at approximately the same time, there is a possibility of data corruption. For instance, lets assume, the key X got updated at a time t in the old cluster and the same key X got updated at a time t+d in the new cluster. Unlike conflict resolution strategies in case of migrations, XDR relies on the timestamp pretty much to resolve which value for a given key should prevail. If we consider the latency between the 2 clusters to be inconsistent and flaky it might very well happen that the value which got updated in the old cluster at t prevail leading to a corruption in data. We therefore could not take the active-active XDR approach on a live system to switch the clusters.

After brainstorming on these case scenarios, we realized that the only option we had left with us was to enable active-active XDR on the old and new clusters however take a downtime to switch the clusters. This way, after the new cluster was live, we could go back to the old cluster if things did not go according to plan.

Terraform

The parallel cluster preparation provided us with the opportunity to create the new cluster with terraform. One of the perils of creating clusters(especially data layer clusters) with terraform is that in case we forget to have all the attributes tied with the cluster before hand, it is very difficult to tie with them later on once the cluster is already live, any change which would require replacement of the machines in case they were tainted , could not be done as terraform by default does not provide a strategy for rolling replacement of instances.

Here is a blueprint of the terraform template we had prepared

provider "aws" {
  ...
}

resource "aws_placement_group" "aerospike_spg" {
  count    = "${var.as_pg_count}"
  ...
}

resource "aws_instance" "ec2_instance_aerospike" {
  count                  = "${var.as_cluster_nodes}"
  ...
}

resource "null_resource" "aerospike_cluster_provisioner" {
  count = "${var.as_cluster_nodes}"

  provisioner "remote-exec" {
    when = "destroy"
    ...
    on_failure = "fail"
  }
  provisioner "remote-exec" {
    ...
    on_failure = "fail"
  }
  ...
}

resource "aws_ebs_volume" "as_volume_attachment_sdg" {
  count             = "${length(var.ebs_attachment_sdg)}"
  ...
}

resource "aws_ebs_volume" "as_volume_attachment_sdh" {
  count             = "${length(var.ebs_attachment_sdh)}"
  ..
}

// attach ebs volumes
resource "aws_volume_attachment" "as_instance_vol_attachment_sdg" {
  count       = "${length(var.ebs_attachment_sdg)}"
  ...
}

resource "aws_volume_attachment" "as_instance_vol_attachment_sdh" {
  count       = "${length(aws_instance.ec2_instance_aerospike)}"
  ...
}

One important thing in case of any data layer cluster preparation is that it should be rather easy to add and remove instances from the cluster. With aerospike, the addition of machine(in AP mode) at the bare minimum would involve 2 steps, spinning the instance and adding the ip address of one of the machines of the cluster as mesh address in the new machine. However for removing an instance, if we simply stop the aerospike daemon in the instance to be removed, in-flight requests to the corresponding instance will fail. The application must therefore have a retry mechanism that can handle this scenario. With quiescing coming in however, its possible to quiesce the node to be removed first and then go for removing it using terraform. Quiescing involes a fair bit of monitoring of the cluster state during the exercise. Unfortunately aerospike does not provide a programmatic access to cluster states and other cluster functionalities and therefore scripted quiescing is not an easy problem to solve, we would have to dive into the abyss of shell output parsing and the like to make it happen.

On destroy provisioners in our case simply stop the aerospike daemon. This is done with the assumption that the corresponding node has already been quiesced. Once the daemon is stopped, the node is terminated along with all its dependencies.


Upgrading our aerospike cluster, gave us insight into aerospike like never before and an opportunity for us to share this insight with the community.


You may also like

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: