Cloudify and IBM InfoSphere BigInsights

Posted By: Yoram Weinreb on September 25, 2012


Following Nati’s blog post about big data in the cloud, this post is focused on Cloudify’s integration with IBM InfoSphere BigInsights, diving into the integration specifics and how to get your feet wet with running the Cloudify BigInsights recipe  hands-on.

The IBM  InfoSphere BigInsights product at its core uses the Hadoop framework with IBM improvements and additions focused on making it tailored for Enterprise customers by adding administrative, workflow, provisioning, and security features, along with best-in-class analytical capabilities from IBM Research.

Cloudify’s value for BigInsights-based applications:

As Nati explained in his post, applications typically consist of a set of services with inter-dependencies and relationships. BigInsights itself is a set of services, and a typical application will utilize some of its services, plus additional home-grown or commercial services. Cloudify provides the application owner the following benefits:

  1. Consistent Management

    1. Deployment automation
    2. Automation of post-deployment operations
    3. SLA-based monitoring and auto-scaling
  2. Cloud Enablement and Portability

Let’s dive into the actual integration and see how these line items map to the Cloudify BigInsights recipe:

Deployment automation:

When building a Cloudify recipe we have to decide between using the existing installer vs. manually installing each component on each node and tying it all together. We decided to utilize the provided installer to capitalize on the existing BigInsights tool and be as closely aligned with how IBM intended the tool to be used. The sequence of events to get to a working BigInsights service is as follows:

  1. Analyze the service and application recipe to decide on the initial cluster topology.
  2. Provision new servers or allocate existing servers (from a cloud or existing hardware in the enterprise) to satisfy the topology requirements.
  3. Prepare the cluster nodes for the BigInsights installer (fulfilling the install prerequisites and requirements such as consistent hostname naming, password SSH or passwords, software packages…)
  4. Build a silent install XML file based on the actual cluster nodes and the topology.
  5. Run the installer and verify everything is working when it is done.

This takes care of bringing up the BigInsights cluster and letting us hook it up to the rest of the services.

Automation of post-deployment operations:

Post deployment operations in Cloudify are handled by Cloudify’s built-in service management capabilities, such as enabling dynamic adjustment of the number of instances each service will have.  In addition to the generic built-in capabilities, which in the BigInsights case can be used, for example, to change the number of data nodes in the cluster, Cloudify recipes define “Custom Commands” that handle specific post-deployment operations.

In the BigInsights recipe we have custom commands that handle Hadoop operations such as adding and removing Hadoop services (Flume, HBase regions, Zookeeper…) to/from existing nodes, re-balancing the cluster, running DfsAdmin commands as well as DFS commands, all from the Cloudify console.

SLA-based monitoring and auto-scaling:

In addition to the option I mentioned earlier to manually set the number of nodes in the cluster during run-time, Cloudify monitors the application’s services and lets us define, in the recipe, SLA-driven policies that can dynamically change the cluster size and the balance between the different services based on the monitoring metrics.

The BigInsights recipe monitors the Hadoop service using a JMX MBeans that Hadoop exposes. The metrics we monitor can easily be changed by editing the list below from the master-service.groovy recipe:
git://gist.github.com/3507945.git
These metrics are then tied to visual widgets that will be shown in the Cloudify Web-UI interface and can be referenced in the SLA definition.

For this version of the recipe, we decided to skip automatic scaling rules and let the user control the scaling by custom commands, since in Hadoop, automatic scaling and specifically re-balancing the cluster based on it has to take into account future workloads that are planned to run on it, and since this can be a lengthy process, that actually decreases performance until it is complete.    

Cloud Enablement and Portability:

Cloudify handles the cloud enablement and portability using Cloud Drivers which abstract the cloud or bare-metal specific provisioning and management details from the recipe. There are built-in drivers for popular clouds such as Openstack, EC2, RackSpace and more, as well as a BYON driver to handle your bare-metal servers.

The Cloud driver let you define hardware templates that will be available for your recipe, as well as your cloud credentials.

For the BigInsight recipe, we define two templates that we will later reference from the recipe. Here is the template definition for the Openstack cloud driver:

Finally, let’s dive into a hands-on on-boarding of BigInsights in the cloud:

The recipe is located at BigInsights App folder & BigInsights Service folder.

Download the recipe and do the following :

  1. The recipe expects two server templates: MASTER & DATA. You will need to edit the cloud driver you will be using (under Cloudify home/tools/cli/plugins/esc/… and add the two templates (shown above) to the existing SMALL_LINUX template.

Deployment automation:

  1. Copy the BigInsights recipe to the recipes folder. Verify that you have a BigInsights folder under the services and the apps folders under the Cloudify home/recipes root folder.
  2. Open the Cloudify console and bootstrap your favorite cloud (which has the two templates defined in #1)
  3. Install the default BigInsights applications by running the following line (assuming the current directory is Cloudify home/bin)”install-application -timeout 45 ../recipes/apps/hadoop-biginsights”

Automation of post-deployment operations:

  1. To add additional data nodes manually, just increase the number of dataOnDemand service instances by running the following command:
    set-instances dataOnDemand X(where X is a number higher than the current number of instances and bound by the max instances count defined in the recipe – default is set to a max of 3)
  2. To rebalance the HDFS cluster after we added data nodes you can run the following command:
    invoke master rebalance
  3. To add an HBase region to one of the existing data nodes run the following custom command:
    invoke master addNode x.×.×.x hbase (where x.×.×.x is the IP of the data node instance)
  4. You can also trigger dfs and dfsAdmin commands from the Cloudify console, for example:
    invoke master dfs -ls

SLA-based monitoring and auto-scaling:

  1. Open the Cloudify Web-UI and select the BigInsights application. You will see the deployment progress and can start the IBM BigInsights management UI directly from the services section of the master service.
  2. From the same Cloudify Web-UI, make sure the master service in the BigInsights application is selected. Click on the Metrics tab in the middle of the page. You will see the Hadoop metrics shown in the GUI widgets as we defined in the master-service.groovy recipe.

Here is a short video that captures the bootsrapping and deployment of BigInsights using Cloudify:


blog comments powered by Disqus