Carrier Grade Cloud/PaaS

Posted By: Nati Shalom on March 7, 2013

Cloud is becoming the next-generation backbone for running the web

As the technology for building and running cloud infrastructure matures, it is starting to spread out into more industries and revolutionizing how even the most conservative organizations are running their entire operations.

One of the areas undergoing a transformation is the carrier backbone services. For those who are not familiar, carrier backbone services includes services like cell and network services (DHCP, DNS,…), content serving (SMS, MMS, …), activation services, CRM, call centers, etc. Moving these critical carrier services from the existing environment tends to be labor intensive and proprietary. Selecting to move applications into a more open and virtualised environment such as cloud could yield a significant cost saving. An open cloud environmentalso enables Carriers organisations to reduce their time to market for delivering new services.

At time where the Telecom/Carrier market is increasingly competitive, moving to a cloud-based carrier backbone can be more than a cost saving initiative. It can be a differentiator from the competition and is critical for survivability and success of the business.

That said, running carrier grade services requires special care to meet the required SLAs in terms of  latency, deterministic behaviour, performance, location awareness, etc. These challenges are unique enough not to fit into your mom-and-pop cloud.

The purpose of the Carrier Grade Cloud and Carrier Grade PaaS is to address these gaps and challenges.

In this post I’ll try to provide a more detailed overview on what Carrier Grade Cloud and PaaS actually means. I will use examples based on GigaSpaces’ joint work with Alcatel. Alcatel-Lucent recently launched a new product in this space named CloudBand which is using Cloudify for its Carrier PaaS layer.

What does Carrier Grade mean?
Learning from the Weather Chanel experience during the Sandy super-storm

The Weather Channel’s experience during Sandy is an excellent example of the need for carrier grade services. Below is a list of some of the key statistics during Hurricane Sandy:

  • 1000% – The Weather Company’s traffic increase during Hurricane Sandy
  • 110 GB- The amount of data, served every second during Sandy
  • 170,000 – Peak number of simultaneous streams of video served during Sandy
  • 1 – The amount of data centers that went bust during the storm
To address this demand during the storm, the Weather Chanel was running from 13 Data Centres managed by Verizon across North America, all with load balancing between them. During the storm Verizon increased their bandwidth capacity to meet the peak demand.
This sort of increased traffic behavior wasn’t unique to the weather channel, as noted below:

image is missing

So what can be learned from this process? What makes a service Carrier Grade?

Learning from the Weather Channel experience we can define a Carrier Grade service as a service with the following attributes:
  • Critical to the business function
  • Designed for massive scale
  • Designed to deal with major usage spikes
  • Location sensitive
  • Designed to provide deterministic response during extreme condition
This is obviously a fairly simplistic definition, but for the sake of this discussion I think it will suffice.

What Makes a Cloud/PaaS Carrier Grade?

There are various attributes that makes a Cloud/PaaS carrier grade, as I noted earlier. The two most important attributes IMO are the network and multi-site deployments. Let me explain why:

The Network

One of the main elements that is extremely important in in a Carrier Grade environment is the ability to assert control over the network.
That include the control over:
  • Isolation
  • Bandwidth
  • Latency

Cross-Cloud/Data Center Deployments:

Another critical element of a successful Carrier environment is the multi-site deployment. As seen with the Weather Channel’s use of 13 sites, multi-site deployment is important to address continues availability and scaling. Optimizing the latency by surveying the content closer to the location of the end user also helps to deal with challenges of data delivery.

So how are things done today?

The current Carrier backbone runs on physical appliances which basically maps to lots of irons. In this environment scaling capacity means buying more appliances. While this model works, it has two main drawbacks:

  1. cost (infrastructure/operation)
  2. lack of agility (i.e. it takes months and sometimes years to launch new service in this environment).

Alcatel CloudBand - Carrier Grade IaaS/PaaS

Alcatel CloudBand is a new platform  that let Telecom apps easily leverage the carrier cloud services.

It is comprised of a few main elements.

Multi node/site IaaS - a multi-site/Cloud infrastructure. The CloudBand infrastructure is essentially a policy based management on large numbers of cloud nodes. Each cloud node can run either an OpenStack or CloudStack-based infrastructure. These nodes can live in many disparate data centers. Alcatel CloudBand glues all of the disparate nodes together into a single big cloud that is accessible through an OpenStack API.

image is missing

CPaaS — Stands for Carrier Grade PaaS, which is essentially the framework enabling the on-boarding of the carrier services into the CloudBand infrastructure via a simple click and run user interface. Cloudify is integrated into this this as an integral part of the CloudBand offering.

image is missing

CloudBand’s Unique Approach: Putting Network and Application Together

One of the unique aspects of the CloudBand architecture is its holistic approach to Network and Application. Standard cloud infrastructures tend to look at the two pieces as separate black boxes that run one on top of the other.

What does this new approach to Network and Applications really means?

Two example scenarios that I often use to describe the value of putting network and application together is in the areas of Disaster Recovery and Cloud Bursting. In today’s cloud, DR involves lots of wiring in which i need to explicitly point a segment of the application into a particular cloud zone and the other to another zone. Beyond the complexity of setting these zones up, it also means that there is a good degree of manual intervention required to handle a recovery or a scaling process in this environment.

Taking an automated SLA-driven approach to IaaS

Instead of identifying explicitly the zones in our cloud, with automated SLA we can simply ask the cloud to figure out the right zone for the job based on our application SLA. For example, a user could simply say something like “deploy RingTone service” where continuous availability=true redundancy=3 and distance between sites=100km". Most of that information is known to the CloudBand management at the time of deployment and it can therefore allocate machine instances not solely on image ID and zone ID, but also based on those SLA requirements.

Integrating the PaaS with the network

Many of the current PaaS solutions were designed to work with a simple cloud infrastructure.

If we design our PaaS solutions to work on top of a more intelligent infrastructure, like CloudBand, that can accept SLA-driven calls to coordinate infrastructure management, a revolution will happen. We can start looking at  offloading some of the responsibility for allocating the right machine instance to a particular application tier to the infrastructure. The infrastructure could be made aware that we’re deploying a data service and would therefore ensure that the nodes of that database don’t reside on the same physical machine or even data center. Another area where the responsibility could be delegated to the infrastructure is the network isolation. Instead of dealing with security groups, the system can attach a particular network for a given application or a tier within that application and the infrastructure will make sure that any machine that is allocated for this service would be attached to this network.

Final Words

For years there has been discussion on the missing piece in the cloud puzzle – the network. Today were at a point where this gap is starting to be filled up by projects like Quantum in OpenStack. In addition to OpenStack the Telecom industry is also launching a new initiative titled NFV which stand for Network Function VirtualizationNFV was born in October of 2012 when AT&T, BT, China Mobile, Deutsche Telekom and many other Telecom companies introduced the NFV Call to Action document. It basically aimes to combine new network API with Virtulization and thus provide a a standard model for a Virtulized Carrier Cloud.

image is missing

While it seems that the industry is moving in the right direction toward the Virtualization of the backbone systems, most of the effort seem to be focused on standardisation at the lower level of stack. Very little has been done to draw the real end game i.e. how would an end to end Carrier backbone would look like given that new virtualized infrastructure in place. More importantly we haven’t yet began to think of what would be the implication of that infrastructure change on the application and services ontop of it.

This is what excites me in the CloudBand project. CloudBand doesn’t just end up with yet another fancy infrastructure piece that we don’t necessarily know how and what do with it. It actually takes the holistic approach and maps those fancy features into a real end to end solution which at the most basic level maps to the fact that setting up data and network clusters, disaster recovery or cloud bursting scenarios can now be fully automated in a much simpler fashion than in most of the current cloud infrastructure environment.

At a more strategic level that means that Carrier can now rely on the the cloud as an infrastructure that could manage their backbone services and thus leverage the cloud economics to meet their cost and business challenges.