Spring cleaning our cloud saved us 80%… per month

By: Ilyas Iyoob

Bigstock

I’ve had an old 350z sitting in my garage collecting dust for a few years now. Every time my wife wants me to get rid of it, I remind her how much safer it is than a motorcycle. So yeah, it may be spring-cleaning time, but that 350z isn’t going anywhere anytime soon. On the other hand, I use this time to re-evaluate all my monthly bills because I’m more interested in recurring savings.

However, it isn’t fun digging through each and every bill to find out what I’m paying for. It’s even more annoying to have to call customer service to make changes, especially when both my wife and I are extreme introverts. But we still do it, and you know why? Because of the recurring savings that accumulate over time.

So, why don’t we apply these cost saving techniques at our work places? Even though it’s been over two years since our startup Gravitant was acquired by IBM (best experience ever!), the cost-saving and efficiency-loving attitude still runs in our blood. That’s why I told my team at IBM that we’re going to spring-clean our cloud environment and optimize our spend for recurring savings.

Yes, we attempted to use off-the-shelf tools in the beginning, but all of them just ended up creating another bill that needs to be paid. Instead, we got our own clients to help us create a minimalistic solution that would just give us the Pareto actions and skip the rest of the nonsense. Here are the steps we took:

Step 1: Quarantine high-cost, low-use assets

By analyzing our bills and usage APIs, we identified a small number of expensive items that were hardly used. This included server reservations that were unallocated, heavily loaded virtual machines underutilized, as well as software subscriptions purchased from the marketplace and forgotten. We immediately quarantined most of these assets and offered a 48-hour warning period to the owners prior to termination.

Step 2: Shut off abandoned assets and mark for termination

There was a small number of assets that were previously used for important projects and then simply not shut off in time. This was especially the case among our dev and test accounts. We immediately shut these assets off and marked for termination within 30 days if the owners did not reclaim them.

Step 3: Snapshot aged assets and archive them

Next, we turned our attention to storage assets. While many of these didn’t cost very much, we realized keeping them up on the cloud for too long would only invite more trouble. As a result, all storage assets older than a year were snapshot and switched to a lower availability storage class, while those over two years old were archived.

Step 4: Delete any dependent assets

Whenever we deleted any assets, all the other related ones would get flagged as being abandoned. In this way, we cleaned up a number of elastic IPs, storage volumes, load balancers, nat gateways and hosted zones.

Step 5: Review and return to step 1

Just when we thought we were done, we realized some of the virtual machines that we shut off were back on. Upon further investigation, we saw there were some auto-scaling groups in a different geographic region that were triggering these virtual machines back on. So, we had to return to Step 2 to delete the autoscaling groups and stop the chain of events. It took several iterations of pruning the cloud account before we were able to reach steady state.

The result

Within a couple months, we were able to decrease our cloud usage cost by over 80 percent, while cutting out 35 percent of the cloud assets as a result of 88 recommended changes. I guess we’re close enough to claim Pareto! There are still a little over 75 recommendations to follow through on, but we are already celebrating our success.

We wouldn’t be so quick to celebrate if we hadn’t performed each step in a data-driven manner that trained a classification machine learning model. As a result, we are now in autopilot mode, and the system automatically highlights recommendations that are executed on a daily basis. Isn’t this true Artificially Intelligent Ops?

Bring in expert help

If this sounds daunting, rest assured you’re not the only IT professional to feel that way. IBM has served many customers who don’t want to tackle this alone. Our Global Technology Services portfolio includes targeted hybrid cloud offerings to help you develop a capable and flexible cloud platform that will serve your organization’s needs.

Our cloud brokerage managed service can help to plan, buy and coordinate cloud-based services from multiple suppliers.

To learn more about hybrid cloud services, or for any questions you may have on related topics, schedule a one-on-one consultation with an IBM expert.

Topics: , ,

About The Author

Ilyas Iyoob

Distinguished Engineer, IBM

Dr. Ilyas Iyoob is a Distinguished Engineer with IBM Global Technology Services. He has pioneered the application of Operations Research to Cloud Computing. He is in the forefront of developing Cloud Analytics, which includes the algorithms for IT Supply Chain Optimization, Virtual Data Center Capacity Planning, and Automatic Scaling and Provisioning of Virtual Machines. He... Read more

Post a comment