How can AI and machine learning help data centers?

By: Neil Alhanti

What’s AI and machine learning doing in today’s data centers?

AI and machine learning have their genesis in science fiction and a revered, colorful history that stretches back to antiquity and forward to the present day. They are used across countless areas to make our daily lives better. For many enterprises, AI and machine learning are quickly populating their data centers as tools to help solve complex issues including prediction and optimization.

What’s deep learning?

Techopedia defines deep learning as “a collection of algorithms [that are] used to model high-level abstractions in data [through] model architectures [composed] of multiple nonlinear transformations.” Deep learning is used by machine learning and is part of a group of methods that are based on learning representations of data. It’s a specific approach for constructing and training neural networks: decision-making nodes with a vast potential.

Algorithms are considered to be “deep” if that input data is siphoned through a series of nonlinearities or nonlinear transformations prior to the input becoming output. Inversely, the majority of today’s machine learning algorithms can be deemed “shallow” since their input can only reach the first few levels of subroutine calling.

Deep learning eliminates the manual identification of features in data and relies on its training process to discover the input example’s useful patterns. This feature makes the training process for neural networks easier and faster and helps produce better results, which advances the field of AI.

How do businesses use deep learning, AI and machine learning to boost efficiency and cut cost?

Deep learning, AI and machine learning are tools that can analyze a vast amount of data, then locate patterns within that data and determine when these patterns might repeat in the future. AI and machine learning can model alternate configurations, boosting uptime and resiliency, locating opportunities for preventative maintenance, and targeting potential cybersecurity risks.

Data centers usually feature a wealth of resources and sensory instruments, supplying their operations with real-time and historical data on their overall performance and their environment. Optimizing resources and predicting and preventing downtime are important functions for AI and machine learning in data centers. By monitoring real-time performance data that regulates power management and system cooling, not only can AI and machine learning conserve and otherwise optimize their resources, but also predict where failure might occur in the data center. If they can locate where a failure is likely to occur, then preventative maintenance can be performed, and system downtime or a system outage can be prevented.

How Google used DeepMind to optimize their data centers’ cooling ability

In the article Smartening up: How AI and machine learning can help data centers, Peter Judge notes that beginning in 2014, Google’s data center engineer Jim Gao began using DeepMind technology as a recommendation engine. By 2016, a couple of neural networks had learned to predict future temperatures and give suggestions on how to proactively respond. This use of AI allowed Google to optimize the cooling of their Singapore facility, reducing the cost of the site’s cooling by 40% and saving 15% of the facility’s power utilization effectiveness (PUE).

During 2018, Google applied the same approach used in their Singapore facility data center and created a self-driving data center cooling system where AI oversaw the data center’s operational settings alongside human oversight. With safety in mind, the bar was set for the automatic system to only reduce the cost of the cooling bill by 30 percent. Ultimately, the data center saw a “40% drop in the amount of energy needed to cool the facility [and achieved] the lowest [PUE] score in its history of 1.06.”

To predict how actions would affect energy consumption and determine the best choice for the future, the AI system used thousands of sensors and recorded snapshots every five minutes of the data center’s cooling system. The AI system then fed this information to a cloud-based AI system and selected what it believed was the best choice of action. This action was then forwarded to the data center where it was verified by that center’s human operators and, if the action was deemed safe, was performed.

Eventually, the AI learned to predict environmental changes and to take advantage of them. For example, the AI used chilly winter conditions to create colder water that reduced the energy needed for cooling the data center.

How can AI and machine learning help businesses to understand their customers?

Businesses are using AI and machine learning to analyze the vast amounts of customer information found throughout their business’ data centers. If the AI or machine learning software is connected with a customer relationship management (CRM) system, then they can locate and retrieve customer data that is otherwise not used by the CRM system. Ultimately, businesses could use AI and machine learning to create strategies for customer lead generation, boosting customer success and reducing customer churn.

How can AI and machine learning help businesses to use prediction?

When there is a big environmental change, people can overreact or just make the wrong decisions. In this area of response, AI can perform better here than people can, maintaining stability with logic-based predictive approaches for choosing the best actions.

Learn more about data center modernization

Temperature fluctuations among servers in data centers can waste lots of resources trying to get a server that is too hot back under control. If a server is temporarily taken offline, then it drastically reduces the data center’s productivity. Data center infrastructure management (DCIM) firms are working to remedy this quandary by integrating predictive analytics with AI and machine learning.

Raw data from sensors is processed and fed into predictive modeling engines. AI and machine learning use pattern matching to regulate temperature and locate signs of refrigerant leaks. Some systems analyze and discern areas for improvement within an AI system.

How Nlyte uses IBM Watson® to move beyond prediction

Peter Judge notes that in 2018 the DCIM vendor Nlyte integrated its tools with IBM Watson. The goals of this collaboration were to improve AI and machine learning-based preventative maintenance and to move “beyond predictive things [and] into workloads and managing workloads.”

Judge goes on to state that AI can help reduce workload size and the risk of workload failure. He cites the North American marketing lead for IBM Watson IoT Amy Bennett, who says AI is “a member of the data center team [who] never takes a vacation.”

What’s direction will AI and machine take in the future?

The ultimate challenge with using AI and machine learning to improve the power and efficiency of cooling data centers’ and container orchestration and move the IT loads themselves and reduce the IT energy cost.

Judge references Suvojit Ghosh, the head of Computing Infrastructure Research Centre (CIRC), who is using AI to analyze the sounds of a data center and their correlation with power consumption. Ghosh is working to create an AI that can predict when something needs to be repaired or replaced. This can be coupled with human operators and engineers who could receive the data center’s condition reports from the AI, and then respond accordingly with repairs or problem solving.

In the article How machine learning in data centers optimizes operations, Julia Borgini argues that machine learning software is predicting issues and resolving them faster than ever before. Machine learning is an extension of the hybrid data center environment and a burgeoning arm of data center infrastructure. Borgini states “IDC predicts that 50% of IT assets in data centers will run autonomously using embedded AI functionality by 2022”.

AI and machine learning could eventually reach the point where the system autonomously performs digital actions and assigning robots to perform physical actions, such as the day-to-day physical maintenance and operations of the data center. This futuristic ability for AI to run a data center without the need for human interaction would create a model for self-sustaining data centers.

Ready to take the next step? Schedule a consultation with an IBM Business Continuity Services expert.

Related topic: Disaster recovery as a service (DRaaS)

IBM products related to business continuity plans

Understand how to plan for and react when business disruptions are happening.

Adapt and respond to risks with a business continuity plan (BCP)

How to defend against cyber attacks

Do you have your disaster recovery plan (DRP)?

Defend against ransomware attacks?

What is data breach and how to defend against one?

What is a recovery time objective (RTO) and how does it affect disaster recovery for your enterprise?

What is an RPO (recovery point objectives)?

About The Author

Neil Alhanti

Senior Writer, IBM Global Technology Services

Neil Alhanti is a content marketing writer, wordsmith and editor extraordinaire with IBM Global Technology Services. He spends his days crafting conversational communications across multiple mediums and otherwise “fighting the good fight and writing the good write.”