The reality of a self-managing data center is getting closer with HPE’s announcement last week of what it claims to be the first artificial intelligence (AI) predictive engine for trouble in the data center.
HPE says next year it will offer an AI recommendation engine add-on that’s designed to predict and stop storage- and general-infrastructure trouble before it starts. It’s one of a number of autonomous data center components that we should expect to see soon from players. Other AI and machine learning systems geared towards data centers will be available from companies such as Litbit (which I wrote about in the summer) and Oracle, among others.
“Infrastructure solutions should utilize data science and machine learning,” HPE says in a white paper in which it attempts to explain why AI and machine learning are better at preventing downtime than humans.
Currently, IT workers have to constantly carry out “intricate forensic work to unravel the maze of issues that impact data delivery to applications.” That creates a bottleneck, HPE says.
However, the company says that through a form of machine learning, iffy-performing components can be identified automatically. That can be done without any traditional human guess work. It can happen early on, too, before users perceive any kind of problem. Essentially, it’s accomplished by tallying massive amounts of collected data throughout the IT infrastructure stack and then analyzing it.
How HPE's self-managing solution works
The idea is to “detect and rapidly identify the root cause” and then to “resolve the problem through data collection.” Signatures are then built to identify other users, elements or customers that might be affected. Rules are then developed to instigate a solution, which can be automated.
Further, in the event that a user does indeed fail, the AI-machine learning solution, with its new signatures and rules, can quickly interject through the entire system and stop others from inheriting the same issue. Future software updates are optimized based on what’s learned through that AI.
HPE got to where it is with its AI offering partly through its purchase earlier this year of flash storage and predictive analysis company Nimble Storage. It’s been collecting data science and telemetry for a decade, HPE says. Nimble has, in fact, analyzed over 12,000 cases of app-gap. That’s the moniker HPE uses for the productivity-reducing bottleneck between application and data — issues, in other words.
To accomplish downtime reduction, one needs full analysis of the entire IT stack, HPE says.
Through that, downtime can be predicted, the company claims. Slowing infrastructure causes will be identified and then prevented with AI, as opposed to merely being human-monitored and flagged as potential trouble.
And “prescriptive resolution” should be employed if the engine can’t prevent a failure. That means that the engine should be able to fix the problem if it occurs. It should do that by knowing the root cause predictively and analytically, rather than through traditional, manual troubleshooting, and utilizing tools such as web-based forum lookups and so on.
Self-managing systems reduce staffing levels
Finally, HPE is really serious about autonomy. Using this technology, staffing levels conceivably drop. It says eliminating front-line tech support who are often simply collecting information and documenting issues brings the autonomous data center closer to becoming a reality. (The AI engine knows there’s a problem, so you don’t need anyone fielding calls.)
“For the small percentage of problems that require the need to talk to an engineer, a customer can immediately reach a level three engineer,” HPE says. Levels one and two are eliminated.