• United States

Want to use AI and machine learning? You need the right infrastructure

Dec 21, 20187 mins
Artificial IntelligenceData CenterMachine Learning

IT is being tasked with supporting artificial intelligence and machine learning initiatives, and that requires thinking broadly about infrastructure needs today and tomorrow.

ai artificial intelligence circuit board circuitry mother board nodes computer chips
Credit: Getty Images

Artificial intelligence (AI) and machine learning (ML) are emerging fields that will transform businesses faster than ever before. In the digital era, success will be based on using analytics to discover key insights locked in the massive volume of data being generated today.

In the past, these insights were discovered using manually intensive analytic methods.  Today, that doesn’t work, as data volumes continue to grow as does the complexity of data. AI and ML are the latest tools for data scientists, enabling them to refine the data into value faster.

Data explosion necessitates the need for AI and ML

Historically, businesses operated with a small set of data generated from large systems of record. Today’s environment is completely different where there are orders of magnitude more devices and systems that generate their own data that can be used in the analysis. The challenge for businesses is that there is far too much data to be analyzed manually. The only way to compete in an increasingly digital world is to use AL and ML.

AI and ML use cases vary by vertical

AI and ML apply across all verticals, although there is no universal “killer application.” Instead there are a number of “deadly” use cases that apply to various industries. Common use cases include:

  • Healthcare – Anomaly detection to diagnose MRIs scans faster
  • Automotive – Classification is used to identify objects in the roadway
  • Retail – Predictions can accurately forecast future sales
  • Contact center – Translation enables agents to converse with people in different languages

The right infrastructure, quality data needed

Regardless of use case, AI/ML success depends on making the right infrastructure choice, which requires understanding the role of data. AI and ML success is largely based on the quality of data fed into the systems. There’s an axiom in the AI industry stating that “bad data leads to bad inferences”— meaning businesses should pay particular attention to how they manage their data. One could extend the axiom to “good data leads to good inferences,” highlighting the need for the right type of infrastructure to ensure the data is “good.”

Data plays a key role in every use case of AI, although the type of data used can vary. For example, innovation can be fueled by having machine learning find insights in the large data lakes being generated by businesses. In fact, it’s possible for businesses to cultivate new thinking inside their organization based on data sciences. The key is to understand the role data plays at every step in the AI/ML workflow. 

AI/ML workflows have the following components:

  • Data collection: Data aggregation, data preparation, data transformation and storage
  • Data science/engineering: Data analysis, data processing, security and governance
  • Training: Model development, validation and data classification
  • Deployment: Execution inferencing

One of the most significant challenges with data is building a data pipeline in real time. Data scientists who conduct exploratory and discovery work with new data sources need to collect, prepare, model and infer. Therefore, IT requires change during each phase and as more data is gathered from more sources. 

It’s also important to note that the workflow is an iterative cycle in which the output of the deployment phase becomes an input to data collection and improves the model. The success of moving data through these phases depends largely on having the right infrastructure.

Key considerations for infrastructure that supports AI and ML

  • Location: AI and ML initiatives are not solely conducted in the cloud nor are they handled on premises. These initiatives should be executed in the location that makes the most sense given the output. For example, a facial recognition system at an airport should conduct the analysis locally, as the time taken to send the information to the cloud and back adds much latency to the process. It’s critical to ensure that infrastructure is deployed in the cloud, in the on-premises data center, and at the edge so the performance of AI initiatives is optimized.
  • Breadth of high-performance infrastructure: As mentioned earlier, AI performance is highly dependent on the underlying infrastructure. For example, graphical processing units (GPUs) can accelerate deep learning by 100 times compared to traditional central processing units (CPUs). Underpowering the server will cause delays in the process, while overpowering wastes money. Whether the strategy is end-to-end or best-of-breed, ensure the compute hardware has the right mix of processing capabilities and high-speed storage. This requires choosing a vendor that has a broad portfolio that can address any phase in the AI process.
  • Validated design: Infrastructure is clearly important, but so is the software that runs on it. Once the software is installed, it can take several months to tune and optimize to fit the underlying hardware. Choose a vendor that has pre-installed the software and has a validated design in order to shorten the deployment time and ensure the performance is optimized.
  • Extension of the data center: AI infrastructure does not live in isolation and should be considered an extension of the current data center. Ideally, businesses should look for a solution that can be managed with their existing tools.
  • End-to-end management: There’s no single “AI in a box” that can be dropped in and turned on to begin the AI process. It’s composed of several moving parts, including servers, storage, networks, and software, with multiple choices at each position. The best solution would be a holistic one that includes all or at least most of the components that could be managed through a single interface.
  • Network infrastructure: When deploying AI, an emphasis is put on GPU-enabled servers, flash storage, and other compute infrastructure. This makes sense, as AI is very processor and storage intensive. However, the storage systems and servers must be fed data that traverses a network. Infrastructure for AI should be considered a “three-legged stool” where the legs are the network, servers, and storage. Each must be equally fast to keep up with each other. A lag in any one of these components can impair performance. The same level of due diligence given to servers and storage should be given to the network.
  • Security: AI often involves extremely sensitive data such as patient records, financial information, and personal data. Having this data breached could be disastrous for the organization. Also, the infusion of bad data could cause the AI system to make incorrect inferences, leading to flawed decisions. The AI infrastructure must be secured from end to end with state-of-the-art technology.
  • Professional services: Although services are not technically considered infrastructure, they should be part of the infrastructure decision. Most organizations, particularly inexperienced ones, won’t have the necessary skills in house to make AI successful. A services partner can deliver the necessary training, advisory, implementation, and optimization services across the AI lifecycle and should be a core component of the deployment.
  • Broad ecosystem: No single AI vendor can provide all technology everywhere. It’s crucial to use a vendor that has a broad ecosystem and can bring together all of the components of AI to deliver a full, turnkey, end-to-end solution. Having to cobble together the components will likely lead to delays and even failures. Choosing a vendor with a strong ecosystem provides a fast path to success.

Historically, AI and ML projects have been run by data science specialists, but that is quickly transitioning to IT professionals as these technologies move into the mainstream. As this transition happens and AI initiatives become more widespread, IT organizations should think more broadly about the infrastructure that enables AI. Instead of purchasing servers, network infrastructure, and other components for specific projects, the goal should be to think more broadly about the business’s needs both today and tomorrow, similar to the way data centers are run today. 


Zeus Kerravala is the founder and principal analyst with ZK Research, and provides a mix of tactical advice to help his clients in the current business climate and long-term strategic advice. Kerravala provides research and advice to end-user IT and network managers, vendors of IT hardware, software and services and the financial community looking to invest in the companies that he covers.

More from this author