Microsoft just released the open-source licensed beta release of the Microsoft Cognitive Toolkit on Github. This announcement represents a shift in Microsoft’s customer focus from research to implementation. It is an update to the Computational Network Toolkit (CNTK). The toolkit is a supervised machine learning system in the same category of other open-source projects such as Tensorflow, Caffe and Torch.
Microsoft is one of the leading investors in and contributors to the open machine learning software and research community. A glance at the Neural Information Processing Systems (NIPS) conference reveals that there are just four major technology companies committed to moving the field of neural networks forward: Microsoft, Google, Facebook and IBM.
This announcement signals Microsoft interest to bring machine learning into the mainstream. The open source license reveals Microsoft’s continued collaboration with the machine learning community.
Microsoft's shift from supporting the research community to enabling customers to use machine learning in new developments is timely. Just a few years ago, machine learning and neural networks shifted from obscurity with few artificial intelligence (AI) practitioners believing that they were useful to the mainstream being deployed or in development at many companies and institutions.
Xuedong Huang, Ph.D., distinguished engineer and chief scientist of speech R&D at Microsoft, briefed me on the development. Huang, joined Microsoft from Carnegie Mellon University in 1993, bringing speech recognition research to the research group.
In its shift from research to implementation, Huang spoke of important features of the Cognitive Toolkit:
* Optimized for the Azure Cloud: The Cognitive Toolkit has been optimized to run across clusters on the Azure N platform powered by Nvidia GPUs, currently in preview. Nvidia has invested in research and development to optimize its GPU platforms for machine learning and is widely used in the field. Microsoft has optimized the Cognitive Toolkit to run large models on multiple GPUs and across multiple servers, a very important consideration for companies developing and implementing new applications with the intent of supporting millions of users. Building supporting infrastructure such as this is important because only a handful of companies have the resources to optimize machine learning infrastructure.
* Machine Learning made accessible to data scientists and software developers: Huang said there are more than 20 models built with the Cognitive Toolkit, including customized speech recognition made available using application programming interfaces (API). Some examples of APIs that are available to build apps using these models:
The number of people capable of developing machine learning models is much smaller than the number of software developers and data scientists that can apply them to a wide spectrum of problems. Few companies are prepared to implement machine learning if required to staff a team capable of building models from scratch. Huang’s approach is intended to facilitate applications that use models, such as computer vision, emotion recognition and language understanding for companies that do not have large machine learning research staffs.
* Accessibility to different types of developers: Huang said Microsoft added support for both Python and C++ programming languages. This serves two kinds of developers outside of Microsoft’s large C# base of developers. Python is widely used by web developers and is the programming lingua franca for STEM-trained professionals who work in diverse fields—from sociology to biopharma. Support for Python broadens the application of the Cognitive Toolkit and Microsoft’s prepackaged machine learning models outside of traditional enterprise developers.
C++ support serves the open source developers who have chosen the Cognitive Toolkit to build large application-specific machine learning systems but will not be able to rely on Microsoft to optimize the training of machine learning models that they build or the execution of the machine learning models, commonly referred to as the inference stage.
Huang mentioned a few interesting developments worth mention. Reading between the lines, some interpretation was added to his comments.
Huang spoke about very favorable benchmarks of the Cognitive Toolkit, demonstrating multi-GPU performance in comparison to the Tensorflow, Caffe and Torch, which were run by the Hong Kong Baptist University and provided the data below.
Benchmarks are application-specific and should be read with “your mileage may vary.” Machine learning benchmarks are more revealing at the inference stage than the learning stage because inference is an issue when implementing very large models to millions of users in a productions system where the cost of execution can be very expensive and learning is a scientific computing problem with less critical hardware economic constraints.
Nvidia’s GPUs delivered the horsepower to advance academic research into commercial development. Microsoft’s attention to performance is very reassuring because the underlying GPU hardware is still in development and will be for years to come. Advancements in machine learning will require hardware that is orders of magnitude faster than what is available today. Attention to the performance of parallel GPU execution will be critical to the commercial deployment of machine learning applications.
Including humans in the training loop is under development. Huang’s description seemed similar to the work Facebook is doing building bots for Messenger. Often, the sample data to train machine learning models is incomplete. Building applications that learn through interaction with humans at the inference stage captures new sample data that increases the accuracy of the model over time.
Huang also offered the following dialog, demonstrating highly accurate speech recognition built using the Cognitive Toolkit.
It is an interesting demonstration of accuracy because Huang said few speech recognition models perform well with children’s’ voices.
This announcement is more than a point release of a toolkit. It is the recognition of AI and machine learning as the next big platform after mobile.