• United States

Conventional computer vision coupled with deep learning makes AI better

Nov 29, 20175 mins
Artificial IntelligenceInternet of Things

Machine learning is driving a revolution in vision-based IoT applications, but new research combining classic computer vision with deep learning shows significantly better results.

camera bots
Credit: Thinkstock

Computer vision is fundamental for a broad set of Internet of Things (IoT) applications. Household monitoring systems use cameras to provide family members with a view of what’s going on at home. Robots and drones use vision processing to map their environment and avoid obstacles in flight. Augmented reality glasses use computer vision to overlay important information on the user’s view, and cars stitch images from multiple cameras mounted in the vehicle to provide drivers with a surround or “bird’s eye” view which helps prevent collisions. The list goes on.

Over the years, exponential improvements in device capabilities including computing power, memory capacity, power consumption, image sensor resolution, and optics have improved the performance and cost-effectiveness of computer vision in IoT applications. This has been accompanied by the development and refinement of sophisticated software algorithms for tasks such as face detection and recognition, object detection and classification, and simultaneous localization and mapping.

The rise and challenges of machine learning

More recently, advancements in artificial intelligence (AI) – particularly in deep learning – have further accelerated the proliferation of vision-based applications in the IoT. Compared to traditional computer vision techniques, deep learning provides IoT developers with greater accuracy in tasks such as object classification. Since neural networks used in deep learning are “trained” rather than “programmed,” applications using this approach are often easier to develop and take better advantage of the enormous amount of imaging and video data available in today’s systems. Deep learning also provides superior versatility because neural network research and frameworks can be re-utilized across a larger variety of use cases compared to computer vision algorithms, which tend to be more purpose-specific.

But the benefits delivered by deep learning don’t come without trade-offs and challenges. Deep learning requires an enormous amount of computing resources, for both training and inferencing stages. Recent research shows a tight relationship between the compute power required for different deep learning models and their accuracy in deep learning techniques. Going from 75% to 80% accuracy in a vision-based application could require nothing less than billions of additional math operations.

Vision processing results using deep learning are also dependent on image resolution. Achieving adequate performance in object classification, for example, requires high resolution images or video – with the consequent increase in the amount of data that needs to be processed, stored, and transferred. Image resolution is especially important for applications in which it is necessary to detect and classify objects in the distance – for instance, enterprise security cameras.

Mixing computer vision with machine learning for better performance

There are clear compromises between traditional computer vision and deep learning-based approaches. Classic computer vision algorithms are mature, proven, and optimized for performance and power efficiency, while deep learning offers greater accuracy and versatility – but demands large amounts of computing resources.

Those looking to implement high performance systems quickly are finding that hybrid approaches, which combine traditional computer vision and deep learning, can offer the best of both worlds. For example, in a security camera, a computer vision algorithm can efficiently detect faces or moving objects in the scene. Then, a smaller segment of the image where the face or object was detected is processed through deep learning for identity verification or object classification – saving significant computing resources compared to using deep learning over the entire scene, on every frame.

At the Embedded Vision Europe conference in October, I presented a hybrid vision processing implementation from Qualcomm Technologies, which combines computer vision and deep learning. The hybrid approach delivers a 130X-1,000X reduction in multiply-accumulate operations and about 10X improvement in frame rates compared to a pure deep learning solution. Furthermore, the hybrid implementation uses about half of the memory bandwidth and requires significantly lower CPU resources. This is a significant performance advantage for manufacturers and developers choosing to implement this strategy.

Making best use of edge computing

Just like using pure deep learning, hybrid approaches for vision processing take great advantage of a heterogeneous computing capabilities available at the edge. A heterogeneous compute architecture helps improve vision processing performance and power efficiency, assigning different workloads to the most efficient compute engine. Test implementations show 10x latency reductions in object detection when deep learning inferences are executed on a DSP versus a CPU.

Running algorithms and neural network inferences on the IoT device itself also helps lower latency and bandwidth requirements compared to cloud-based implementations. Edge computing can also reduce costs, by reducing cloud storage and processing requirements –while protecting user privacy and security by avoiding transmission of sensitive or identifiable data over the network.

Deep learning innovations are driving exciting breakthroughs for the IoT, as well as hybrid techniques that combine the technologies with traditional algorithms. Vision processing is just a start, as the same principles can be applied to other areas such as audio analytics. As devices on the edge get smarter and more capable, innovators can start building products and applications never possible before. These are truly exciting times for the IoT.


Raj Talluri serves as senior vice president of product management for Qualcomm Technologies, Inc. (QTI), where he is currently responsible for managing QTI’s Internet of Everything (IoE), mobile computing and Qualcomm Snapdragon Sense ID 3D finger print technology businesses.

Prior to this role, Talluri was responsible for product management of Qualcomm Snapdragon application processor technologies. He has more than 20 years of experience spanning across business management, strategic marketing and engineering management.

Talluri began his career at Texas Instruments (TI), working on media processing in their corporate research labs. During that time, Talluri started multiple new businesses in digital consumer electronics and wireless technologies. He also served as general manager of the imaging and audio business for five years, where he led the development of successful digital signal processing technologies for various consumer electronics devices. Later, Talluri was named general manager of the cellular media solution business in TI’s wireless terminals business unit. In this role, he led the successful launch of TI’s OMAP3 and OMAP4 application processor platform for smartphones.

Talluri holds a Ph.D in electrical engineering from the University of Texas at Austin. He also holds a Master of Engineering from Anna University in Chennai, India and a Bachelor of Engineering from Andhra University in Waltair, India.

Talluri has published more than 35 journal articles, papers, and book chapters in many leading electrical engineering publications. He has been granted 13 U.S. patents for image processing, video compression and media processor architectures.

Raj Talluri was chosen as No. 5 in Fast Company’s list of 100 Most Creative People in business in 2014.

The opinions expressed in this blog are those of Raj Talluri and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

More from this author