Computer vision is fundamental for a broad set of Internet of Things (IoT) applications. Household monitoring systems use cameras to provide family members with a view of what\u2019s going on at home. Robots and drones use vision processing to map their environment and avoid obstacles in flight. Augmented reality glasses use computer vision to overlay important information on the user\u2019s view, and cars stitch images from multiple cameras mounted in the vehicle to provide drivers with a surround or \u201cbird\u2019s eye\u201d view which helps prevent collisions. The list goes on.\nOver the years, exponential improvements in device capabilities including computing power, memory capacity, power consumption, image sensor resolution, and optics have improved the performance and cost-effectiveness of computer vision in IoT applications. This has been accompanied by the development and refinement of sophisticated software algorithms for tasks such as face detection and recognition, object detection and classification, and simultaneous localization and mapping.\nThe rise and challenges of machine learning\nMore recently, advancements in artificial intelligence (AI) \u2013 particularly in deep learning \u2013 have further accelerated the proliferation of vision-based applications in the IoT. Compared to traditional computer vision techniques, deep learning provides IoT developers with greater accuracy in tasks such as object classification. Since neural networks used in deep learning are \u201ctrained\u201d rather than \u201cprogrammed,\u201d applications using this approach are often easier to develop and take better advantage of the enormous amount of imaging and video data available in today\u2019s systems. Deep learning also provides superior versatility because neural network research and frameworks can be re-utilized across a larger variety of use cases compared to computer vision algorithms, which tend to be more purpose-specific.\nBut the benefits delivered by deep learning don\u2019t come without trade-offs and challenges. Deep learning requires an enormous amount of computing resources, for both training and inferencing stages. Recent research shows a tight relationship between the compute power required for different deep learning models and their accuracy in deep learning techniques. Going from 75% to 80% accuracy in a vision-based application could require nothing less than billions of additional math operations.\nVision processing results using deep learning are also dependent on image resolution. Achieving adequate performance in object classification, for example, requires high resolution images or video \u2013 with the consequent increase in the amount of data that needs to be processed, stored, and transferred. Image resolution is especially important for applications in which it is necessary to detect and classify objects in the distance \u2013 for instance, enterprise security cameras.\nMixing computer vision with machine learning for better performance\nThere are clear compromises between traditional computer vision and deep learning-based approaches. Classic computer vision algorithms are mature, proven, and optimized for performance and power efficiency, while deep learning offers greater accuracy and versatility \u2013 but demands large amounts of computing resources.\nThose looking to implement high performance systems quickly are finding that hybrid approaches, which combine traditional computer vision and deep learning, can offer the best of both worlds. For example, in a security camera, a computer vision algorithm can efficiently detect faces or moving objects in the scene. Then, a smaller segment of the image where the face or object was detected is processed through deep learning for identity verification or object classification \u2013 saving significant computing resources compared to using deep learning over the entire scene, on every frame.\nAt the Embedded Vision Europe conference in October, I presented a hybrid vision processing implementation from Qualcomm Technologies, which combines computer vision and deep learning. The hybrid approach delivers a 130X-1,000X reduction in multiply-accumulate operations and about 10X improvement in frame rates compared to a pure deep learning solution. Furthermore, the hybrid implementation uses about half of the memory bandwidth and requires significantly lower CPU resources. This is a significant performance advantage for manufacturers and developers choosing to implement this strategy.\nMaking best use of edge computing\nJust like using pure deep learning, hybrid approaches for vision processing take great advantage of a heterogeneous computing capabilities available at the edge. A heterogeneous compute architecture helps improve vision processing performance and power efficiency, assigning different workloads to the most efficient compute engine. Test implementations show 10x latency reductions in object detection when deep learning inferences are executed on a DSP versus a CPU.\nRunning algorithms and neural network inferences on the IoT device itself also helps lower latency and bandwidth requirements compared to cloud-based implementations. Edge computing can also reduce costs, by reducing cloud storage and processing requirements \u2013while protecting user privacy and security by avoiding transmission of sensitive or identifiable data over the network.\nDeep learning innovations are driving exciting breakthroughs for the IoT, as well as hybrid techniques that combine the technologies with traditional algorithms. Vision processing is just a start, as the same principles can be applied to other areas such as audio analytics. As devices on the edge get smarter and more capable, innovators can start building products and applications never possible before. These are truly exciting times for the IoT.