What will you Learn from this webinar?
01How the Hardware and software components make up an Edge AI-enabled Computer Vision product?
02How to choose the right embedded camera for your computer vision solution?
03How to choose the right Nvidia Jetson Edge Processor for your Edge AI and computer vision camera?
04What is Nvidia Deepstream, and what are its capabilities, pros, and cons?
05How to use Nvidia Deepstream plugin-enabled VisAI Platform accelerators to fasten your algorithm development?
06What are the primary business considerations you need to look into before building an Edge AI?
Subscribe to the newsletter and be updated on the latest happenings in Edge AI and Computer vision
So what are we going to see today? So the following are the six things that you will learn today as part of this webinar, so what are the components of an edge ai vision-based solution?
And we will go into each of these components, like how to choose the right camera for your application? Then how to choose the right processor, and why Nvidia is the best processor for Edge ai solution, and how to identify algorithms or models for your prototyping? where do you start with your algorithms?
So, you you’ll also be we’ll also be touching on that and how to accelerate, so once you know all these components and when you start the development, how do you quickly accelerate your development and when to work with ai or an edge development expert?
How to start your development, and when you should reach out to experts?
So, let me right away start with the core components of the Edge of an edge AI application.
When I talk about a vision-based application, obviously, the camera is the first part, and a camera is a very subjective thing. So it’s not a one size fits all and you always have to make the right choice of your camera.
The second main component is the edge processor, like I said, making uh running a model or making an inference at the Edge is a challenging task, and so you have to choose the right processor, so that’s the second part.
And the third is what kind of an algorithm that you’re going to use or what kind of a model that you’re going to train and use in your edge application with all the limitations that you have on the Edge.
So these are the three different uh components that we’ll be talking about today, and I will have my uh colleague Gomathi Shankar who’s an expert in the cameras to take over the webinar from here and explain you uh how you can choose your camera for your ai edge solution.
Thank you, Maha!
Hello everyone, uh, this is Gomathi, so image quality is very critical for a vision-based Edge AI system, so if you get a better quality image, then the life of an inference system is going to be simple. if the image quality is not good enough, then it is going to be hard for the application algorithm development.
So what we would recommend is to consider these four aspects when you choose the camera. you always look at the resolution of the camera, but that is just one aspect of the camera system.
Uh, whether we need a full HD, or 4k, or more than that, it is just one aspect. What are the aspects we need to consider when we choose the camera?
What kind of optics we have to choose for applications, we always look for the autofocus lens but is it suitable for all the applications? There is something we need to look for.
What is the purpose of an image signal processor? Do we need an image signal processor for the application? What is a suitable interface for the camera? Let’s see one by one.
Okay, so let’s look at one of the applications like. I assume your end application is an outdoor application like smart surveillance, or smart parking lot system or smart traffic system or any outdoor video analytic device the first and foremost thing we need to consider is we are going to have harsh lighting conditions the sunlight make it right into the sensor.
So if you choose a sensor with a high dynamic range, then that would be a perfect fit. the other things we need to consider for an outdoor application is are we going to use these cameras for in the night vision as well.
when it’s in night vision, uh, whether it is just a low light or we are not going to have any light at all if it is a low light application, then we will have to choose a bigger sensor size that will provide better image quality at low light, uh, if it is a smaller sensor size then we are going to get a lot of noises in the low light
In other case, if there is no light at all, then the sensor has to support the infrared region as well. so look for quantum efficiency in the infrared region
let’s assume another application, where the end object was where the object is going
to be moving really fast example, license plate recognization
If you use a rolling shutter camera, you may get into a more simpler issue you may not be able to read the number plate properly, or if it is a AGV automated graded vehicle, then again, rolling center may not be a very good option you may need a global shutter camera and we may need a synchronization as well, so global charter would be best for any moving object application.
Let’s assume another application a parking lot management, you may not need a high frame rate in a parking lot application because objects are not going to move really fast. But in case of industrial automation, uh one of our customers asked for 400, 500 frames per second in industrial automation, so if you are going to use it for industrial automation, go for a sensor which can support high frame rate. resolution may not be a big deal at this point, but your frame rate is a big deal
Let’s look at the optics for this application, if it is a facial recognition system or Face-detection application, or any simple object detection application which where the object is close enough within a meter or two-meter, then the autofocus would be a better option.
but if the object distance is more than two meters, autofocus may not be a very good option
we have to look for a fixed focus lens where we get enough desktop view for our application.
For smart surveillance, we may need a wider field of view lens, but when you choose a wider field of view lens, look for fisheye effect, whether it is something you can manage in your application, or you will have to choose something like 100 degree where you may not get a lot of fishy effect, but it will be good enough in the field.
So these are some of the things we need to consider when you choose a sensor or optics.
What is the need of an ISP? What is an ISP first? ISP is a hardware or a software block which is responsible for most of the auto functions, the auto exposure, auto white balance, autofocus. All these things are taken care of by the image signal processor. The image on the left is what you get usually from the sensor. With the fine-tuned ISP, you will get something on the right.
If it is not fine-tuned well enough for your sensor, then you may not get a very good image
quality regardless of what sensor you are going to use.
We if it is a monochrome sensor, we may not need a full-fleshed uh ISP. All we need is some uh just an auto exposure which can be handled in the software or some of these uh monochrome cameras comes with an auto exposure input.
But if it is a color camera, ISP is a must, and mostly uh option choosing an ISP is the most underrated thing we underestimate the time-consuming consumption of uh integrating an ISP with a sensor, but fine-tuning an ISP is the most time-consuming work in the product development cycle
Also uh getting an access to these ISP settings is not an easy thing, uh prefer partners like us get access to Nvidia ISP. It will not be available for everyone.
So when you select the camera system, ISP is a most important block.
Then the next important part is the right camera interface for your application. Generally, sensors comes with a mipi interface uh then if you need it in an USB interface, this mipi data has to be converted into UVC packets and send it across, or if it is a Gigi, then you would have to make it as an IP packet and send it across, again the for the GMSL it has to be serialized and send it across.
So maybe is the natural interface of any camera system. so reliability wise me is the most sturdy interface, when it comes to GMSL, GMSL can be used for for a longer distance.
When it comes to mipi, it can be supported up to uh 30 to 50 centimeter, but when it goes for a longer distance like fleet management, you may not be able to use MIPI interface right away.
You will have to convert serialize the signals. You can use GMSL.
Then the in terms of bandwidth, MIPI provides higher bandwidth just to give an example, if you want to 4k 60f frames per second, MIPI would be the only option. USB if you want 4k 30 fps USB the next better option and GMSL or you can go for 4k 30, or 1080p 60 is possible, but when it comes to GIG-E uh usually you won’t get that much of bandwidth.
For longer range, GTU would be the better option it can go for several meters, GMSL can go up to 15 meters, then the USB by default it supports up to 1 meter, but there are cables which can support up to 5 meters as well.
For a multi-camera, maybe would be the better option, then the next better option is GMSL because these processors provide multiple interfaces for MIPI cameras.
When it comes to USB, if you connect multiple cameras, assume if you are going to use a surround v system and if you want four or five cameras per system, uh if you connect all these cameras into the same interface, USB 3.0 interface is going to the bandwidth going to be shared and you may not be able to get enough frame rate from all these cameras.
GMSL is just a conversion between MIPI and GMSL. Also, again you can have multiple cameras with the GMSL interface as well.
So now we know uh the components, uh we need to consider before choosing the right camera. So how do we accelerate? how do you make a fast prototype and make it a production-ready product? You can go for off-the-shelf cameras, so if you look at our product portfolio, we have a wide range of cameras, uh, in USB 3.0. mp3 cameras, GMSL cameras, you can choose the off-the-shelf camera, which already has an image signal processor inbuilt into it, and you have a wide range of interface option as well.
To start with, you can go for a USB 3.0 camera and start your evaluation. These are UVC cameras you can connect into a pc, and it can use the default driver. Whether it’s a windows or Linux, it can work with a default driver. You don’t need any additional driver for this use UVC cameras.
That will give you an idea about what kind of image quality you are getting with this camera, and then if you want to continue with USB, you can continue with USB, and if you want to switch to a MIPI GMSL camera, then it is possible. that all our USB camera are also available in ep and GMSL interface as well
when it comes to Nvidia, uh, one of the interesting uh aspects of Nvidia processor is that it supports multiple cameras, supports up to six cameras in some cases, and can have eight cameras and econ systems multi-synchronized camera support for all the Nvidia platforms. One of the latest launch which we did is six GMSL camera for Xavier platform
And we do have cameras for every single application; if you are looking for a global shutter camera, we don’t have an option both in monochrome and color, if you have if you want to have a stereo camera for your AGV application, we don’t have cameras for it.
If you’re looking for HDR, GMSL, IP rater, ip67 rated enclosure cameras, we have the options.
So this way, these are pre-certified cameras, so we be all our cameras are pre-certified designed for certification so that you can use it as it you can take it to the market capacitance you don’t have to make any further development with that.
Okay, that’s it on the cameras, uh now uh the processor, so we will strongly recommend the Nvidia processor for any Edge AI device designed for artificial intelligence application.
Just in processor bringing the power of modern ai, deep learning, and interference to an embedded system like this Edge.
So as I said, it is designed for Edge AI, and you have a system on module for every application, be it a small facial recognition application, or be it a fleet management application, where you require up to six cameras, and you may have to do you may have to run your neural network in all the six streams it is possible. You have options for uh all the applications.
Let’s look at one by one, so just a Nano is a uh most cost-effective low power uh in media processor uh it comes with a quad-core processor and 128 uh GPU cores, what it means so 128 cores, so with more the GPU code more better it is for the algorithm, and the floating-point operation that’s an indicator for how powerful it is for inference system, so the more the flops, the better it is for inference, and the Nvidia also have a complete ecosystem.
So you can start your aga application right away with the sample deep steam SDK.
We have a deep stream demo at the end of the presentation, so you have a complete echo system.
And the stack, the application stack, can be shared with all these processes, so if you develop an application on Jetson Nano, if you want to move ahead with the higher processor, you can use the same application that will run on cable as well.
So the software stack is exactly same for all the four processors provided by in media. So when you choose a processor, uh, look at the number of cameras you need; the Nano can support up to four two-lane and three four-lane cameras so that you can have uh practically four 4k cameras; and also look at the encoder capability of the encoder is also important when it comes to applications like uh remote monitoring or telepresence.
So next best processor, uh better performance processor, is Nvidia JSON tx2 that has multiple cores like one quad-core and one dual-core, and it is twice better in terms of GPU cores, and also more than twice better in terms of inference engine flip flops, and it can have up to six cameras.
There are customers. Some of our customers are already in the field with six cameras, and all the six cameras can stream uh full HD at 30 frames per second.
Then the Xavier NX and Xavier are the most powerful aka platform available today. So it has a six-core uh the Xavier NX is one of the latest launch from Nvidia, which has six uh core processor and uh powerful GPU also it has a 48 sensor cores which is specifically for um inference neural network, and it can have up to uh three 4k cameras and TW Nano o uh six uh uh full HD cameras like in media TX2, and it is printed incompatible with Nvidia as well.
So if you develop an application with the JSON Nano and if you want to upgrade it if you want to scale it in the future, you can just use the same carrier board and switch to Xavier NX.
And finally, the Xavier AGX Xavier it’s the most uh powerful limited platform, and it has uh up to four 4k. It can have up to four 4k cameras, and as I said, it is the having a more powerful GPU course.
So let’s see how it translates into the application, let’s see some of the application, and what kind of system we can use.
So based on our experience, we do have customers in most of these applications. So for telepresence or a facial detection or facial recognition application, the amount of uh algorithm will be the horsepower needed for the algorithm is very less and we can go for a cost-effective option like this and now but in terms of if it comes to smart surveillance or smart parking lot you may need to download lot of data from the cloud and you will have to compare it, so in that case uh you can go for TX2.
When it comes to sports vision or multi-camera system, you can choose a Xavier NX, one of our customers has integrated up to six cameras to Xavier NX, and for surrounding system example, fleet management, you can go for Xavier.
So you can have all the six cameras stream the stream at the same time, and also, you can run the inference engine on top of it.
So how do you speed up? Now we know uh what is the best processor for the application, how do we uh speed up the prototyping? Uh, and for the production? So uh just give you an idea a rough estimate of how much time it will take for developing a career board, it will be at least two to three months that’s the minimum time it will take for developing a career board, and also you will have to take it for certification, and that is going to take for the time.
So we would strongly recommend off-the-shelf career boards, econ systems, and CRG. We are partnered with connect tech, and we provide cameras for uh all the connected career boards. So you can use the Jetson SOM and use the office’s off-the-shelf career code and take it through for the prototyping right away on day one, and then if the uh interfaces are good enough, you can take it for mass production with the same career code and the sound you don’t really need to redesign everything.
But if you want a custom design either on the camera or on the processor, we are open for that, and we should be able to help you in customizing the platform and connected has a career board for every single distance on.
Okay, um, now we know the hardware, and the important part is the algorithm,
uh, thanks uh thanks, Gomathi, thanks for covering the camera and the processor section.
So the key takeaways from what Gomathi shared is that you need to pick up the right hardware which is readily available from the shelf; where the camera is already integrated with the Jetson so that you need not have to do all the hard work of making your camera work on this board first before you start your project, right? So that is the key takeaway.
When you choose such a development platform, make sure that these components you are picking up also have a production-ready version.
Like when you choose the camera and the hardware platform, the Jetson platform is there production-ready equivalents available so that once you achieve your goal of developing the algorithm, you could easily move on to your production.
Perfect! So, having covered the hardware part, let’s move on to the algorithm and software part.
So here, there are a lot of different combinations um that work on when you choose the software.
There could be a computer vision algorithm plus an AI model in your solution, or that could be just a computer vision algorithm alone or an ai model alone, so there are multiple combinations that occur when you approach the solution.
So, for example, when you want to detect a circle, a computer vision algorithm is is is good enough; whereas when you want to detect people when you want to recognize people as based on their gender, then an ai model would be much more suited and just solving this problem with vision might not be enough, and identifying a people we could use a model and then tracking them could be a computer vision algorithm.
So there is a lot of combination of these algorithms and models which are present. Now when we start working on this algorithm, there is an algorithm development part, either you could start from scratch um and do it, or you could pick an existing algorithm or model and see how it works for you, and then optimize it improve it if you choose a model then you prepare your data set and train it before applying it for your final solution.
So the algorithm development or going towards the solution with existing blocks is not always a very straight path.
And once you do all this, okay, you have made everything work. There are a lot of other things when it comes to final productization, putting it on the Edge optimizing it for working with live stream of data and things like that, that is another thing that we need to consider uh when we are choosing these algorithms and blocks.
How do we start? How do we go towards the fast prototyping of all of this? That’s where Nvidia’s deep stream helps us where Nvidia has a framework to help us get started on this ai journey.
And from VisAI labs part of econ, we also have a bizarre platform which has a collection of algorithm accelerators which can help you go forward as well.
Let us first see how Nvidia’s deep stream helps us um get started with our ai development, so what is a deep stream? A deep stream is a flexible and scalable SDK with a building block for rapid prototyping complex a solution from Edge to cloud.
So wherever Nvidia’s GPUs are present, you could develop a model and that would work right from the Edge to the cloud. Of course, they would have limitations based on their computing power, but you could use this same uh development tools to work with an ai model right from the Edge to cloud. So that is a deep stream. So what is in the deep stream? A deep stream is not like a one-size-fits-all solution where you can just take an existing example of a deep stream and go for production.
So we have to understand that this is more of a development tool with which makes our life easy to build algorithms but not an algorithm in itself that you can pick and put it into the market. So, I might be repeating this one size, not a one-size-fits-all, very frequently because the problem with ai and the problem you’re trying to solve with ai is not a one-size-fits-all like you cannot make one model that works for every situation.
For example, if I make a model to detect people, it might work in one particular angle of the camera with one particular condition and field of view and the distance from the camera people. There are so many things that are connected to this working efficiently. When I’m using the same model for a different field of view, where i see more people, I may have to train that particular situation model.
So that could be optimization, fine-tuning of the model, and retraining of all this might also happen, that’s why we always say that solving this ai problem is not a one-size-fits-all.
Now when I said deep stream is a framework, right? It’s an SDK which makes our job easy, so where does it help? When you connect a camera to an Nvidia platform, you need to take the frames from the camera to your ram, and then you may have to pass it through your engine where the where you do an infinite through your inference engine where the inferencing is done for your ai model, and then you have the results out, and you may also have to take the video stream and stream it out to another server or maybe to the display and you may want to overlay your results on top of the screen.
So how do you connect all these different blocks of hardware of software? How do you take this stream to the memory? And how do you uh pass it down to your inference engine? And then how do you work with your results? All connecting all this with deep stream makes your job very easy to connect this from end to end, and the deep stream has like very clear examples of how this is done, which you can download freely from Nvidia, and you can try, and you can use.
So deep stream has a place where you plug in your models, and the deep stream has some sample pre-trained models for evaluation as well.
So you could find a list here which we have put like people net traffic cam net, phase detect IR, all these are models which deep stream has given us samples so that they can demonstrate how their framework will work.
So this is a good starting point for anybody who wants to explore ai on the Nvidia platform, to begin with.
So what does VisA platform offer? So VisAI platform offers algorithm blocks or algorithm accelerators. So we have done work, and we have created some blocks like people counting, congestion detection, social distancing checkers, so these are blocks which we have trained, and these are models which will work with the deep stream, and we have trained this for a particular data set.
And if anybody has a requirement for people counting for a different situation, these models can be retrained or can be used for their applications as well.
so this is what we call as algo accelerators, and if these are mainly used for people who do not want to develop the algorithms from scratch, rather they have a requirement, and they want to pick something which is existing which is available in the available with us and which we can fine-tune for your final use case.
Okay, so now we have covered the hardware parts, and we have also covered the algorithm parts of where you could pick your algorithms from and how you could start, right?
So I’m just giving you, uh, as a developer as an engineer, if you’re going to start your first Edge ai development or product development, how will it look like? So in broad terms, this is how it is going to look like.
You have to go in and pick a board development board which has the Jetson as well as the camera inbuilt, and then choose a model that you want to work with.
So in for example, if you’re going to do people detection, it could be People Net. or yolo from the open-source, so you pick up a model, and then you work with it you first evaluate it on this platform to see how it works, and how it works for your particular use case, and obviously, it’s not going to be a perfect fit it will not work 100 percent, but you will be able to appreciate some of the algorithm working, and you will also see where it is lacking, so that is the time when you go ahead and tune this model for your requirement you have to figure out what is not working then tune the model accordingly and also train the model for your with your database so that it starts bringing in the accuracy that you require.
So the first uh the first thing that you have to start with is uh making a proof of concept have a goal for your algorithm, and then see how much you can get there with your dev kit an existing uh model uh and a camera and then fine-tune the Algor uh the model or the algorithm and if required if you understand that it is going to be a big problem, or if it’s if this particular existing model is not going to solve your problem then you could go ahead and think about designing your model and trying to resolve the problem because in most ai cases an existing model with fine-tuning and training it with the right kind of data is the final solution for most of these applications.
And once you have a proof of concept which works to your requirement, and it works to your accuracy levels, then you could go ahead and design the final product or solution.
so before we close this webinar,
I would like to show how the first two steps are done, and I’ll show you a video of how we did it, like choosing the right development kit for hardware and a camera combination and then using an existing ai model.
okay, so I’ll show you a dev kit,
So this is a Rudy NX kit which has an Nvidia Xavier NX inside it, and this is from connect tech, and we have connected our sturdy cam, which is a rugged camera that has connectivity over GMSL, to the Rudy NX box, so once we connected this we attached this to a vehicle, and we took it out on a ride to a road so that we could capture the video and we could also do some live inferences.
So if you order such a kit and a camera once you get it, you can use deep stream and run one of its samples in like 2-3 days, so that’s what we did here, and i will show you how that looked at first okay so please be aware that the video I’m going to show has a deep stream working on it, but again it’s not an optimized or a ready to use application that we are showing so we are showing what we you would see on day one of your development.
So the reason why we are showing this is to emphasize that this is a great starting point so you have a starting point, where you have a model which is working you have the camera and the hardware everything with you so that you could start your development from here so let’s watch this.
So we come to the end of this uh webinar, and um please, if you’re going to go ahead and try your own AI edge development, these are the steps you will have to follow; we will share this presentation and the video as well you could go ahead and study a bit more about these hardware blocks as well as the software blocks, and in case you need help in your development at any stage we are there through every step in your vision edge solution development.
Thanks a lot!