Connect with us

Graphic Cards

In Depth: Cameras

That is the beginning of a brand new set of articles which offer a deeper clarification of the ideas across the NVIDIA Jetson household of merchandise. In most of the earlier articles on JetsonHacks, we now have been concentrating on “methods to” do a selected job.  

“In Depth” will focus extra on how completely different subsystems work and will be mixed collectively for particular duties similar to imaginative and prescient processing and machine studying. Articles on this collection will cowl the Jetson {hardware} itself, exterior units, plus the software program that binds the whole lot collectively.

Right here we deal with digital video cameras. As cameras present the photographs for imaginative and prescient and machine studying evaluation, understanding how a digicam gathers and distributes these pictures is vital. 

The Intent

The intent right here is to familiarize (or when you’re like me, refresh your reminiscence) on completely different phrases and ideas related to digital streaming video cameras. You need to use this overview as a leaping off level to dive deeper into the topics offered.


Digital video cameras are ubiquitous. Billions of individuals have smartphones or tablets with built-in cameras, and a whole lot of thousands and thousands have webcams hooked up to their pc. 

Digital video has a short historical past. The primary semiconductor picture sensor (CCD) was invented in 1969 at Bell Laboratories. A second sort, referred to as a CMOS sensor, was invented on the Jet Propulsion Laboratory down the road right here in Pasadena, California in 1993. It was within the early Nineties that there was a convergence of know-how which allowed digital video to be streamed into client degree computer systems. The primary widespread client webcam, the Connectix QuickCam, was launched in 1994 for $100. 320×240 decision, 16 bit grayscale. Twas wonderful on the time.

CMOS know-how is now in use on the overwhelming majority of the sensors in client digital video merchandise. Over time the decision of the sensors has improved, whereas including a myriad of capabilities.

Even with a brief historical past, there’s a forest of abbreviations and acronyms to navigate to grasp what persons are speaking about in a given context. Onerous to speak about one thing when you don’t know the fitting title.

Right here we’ll think about cameras that we connect to a Jetson, although these similar cameras will be hooked up to lesser machines. Only one instance, right here’s a 4K digicam:

Arducam IMX477 for Jetson Nano and Xavier NX

You possibly can consider a digicam as a number of completely different elements. First is the picture sensor, which gathers lights and digitizes it. The second half is the optics, which helps focus mild on the sensor and supplies a shutter. Then there may be the supporting digital circuitry which interfaces with the sensor, gathers the photographs and transmits them. 

Picture Sensor

There are two varieties of picture sensors predominately in use in the present day. The primary is CMOS, the opposite is CCD. CMOS is dominant in most decrease value functions. The uncooked sensors present monochrome (greyscale) pictures.

Right here’s a picture of a picture sensor, the Sony IMX477:

Coloration Pictures

There are other ways to get shade pictures from these sensors. By far the most typical manner is to make use of a Bayer Filter mosaic, which is a shade filter array. The mosaic arranges shade filters on the pixel array of the picture sensor. The filter sample is half inexperienced, one quarter purple, and one quarter blue. The human eye is most delicate to inexperienced, that’s why there’s an additional within the filter sample.

Every filter tunes a selected wavelength of photons to the sensor pixel. For instance, a blue filter makes the sensor pixel delicate to blue mild. The pixel emits a sign relying on what number of photons it sees, on this case how a lot blue mild.

Determine 1

There are different variations utilizing shade filter arrays of the sort of strategy. The Bayer methodology is patented, so some folks attempt to work round that. Alternate options are CYGM (Cyan, Yellow, Green, Magenta) and RGBE (Red, Green, Blue, Emerald).

Within the Bayer filter, the colours could also be organized in several patterns. To get all of the combos, you might even see BGGR (Blue, Green, Green, Red), RGBG, GRBG and RGGB. This sample is used to interpolate a shade picture utilizing demosaicing algorithms.

The uncooked output of Bayer-filter cameras is known as a Bayer sample picture. Keep in mind that every pixel is filtered to document solely considered one of three colours. The demosaicing algorithm examines every pixel and its surrounding neighbors to estimate a full Red Green Blue (RGB) shade for that pixel. That’s why it’s vital to know the association of the colours within the filter.

Determine 2

These algorithms will be easy or advanced, relying on computational parts onboard the digicam. As you possibly can think about, that is fairly the issue. The algorithms make tradeoffs and assumptions concerning the scene that they’re capturing and bear in mind the time allowed to calculate the colour values. There will be artifacts within the remaining shade picture relying on the scene and algorithms chosen.

Time is a crucial issue if you find yourself making an attempt to estimate the colour of every pixel in actual time. Let’s say you’re streaming knowledge at 30 frames per second. Which means you might have have about 33 milliseconds between frames. Your picture higher be carried out and gone earlier than the following one arrives! When you’ve got a few million pixels to demosaic per body, meaning you might have your work minimize out for you! Correct shade estimation will be the enemy of pace, relying on which algorithm is in use.

Sensor modules comprise simply the picture sensor. Raspberry Pi V2 Digicam IMX219 and the Excessive High quality model, IMX477 are two such modules that work on the Jetson Nano and Xavier NX. These sensors transmit the uncooked Bayer sample picture over the Camera Serial Interface (CSI) bus. The Jetson then use on board Image Signal Processors (ISP) to carry out a wide range of duties on the photographs. The Tegra configurable ISP {hardware} handles demosaicing, auto white steadiness, down scaling and so forth. Take a look at Picture Processing and Administration for an expanded overview.

However, digicam modules embrace the smarts onboard the module to carry out these duties. A few of these modules have a CSI interface, however are usually in use by cameras with alternate interfaces, similar to USB. Whereas a few of these modules transmit uncooked Bayer sample pictures, the most probably use case you’ll encounter is an encapsulated video stream, uncooked shade pictures or compressed.

Infrared Gentle

The Bayer filter is clear to infrared mild. Many picture sensors can detect close to infrared wavelengths. Most shade cameras add an infrared filter on the lens to assist with higher shade estimation.

Nonetheless, typically it’s helpful to take a look at a scene that’s illuminated by infrared mild! Safety “evening imaginative and prescient” methods usually have an IR emitter mixed with a digicam picture sensor with out an infrared filter. This permits the digicam to “see at the hours of darkness”. One instance is the Raspberry Pi NoIR Digicam Module V2. This Jetson appropriate sensor is similar because the beforehand talked about V2 IMX219 RPi digicam with the infrared lens eliminated.


The optics for a digital video digicam encompass the lens and the shutter. Most cheap cameras use a plastic lens, and supply restricted handbook focus management. There are additionally fixed-focus lenses which haven’t any provision for adjustment. Different cameras have glass lenses, and a few have interchangeable lenses. 

You’ll hear lenses categorised by completely different statements. Sometimes a lens is specified by its focal size. The focal size of a lens could be a mounted distance. If the focal size is variable, that is referred to as a zoom lens. 

One other classification is the aperture, which is denoted by a f, e.g. f2.8. A lens can have a hard and fast aperture, or a variable one. The scale of the aperture determines how a lot mild can hit the sensor. The bigger the aperture, the extra mild is allowed by way of the lens. The bigger the aperture, the smaller the f quantity.

The lens Field of View (FoV) can be vital. Sometimes that is expressed in levels, each within the horizontal and the vertical dimension, or diagonally, with the middle of the lens being the midpoint of each of the angles.

Determine 3

The fourth classification is the mount sort for cameras which have interchangeable lenses. Interchangeable lenses permit for way more flexibility when capturing pictures. Within the Jetson world, chances are you’ll hear of a M12 mount. It makes use of a metric M12 thread with 0.5mm pitch. That is often known as a S-mount. One other widespread time period is a C or CS lens mount. There could connect on to the PCB of the sensor. The Raspberry Pi Hello Def digicam makes use of the sort of mount.

The shutter for the digicam could also be mechanical or digital. The shutter exposes the sensor for a predetermined period of time. There are two principal varieties of publicity strategies that shutters use. The primary is a rolling shutter.  The rolling shutter scans throughout the sensor progressively, both horizontally or vertically. The second is a worldwide shutter, which exposes the entire sensor on the similar on the spot. The rolling shutter is most typical because it tends to be cheaper to implement on a CMOS system, although there could have picture artifacts, like smearing, for fast paced objects in a scene. 

For scenes that wouldn’t have any fast paced objects, rolling shutter could be a good selection. Nonetheless, for different functions this can be unacceptable. For instance, a cell robotic which is inherently a shaky platform to start with could not be capable of produce adequate pictures for visualization if the photographs are smeared. Due to this fact a worldwide shutter is extra acceptable.

Digital Circuitry

The digital circuitry of the digital video digicam controls picture acquisition, interpolation and routing of the photographs to the awaiting world. Some cameras have this circuitry on the sensor die (many telephone cameras do that to avoid wasting area), others have exterior circuitry to deal with the duty. 

Digicam sensors, however, merely interface with a bunch that handles the info acquisition instantly. The Jetsons have a number of Tegra ISPs to deal with this job

Information compression is a crucial job. Video knowledge streams will be very massive. Most cheap webcams have a built-in ASIC to do picture interpolation and video compression.

Newer to the market ‘sensible’ cameras could have extra circuitry to course of the video knowledge stream. This consists of extra difficult duties similar to pc imaginative and prescient or depth picture processing. These specialty cameras could mix multiple sensor within the digicam.

For instance, a RGBD digicam (Red, Green, Blue, Depth) could have two sensors for calculating depth, and one other sensor for grabbing shade pictures. A few of these cameras use infrared illuminators to assist the depth sensors in low mild conditions.

The digital circuitry transmits the video knowledge from the digicam to a bunch system. This may be by way of considered one of a number of bodily paths. On the Jetson, that is the MIPI Camera Serial Interface (MIPI CSI) or by way of the acquainted USB. Third events provide GMSL (Gigabit Multimedia Serial Link) connectors on Jetson service boards. GMSL permits longer transmission distances than the standard CSI ribbon cables by serializing/deserializing the video knowledge stream with buffers. For instance, you might even see a lot of these connections in use in robots or cars.

GMSL digicam connectors, picture courtesy of Join Tech

Information Compression and Transmission

Right here’s the place it begins to get attention-grabbing for us. Information is coming throughout the wire, how will we interpret it?

We talked about creating full shade pictures. Sometimes we take into consideration these as three channels of Red, Green and Blue (RGB). The variety of bits in every of those channels decide what number of “true” colours will be displayed. 8 bits per channel is widespread, you might even see 10 bits. In skilled video, you will notice larger numbers. The extra bits, the extra colours you possibly can symbolize.

Let’s say it’s 8 bits per shade channel, in order that’s 24 bits per pixel. If a picture is 1920×1080 pixels, that’s 2,073,600 pixels X 3 bytes = 12,441,600 bytes. If there are 30 frames per second, you get 373,248,000 bytes per second. In fact, if you’re utilizing 4K video you then get 4x that quantity. Now, we love our pixel pals, however we don’t wish to drown in them.

As I’m certain you might have identified by now, we took a Bayer sample picture and expanded it. Definitely we will transmit the picture itself together with an identifier indicating which sample of colours are on the sensor! In fact we will! Nonetheless, this forces the receiver to do the colour conversion, which is probably not an optimum answer.

Varieties of Information Compression

There are lots of methods to cut back the quantity of picture knowledge being transmitted from a video stream. Usually that is carried out by:

  • Coloration area conversion
  • Lossless Compression
  • Lossy Compression
  • Temporal Compression

We gained’t go too deeply into this topic right here. Subsequent articles will cowl the highlights as we get down the street. Whole industries are devoted to those topics. Nonetheless, if in case you have used cameras prior to now you’re in all probability already conversant in a few of the names of the topics right here.

In shade area conversion, YUV coding converts the RGB sign to an depth element (Y) that ranges from black to white plus two different parts (U and V) which code the colour. This may be both a lossless or lossy strategy. Lossless implies that we will convert the picture again to the unique with none loss, lossy means that we are going to lose a few of the knowledge.

Then there may be picture compression. You’re in all probability conversant in a PNG file, which makes use of lossless bitmap compression. A JPEG file is a lossy compression methodology primarily based on a discreet cosine rework. Normally, you possibly can stand up to a ~4x dimension discount utilizing lossless compression, whereas by way of lossy compression you possibly can go a lot larger. The standard of the lossy compressed picture could undergo, in fact.

Temporal compression measures and encodes variations within the video stream pictures over time. Usually a body is about as the important thing (keyframe), and variations are measured between subsequent frames from there. That manner, you solely have to ship the one keyframe after which the variations. New keyframes are normally generated after a given interval, or generated on a scene change. For principally static scenes, the dimensions financial savings will be fairly dramatic.

There are all kinds of algorithms for this job, which is known as encoding. The names of those encoders embrace H.264, H.265, VP8, VP9 and MJPEG. An identical decoder on the receiving finish reconstructs the video.


A 4 character identifier (fourcc) identifies how the video knowledge stream is encoded. It is a throwback to the outdated Macintosh days the place QuickTime constructed upon the Apple File Supervisor thought of defining containers with 4 characters. The 4 characters conveniently slot in a 32 bit phrase. Audio makes use of this methodology too.

A few of the fourcc codes are simple to guess, similar to ‘H264’ and ‘H265’. ‘MJPG’ implies that every picture is JPEG encoded. Others will not be really easy, ‘YUYV’ is pretty widespread which is a packed format with ½ horizontal chroma decision, often known as YUV 4:2:2. Partly a few of this confusion is as a result of producers can register these format names. Additionally, over time the identical code could have an alias on completely different platforms. For instance, on the Home windows platform ‘YUYV’ is named ‘YUY2’.


That is an outline of cameras. There are a number of books and analysis article on every of the topics right here. Hopefully this offers you a place to begin on the place to start out exploring when digging deeper into the topic.

Within the subsequent article, we’ll go over methods to really get the video stream into the Jetson!


Debayering Demystified by Craig Stark, PhD

How does YUV Coloration Encoding Work?

Determine 1: Picture Creator: Interiot at English Wikipedia., CC BY-SA 3.0, through Wikimedia Commons

Determine 2, 3: CC By-SA 3.0 from Optical Digicam Communications

Determine 4: Raspberry Pi HQ Digicam, CS Mount. Picture courtesy

The submit In Depth: Cameras appeared first on JetsonHacks.

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *