Inside image processing pipelines

Have you ever wondered how a mere smartphone takes images as impressive as your high-end DSLR camera? Even if your smartphone camera doesn’t possess physical optics, it can outperform several high-end cameras. There is credit for computing photography because the image captured by the mobile phone is largely “calculated” by the internal system. Paramount to computing and processing of images is the Image Signal Processor (ISP) which can be called the visual cortex of your smartphone.  

ISP is a dedicated processing unit specific for photography and video that sits right alongside the CPU and other processing components. An ISP runs various algorithms to process image signals in real time. The control structure of ISP consists of two parts: the ISP logic and the firmware running on it.  The logic unit can also calculate the real-time information of the current image. The firmware calculates the image statistics of the ISP logic and generates feedback controls for the Lens, Sensor, and ISP logic to achieve the purpose of automatically adjusting the image quality. ISP output the data in YUV or RGB format. The image is then transmitted to the CPU for processing for image storage and display. 

Image Processors convert RAW images from image sensors into high-quality images in YUV/RGB image formats. When you press the button to capture an image with your camera, the shutter moves out of the way, allowing light in to strike the image sensors. Image sensors detect and convey information by converting the variable attenuation of light waves into signals in the form of small bursts of current that convey the information. The information from different sensors is then switched to form an image. The way a color image is captured is little more difficult to understand. The image sensor has an integral part, the Bayer filter for color separation. A color image is represented as a percentage of red, green and blue intensities. So, the Bayer sensors use a simple strategy: capture alternating red, green and blue colors at each photosite and do so in a way that twice as many green photosites are recorded as either of the other two colors, reason being a human eye is more sensitive to the color green. Values from these photosites are then intelligently combined to produce full-color pixels using a process called "demosaicing". This is treated as the input to image processor. In some cases, demosaicing is also done in ISPs. 

Fig.1 The Bayer arrangement of color filters on the pixel array of an image sensor. Source

Images captured by the sensor even after demosaicing may suffer from blurring, unnatural colors, or other problems due to aberrations caused by the camera’s lens limitations or the picture taker’s abilities. The image processor takes the raw image and performs different processes as shown in Fig 2 to convert it into high quality image. 

Fig.2. Typical Block Diagram of Image Signal Processor (ISP). Source

Defective Pixel Correction 

Defects in image sensors tend to be composed of single pixels, small pixel clusters, or may also include whole columns. These types of image defects are usually unacceptable to the human observer and thus need to be removed.  Relatively simple methods use a horizontal register of data and average or substitute pixel values in defective locations. 

Noise Filter  

The limiting factor in image processing is noise. Therefore, in the design of any imaging system, one must consider closely the noise performance of the system, especially in the way the noise interacts with the visual appearance of the picture and in which it perturbs the subsequent digital processing. Median filters are commonly used for removal of the impulse noise from images. De-noising is a preliminary step in online processing of images; thus, hardware implementation of median filters is of great interest.   

Lens shading correction 

The method by which images are produced--the interaction between objects in real space, the illumination, and the camera--frequently leads to situations where the image exhibits significant shading across the field-of-view. In some cases, the image might be bright in the center and decrease in brightness as one goes to the edge of the field-of-view. In other cases, the image might be darker on the left side and lighter on the right side. Lens shading correction is applied to improve brightness and color non-uniformity towards the image periphery. 

Auto white balance  

Auto white balance (AWB) is applied by camera hardware at capture time to remove the color cast caused by the scene illumination. AWB consists of two steps. First, the scene illuminant color as observed by the camera’s sensor is estimated using an illuminant estimation algorithm. This first step assumes there is a single (i.e., global) illuminant in the scene. Second, the captured image is corrected based on the estimated illumination. These two steps are applied early in the camera’s ISP to the raw image. 

Tone mapping 

Tone mapping is a technique used to map one set of colors to another to approximate the appearance of high-dynamic-range images in a medium that has a more limited dynamic range.  Tone mapping algorithms can be classified into two categories: global and local tone mapping algorithms. Global tone mapping algorithms employ a single function for all pixels and disregard pixel's neighbor statistics. In general, they are relatively easy to implement in hardware. However, they may be prone to loss of details in images as well as insufficient contrast. Local tone mapping takes pixel neighbor statistics into account, and they can produce images with more contrast and brightness than global tone mapping algorithms. 

Gamma correction 

A nonlinear effect of signal transfer exists between an electrical and an optical device. The nonlinear effect distorts the color displayed by the output display. To compensate for the nonlinear effect, a gamma correction is applied to the video signal before it is displayed to make the intensity output linear. 

Gamma correction is, in the simplest cases, defined by the following power-law expression: 

where the non-negative real input value Vin is raised to the power gamma and multiplied by the constant A to get the output value  Vout    

Auto focus 

To capture focused images, the image signal processor (ISP) includes both a built-in auto focus (AF) control block and AF lens driver to fully implement AF with fast and accurate performance. Each of the auto-focus circuits may be connected to an image sensor and may be separate from a statistics circuit and other image processing pipelines of the ISP. An image sensor may include one or more focus pixels that are used to generate data for auto-focusing. The auto-focus circuit may extract the focus pixel values and generate a signal to control the lens position of the image sensor.   

Auto Exposure 

ISP performs automatic fine tuning of the image brightness according to the amount of light that reaches the camera sensor.  

Future Trends 

Computer Vision has entered the imaging industry as a new element of technological change, making image processing and calculation usher in more challenges. Now in addition to the ISP, it is also necessary to add an AI-capable vision processor (VP) chip. A vision processor is a class of microprocessor which is a specific type of AI accelerator, designed to accelerate machine vision tasks. In many intelligent vision fields, the integration of Vision Processor and Image Signal Processor has brought about a brand-new intelligent revolution.  The explosive development of vision processors is parallel to the development of the ISP industry, because if you want to realize the intelligent vision, you must consider choosing a suitable front-end image processing solution.  


As we are living in an image centric network and computer vision world, the need for dedicated image signal processors is inevitable to solve real-world business challenges and meet customer expectations. Apart from the conversion of raw images into YUV/RGB formats, more focus of ISP is to produce a high-quality image which also should enable maximum performance and capability in area and power constrained applications, making it ideal for smart camera and edge AI vision use cases.