How to use structured light and image sensors for range detection and identification
There are many reasons why measuring the distance to an object just a few centimeters away is useful. In its simplest form, a single data point provides object detection. With more resources a point cloud can represent a larger, three-dimensional area. This enables object detection and identification based on its surface profile. Once identified, an appropriate action can be taken.
Detecting objects in 3D is essential when either the sensor, the object or both are free to move in any direction. Free movement creates a constantly changing perspective, which would present challenges for sensors that only detect in two dimensions. Adding the third dimension, range, provides vital additional information about the scene.
For example, face scanning becomes much more accurate when using a 3D technology. This is useful for security purposes, but face scanning is also used to detect the direction a person is looking. This can deliver a better experience for user interfaces, for example. It is also being developed for driver monitoring in semi- or fully autonomous vehicles.
Point clouds form the basis of 3D sensing. Each point in the cloud has coordinates, positioning it in the X, Y and Z axes. Every point represents a very small area but the distance between each point is set by the resolution of the 3D sensing solution. The closer the points, the higher the resolution. In simple terms, resolution is the difference between detecting a face and recognizing a person.
Resolution has a second significance in 3D sensing. It relates to the size of the smallest measurable change in distance. When scanning features on a face, this type of resolution is becoming more relevant. Some 3D systems can resolve range differences of below 1 mm.
Active 3D sensing technologies
The distance between pixels on a sensor is an absolute. How that relates to points in a cloud is relative, based on the distance to an object. Ranging always requires some form of reference and the main challenge is often creating that reference. There are various ways to measure distance between two objects, but 3D image sensing is complex. It involves scanning a scene and creating that point cloud. The location of each point is relative to the sensor, the object and the other points.
The three main technologies used in 3D image sensing are stereoscopic vision, time of flight (ToF) and structured light. Of the three, stereoscopic vision is probably the simplest to understand. A stereoscopic vision system uses two cameras mounted a known distance apart but focused on the same scene. Each camera will provide a slightly different image and by comparing the differences the distance to the object can be triangulated. This is how a biological brain perceives depth and distance.
Biological brains can only sense range over relatively short distances and stereoscopic vision systems suffer from the same limitation. A simple stereoscopic vision system is passive because it relies on incidental light reaching the sensors. An active system uses a light source to improve performance. Both ToF and structured light are active by their nature, but stereoscopic vision systems can also be active with the addition of a light source.
Active 3D image sensing systems operate in conjunction with this controlled light source. The type of active light source used will depend on the sensing technology. For structured light and stereoscopic vision systems, the sensor will usually be a CMOS image sensor and create an output comparable to a regular camera. In fact, the image sensors used in 3D vision systems are the same sensors used in regular cameras. This makes them cost effective.
A ToF sensor would not necessarily provide a human-centric output because the point cloud generated is really intended for a machine. The ToF concept is scalable in this respect as it generates a single response at a fixed distance from the sensor. This might be detecting a hand under a dryer, for example. It can also be used with an array of sensor elements to generate a complex 3D image.
Each of the three 3D sensing technologies mentioned calculate range by measuring modulation in the light source. For the systems based on regular image sensors, which include structured light and active stereoscopic vision, the modulation is typically of a pattern generated by the light source. This pattern becomes distorted by the object(s) in the field of illumination (FOI).
Active 3D imaging
The pattern can be simple and repeatable, like dots in a grid. This regular pattern becomes distorted (irregular) by any variation on the object’s surface. This distortion is easily identified in the video signal produced. By analyzing the video signal, the shape of the surface can be understood. The range can be calculated by measuring the distance between the dots.
As the name suggests, a time-of-flight system measures the time it takes light to make a round trip to the object and back. In practice, for a sensor with multiple pixels, accurate time measurement across all pixels would be difficult. Instead, a technique called indirect time of flight, or iToF, is used.
Active 3D imaging uses a controlled light source to emit a signal, which is reflected and detected using a sensor. The signal is processed to determine what the image has captured in 3D. (Source: ams OSRAM)
With iToF, a waveform modulates the light source. Each pixel will detect the waveform in the returning signal at around the same time, but its phase will be different depending on how far it has traveled. The sensor uses the phase change measured at each pixel to calculate the time (and therefore distance) between when the signal was emitted and when it was received. The phase difference of the returning waveform correlates with the distance from a pixel to the object’s surface.
Strengths, weaknesses and trade-offs
The table below provides a guide to the relative strengths and weaknesses of the three 3D ranging technologies discussed here.
The best light sources for 3D ranging
Both active stereoscopic vision systems and structured light systems are now designed using tiny but powerful light sources. These dedicated light sources project structured patterns such as an array of dots onto a surface. The PCB-mounted light sources measure around 4 mm or less on each side but can feature arrays of up to 5,000 dots arranged linearly or pseudo-randomly.
Unlike a stereoscopic system, a 3D system using structured light only needs a single image sensor. This makes them easier to implement than stereoscopic vision systems. Using only a single image sensor requires less processing power, making structured light systems ideal for implementation in embedded IoT systems for shortrange sensing.
Adding structured light to a stereoscopic system improves its performance, particularly in scenes with low profile variation. As it already uses two image sensors, the scene is captured by two complementary sub-systems. This can result in more accuracy and higher resolution, with only a marginal increase in the hardware costs. The software would be more complex, but the gains made should justify the increased complexity.
An indirect time-of-flight ranging system requires a different kind of light source, referred to as a flood illuminator. As the name suggests, a single source lights the entire scene, and this happens as a burst or pulse of activity.
As outlined earlier, a flood illuminator’s output is modulated. This would typically be with a pulse-width modulation (PWM) signal. The phase shift in the pulse holds the information needed to calculate the range, based on the speed of light. Each pixel in the detecting sensor would detect a phase shift based on the distance from that pixel to the corresponding location on the object in the scene.
In both cases, the light wavelength used would be in the infrared range, making it invisible to the human eye. The technology behind these IR light emitters needs to be cost effective, small and extremely efficient. One example is the vertical-cavity self-emitting laser, or VCSEL diode. These diodes can be manufactured using conventional semiconductor techniques and are also used in applications such as laser mouse pointers.
The VCSEL-based laser diodes offered by ams OSRAM are suitable for both structured light and iToF 3D image sensing. They have been designed to offer high performance in a small form factor that is easy to use in automated manufacturing processes. The families offered include the TARA2000-AUT flood illuminators, which are qualified to AEC-Q102 and ISO 26262 for use in automotive applications.
The Belago 1.1 Dot Projector combines an IR VCSEL with advanced optics used in active stereo vision systems designed for autonomous applications. A partnership between ams OSRAM and Luxonis, a developer of embedded artificial intelligence and computer vision technology, was recently announced. Luxonis will use the Belago 1.1 dot projector in the 3D solutions it develops for automated guided vehicles, robots and drones.
CMOS sensors for 3D sensing
While the light source is critical in 3D range detection, the sensor is equally important. This is another area where ams OSRAM provides a differentiating advantage. One of the important parameters in CMOS image sensors designed for 3D sensing is the quantum efficiency (QE), particularly at near IR (NIR) wavelengths. The Mira family of CMOS sensors delivers a QE of 0.4 at 940 nm and features global shutter technology. Global shutter, as opposed to rolling shutter, offers better image capture in scenes with fast movement. This makes global shutter better suited to autonomous applications.
The Mira family has also been designed to provide a better silicon-to-optical area than its competitors. This makes it more power efficient in 2D and 3D sensing applications. The 2.2 mega pixel version – the Mira220 – measures just 5.3 mm on each side but offers a resolution of 1600 by 1400 and a frame rate of 90 fps. However, its active power is still less than 350 mW. The Mira is also available in 1.3 MP, 0.5 MP and 0.3 MP versions.
Other CMOS image sensors from ams OSRAM include the CMV and CSG families, with a resolution of up to 50 MP. The sensors in these families have been designed for 2D applications such as factory automation, display inspection, barcode reading, traffic cameras and prosumer photography.
Design trade-offs in 3D vision systems
The cost of the light source in all three technologies is comparable. If CMOS image sensors are used, so too is the cost of the sensor. The main differences relate to how the data is processed after capture. Stereoscopic vision systems require two sensors and therefore two sensor processing stages. If the system is high speed, that may dictate two processors, but for low-speed applications one processor may be able to handle both sensors.
A structured light system operates with one image sensor, so the processing requirements are lower. The scalability of iToF means the processing requirements will probably relate to the number of points in the point cloud and the resolution needed.
Carefully consider the application’s needs before choosing the most appropriate technology.
3D sensing adds a new dimension
The Belago 1.1 dot projector and Mira130 high QE image sensor have been brought together by ams OSRAM in the Athena reference design. This is a self-contained 3D face authentication module that can recognize a subject at a distance of between 30 cm and 150 cm. The module can wake from sleep and authenticate a subject in less than 1 second.
Distance sensing is fast becoming a must-have feature in a modern, automated and autonomous world. It adds context to almost any application. This includes obstacle detection and avoidance for anything from large automated guided vehicles to domestic cleaning robots. In an industrial manufacturing setting, depth perception and object recognition work together in pick-and-place production and assembly. Virtual safety ringfencing can also be improved using ranging and object detection in the third dimension.
All of this points toward an increased demand for depth sensing. It is a natural complement to artificial intelligence and machine learning, but it can also increase the capabilities of more conventional embedded systems. To learn more about how ams OSRAM’s 3D image sensing solution could help you, contact your local Avnet representative.
Philip Ling is a senior technology writer with Avnet.