Challenges and Advancements in Neural Networks for Eye Tracking
Dependable head and eye tracking is vital for driver monitoring. So why do so many automotive companies rely on outdated and overly complex methods to monitor this behavior? At Neonode, our bespoke approach to neural networks provides our customers with robust and reliable data they can trust.
Head pose estimation and eye tracking have many use cases. Knowing where a person’s attention is directed can be useful in commercial applications like visual merchandising, advertising and consumer electronics. But accurate tracking becomes vital in applications like Driver Monitoring, and therefore, it is integral to have high data integrity and accuracy.
Neural networks are the go-to solution for obtaining information for use in head and eye tracking, however it is still common to find them applied in ways that are outdated, requiring complex additional logic after the networks have been used, requiring additional post-processing to extract approximations from the neural networks instead of reading the desired value directly.
Landmark Eye Tracking Can Fail When Line-of-sight is Obscured
One such classic example of how to do head and eye tracking is to train a network to find certain landmark points around the face and eyes. Based on these landmark points, you add logic which calculates the data you’re actually interested in. For example, eye openness can be found by training a network to find landmark points around the eye, and then write logic which uses the position of the points in the 2D image to calculate the aspect ratio between the upper and lower eyelids and the width of the eye. To be reliable, this logic will have to account for cases where some of the points of interest are hidden from the camera, and cases where the angle of the eye affects the aspect ratio.
This method stems from a time where the only training data you could get was hand-annotated photos. Annotating these photos in a consistent way is hard, leaving your solution to be built on several layers of approximations. To make up for these approximations, the networks must be complex as well, requiring more computational power.
Landmark points around an eye – an outdated approach to finding eye openness.
Pupil Center Corneal Reflection (PCCR) Requires Complex Installation and Calibration
A similar established method of determining eye openness is to look at reflections caused by active illumination in the eyes of the person you are analyzing. By for example studying how much of the illumination is reflected by the eyes it is possible to determine their degree of openness. Tracking can be done using the Pupil Center Corneal Reflection method (PCCR), where the distance between the center of the pupil and the corneal reflection is used to determine the gaze direction.
While this method is slightly more sophisticated than the landmark method mentioned above, it comes with some of the same flaws. It requires additional logic after the points of interest have been found, and you need a strategy for handling the occlusion of the points of interest. On top of that it also puts a high demand on hardware installation accuracy. For the solution to work the active illumination needs to hit the face correctly. The technology has its roots in eye tacking headsets, where you always know the distance between the eyes you’re studying and your source of illumination, and you always observe the eyes from the same angle. In this controlled environment the method can give you highly accurate results. However, to achieve that accuracy you need extensive unit calibration. When using the method to track subjects from afar (so called remote tracking) the solution will only be able to track subjects which are within a pre-calibrated “headbox”. If you want to be able to track persons no matter where they are in the image, this solution will fail you. The methods is simply not developed for handling the full complexity of a 3D world.
New Approach to Neural Networks Removes Complexity and Improves Robustness
The Neonode approach to neural networks is founded in our long use of synthetic data. Through our synthetic data factory we can create data that meets our exact needs, and thus have the possibility of creating neural networks which provide a direct answer to the question we’re asking. In this data we will also be sure that we treat all eyes and all faces in the same way. Annotations will be consistent throughout the entire training data sets. This helps us create lightweight networks, suited for embedded applications.
Eye Tracking: Eye Openness Detection
At Neonode we use a pure AI method for head pose estimation and eye tracking. In the case of eye openness, we can train our networks to directly give us an openness level. For a truly robust answer this information is combined with other information that our networks can gather from the scene.
By simulating different lens parameters, camera positions and types, and occlusion phenomena in our training data we can create a system that can handle all the challenges of the real world, without having seen it in training. Our networks will continue to successfully track a person even when important features are hidden from sight. Classical landmark detection still has a place when it comes to validation and visualization, but relying on landmarks alone will result in a system that is unfit to handle the full complexity of the 3D world.
Unlike PCCR solutions, our neural networks can track people without any additional calibration. And they can do so no matter where in the image a person is, provided they are close enough to the camera to be detectable in the image. The Neonode approach to head and eye tracking has been developed specifically for remote tracking of subjects and therefore excels in difficult real-world conditions.
A Solution You Can Trust
Speak to us for more information about Neonode's Head pose estimation and eye tracking solutions for driver and in-cabin monitoring.