RE: WIRED article, EVEN ARTIFICIAL NEURAL NETWORKS CAN HAVE EXPLOITABLE ‘BACKDOORS’
The current visual system is a dumb visual recognition system that mostly sees the world in 2D. The depth perception and the information about layers of things in between the target object and the subject has not been considered in the current models. There are phones equipped with infrared-ray sensors for this very purpose. Google’s 3D-sensing Tango project is another example. However, a visual recognition system seeing a stop sign with a Post-It note and recognizing it as a speed limit sign is a huge problem for the current models. I think this is another reason why we need to fundamentally reconsider presuppositions about how a typical neural net works. Just like the late introduction of epigenetics, there is more involved than just synapses, and neurons.
The current models deal with images fixed in time. The input data is usually too flat, so the incoming data needs to be amalgamated with more metadata to make it multi-dimensional. A parked car at a certain location is a car, but once it leaves the parking lot at one point, the image is now back to parking lot. Current models do not account this as a “missing” car. Likewise, the Post-It note should be recognized as an “appended” object to a pre-existing object in a time past. Human visual system would not be able to recognize anything if it did not scan constantly, or things were static and not moving. There’s also a factor of expectation as we anticipate in motion.
One of the simplest solution to the stop sign issue could be to train the system to pull out the best expected target, and deal with the delta between the actual target and the expected target.