We have been using "direct" image to action mapping for a while now and the word "direct" took me back to tracking in SLAM. Back in the 90s many people were looking at recovering camera pose directly from the images rather than doing feature tracking first and then recovering homography afterwards.
add a skeleton here at some point
11 months ago