When a person visits a strange city, they can follow directions by finding landmarks in a map. For example, the can follow instructions like “Go to the Wegmans supermarket” by finding the supermarket in their map of the city.
Autonomous systems on robots need the ability to understand natural language instructions referring to landmarks as well . Crucially, the robot’s model for understanding the user’s instructions should be flexible to new environments. If the robot has never seen “the Wegmans supermarket” before but it is provided with a map of the environment that marks these locations, it should be able to follow these sorts of instructions. Unfortunately, existing approaches need not just a map of the environment, but also a corpus of natural language commands collected in that environment.
Our recent paper accepted to ICRA 2020, Grounding Language to Landmarks in Arbitrary Outdoor Environments, addresses this problem by enabling robots to use maps to interpret natural language commands. Our approach uses a novel domain-flexible language and planning model that enables untrained users to give landmark-based natural language instructions to a robot. We use Open Street Maps (OSM) to identify landmarks in the robot’s environment, and leverage semantic information encoded in OSM’s representation of landmarks (such as type of business and/or street address) to assess similarity between the user’s natural language expression (for example, “medicine store”) and landmarks in the robot’s environment (CVS, medical research lab). You can download the collected natural language data on the h2r GitHub repository: https://github.com/h2r/Language-to-Landmarks-Data
We tested our model with a Skydio R1 drone in simulation, then outdoors. Here’s the video!