Robot Motion Planning in Dynamic Pedestrian Environment Using PO/MDP Based Techniques

Manzoor, Mohsin

DSpace Home
→
E-Theses
→
CEME
→
Mechatronics Engineering
→
MS
→
View Item

Robot Motion Planning in Dynamic Pedestrian Environment Using PO/MDP Based Techniques

Manzoor, Mohsin

URI: http://10.250.8.41:8080/xmlui/handle/123456789/21141

Date: 2016

Abstract:

Human-Robot interaction (HRI) is a dedicated field of robotics, focused on the development of robots that can communicate, assist and collaborate with humans. Robots are helpful in many roles; like collaboration, personal care, search & rescue. They can operate in hazardous environment. Research has demonstrated that a robot with better understanding of human behavior performs better in its tasks which require them to operate around human. Therefore, a mobile robot that has to operate in pedestrians should have human like behavior. Such a robot have more social acceptance and people feel safer around them. Inverse reinforcement learning (IRL) based methods are being used to train robots for human-like behavior. Training a robot as it operates around people may give biased results as currently humans may not act natural around them. Moreover, information may have lost in tracking crowd state due to sensor errors and other setup related issues. One major disadvantage of using existing datasets is far lesser human-human interactions and lack of diversity in environmental scenarios. Another approach is to generate crowd data from a crowd simulator such as social force model (SFM). In most experiments crowd trajectories are directly used for behavior learning. Since governing factor behind motion is social-force, it is plausible to use such forces instead of output tracks for learning. We propose an IRL based method to learn pedestrian behavior directly from social-force. Apprenticeship learning is used to match optimal value function of MDP model to underlying social force-field. It is observed that convergence in this case is faster than using trajectories and reward function is more likely to converge to actual reward. Path planning problem is implemented using MOMDP based planner where unknown destination location is predicted using recursive Bayesian estimator