Specifying User Preferences for Autonomous Robots through Interactive Learning
Loading...
Date
2020-09-28
Authors
Wilde, Nils
Advisor
Smith, Stephen
Kulic, Dana
Kulic, Dana
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
This thesis studies a central problem in human-robot interaction (HRI): How can non-expert users specify complex behaviours for autonomous robots? A common technique for robot task specification that does not require expert knowledge is active preference learning. The desired behaviour of a robot is learned by iteratively presenting the user with alternative behaviours of the robot. The user then chooses the alternative they prefer. It is assumed that they make this decision based on an internal, hidden cost function. From the user's choice among the alternatives, the robot learns the hidden user cost function.
We use an interactive framework allowing users to create robot task specifications. The behaviour of an autonomous robot can be specified by defining constraints on allowable robot states and actions. For instance, for a mobile robot a user can define traffic rules such as roads, slow zones or areas of avoidance. These constraints form the user-specified terms of the cost function. However, inexperienced users might be oblivious to the impact such constraints have on the robot task performance. Employing an active preference learning framework we present users with the behaviour of the robot following their specification, i.e., the constraints, together with an alternative behaviour where some constraints might be violated. A user cost function trades-off the importance of constraints and the performance of the robot. From the user feedback, the robot learns about the importance of constraints, i.e., parameters in the cost function.
We first introduce an algorithm for specification revision that is based on a deterministic user model: We assume that the user always follows the proposed cost function. This allows for dividing the set of possible weights for the user constraints into infeasible and feasible weights whenever user feedback is obtained. In each iteration we present the path the user preferred previously again, together with an alternative path that is optimal for a weight that is feasible with respect to all previous iterations. This path is found with a local search, iterating over the feasible weights until a new path is found. As the number of paths is finite for any discrete motion planner, the algorithm is guaranteed to find the optimal solution within a finite number of iterations. Simulation results show that this approach is suitable to effectively revise user specifications within few iterations. The practicality of the framework is investigated in a user study. The algorithm is extended to learn about multiple tasks for the robot simultaneously, which allows for more realistic scenarios and another active learning component: The choice of task for which the user is presented with two alternative solutions. Through the study we show that nearly all users accept alternative solutions and thus obtain a revised specification through the learning process, leading to a substantial improvement in robot performance. Also, the users whose initial specifications had the largest impact on performance benefit the most from the interactive learning.
Next, we weaken the assumptions about the user: In a probabilistic model we do not require the user to always follow our cost function. Based on the sensitivity of a motion planning problem, we show that different values in the user cost function, i.e., weights for the user constraints, do not necessarily lead to different robot behaviour. From the implied discretization of the space of possible parameters we derive an algorithm for efficiently learning a specification revision and demonstrate the performance and robustness in simulations. We build on the notion of sensitivity to an active preference learning technique based on maximum regret, i.e., the maximum error ratio over all possible solutions. We show that active preference learning based on regret substantially outperforms other state of the art approaches. Further, regret based preference learning can be used as an heuristic for both discrete and continuous state and action spaces.
An emerging technique for real-time motion planning are state lattice planners, based on a regular discrete set of robot states and pre-computed motions connecting the states, called motion primitives. We study how learning from demonstrations can be used to learn global preferences for robot movement, such as the trade-off between time and jerkiness of the motions. We show how to compute a user optimal set of motion primitives of given size, based on an estimate of the user preferences. We demonstrate that by learning about the motion primitives of a lattice planner, we can shape the robot's behaviour to follow the global user preferences while ensuring good computation time of the motion planner. Furthermore, we study how a robot can simultaneously learn about user preferences on both motions of a lattice planner and parts of the environment when a user is iteratively correcting the robot behaviour. We demonstrate in simulations that this approach is suitable to adapt to user preferences even when the features on the environment that a user considers are not given.