In this paper we propose a system consisting of a manipulator equipped with range sensors, that is instructed to follow a trajectory demonstrated by a human teacher wearing a motion capturing device. During the demonstration a three dimensional occupancy grid of the environment is built using the range sensor information and the trajectory. The demonstration is followed by an exploration phase, where the robot undergoes self-improvement of the task, during which the occupancy grid is used to avoid collisions. In parallel a reinforcement learning (RL) agent, biased by the demonstration, learns a point-to-point task policy. When changes occur in the workspace, both the occupancy grid and the learned policy will be updated online by the system.