GPS¶
Guided Policy Search¶
This code is a reimplementation of the guided policy search algorithm and LQG-based trajectory optimization, meant to help others understand, reuse, and build upon existing work. It includes a complete robot controller and sensor interface for the PR2 robot via ROS, and an interface for simulated agents in Box2D and Mujoco. Source code is available on GitHub.
While the core functionality is fully implemented and tested, the codebase is a work in progress. See the FAQ for information on planned future additions to the code.
Relevant work¶
Relevant papers which have used guided policy search include:
- Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. 2015. arxiv 1504.00702. [pdf]
- Marvin Zhang, Zoe McCarthy, Chelsea Finn, Sergey Levine, Pieter Abbeel. Learning Deep Neural Network Policies with Continuous Memory States. ICRA 2016. [pdf]
- Chelsea Finn, Xin Yu Tan, Yan Duan, Trevor Darrell, Sergey Levine, Pieter Abbeel. Deep Spatial Autoencoders for Visuomotor Learning. ICRA 2016. [pdf]
- Sergey Levine, Nolan Wagener, Pieter Abbeel. Learning Contact-Rich Manipulation Skills with Guided Policy Search. ICRA 2015. [pdf]
- Sergey Levine, Pieter Abbeel. Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics. NIPS 2014. [pdf]
If the codebase is helpful for your research, please cite any relevant paper(s) above and the following:
- Chelsea Finn, Marvin Zhang, Justin Fu, Xin Yu Tan, Zoe McCarthy, Emily Scharff, Sergey Levine. Guided Policy Search Code Implementation. 2016. Software available from rll.berkeley.edu/gps.
For bibtex, see this page.
Installation¶
Dependencies¶
The following are required
- python 2.7, numpy (v1.7.0+), matplotlib (v1.5.0+), scipy (v0.11.0+)
- boost, including boost-python
- protobuf (apt-get packages libprotobuf-dev and protobuf-compiler)
One or more of the following agent interfaces is required. Set up instructions for each are below.
One of the following neural network libraries is required for the full guided policy search algorithm
- Caffe (master branch as of 11/2015, with pycaffe compiled, python layer enabled, PYTHONPATH configured)
- TensorFlow (coming soon)
Setup¶
Follow the following steps to get set up:
Install necessary dependencies above.
Clone the repo:
git clone https://github.com/cbfinn/gps.git
Compile protobuffer:
cd gps ./compile_proto.sh
Set up one or more agents below.
Box2D Setup (optional)
Here are the instructions for setting up Pybox2D.
Install Swig and Pygame:
sudo apt-get install build-essential python-dev swig python-pygame subversion
Check out the Pybox2d code via SVN
svn checkout http://pybox2d.googlecode.com/svn/trunk/ pybox2d
Build and install the library:
python setup.py build sudo python setup.py install
Mujoco Setup (optional)
In addition to the dependencies listed above, OpenSceneGraph(v3.0.1+) is also needed.
Install Mujoco (v1.22+) and place the downloaded
mjpro
directory intogps/src/3rdparty
. Mujoco is a high-quality physics engine and requires requires a license. Obtain a key, which should be namedmjkey.txt
, and place the key into themjpro
directory.Build
gps/src/3rdparty
by running:cd gps/build cmake ../src/3rdparty make -j
Set up paths by adding the following to your
~/.bashrc
file:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:gps/build/lib export PYTHONPATH=$PYTHONPATH:gps/build/lib
Don’t forget to run
source ~/.bashrc
afterward.
ROS Setup (optional)
Install ROS, including the standard PR2 packages
Set up paths by adding the following to your
~/.bashrc
file:export ROS_PACKAGE_PATH=$ROS_PACKAGE_PATH:/path/to/gps:/path/to/gps/src/gps_agent_pkg
Don’t forget to run
source ~/.bashrc
afterward.Compilation:
cd src/gps_agent_pkg/ cmake . make -j
ROS Setup with Caffe (optional)
This is required if you intend to run neural network policies with the ROS agent.
Run step 1 and 2 of the above section.
Checkout and build caffe, including running
make -j && make distribute
within caffe.Compilation:
cd src/gps_agent_pkg/ cmake . -DUSE_CAFFE=1 -DCAFFE_INCLUDE_PATH=/path/to/caffe/distribute/include -DCAFFE_LIBRARY_PATH=/path/to/caffe/build/lib make -j
To compile with GPU, also include the option
-DUSE_CAFFE_GPU=1
.
Examples¶
Box2D example¶
There are two examples of running trajectory optimizaiton using a simple 2D agent in Box2D. Before proceeding, be sure to set up Box2D.
Each example starts from a random controller and learns through experience to minimize cost.
The first is a point mass learning to move to goal position.
To try it out, run the following from the gps directory:
python python/gps/gps_main.py box2d_pointmass_example
The progress of the algorithm is displayed on the GUI. The point mass should start reaching the visualized goal by around the 4th iteration.
The second example is a 2-link arm learning to move to goal state.
To try it out, run this:
python python/gps/gps_main.py box2d_arm_example
The arm should start reaching the visualized goal after around 6 iterations.
All settings for these examples are located in experiments/box2d_[name]_example/hyperparams.py
,
which can be modified to input different target positions and change various hyperparameters of the algorihtm.
Mujoco example¶
To run the mujoco example, be sure to first set up Mujoco.
The first example is using trajectory optimizing for peg insertion. To try it, run the following from the gps directory:
python python/gps/gps_main.py mjc_example
Here the robot starts with a random initial controller and learns to insert the peg into the hole. The progress of the algorithm is displayed on the GUI.
Now let’s learn to generalize to different positions of the hole. For this, run the guided policy search algorithm:
python python/gps/gps_main.py mjc_badmm_example
The robot learns a neural network policy for inserting the peg under varying initial conditions.
To tinker with the hyperparameters and input, take a look at experiments/mjc_badmm_example/hyperparams.py
.
PR2 example¶
To run the code on a real or simulated PR2, be sure to first follow the instructions above for ROS setup.
1. Start the controller¶
Real-world PR2
On the PR2 computer, run:
roslaunch gps_agent_pkg pr2_real.launch
This will stop the default arm controllers and spawn the GPSPR2Plugin.
Simulated PR2
Note: If you are running ROS hydro or later, open the launch file pr2_gazebo_no_controller.launch and change the include line as specified.
Launch gazebo and the GPSPR2Plugin:
roslaunch gps_agent_pkg pr2_gazebo.launch
2. Run the code¶
Now you’re ready to run the examples via gps_main. This can be done on any machine as long as the ROS environment variables are set appropriately.
The first example starts from a random initial controller and learns to move the gripper to a specified location.
Run the following from the gps directory:
python python/gps/gps_main.py pr2_example
The PR2 should reach the position shown on the right below, and reach a cost of around -600 before the end of 10 iterations.
The second example trains a neural network policy to reach a goal pose from different starting positions, using guided policy search:
python python/gps/gps_main.py pr2_badmm_example
To learn how to make your own experiment and/or set your own initial and target positions, see the next section
Running a new experiment¶
Set up a new experiment directory by running:
python python/gps/gps_main.py my_experiment -n
This will create a new directory called my_experiment/ in the experiments directory, with a blank hyperparams.py file.
Fill in a hyperparams.py file in your experiment. See pr2_example and mjc_example for examples.
If you wish to set the initial and/or target positions for the pr2 robot agent, run target setup:
python python/gps/gps_main.py my_experiment -t
See the GUI documentation for details on using the GUI.
Finally, run your experiment
python python/gps/gps_main.py my_experiment
All of the output logs and data will be routed to your experiment directory. For more details, see intended usage.
Documentation¶
In addition to the inline docstrings and comments, see the following pages for more detailed documentation:
Learning with your own robot¶
The code was written to be modular, to make it easy to hook up your own robot. To do so, either use one of the existing agent interfaces (e.g. AgentROS), or write your own.
Reporting bugs and getting help¶
You can post questions on gps-help. If you want to contribute, please post on gps-dev. When your contribution is ready, make a pull request on GitHub.
Licensing¶
This codebase is released under the CC BY-NC-SA license.
Intended Usage¶
The intended command line usage is through the gps_main.py script:
cd /path/to/gps
python python/gps/gps_main.py -h
usage: gps_main.py [-h] [-n] [-t] [-r N] experiment
Run the Guided Policy Search algorithm.
positional arguments:
experiment experiment name
optional arguments:
-h, --help show this help message and exit
-n, --new create new experiment
-t, --targetsetup run target setup
-r N, --resume N resume training from iter N
Usage:
python python/gps/gps_main.py <EXPERIMENT_NAME> -n
Creates a new experiment folder at
experiments/<EXPERIMENT_NAME>
with an empty hyperparams filehyperparams.py
. Copy and paste your oldhyperparams.py
from your previous experiment and make any modifications.python python/gps/gps_main.py <EXPERIMENT_NAME> -t
(for ROS only)Opens the Target Setup GUI, for target setup when using ROS. See the Target Setup GUI section for details.
python python/gps/gps_main.py <EXPERIMENT_NAME>
Opens the GPS Training GUI and runs the guided policy search algorithm for your specific experiment hyperparams. See the Training GUI section for details.
python python/gps/gps_main.py <EXPERIMENT_NAME> -r N
Resumes the guided policy search algorithm, loading the algorithm state from iteration N. (The file
experiments/<EXPERIMENT_NAME>/data_files/algorithm_itr_<N>.pkl
must exist.)
For your reference, your experiments folder contains the following:
data_files/
- holds the data files.data_files/algorithm_itr_<N>.pkl
- the algorithm state at iteration N.data_files/traj_samples_itr_<N>.pkl
- the trajectory samples collected at iteration N.data_files/pol_samples_itr_<N>.pkl
- the policy samples collected at iteration N.data_files/figure_itr_<N>.png
- an image of the GPS Training GUI figure at iteration N.
hyperparams.py
- the hyperparams used for this experiment. For more details, see this page.log.txt
- the log text of output from the Target Setup GUI and the GPS Training GUI.targets.npz
- the initial and target state used for this experiment (ROS agent only – set for other agents in hyperparams.py)
GUI Docs¶
There are two GUI interfaces which are useful for interacting with and visualizing experiments, documented below.
GPS Training GUI¶
python python/gps/gps_main.py <EXPERIMENT_NAME>
The GPS Training GUI is composed of seven parts:
The Action Panel: Consists of 4 actions which can be performed by clicking the button, pressing the keyboard shortcut, or using the PS3 Controller shortcut:
stop
- stop the robot from collecting samples (after the current sample has completed), used to perform a manual reset of the robot’s armsreset
- reset the robot to the initial position (after the current sample has completed), used to perform a manual reset of the objects in the scenego
- start/restart the robot to collect samples (after usingstop
,reset
, orfail
), used to resume training after stop, reset or failfail
- fail the current sample being collected and recollect that sample (after the current sample has completed), used to recollect a sample that was conducted under faulty conditions
The Action Status TextBox: Indicates whether or not the actions were performed successfully or failed.
The Algorithm Status TextBox: Indicates the current status of the algorithm (sampling, calculating, logging data, etc.).
The Cost Plot: After each iteration, plots the cost per condition (as points) and the mean cost (as a connected line between iterations).
The Algorithm Output Textbox: After each iteration, outputs the iteration number and the mean cost, and for each condition, the cost, the step_multiplier, the linear Gaussian controller entropy, and the initial and final KL Divergences (for BADMM only).
To change what is printed, see the update
method of gps_training_gui.py
.
The 3D Trajectory Visualizer: After each iteration, plots the trajectory samples collected (green lines), the policy samples collected (blue lines), the linear gaussian controller means (dark red crosses), and the linear gaussian controller distributions (red ellipses).
The Image Visualizer: Displays the real-time images outputted by the PR2’s on-board Kinect. Contains the below two actions:
overlay_initial_image
- overlays the initial image set by the Target Setup GUI (pressreset
at the end of sampling to reset the robot to the initial position and check if the image matches the initial image set by the Target Setup GUIoverlay_target_image
- overlays the target image set by the Target Setup GUI (pressstop
at the end of sampling to stop the robot at the target position and check if the image matches the target image set by the Target Setup GUI
Target Setup GUI (for ROS only)¶
python python/gps/gps_main.py <EXPERIMENT_NAME> -t
The Target Setup GUI is composed of four parts:
The Action Panel: Consists of 12 actions which can be performed by clicking the button, pressing the keyboard shortcut, or using the PS3 Controller shortcut:
prev_target_number
- switch to the previous target number (0-9)next_target_number
- switch to the next target number (0-9)next_actuator_type
- switch to the next actuator type (TRIAL_ARM, AUXILIARY_ARM)set_initial_position
- set the initial position (manually move the robot to the desired position and then press set)set_target_position
- set the target position (manually move the robot to the desired position and then press set)set_initial_image
- set the initial image (associated with the initial position)set_target_image
- set the target image (associated with the target position)move_to_initial
- move to the initial position (queries AgentROS to physically move the robot and lock in place)move_to_target
- move to the target position (queries AgentROS to physically move the robot and lock in place)relax_controller
- relaxes the robot controllers, so they can be moved again (aftermove_to_initial
ormove_to_target
locks them in place)mannequin_mode
- toggles mannequin mode on or off (for the robot’s arms to stay in place after physically moving them)
The Action Status TextBox: Indicates whether or not the actions were performed successfully or failed.
The Targets TextBox: Displays the target values for the current target number and actuator type, including:
target number
- the current target numberactuator type
- the current actuator typeinitial pose
- the initial poses’joint angles
,end effector points
, andend effector rotations
target pose
- the target poses’joint angles
,end effector points
, andend effector rotations
initial image
- the initial image associated with the initial pose, displayed on the bottom lefttarget image
- the target image associated with the target pose, displayed on the bottom right
The Image Visualizer: Displays the real-time images outputted by the PR2’s on-board Kinect (defaults to the topic /camera/rgb/image_color). Has options to overlay the initial image and the target image.
GUI FAQ¶
1. How do I turn off the GPS Training GUI?
At the bottom of the hyperparams.py
file, in the config
dictionary set gui_on
to False
.
2. Why is the algorithm output text overflowing?
Matplotlib does not handle drawing text very smoothly. Either decrease the amount of iteration data printed (see the update
function of gps_training_gui.py
), the size at which is printed (see the fontsize
parameter in the initialization of the self._algthm_output
OutputAxis
in gps_trainin_gui.py
), or increase your window size.
3. How do I change the experiment information displayed at the start of learning?
See what algorithm information is printed in the info
key of the common
dictionary in hyperparams.py
.
Configuration & Hyperparameters¶
All of the configuration settings are stored in a config.py file in the relevant code directory. All hyperparameters can be changed from the default value in the hyperparams.py file for a particular experiment.
This page contains all of the config settings that are exposed via the experiment hyperparams file. See the corresponding config files for more detailed comments on each variable.
Algorithm and Optimization¶
Algorithm base class
- initial_state_var
- sample_decrease_var
- kl_step
- init_traj_distr
- sample_increase_var
- cost
- inner_iterations
- dynamics
- min_step_mult
- min_eta
- max_step_mult
- traj_opt
BADMM Algorithm
- fixed_lg_step
- exp_step_decrease
- init_pol_wt
- max_policy_samples
- policy_dual_rate
- inner_iterations
- exp_step_lower
- exp_step_increase
- policy_sample_mode
- exp_step_upper
- lg_step_schedule
- policy_dual_rate_covar
- ent_reg_schedule
LQR Traj Opt
- del0
- eta_error_threshold
- min_eta
Caffe Policy Optimization
- gpu_id
- batch_size
- iterations
- weights_file_prefix
- weight_decay
- init_var
- use_gpu
- ent_reg
- lr
- network_model
- network_arch_params
- lr_policy
- momentum
- solver_type
Policy Prior & GMM
- strength
- keep_samples
- max_clusters
- strength
- min_samples_per_cluster
- max_samples
Cost Function¶
State cost
- wp_final_multiplier
- ramp_option
- l2
- data_types
- l1
- alpha
Forward kinematics cost
- evalnorm
- wp_final_multiplier
- alpha
- target_end_effector
- l2
- ramp_option
- env_target
- wp
- l1
Action cost
- wu
Sum of costs
- costs
- weights
Initialization¶
Initial Trajectory Distribution - PD initializer
- init_action_offset
- pos_gains
- init_var
- vel_gains_mult
Initial Trajectory Distribution - LQR initializer
- init_acc
- stiffness
- init_gains
- init_var
- stiffness_vel
- final_weight
Agent Interfaces¶
Agent base class
- noisy_body_var
- x0var
- dH
- noisy_body_idx
- smooth_noise_renormalize
- smooth_noise
- smooth_noise_var
- pos_body_offset
- pos_body_idx
Box2D agent
Mujoco agent
- substeps
ROS agent
FAQ¶
What does this codebase include?
This codebase implements the BADMM-based guided policy search algorithm, including LQG-based trajectory optimization, fitting local dynamics model using a Gaussian Mixture Model prior, and the cost functions used our work.
For ease of applying the method, we also provide interfaces with three simulation and robotic platforms: Box2D for its ease of access, Mujoco for its more accurate and capable 3D physics simulation, and ROS for interfacing with real-world robots.
What does this codebase not include?
The codebase is a work in progress.
It includes the algorithm detailed in (Levine, Finn et al., 2015) but does not yet include support for images and convolutional networks, which is under development.
It does not include the constrained guided policy search algorithm in (Levine et al., 2015; Levine and Abbeel, 2014). For a discussion on the differences between the BADMM and constrained versions of the algorithm, see (Levine, Finn et al. 2015).
Other extensions on the algorithm, including (Finn et al., 2016; Zhang et al., 2016; Fu et al., 2015) are not currently implemented, but the former two extensions are in progress.
Why Caffe?
Caffe provides a straight-forward interface in python that makes it easy to define, train, and deploy simple architectures.
We plan to add support for TensorFlow to enable training more flexible network architectures.
How can I find more details on the algorithms implemented in this codebase?
For the most complete, up-to-date reference, we recommend this paper:
Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. 2015. arxiv 1504.00702. [pdf]