GPS

Intended Usage

The intended command line usage is through the gps_main.py script:

cd /path/to/gps
python python/gps/gps_main.py -h
usage: gps_main.py [-h] [-n] [-t] [-r N] experiment

Run the Guided Policy Search algorithm.

positional arguments:
  experiment         experiment name

optional arguments:
  -h, --help         show this help message and exit
  -n, --new          create new experiment
  -t, --targetsetup  run target setup
  -r N, --resume N   resume training from iter N

Usage:

  • python python/gps/gps_main.py <EXPERIMENT_NAME> -n

    Creates a new experiment folder at experiments/<EXPERIMENT_NAME> with an empty hyperparams file hyperparams.py. Copy and paste your old hyperparams.py from your previous experiment and make any modifications.

  • python python/gps/gps_main.py <EXPERIMENT_NAME> -t (for ROS only)

    Opens the Target Setup GUI, for target setup when using ROS. See the Target Setup GUI section for details.

  • python python/gps/gps_main.py <EXPERIMENT_NAME>

    Opens the GPS Training GUI and runs the guided policy search algorithm for your specific experiment hyperparams. See the Training GUI section for details.

  • python python/gps/gps_main.py <EXPERIMENT_NAME> -r N

    Resumes the guided policy search algorithm, loading the algorithm state from iteration N. (The file experiments/<EXPERIMENT_NAME>/data_files/algorithm_itr_<N>.pkl must exist.)

For your reference, your experiments folder contains the following:

  • data_files/ - holds the data files.
    • data_files/algorithm_itr_<N>.pkl - the algorithm state at iteration N.
    • data_files/traj_samples_itr_<N>.pkl - the trajectory samples collected at iteration N.
    • data_files/pol_samples_itr_<N>.pkl - the policy samples collected at iteration N.
    • data_files/figure_itr_<N>.png - an image of the GPS Training GUI figure at iteration N.
  • hyperparams.py - the hyperparams used for this experiment. For more details, see this page.
  • log.txt - the log text of output from the Target Setup GUI and the GPS Training GUI.
  • targets.npz - the initial and target state used for this experiment (ROS agent only – set for other agents in hyperparams.py)

GUI Docs

There are two GUI interfaces which are useful for interacting with and visualizing experiments, documented below.


GPS Training GUI

python python/gps/gps_main.py <EXPERIMENT_NAME>

alt text

The GPS Training GUI is composed of seven parts:

The Action Panel: Consists of 4 actions which can be performed by clicking the button, pressing the keyboard shortcut, or using the PS3 Controller shortcut:

  • stop - stop the robot from collecting samples (after the current sample has completed), used to perform a manual reset of the robot’s arms
  • reset - reset the robot to the initial position (after the current sample has completed), used to perform a manual reset of the objects in the scene
  • go - start/restart the robot to collect samples (after using stop, reset, or fail), used to resume training after stop, reset or fail
  • fail - fail the current sample being collected and recollect that sample (after the current sample has completed), used to recollect a sample that was conducted under faulty conditions

The Action Status TextBox: Indicates whether or not the actions were performed successfully or failed.

The Algorithm Status TextBox: Indicates the current status of the algorithm (sampling, calculating, logging data, etc.).

The Cost Plot: After each iteration, plots the cost per condition (as points) and the mean cost (as a connected line between iterations).

The Algorithm Output Textbox: After each iteration, outputs the iteration number and the mean cost, and for each condition, the cost, the step_multiplier, the linear Gaussian controller entropy, and the initial and final KL Divergences (for BADMM only). To change what is printed, see the update method of gps_training_gui.py.

The 3D Trajectory Visualizer: After each iteration, plots the trajectory samples collected (green lines), the policy samples collected (blue lines), the linear gaussian controller means (dark red crosses), and the linear gaussian controller distributions (red ellipses).

The Image Visualizer: Displays the real-time images outputted by the PR2’s on-board Kinect. Contains the below two actions:

  • overlay_initial_image - overlays the initial image set by the Target Setup GUI (press reset at the end of sampling to reset the robot to the initial position and check if the image matches the initial image set by the Target Setup GUI
  • overlay_target_image - overlays the target image set by the Target Setup GUI (press stop at the end of sampling to stop the robot at the target position and check if the image matches the target image set by the Target Setup GUI

Target Setup GUI (for ROS only)

python python/gps/gps_main.py <EXPERIMENT_NAME> -t

alt text

The Target Setup GUI is composed of four parts:

The Action Panel: Consists of 12 actions which can be performed by clicking the button, pressing the keyboard shortcut, or using the PS3 Controller shortcut:

  • prev_target_number - switch to the previous target number (0-9)
  • next_target_number - switch to the next target number (0-9)
  • next_actuator_type - switch to the next actuator type (TRIAL_ARM, AUXILIARY_ARM)
  • set_initial_position - set the initial position (manually move the robot to the desired position and then press set)
  • set_target_position - set the target position (manually move the robot to the desired position and then press set)
  • set_initial_image - set the initial image (associated with the initial position)
  • set_target_image - set the target image (associated with the target position)
  • move_to_initial - move to the initial position (queries AgentROS to physically move the robot and lock in place)
  • move_to_target - move to the target position (queries AgentROS to physically move the robot and lock in place)
  • relax_controller - relaxes the robot controllers, so they can be moved again (after move_to_initial or move_to_target locks them in place)
  • mannequin_mode - toggles mannequin mode on or off (for the robot’s arms to stay in place after physically moving them)

The Action Status TextBox: Indicates whether or not the actions were performed successfully or failed.

The Targets TextBox: Displays the target values for the current target number and actuator type, including:

  • target number - the current target number
  • actuator type - the current actuator type
  • initial pose - the initial poses’ joint angles, end effector points, and end effector rotations
  • target pose - the target poses’ joint angles, end effector points, and end effector rotations
  • initial image - the initial image associated with the initial pose, displayed on the bottom left
  • target image - the target image associated with the target pose, displayed on the bottom right

The Image Visualizer: Displays the real-time images outputted by the PR2’s on-board Kinect (defaults to the topic /camera/rgb/image_color). Has options to overlay the initial image and the target image.


GUI FAQ

1. How do I turn off the GPS Training GUI?

At the bottom of the hyperparams.py file, in the config dictionary set gui_on to False.

2. Why is the algorithm output text overflowing?

Matplotlib does not handle drawing text very smoothly. Either decrease the amount of iteration data printed (see the update function of gps_training_gui.py), the size at which is printed (see the fontsize parameter in the initialization of the self._algthm_output OutputAxis in gps_trainin_gui.py), or increase your window size.

3. How do I change the experiment information displayed at the start of learning?

See what algorithm information is printed in the info key of the common dictionary in hyperparams.py.

Configuration & Hyperparameters

All of the configuration settings are stored in a config.py file in the relevant code directory. All hyperparameters can be changed from the default value in the hyperparams.py file for a particular experiment.

This page contains all of the config settings that are exposed via the experiment hyperparams file. See the corresponding config files for more detailed comments on each variable.


Algorithm and Optimization

Algorithm base class

  • initial_state_var
  • sample_decrease_var
  • kl_step
  • init_traj_distr
  • sample_increase_var
  • cost
  • inner_iterations
  • dynamics
  • min_step_mult
  • min_eta
  • max_step_mult
  • traj_opt

BADMM Algorithm

  • fixed_lg_step
  • exp_step_decrease
  • init_pol_wt
  • max_policy_samples
  • policy_dual_rate
  • inner_iterations
  • exp_step_lower
  • exp_step_increase
  • policy_sample_mode
  • exp_step_upper
  • lg_step_schedule
  • policy_dual_rate_covar
  • ent_reg_schedule

LQR Traj Opt

  • del0
  • eta_error_threshold
  • min_eta

Caffe Policy Optimization

  • gpu_id
  • batch_size
  • iterations
  • weights_file_prefix
  • weight_decay
  • init_var
  • use_gpu
  • ent_reg
  • lr
  • network_model
  • network_arch_params
  • lr_policy
  • momentum
  • solver_type

Policy Prior & GMM

  • strength
  • keep_samples
  • max_clusters
  • strength
  • min_samples_per_cluster
  • max_samples

Dynamics

Dynamics GMM Prior

  • max_clusters
  • strength
  • min_samples_per_cluster
  • max_samples

Cost Function

State cost

  • wp_final_multiplier
  • ramp_option
  • l2
  • data_types
  • l1
  • alpha

Forward kinematics cost

  • evalnorm
  • wp_final_multiplier
  • alpha
  • target_end_effector
  • l2
  • ramp_option
  • env_target
  • wp
  • l1

Action cost

  • wu

Sum of costs

  • costs
  • weights

Initialization

Initial Trajectory Distribution - PD initializer

  • init_action_offset
  • pos_gains
  • init_var
  • vel_gains_mult

Initial Trajectory Distribution - LQR initializer

  • init_acc
  • stiffness
  • init_gains
  • init_var
  • stiffness_vel
  • final_weight

Agent Interfaces

Agent base class

  • noisy_body_var
  • x0var
  • dH
  • noisy_body_idx
  • smooth_noise_renormalize
  • smooth_noise
  • smooth_noise_var
  • pos_body_offset
  • pos_body_idx

Box2D agent

Mujoco agent

  • substeps

ROS agent

FAQ

What does this codebase include?

This codebase implements the BADMM-based guided policy search algorithm, including LQG-based trajectory optimization, fitting local dynamics model using a Gaussian Mixture Model prior, and the cost functions used our work.

For ease of applying the method, we also provide interfaces with three simulation and robotic platforms: Box2D for its ease of access, Mujoco for its more accurate and capable 3D physics simulation, and ROS for interfacing with real-world robots.

What does this codebase not include?

The codebase is a work in progress.

It includes the algorithm detailed in (Levine, Finn et al., 2015) but does not yet include support for images and convolutional networks, which is under development.

It does not include the constrained guided policy search algorithm in (Levine et al., 2015; Levine and Abbeel, 2014). For a discussion on the differences between the BADMM and constrained versions of the algorithm, see (Levine, Finn et al. 2015).

Other extensions on the algorithm, including (Finn et al., 2016; Zhang et al., 2016; Fu et al., 2015) are not currently implemented, but the former two extensions are in progress.

Why Caffe?

Caffe provides a straight-forward interface in python that makes it easy to define, train, and deploy simple architectures.

We plan to add support for TensorFlow to enable training more flexible network architectures.

How can I find more details on the algorithms implemented in this codebase?

For the most complete, up-to-date reference, we recommend this paper:

Sergey Levine*, Chelsea Finn*, Trevor Darrell, Pieter Abbeel. End-to-End Training of Deep Visuomotor Policies. 2015. arxiv 1504.00702. [pdf]