Experiments

Training and evaluation

pfrl.experiments.train_agent_async(outdir, processes, make_env, profile=False, steps=80000000, eval_interval=1000000, eval_n_steps=None, eval_n_episodes=10, eval_success_threshold=0.0, max_episode_len=None, step_offset=0, successful_score=None, agent=None, make_agent=None, global_step_hooks=[], evaluation_hooks=(), save_best_so_far_agent=True, use_tensorboard=False, logger=None, random_seeds=None, stop_event=None, exception_event=None, use_shared_memory=True)[source]

Train agent asynchronously using multiprocessing.

Either agent or make_agent must be specified.

Parameters:
  • outdir (str) – Path to the directory to output things.
  • processes (int) – Number of processes.
  • make_env (callable) – (process_idx, test) -> Environment.
  • profile (bool) – Profile if set True.
  • steps (int) – Number of global time steps for training.
  • eval_interval (int) – Interval of evaluation. If set to None, the agent will not be evaluated at all.
  • eval_n_steps (int) – Number of eval timesteps at each eval phase
  • eval_n_episodes (int) – Number of eval episodes at each eval phase
  • eval_success_threshold (float) – r-threshold above which grasp succeeds
  • max_episode_len (int) – Maximum episode length.
  • step_offset (int) – Time step from which training starts.
  • successful_score (float) – Finish training if the mean score is greater or equal to this value if not None
  • agent (Agent) – Agent to train.
  • make_agent (callable) – (process_idx) -> Agent
  • global_step_hooks (list) – List of callable objects that accepts (env, agent, step) as arguments. They are called every global step. See pfrl.experiments.hooks.
  • evaluation_hooks (Sequence) – Sequence of pfrl.experiments.evaluation_hooks.EvaluationHook objects. They are called after each evaluation.
  • save_best_so_far_agent (bool) – If set to True, after each evaluation, if the score (= mean return of evaluation episodes) exceeds the best-so-far score, the current agent is saved.
  • use_tensorboard (bool) – Additionally log eval stats to tensorboard
  • logger (logging.Logger) – Logger used in this function.
  • random_seeds (array-like of ints or None) – Random seeds for processes. If set to None, [0, 1, …, processes-1] are used.
  • stop_event (multiprocessing.Event or None) – Event to stop training. If set to None, a new Event object is created and used internally.
  • exception_event (multiprocessing.Event or None) – Event that indicates other thread raised an excpetion. The train will be terminated and the current agent will be saved. If set to None, a new Event object is created and used internally.
  • use_shared_memory (bool) – Share memory amongst asynchronous agents.
Returns:

Trained agent.

pfrl.experiments.train_agent_batch(agent, env, steps, outdir, checkpoint_freq=None, log_interval=None, max_episode_len=None, step_offset=0, evaluator=None, successful_score=None, step_hooks=(), return_window_size=100, logger=None)[source]

Train an agent in a batch environment.

Parameters:
  • agent – Agent to train.
  • env – Environment to train the agent against.
  • steps (int) – Number of total time steps for training.
  • outdir (str) – Path to the directory to output things.
  • checkpoint_freq (int) – frequency at which agents are stored.
  • log_interval (int) – Interval of logging.
  • max_episode_len (int) – Maximum episode length.
  • step_offset (int) – Time step from which training starts.
  • return_window_size (int) – Number of training episodes used to estimate the average returns of the current agent.
  • successful_score (float) – Finish training if the mean score is greater or equal to thisvalue if not None
  • step_hooks (Sequence) – Sequence of callable objects that accepts (env, agent, step) as arguments. They are called every step. See pfrl.experiments.hooks.
  • logger (logging.Logger) – Logger used in this function.
Returns:

List of evaluation episode stats dict.

pfrl.experiments.train_agent_batch_with_evaluation(agent, env, steps, eval_n_steps, eval_n_episodes, eval_interval, outdir, checkpoint_freq=None, max_episode_len=None, step_offset=0, eval_max_episode_len=None, return_window_size=100, eval_env=None, log_interval=None, successful_score=None, step_hooks=(), evaluation_hooks=(), save_best_so_far_agent=True, use_tensorboard=False, logger=None)[source]

Train an agent while regularly evaluating it.

Parameters:
  • agent – Agent to train.
  • env – Environment train the againt against.
  • steps (int) – Number of total time steps for training.
  • eval_n_steps (int) – Number of timesteps at each evaluation phase.
  • eval_n_runs (int) – Number of runs for each time of evaluation.
  • eval_interval (int) – Interval of evaluation.
  • outdir (str) – Path to the directory to output things.
  • log_interval (int) – Interval of logging.
  • checkpoint_freq (int) – frequency with which to store networks
  • max_episode_len (int) – Maximum episode length.
  • step_offset (int) – Time step from which training starts.
  • return_window_size (int) – Number of training episodes used to estimate the average returns of the current agent.
  • eval_max_episode_len (int or None) – Maximum episode length of evaluation runs. If set to None, max_episode_len is used instead.
  • eval_env – Environment used for evaluation.
  • successful_score (float) – Finish training if the mean score is greater or equal to thisvalue if not None
  • step_hooks (Sequence) – Sequence of callable objects that accepts (env, agent, step) as arguments. They are called every step. See pfrl.experiments.hooks.
  • evaluation_hooks (Sequence) – Sequence of pfrl.experiments.evaluation_hooks.EvaluationHook objects. They are called after each evaluation.
  • save_best_so_far_agent (bool) – If set to True, after each evaluation, if the score (= mean return of evaluation episodes) exceeds the best-so-far score, the current agent is saved.
  • use_tensorboard (bool) – Additionally log eval stats to tensorboard
  • logger (logging.Logger) – Logger used in this function.
Returns:

Trained agent. eval_stats_history: List of evaluation episode stats dict.

Return type:

agent

pfrl.experiments.train_agent_with_evaluation(agent, env, steps, eval_n_steps, eval_n_episodes, eval_interval, outdir, checkpoint_freq=None, train_max_episode_len=None, step_offset=0, eval_max_episode_len=None, eval_env=None, successful_score=None, step_hooks=(), evaluation_hooks=(), save_best_so_far_agent=True, use_tensorboard=False, eval_during_episode=False, logger=None)[source]

Train an agent while periodically evaluating it.

Parameters:
  • agent – A pfrl.agent.Agent
  • env – Environment train the agent against.
  • steps (int) – Total number of timesteps for training.
  • eval_n_steps (int) – Number of timesteps at each evaluation phase.
  • eval_n_episodes (int) – Number of episodes at each evaluation phase.
  • eval_interval (int) – Interval of evaluation.
  • outdir (str) – Path to the directory to output data.
  • checkpoint_freq (int) – frequency at which agents are stored.
  • train_max_episode_len (int) – Maximum episode length during training.
  • step_offset (int) – Time step from which training starts.
  • eval_max_episode_len (int or None) – Maximum episode length of evaluation runs. If None, train_max_episode_len is used instead.
  • eval_env – Environment used for evaluation.
  • successful_score (float) – Finish training if the mean score is greater than or equal to this value if not None
  • step_hooks (Sequence) – Sequence of callable objects that accepts (env, agent, step) as arguments. They are called every step. See pfrl.experiments.hooks.
  • evaluation_hooks (Sequence) – Sequence of pfrl.experiments.evaluation_hooks.EvaluationHook objects. They are called after each evaluation.
  • save_best_so_far_agent (bool) – If set to True, after each evaluation phase, if the score (= mean return of evaluation episodes) exceeds the best-so-far score, the current agent is saved.
  • use_tensorboard (bool) – Additionally log eval stats to tensorboard
  • eval_during_episode (bool) – Allow running evaluation during training episodes. This should be enabled only when env and eval_env are independent.
  • logger (logging.Logger) – Logger used in this function.
Returns:

Trained agent. eval_stats_history: List of evaluation episode stats dict.

Return type:

agent

Training hooks

class pfrl.experiments.StepHook[source]

Hook function that will be called in training.

This class is for clarifying the interface required for Hook functions. You don’t need to inherit this class to define your own hooks. Any callable that accepts (env, agent, step) as arguments can be used as a hook.

class pfrl.experiments.LinearInterpolationHook(total_steps, start_value, stop_value, setter)[source]

Hook that will set a linearly interpolated value.

You can use this hook to decay the learning rate by using a setter function as follows:

def lr_setter(env, agent, value):
    agent.optimizer.lr = value

hook = LinearInterpolationHook(10 ** 6, 1e-3, 0, lr_setter)
Parameters:
  • total_steps (int) – Number of total steps.
  • start_value (float) – Start value.
  • stop_value (float) – Stop value.
  • setter (callable) – (env, agent, value) -> None

Experiment Management

pfrl.experiments.generate_exp_id(prefix=None, argv=['/home/docs/checkouts/readthedocs.org/user_builds/pfrl/envs/latest/lib/python3.7/site-packages/sphinx/__main__.py', '-T', '-E', '-b', 'html', '-d', '_build/doctrees', '-D', 'language=en', '.', '_build/html']) → str[source]

Generate reproducible, unique and deterministic experiment id

The generated id will be string generated from prefix, Git checksum, git diff from HEAD and command line arguments.

Returns:A generated experiment id in string (str) which if avialable for directory name
pfrl.experiments.prepare_output_dir(args, basedir=None, exp_id=None, argv=None, time_format='%Y%m%dT%H%M%S.%f', make_backup=True) → str[source]

Prepare a directory for outputting training results.

An output directory, which ends with the current datetime string, is created. Then the following infomation is saved into the directory:

args.txt: argument values and arbitrary parameters command.txt: command itself environ.txt: environmental variables start.txt: timestamp when the experiment executed

Additionally, if the current directory is under git control, the following information is saved:

git-head.txt: result of git rev-parse HEAD git-status.txt: result of git status git-log.txt: result of git log git-diff.txt: result of git diff HEAD
Parameters:
  • exp_id (str or None) – Experiment identifier. If None is given, reproducible ID will be automatically generated from Git version hash and command arguments. If the code is not under Git control, it is generated from current timestamp under the format of time_format.
  • args (dict or argparse.Namespace) – Arguments to save to see parameters
  • basedir (str or None) – If a string is specified, the output directory is created under that path. If not specified, it is created in current directory.
  • argv (list or None) – The list of command line arguments passed to a script. If not specified, sys.argv is used instead.
  • time_format (str) – Format used to represent the current datetime. The default format is the basic format of ISO 8601.
  • make_backup (bool) – If there exists old experiment with same name, copy a backup with additional suffix with time_format.
Returns:

Path of the output directory created by this function (str).