Replay Buffers¶

ReplayBuffer interfaces¶

class pfrl.replay_buffers.ReplayBuffer(capacity: Optional[int] = None, num_steps: int = 1)[source]¶

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:	capacity (int) – capacity in terms of number of transitions num_steps (int) – Number of timesteps per stored transition (for N-step updates)

append(state, action, reward, next_state=None, next_action=None, is_state_terminal=False, env_id=0, **kwargs)[source]¶

Append a transition to this replay buffer.

Parameters:	state – s_t action – a_t reward – r_t next_state – s_{t+1} (can be None if terminal) next_action – a_{t+1} (can be None for off-policy algorithms) is_state_terminal (bool) – env_id (object) – Object that is unique to each env. It indicates which env a given transition came from in multi-env training. **kwargs – Any other information to store.

load(filename)[source]¶

Load the content of the buffer from a file.

Parameters:	filename (str) – Path to a file.

sample(num_experiences)[source]¶

Sample n unique transitions from this replay buffer.

Parameters:	n (int) – Number of transitions to sample.
Returns:	Sequence of n sampled transitions.

save(filename)[source]¶

Save the content of the buffer to a file.

Parameters:	filename (str) – Path to a file.

stop_current_episode(env_id=0)[source]¶

Notify the buffer that the current episode is interrupted.

You may want to interrupt the current episode and start a new one before observing a terminal state. This is typical in continuing envs. In such cases, you need to call this method before appending a new transition so that the buffer will treat it as an initial transition of a new episode.

This method should not be called after an episode whose termination is already notified by appending a transition with is_state_terminal=True.

Parameters:	env_id (object) – Object that is unique to each env. It indicates which env’s current episode is interrupted in multi-env training.

ReplayBuffer implementations¶

class pfrl.replay_buffers.EpisodicReplayBuffer(capacity=None)[source]¶

class pfrl.replay_buffers.ReplayBuffer(capacity: Optional[int] = None, num_steps: int = 1)[source]

Experience Replay Buffer

As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.

Parameters:	capacity (int) – capacity in terms of number of transitions num_steps (int) – Number of timesteps per stored transition (for N-step updates)

class pfrl.replay_buffers.PrioritizedReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=0.01, normalize_by_max=True, error_min=0, error_max=1, num_steps=1)[source]¶

Stochastic Prioritization

https://arxiv.org/pdf/1511.05952.pdf Section 3.3 proportional prioritization

Parameters:

capacity (int) – capacity in terms of number of transitions
alpha (float) – Exponent of errors to compute probabilities to sample
beta0 (float) – Initial value of beta
betasteps (int) – Steps to anneal beta to 1
eps (float) – To revisit a step after its error becomes near zero
normalize_by_max (bool) – Method to normalize weights. 'batch' or True (default): divide by the maximum weight in the sampled batch. 'memory': divide by the maximum weight in the memory. False: do not normalize

class pfrl.replay_buffers.PrioritizedEpisodicReplayBuffer(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=1e-08, normalize_by_max=True, default_priority_func=None, uniform_ratio=0, wait_priority_after_sampling=True, return_sample_weights=True, error_min=None, error_max=None)[source]¶

class pfrl.replay_buffers.PersistentReplayBuffer(dirname, capacity, *, ancestor=None, logger=None, distributed=False, group=None)[source]¶

Experience replay buffer that are saved to disk storage

ReplayBuffer is used to store sampled experience data, but the data is stored in DRAM memory and removed after program termination. This class add persistence to ReplayBuffer, so that the learning process can be restarted from a previously saved replay data.

Parameters:

dirname (str) – Directory name where the buffer data is saved. Please note that it tries to load data from it as well. Also, it would be important to note that it can’t be used with ancestor.
capacity (int) – Capacity in terms of number of transitions
ancestor (str) – Path to pre-generated replay buffer. The ancestor directory is used to load/save, instead of dirname.
logger – logger object
distributed (bool) – Use a distributed version for the underlying persistent queue class. You need the private package pfrlmn to use this option.
group – torch.distributed group object. Only used when distributed=True and pfrlmn package is available

Note

Contrary to the original ReplayBuffer implementation, state and next_state, action and next_action are pickled and stored as different objects even they point to the same object. This may lead to inefficient usage of storage space, but it is recommended to buy more storage - hardware is sometimes cheaper than software.

class pfrl.replay_buffers.PersistentEpisodicReplayBuffer(dirname, capacity, *, ancestor=None, logger=None, distributed=False, group=None)[source]¶

Episodic version of PersistentReplayBuffer

Parameters:

dirname (str) – Directory name where the buffer data is saved. This cannot be used with ancestor
capacity (int) – Capacity in terms of number of transitions
ancestor (str) – Path to pre-generated replay buffer. The ancestor directory is used to load/save, instead of dirname.
logger – logger object
distributed (bool) – Use a distributed version for the underlying persistent queue class. You need the private package pfrlmn to use this option.
group – torch.distributed group object. Only used when distributed=True and pfrlmn package is available

Note

Current implementation is inefficient, as episodic memory and memory data shares the almost same data in EpisodicReplayBuffer by reference but shows different data structure. Otherwise, persistent version of them does not share the data between them but backing file structure is separated.