Replay Buffers¶
ReplayBuffer interfaces¶
-
class
pfrl.replay_buffers.
ReplayBuffer
(capacity: Optional[int] = None, num_steps: int = 1)[source]¶ Experience Replay Buffer
As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.
Parameters: -
append
(state, action, reward, next_state=None, next_action=None, is_state_terminal=False, env_id=0, **kwargs)[source]¶ Append a transition to this replay buffer.
Parameters: - state – s_t
- action – a_t
- reward – r_t
- next_state – s_{t+1} (can be None if terminal)
- next_action – a_{t+1} (can be None for off-policy algorithms)
- is_state_terminal (bool) –
- env_id (object) – Object that is unique to each env. It indicates which env a given transition came from in multi-env training.
- **kwargs – Any other information to store.
-
load
(filename)[source]¶ Load the content of the buffer from a file.
Parameters: filename (str) – Path to a file.
-
sample
(num_experiences)[source]¶ Sample n unique transitions from this replay buffer.
Parameters: n (int) – Number of transitions to sample. Returns: Sequence of n sampled transitions.
-
save
(filename)[source]¶ Save the content of the buffer to a file.
Parameters: filename (str) – Path to a file.
-
stop_current_episode
(env_id=0)[source]¶ Notify the buffer that the current episode is interrupted.
You may want to interrupt the current episode and start a new one before observing a terminal state. This is typical in continuing envs. In such cases, you need to call this method before appending a new transition so that the buffer will treat it as an initial transition of a new episode.
This method should not be called after an episode whose termination is already notified by appending a transition with is_state_terminal=True.
Parameters: env_id (object) – Object that is unique to each env. It indicates which env’s current episode is interrupted in multi-env training.
-
ReplayBuffer implementations¶
-
class
pfrl.replay_buffers.
ReplayBuffer
(capacity: Optional[int] = None, num_steps: int = 1)[source] Experience Replay Buffer
As described in https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf.
Parameters:
-
class
pfrl.replay_buffers.
PrioritizedReplayBuffer
(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=0.01, normalize_by_max=True, error_min=0, error_max=1, num_steps=1)[source]¶ Stochastic Prioritization
https://arxiv.org/pdf/1511.05952.pdf Section 3.3 proportional prioritization
Parameters: - capacity (int) – capacity in terms of number of transitions
- alpha (float) – Exponent of errors to compute probabilities to sample
- beta0 (float) – Initial value of beta
- betasteps (int) – Steps to anneal beta to 1
- eps (float) – To revisit a step after its error becomes near zero
- normalize_by_max (bool) – Method to normalize weights.
'batch'
orTrue
(default): divide by the maximum weight in the sampled batch.'memory'
: divide by the maximum weight in the memory.False
: do not normalize
-
class
pfrl.replay_buffers.
PrioritizedEpisodicReplayBuffer
(capacity=None, alpha=0.6, beta0=0.4, betasteps=200000.0, eps=1e-08, normalize_by_max=True, default_priority_func=None, uniform_ratio=0, wait_priority_after_sampling=True, return_sample_weights=True, error_min=None, error_max=None)[source]¶
-
class
pfrl.replay_buffers.
PersistentReplayBuffer
(dirname, capacity, *, ancestor=None, logger=None, distributed=False, group=None)[source]¶ Experience replay buffer that are saved to disk storage
ReplayBuffer
is used to store sampled experience data, but the data is stored in DRAM memory and removed after program termination. This class add persistence toReplayBuffer
, so that the learning process can be restarted from a previously saved replay data.Parameters: - dirname (str) – Directory name where the buffer data is saved. Please note that it tries to load data from it as well. Also, it would be important to note that it can’t be used with ancestor.
- capacity (int) – Capacity in terms of number of transitions
- ancestor (str) – Path to pre-generated replay buffer. The ancestor directory is used to load/save, instead of dirname.
- logger – logger object
- distributed (bool) – Use a distributed version for the underlying persistent queue class. You need the private package pfrlmn to use this option.
- group – torch.distributed group object. Only used when distributed=True and pfrlmn package is available
Note
Contrary to the original
ReplayBuffer
implementation,state
andnext_state
,action
andnext_action
are pickled and stored as different objects even they point to the same object. This may lead to inefficient usage of storage space, but it is recommended to buy more storage - hardware is sometimes cheaper than software.
-
class
pfrl.replay_buffers.
PersistentEpisodicReplayBuffer
(dirname, capacity, *, ancestor=None, logger=None, distributed=False, group=None)[source]¶ Episodic version of
PersistentReplayBuffer
Parameters: - dirname (str) – Directory name where the buffer data is saved. This cannot be used with ancestor
- capacity (int) – Capacity in terms of number of transitions
- ancestor (str) – Path to pre-generated replay buffer. The ancestor directory is used to load/save, instead of dirname.
- logger – logger object
- distributed (bool) – Use a distributed version for the underlying persistent queue class. You need the private package pfrlmn to use this option.
- group – torch.distributed group object. Only used when distributed=True and pfrlmn package is available
Note
Current implementation is inefficient, as episodic memory and memory data shares the almost same data in
EpisodicReplayBuffer
by reference but shows different data structure. Otherwise, persistent version of them does not share the data between them but backing file structure is separated.