evorl.replay_buffers.prioritized_replay_buffer

Prioritized Replay Buffer with LAP (Loss-Adjusted Prioritization) support.

Module Contents

Classes

PrioritizedReplayBuffer

ReplayBuffer with proportional prioritized sampling (LAP).

PrioritizedReplayBufferState

State for the prioritized replay buffer.

API

class evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBuffer[source]

Bases: evorl.replay_buffers.replay_buffer.ReplayBuffer

ReplayBuffer with proportional prioritized sampling (LAP).

Uses Loss-Adjusted Prioritization from TD7 paper. New samples are assigned max_priority. Priorities are updated based on TD-error after critic training.

Variables:
  • capacity – the maximum capacity of the replay buffer.

  • sample_batch_size – the batch size for sample().

  • min_sample_timesteps – the minimum number of timesteps before the replay buffer can sample.

  • alpha – the exponent determining how much prioritization is used (0 = uniform, 1 = full).

add(buffer_state: evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState, xs: chex.ArrayTree, mask: chex.Array | None = None) evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState[source]
alpha: float

0.6

init(spec: chex.ArrayTree) evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState[source]
reset_max_priority(buffer_state: evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState) evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState[source]

Recompute max_priority from current buffer entries.

sample(buffer_state: evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState, key: chex.PRNGKey, beta: float | chex.Array = 0.4) tuple[chex.ArrayTree, chex.Array, evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState][source]

Sample a batch proportional to priorities with IS weights.

Parameters:
  • buffer_state – Current buffer state.

  • key – PRNG key.

  • beta – Importance sampling exponent.

Returns:

A tuple of (batch, weights, updated_buffer_state). The weights are the computed Importance Sampling (IS) weights.

update_priority(buffer_state: evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState, priority: chex.Array) evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState[source]

Update priorities at the last sampled indices.

Parameters:
  • buffer_state – Current buffer state (must have valid sample_indices).

  • priority – New priority values, shape (sample_batch_size,).

Returns:

Updated buffer state with new priorities.

class evorl.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBufferState[source]

Bases: evorl.replay_buffers.replay_buffer.ReplayBufferState

State for the prioritized replay buffer.

Variables:
  • priority – Priority values for each entry in the buffer.

  • max_priority – Current maximum priority value.

  • sample_indices – Indices of the last sampled batch (for priority updates).

max_priority: chex.Array

‘ones(…)’

priority: chex.Array

‘zeros(…)’

sample_indices: chex.Array

‘zeros(…)’