evorl.algorithms.offpolicy_utils¶
Module Contents¶
Classes¶
Wrapping some common template for off-policy RL with TD Learning. |
Functions¶
Clean the trajectory to make it suitable for the replay buffer. |
|
Utility function to remove replay_buffer_state from state. |
API¶
- class evorl.algorithms.offpolicy_utils.OffPolicyWorkflowTemplate(env: evorl.envs.Env, agent: evorl.agent.Agent, optimizer: optax.GradientTransformation, evaluator: evorl.evaluators.Evaluator, replay_buffer: evorl.replay_buffers.AbstractReplayBuffer, config: omegaconf.DictConfig)[source]¶
Bases:
evorl.workflows.OffPolicyWorkflowWrapping some common template for off-policy RL with TD Learning.
- learn(state: evorl.types.State) evorl.types.State[source]¶
- evorl.algorithms.offpolicy_utils.clean_trajectory(trajectory: evorl.sample_batch.SampleBatch) evorl.sample_batch.SampleBatch[source]¶
Clean the trajectory to make it suitable for the replay buffer.
- evorl.algorithms.offpolicy_utils.skip_replay_buffer_state(state: evorl.types.State) evorl.types.State[source]¶
Utility function to remove replay_buffer_state from state.
Usually used when saving the off-policy workflow state to disk.