evorl.algorithms.contrib.pop_td3¶
Module Contents¶
Classes¶
Indepentent TD3 agent with shared replay buffer. |
|
Functions¶
Flatten the trajectory from [#pop, T, B, …] to [#popTB, …]. |
API¶
- class evorl.algorithms.contrib.pop_td3.PopTD3Workflow(env: evorl.envs.Env, agent: evorl.agent.Agent, optimizer: optax.GradientTransformation, evaluator: evorl.evaluators.Evaluator, replay_buffer: evorl.replay_buffers.AbstractReplayBuffer, config: omegaconf.DictConfig)[source]¶
Bases:
evorl.algorithms.td3.TD3WorkflowIndepentent TD3 agent with shared replay buffer.
- classmethod build_from_config(config: omegaconf.DictConfig, enable_multi_devices: bool = False, enable_jit: bool = True) typing_extensions.Self[source]¶
- evaluate(state: evorl.types.State) tuple[evorl.metrics.MetricBase, evorl.types.State][source]¶
- learn(state: evorl.types.State) evorl.types.State[source]¶
- setup(key: chex.PRNGKey) evorl.types.State[source]¶
- step(state: evorl.types.State) tuple[evorl.metrics.MetricBase, evorl.types.State][source]¶
- class evorl.algorithms.contrib.pop_td3.WorkflowMetric[source]¶
Bases:
evorl.metrics.MetricBase- iterations: chex.Array¶
‘zeros(…)’
- sampled_episodes: chex.Array¶
‘zeros(…)’
- sampled_timesteps: chex.Array¶
‘zeros(…)’
- sampled_timesteps_per_agent: chex.Array¶
‘zeros(…)’
- evorl.algorithms.contrib.pop_td3.flatten_pop_rollout_trajectory(trajectory: evorl.sample_batch.SampleBatch) evorl.sample_batch.SampleBatch[source]¶
Flatten the trajectory from [#pop, T, B, …] to [#popTB, …].