Quick Start¶

Initializing environments in MARP is very similar to doing that in PettingZoo and Gym.

from marp.ma_env import MARP
env = MARP(N=3, layout='small', orthogonal_actions=True, one_shot=True, render_mode='human')

This creates a multi-agent environment where each agent takes actions simultaneously. In other words, every time the environment takes as input an action profile (i.e., an joint-action) and proceeds to the next step. We provide similar interfaces as PettingZoo

from marp.ma_env import MARP

env = MARP(N=3, layout='small', orthogonal_actions=True, one_shot=True, render_mode='human')
observations, infos = env.reset()

while env.agents:
    actions = {
        agent: env.action_space(agent).sample(infos[agent]['action_mask'])
        for agent in env.agents
    }
    observations, rewards, terminations, truncations, infos = env.step(actions)

In addition to the conventional step() interface that is commonly used in the RL community, we also provide interfaces that help obtain the explicit transition between (global or system) states.

from marp.ma_env import MARP

env = MARP(N=3, layout='small', orthogonal_actions=True, one_shot=True, render_mode='human')
env.reset()

curr_state = env.get_state()
actions = {
        agent: env.action_space(agent).sample(infos[agent]['action_mask'])
        for agent in env.agents
}
succ_state = env.transit(curr_state, actions)

Compared to the step() interface, the transit() interface explicitly takes in a state, which can be aquired by get_state() in advance, and an action profile, and returns a successor state. Note that, calls to this function will not change the internal state of the environment, therefore, can be used to implement search algorithms that plan ahead.