Skip to content

The upper and lower limits of action space in a custom multi-agent environment #112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
limuhan33 opened this issue Mar 8, 2025 · 2 comments

Comments

@limuhan33
Copy link

Hello! I found that the current custom environment does not seem to support discrete action space, so I changed the model to a one-dimensional continuous action space. But I found that my definition of action_space does not seem to work.
At the beginning, I noticed that the output of the action always hovered around 0.9 during training, and the value space I defined was [-20, 20]. To verify whether it is a question of randomness, I changed the range to [-0.1, 0.1], but the action of each input step() is still 0.9~1.1.
Image
Image
I noticed that the value of the upper and lower limits of the action space in "off_policy_marl.py" seems to be wrong, and only returns "NONE". I wonder if this is the root cause that affects the correctness of action space? Or is it that something went wrong when I defined the environment?
Image
I checked the observation_space and state_space, and their values ​​are both normal.

@wenzhangliu
Copy link
Collaborator

Hi, the range of continuous actions returned by agent.policy in XuanCe is determined by the activation function you choose. For instance, using the sigmoid activation function restricts the action range to [0, 1], while tanh results in a range of [-1, 1]. This range cannot be modified by other ways.

If your custom environment has an action space of [low, high], you can rescale the actions within env.step() before they are executed. That means, if the activation action is tanh, you can modify your code like this:

def step(self, action):
    action_execute = (action + 1) / 2 * (high - low) + low
    ...

That will not influence the performance of your tasks.

@limuhan33
Copy link
Author

您好,玄测中 agent.policy 返回的连续动作范围由您选择的激活函数决定。例如,使用激活函数将作范围限制为 [0, 1],而结果范围为 [-1, 1]。此范围无法通过其他方式修改。sigmoid``tanh

如果您的自定义环境具有 [low, high] 的作空间,则可以在执行作之前在 env.step() 中重新缩放作。这意味着,如果激活作是 ,您可以像这样修改您的代码:tanh

def step(self, action):
action_execute = (action + 1) / 2 * (high - low) + low
...
这不会影响您的任务的执行。

Thanks for the reminder! There are two activation parameters in the configuration file, "activation" and "action_activation". What do they correspond to?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants