Skip to content

请求RNN方法相关说明文档 #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
HawkQ opened this issue Mar 28, 2025 · 10 comments
Open

请求RNN方法相关说明文档 #122

HawkQ opened this issue Mar 28, 2025 · 10 comments

Comments

@HawkQ
Copy link

HawkQ commented Mar 28, 2025

使用多智能体MAPPO算法时,想要尝试用Basic_RNN替换Basic_MLP,在配置文件中同步修改use_rnn: True后,出现错误提示:

Traceback (most recent call last):
File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 22, in <module>
Agent.train(configs.running_steps // configs.parallels) # Train the model for numerous steps.
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 287, in train
self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 384, in run_episodes
policy_out = self.action(obs_dict=obs_dict, state=state, avail_actions_dict=avail_actions,
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 141, in action
rnn_hidden_critic_new, values_out = self.policy.get_values(observation=critic_input,
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/policies/gaussian_marl.py", line 176, in get_values
outputs = self.critic_representation[key](observation[key], *rnn_hidden[key])
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/rnn.py", line 63, in forward
output, hn = self.rnn(mlp_output, h)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/torch/nn/modules/rnn.py", line 1117, in forward
raise RuntimeError(
RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor

应该是数据维度的问题,查阅文档后并未发现有相关部分的说明,不知道还需修改环境代码的其他什么位置,以下是我动作、状态、观察空间:

        self.state_space = Box(-np.inf, np.inf, shape=[7 * self.num_agents, ], dtype=np.float32)
        self.observation_space = {agent: Box(-np.inf, np.inf, shape=[14, ], dtype=np.float32) for agent in self.agents}
        self.action_space = {agent: Box(-1, 1, shape=[2, ], dtype=np.float32) for agent in self.agents}

请问还有哪里需要做出调整,谢谢答疑!

@wenzhangliu
Copy link
Collaborator

你好,如果要修改representation为RNN,需要配置以下信息:

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

@HawkQ
Copy link
Author

HawkQ commented Mar 29, 2025

你好,如果要修改representation为RNN,需要配置以下信息:

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

您好,按此方法修改后,报错如下

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 27, in <module>
    Agent = MAPPO_Agents(config=configs, envs=envs)  # Create a DDPG agent from XuanCe.
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 24, in __init__
    super(MAPPO_Agents, self).__init__(config, envs)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/ippo_agents.py", line 24, in __init__
    self.policy = self._build_policy()  # build policy
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/mappo_agents.py", line 38, in _build_policy
    A_representation = self._build_representation(self.config.representation, self.observation_space, self.config)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/base/agents_marl.py", line 217, in _build_representation
    representation[key] = REGISTRY_Representation[representation_key](**input_representations)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/mlp.py", line 40, in __init__
    self.output_shapes = {'state': (hidden_sizes[-1],)}
KeyError: -1

是否因为我state数据格式需要修改?

@wenzhangliu
Copy link
Collaborator

可参考这上面的参数配置,对比格式是否一致:https://github.com/agi-brain/xuance/blob/master/examples/mappo/mappo_mpe_configs/simple_spread_v3.yaml.

@HawkQ
Copy link
Author

HawkQ commented Mar 29, 2025

我的配置文件就是从simple_spread_v3.yaml修改而来,使用这个yaml也会报错:
RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor
事实上直接使用simple_spread_v3.yaml运行mpe环境的测试,当配置文件修改为RNN相关设置后也会报错

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/testrun.py", line 13, in <module>
    runner.run()
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/runners/runner_marl.py", line 32, in run
    self.agents.train(n_train_steps)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 287, in train
    self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/core/on_policy_marl.py", line 420, in run_episodes
    _, value_next = self.values_next(i_env=i, obs_dict=obs_dict[i], state=state[i],
TypeError: 'NoneType' object is not subscriptable

@wenzhangliu
Copy link
Collaborator

你好,请问在VDN或MADDPG这类算法上测试过吗?是否也存在同样问题?我需要判断据此判断一下问题出现在哪个环节

@HawkQ
Copy link
Author

HawkQ commented Apr 3, 2025

你好,请问在VDN或MADDPG这类算法上测试过吗?是否也存在同样问题?我需要判断据此判断一下问题出现在哪个环节

您好,因readthedocs提供的配置文件有限,仅将MADDPG算法配置文件修改并添加如下内容:

agent: "MADDPG"  # the learning algorithms_marl
env_name: "fb"
env_id: "fb_v0"
env_seed: 1
continuous_action: True
learner: "MADDPG_Learner"
policy: "MADDPG_Policy"
representation: "Basic_RNN"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL"
distributed_training: False

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

representation_hidden_size: []  # the units for each hidden layer
actor_hidden_size: [64, ]
critic_hidden_size: [64, ]
activation: 'leaky_relu'
activation_action: 'sigmoid'
use_parameter_sharing: True
use_actions_mask: False

MADDPG可以运行,但仅单环境运行也会大量读写,不知是否是RNN的特性

在VDN中,配置文件关键如下:

agent: "VDN"  
env_name: "fb"
env_id: "fb_v0"
env_seed: 1
continuous_action: True
learner: "VDN_Learner"
policy: "Mixing_Q_network"
representation: "Basic_MLP"
vectorize: "DummyVecMultiAgentEnv"
runner: "MARL"
distributed_training: False

use_rnn: True
rnn: "GRU"
recurrent_layer_N: 1
fc_hidden_sizes: [64, ]
recurrent_hidden_size: 64
N_recurrent_layers: 1
dropout: 0

representation_hidden_size: [64, ]
q_hidden_size: [64, ]  # the units for each hidden layer
activation: "relu"

此时会报错:

Traceback (most recent call last):
  File "/Users/hawkq/Desktop/frigatebird_multi/new_run.py", line 29, in <module>
    Agent = VDN_Agents(config=configs, envs=envs)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/vdn_agents.py", line 27, in __init__
    self.policy = self._build_policy()  # build policy
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/multi_agent_rl/vdn_agents.py", line 44, in _build_policy
    representation = self._build_representation(self.config.representation, self.observation_space, self.config)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/agents/base/agents_marl.py", line 217, in _build_representation
    representation[key] = REGISTRY_Representation[representation_key](**input_representations)
  File "/opt/anaconda3/envs/xuance_marl/lib/python3.8/site-packages/xuance/torch/representations/mlp.py", line 40, in __init__
    self.output_shapes = {'state': (hidden_sizes[-1],)}
KeyError: -1

这与上文报错相同

@wenzhangliu
Copy link
Collaborator

你好,请确认representation参数设置为“Basic_RNN”:

representation: "Basic_RNN"

@HawkQ
Copy link
Author

HawkQ commented Apr 4, 2025

VDN修改为representation: "Basic_RNN"后,由于动作连续,因此报错,但简单调整为离散动作后能够运行

@wenzhangliu
Copy link
Collaborator

wenzhangliu commented Apr 4, 2025

VDN适用于离散动作

@HawkQ
Copy link
Author

HawkQ commented Apr 27, 2025

你好,请问在VDN或MADDPG这类算法上测试过吗?是否也存在同样问题?我需要判断据此判断一下问题出现在哪个环节

经过测试,VDN能够运行,MADDPG能够运行,MAPPO报错:

Traceback (most recent call last):
  File "C:\Users\HawkQ\Desktop\frigatebird_multi\new_run.py", line 30, in <module>
    Agent.train(configs.running_steps // configs.parallels)  # Train the model for numerous steps.
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\core\on_policy_marl.py", line 287, in train
    self.run_episodes(None, n_episodes=self.n_envs, test_mode=False)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\core\on_policy_marl.py", line 384, in run_episodes
    policy_out = self.action(obs_dict=obs_dict, state=state, avail_actions_dict=avail_actions,
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\agents\multi_agent_rl\mappo_agents.py", line 141, in action
    rnn_hidden_critic_new, values_out = self.policy.get_values(observation=critic_input,
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\policies\gaussian_marl.py", line 176, in get_values
    outputs = self.critic_representation[key](observation[key], *rnn_hidden[key])
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\xuance\torch\representations\rnn.py", line 63, in forward
    output, hn = self.rnn(mlp_output, h)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Software\Anaconda\envs\xuance_marl\lib\site-packages\torch\nn\modules\rnn.py", line 1117, in forward
    raise RuntimeError(
RuntimeError: For unbatched 2-D input, hx should also be 2-D but got 3-D tensor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants