Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Comparative Analysis of Multi-Angle examples of DQN and PG

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, the editor will share with you the relevant knowledge points of the comparative analysis of DQN and PG multi-angle examples. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

The first is the comparison of principles. The goal of reinforcement learning is to train a good model corresponding to specific tasks. The methods of the two training strategies are different. DQN value-based method, to put it simply, first learn a value function, and then determine the strategy through the value function. The policy-based approach of PG is to train a strategy directly through an objective function.

Then there is the difference in the network model, which is the model required by the DQN method in MATLAB

The state of each step enters the network together with action as input, and the final output is the value of the next step action, corresponding to the actions accepted by the model. For example, in the labyrinth environment, the upward expression of 1 is applied by the rlDQN agent model to the environment.

Let's look at the model of PG method.

As long as state is used as input, the output after network operation is the next step of action, which corresponds to the actionInfo of the model. After rlPGAgent analysis, the actions that need to be performed are extracted and then interacted with the environment.

Finally, look at the training process, the same simple balance to maintain the environment, DQN training reward changes like this

And PG training needs more times.

This comparison only intuitively points out the difference. Maybe the PG method is not suitable for such an environment. Here, the input and output of the two methods are mainly recorded, which can be used for reference when building the model next time:

The input of DQN is state and action, and the output corresponds to the exact value of action.

The input of PG is state, and the output is ActionInfo of env.

These are all the contents of the article "Comparative Analysis of Multi-angle examples of DQN and PG". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report