亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? cactorcritic.h

?? 強化學習算法(R-Learning)難得的珍貴資料
?? H
字號:
// Copyright (C) 2003
// Gerhard Neumann (gerhard@igi.tu-graz.ac.at)

//                
// This file is part of RL Toolbox.
// http://www.igi.tugraz.at/ril_toolbox
//
// All rights reserved.
// 
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
// are met:
// 1. Redistributions of source code must retain the above copyright
//    notice, this list of conditions and the following disclaimer.
// 2. Redistributions in binary form must reproduce the above copyright
//    notice, this list of conditions and the following disclaimer in the
//    documentation and/or other materials provided with the distribution.
// 3. The name of the author may not be used to endorse or promote products
//    derived from this software without specific prior written permission.
// 
// THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
// IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
// OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
// IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
// INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
// NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
// THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#ifndef C_ACTORCRITIC_H
#define C_ACTORCRITIC_H

#include "cagentcontroller.h"
#include "cagentlistener.h"
#include "cqfunction.h"
#include "cvfunction.h"
#include "cvfunctionlearner.h"
#include "cqetraces.h"
#include "cparameters.h"
#include "ccontinuousactiongradientpolicy.h"
#include "ril_debug.h"

/// Interface for all Actors
/**The actors have to adopt their policies according to the critics they get. The class CActor provides an interface for sending a critic
to the actor (receiveError(CStateCollection *currentState, CAction *action). This is all the Actor classes have to implement. How to get the policy from
the actor is decontrolled to the Actor classes itself. The class also maintains a learning rate beta, which can be used by the subclasses.
*/
class CActor : public CErrorListener
{
protected:
	
public:
	CActor();
	
/// interface function for the actors
/** The actor gets a critic for a given state action pair. Then he has to adopt his policy according to the critic. */
	virtual void receiveError(rlt_real critic, CStateCollection *oldState, CAction *Action, CActionData *data = NULL) = 0;


	rlt_real getLearningRate();
	void setLearningRate(rlt_real learningRate);
};

/// Actor who creates his Policy on a Q Function
/**The CActorFromQFunction updates it's Q-Function on the particular state action pair
according to the critic he got for that state action pair. Since we are using a Q-Function the actor from Q-Function uses QETraces to boost learning. 
The policy from the actor is usually a Softmax Policy using the Q-Function, this Policy must be created by the user exclusivly. 
<p>
The Q-Function update for this actor is Q(s,a)_new = Q(s,a)_old + beta * td, where td is the value coming from the critic. 
@see CActorFromQFunctionAndPolicy. */
class CActorFromQFunction : public CActor, public CSemiMDPListener
{
protected:
	/// The Q Function of the actor
	CAbstractQFunction *qFunction;
	/// The Etraces used for the QFunction
	CAbstractQETraces *eTraces;

public:
/// Creates an Actor using the specified Q-Function to adopt his Policy. 
	CActorFromQFunction(CAbstractQFunction *qFunction);
	virtual ~CActorFromQFunction();

/// Updates the Q-Function
/**
The actor first updates the Etraces (i.e. mulitply all ETraces with gamma*lambda and then adds the state to the ETraces). 
Then the Q-Function is updated by the Etraces Object with the value beta * critic.
@see CQETraces
*/
	virtual void receiveError(rlt_real critic, CStateCollection *oldState, CAction *Action, CActionData *data = NULL);
/// Returns the used Q-Function
	CAbstractQFunction *getQFunction();
/// Returns the used ETraces
	CAbstractQETraces *getETraces();

	/// resets etraces object
	virtual void newEpisode();

};

/// Actor which uses a QFunction and his Policy for the update
/** The only difference to CActorFromQFunction is the update of the Q-Function. The update is
Q(s_t,a_t)_new = Q(s_t,a_t)_old + beta * td * (1 - pi_(s_t, a_t)), where pi(s_t, a_t) is the softmax-policy from the actor. This method is recommended by
Sutton and Barto.
*/
class CActorFromQFunctionAndPolicy : public CActorFromQFunction
{
protected:
	CStochasticPolicy *policy;
	rlt_real *actionValues;

public:
/// Creates the actor object, the policy has to choose the actions using the specified Q-Function.
	CActorFromQFunctionAndPolicy(CAbstractQFunction *qFunction, CStochasticPolicy *policy);
	virtual ~CActorFromQFunctionAndPolicy();

/// Updates the Q-Function
/** Does the following update: Q(s_t,a_t)_new = Q(s_t,a_t)_old + beta * td * (1 - pi(s_t, a_t))
*/
	virtual void receiveError(rlt_real critic, CStateCollection *state, CAction *Action, CActionData *data = NULL);

	CStochasticPolicy *getPolicy();

	
};

/// Actor class which can only decide beween 2 different action, depending on the action value of the current state
/** 
This is the implementation of the simple Actor-Critic Algorithm used by Barto, Sutton, and Anderson in their cart pole example. The actor can only decide between 2 actions. Which action is taken depends on the action value of the current state. If this value is negative, the first action is more likely to be choosen and vice versa. The probabilty of choosing the first action is caculated the following way : 1.0 / (1.0 + exp(actionvalue(s))).
The action weight value is represented by an V-Function, for updating the V-Function an etrace object is used. The current state is added to the etrace with a positive factor if the second action was choosed, otherwise with a negative factor. When a new episode begins, the etraces are resetted.
This kind of algorithm usually need a very high learning rate, for this class 1000.0 is the standard value for the "ActorLearningRate" Parameter.
<p>
This class directly implements the CAgentController interface, so it can be used as controller.
*/
class CActorFromActionValue : public CAgentController, public CActor, public CSemiMDPListener
{
protected:
	CAbstractVFunction *vFunction;
	CAbstractVETraces *eTraces;

public:
	CActorFromActionValue(CAbstractVFunction *vFunction, CAction *action1, CAction *action2);
	~CActorFromActionValue();

	/// Adopt the action values according to the critic
	virtual void receiveError(rlt_real critic, CStateCollection *oldState, CAction *Action, CActionData *data = NULL);

	virtual CAction *getNextAction(CStateCollection *state, CActionDataSet *data = NULL);
		/// resets etraces object
	virtual void newEpisode();
};

class CActorFromContinuousActionGradientPolicy : public CActor, public CSemiMDPListener
{
protected:
	CContinuousActionGradientPolicy *gradientPolicy;
	CGradientVETraces *gradientETraces;
	CFeatureList *gradientFeatureList;

	CContinuousActionData *policyDifference;
public:
	CActorFromContinuousActionGradientPolicy(CContinuousActionGradientPolicy *gradientPolicy);
	virtual ~CActorFromContinuousActionGradientPolicy();

	virtual void receiveError(rlt_real critic, CStateCollection *oldState, CAction *Action, CActionData *data = NULL);
	virtual void newEpisode();
};

/*
/// Class Representing all Actor Critic Learning algorithm.
/**
Actor critic Learners have an own representation for the Policy (the Actor) and the Value Function (the Critic). 
The critic mantains the value function of the policy (the actor) and sends the TD-error for each step to the actor. The actor adapts his Policy according to the critics got from the critic. The value function is not learned by the Actor Critic learner class and has to be learned by an own V-Function Learner.
<p>
The TD error is calculated by td = r_t + gamma * V(s_{t+1}) - V(s_t). This value is similar to the temporal difference at the TD-Learner. The critic learns the Value function of the Policy by updating
V(s_t) with td (combined with a learning rate). The adoption of the actor's policy is for example done by a normal Q-Function, Using a Softmax Policy as acting Policy, or it can also be some other continuous policy.
<p>
It also maintains an CActor object, to which the critic (the TD error) is send. For each step the TD-Value is calculated, and send to the actor which uses to update his Policy. If a new Episode starts, the actor's new Episode method is called ( for resetting eTraces), so the actor doesn't need to be an agent listener.
<p>
The actor critic learner itself has to be added to the agent listener list, its recommended to add it before you add the V-Learner object, in order not to falsify the TD-Error calculation.


class CActorCriticLearner : public CSemiMDPRewardListener
{
protected:
	/// assigned actor to the learning algorithm
	CActor *actor;
	/// V-Function serving as critic
	CAbstractVFunction *critic;

public:
	/// Creates an actor critic learning algorithm with the specified actor and critic.
	CActorCriticLearner(CRewardFunction *rewardFunction, CActor *actor, CAbstractVFunction *critic);

	virtual ~CActorCriticLearner() {};

	/** Calculates the TD-Error, updates the Value Function and send the TD-Error to the actor object.
	void nextStep(CStateCollection *oldState, CAction *action, rlt_real reward, CStateCollection *nextState);
	
	rlt_real getTemporalDifference(CStateCollection *oldState, CAction *action, rlt_real reward, CStateCollection *nextState);

	virtual void newEpisode();

	CAbstractVFunction *getCritic();
	CActor *getActor();
};
*/

#endif

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
日韩免费看网站| 午夜电影一区二区三区| 亚洲午夜精品一区二区三区他趣| 午夜国产精品一区| 色综合色综合色综合| 久久亚区不卡日本| 免费观看91视频大全| 色哟哟在线观看一区二区三区| 久久婷婷久久一区二区三区| 日本不卡123| 在线中文字幕不卡| 亚洲欧洲一区二区三区| 国产aⅴ精品一区二区三区色成熟| 欧美精品三级在线观看| 亚洲精品国产第一综合99久久| 国产高清视频一区| 久久精品日产第一区二区三区高清版 | 日韩欧美中文字幕一区| 一级女性全黄久久生活片免费| 国产不卡一区视频| 久久久久久久久久久久久夜| 蜜桃一区二区三区在线观看| 91精品国产免费久久综合| 亚洲国产一区二区三区青草影视| 一本色道久久综合精品竹菊| 亚洲色图制服诱惑 | 蜜桃精品在线观看| 在线不卡中文字幕| 午夜激情综合网| 欧美日韩国产经典色站一区二区三区| 自拍偷拍国产精品| 色综合网色综合| 亚洲另类春色国产| 91黄色激情网站| 夜色激情一区二区| 欧美日韩国产一二三| 亚洲成av人片观看| 日韩视频永久免费| 国产原创一区二区| 久久精品一二三| 播五月开心婷婷综合| 亚洲人成网站在线| 欧美日韩三级一区二区| 男人的天堂久久精品| 精品久久人人做人人爽| 国产91高潮流白浆在线麻豆| 国产精品的网站| 欧美中文字幕一区二区三区亚洲| 亚洲成a人片在线观看中文| 欧美一区二区三区啪啪| 国产在线观看免费一区| 亚洲视频一二区| 欧美日韩精品欧美日韩精品| 久久91精品国产91久久小草| 久久久精品一品道一区| 一本到不卡免费一区二区| 日日嗨av一区二区三区四区| 精品日产卡一卡二卡麻豆| www.av精品| 青草国产精品久久久久久| 久久精品视频网| 欧美偷拍一区二区| 激情综合网最新| 亚洲黄色小说网站| 亚洲精品在线电影| 色婷婷久久99综合精品jk白丝| 日本免费新一区视频 | 精品国免费一区二区三区| 成人精品视频一区二区三区尤物| 亚洲一级在线观看| 国产三区在线成人av| 精品视频999| 国产成人免费在线观看| 午夜a成v人精品| 国产精品日产欧美久久久久| 制服丝袜日韩国产| 99视频精品在线| 另类人妖一区二区av| 一区二区三区四区亚洲| 久久久久综合网| 5月丁香婷婷综合| 91麻豆免费在线观看| 国产在线不卡一区| 免费高清在线一区| 亚洲成人第一页| 亚洲欧洲性图库| 久久久久久免费网| 日韩一区二区三区在线| 在线观看欧美黄色| 成人深夜在线观看| 国产毛片精品视频| 看片的网站亚洲| 同产精品九九九| 一区二区三区**美女毛片| 中文在线一区二区| 久久久精品黄色| 欧美大片一区二区三区| 91精品国产综合久久久蜜臀粉嫩| 欧洲精品一区二区三区在线观看| 粉嫩高潮美女一区二区三区 | 成人av在线观| 国产91丝袜在线播放0| 国产一区二区三区四区五区入口 | 午夜电影一区二区三区| 亚洲狠狠爱一区二区三区| 亚洲天堂2016| 国产精品美女久久久久久| 亚洲国产成人在线| 国产日韩综合av| 欧美国产综合一区二区| 国产亚洲综合在线| 国产亚洲视频系列| 国产日韩一级二级三级| 国产亚洲综合在线| 欧美激情一区二区三区全黄| 国产亚洲欧美日韩在线一区| www国产成人| 亚洲国产精品黑人久久久| 国产性天天综合网| 国产精品电影一区二区| 尤物在线观看一区| 亚洲国产成人av好男人在线观看| 亚洲成人一区二区| 日韩成人一区二区三区在线观看| 青青草97国产精品免费观看| 激情文学综合网| 岛国一区二区在线观看| 色综合天天做天天爱| 欧美色爱综合网| 日韩三级在线观看| 久久久精品免费网站| 中文字幕一区日韩精品欧美| 亚洲激情校园春色| 日韩中文欧美在线| 国产在线精品不卡| 一本到高清视频免费精品| 欧美日韩美少妇| 久久午夜老司机| 日韩毛片高清在线播放| 丝袜亚洲另类丝袜在线| 国产精品自拍一区| 欧美在线观看一二区| 日韩欧美123| 国产精品不卡一区| 日本不卡123| 91视视频在线观看入口直接观看www| 欧美日韩在线三级| 国产区在线观看成人精品| 亚洲精品亚洲人成人网在线播放| 日本伊人午夜精品| 成人h动漫精品| 日韩免费观看高清完整版| 最好看的中文字幕久久| 捆绑调教美女网站视频一区| 99riav久久精品riav| 91精品婷婷国产综合久久性色| 国产免费成人在线视频| 午夜精品久久久久久久久| 国产成人免费视| 在线不卡中文字幕播放| 亚洲人吸女人奶水| 国产麻豆精品一区二区| 欧美日韩精品久久久| 中文字幕一区二区三区av| 九九久久精品视频| 欧美午夜在线观看| 中文字幕一区二| 国产精品白丝jk黑袜喷水| 欧美麻豆精品久久久久久| 亚洲欧美日韩久久| 国产盗摄一区二区三区| 欧美午夜视频网站| 国产精品成人一区二区三区夜夜夜| 免费的成人av| 欧美日韩久久一区二区| 亚洲精品一二三区| 99v久久综合狠狠综合久久| 欧美成人三级电影在线| 日韩福利视频导航| 欧美自拍偷拍午夜视频| 综合在线观看色| av亚洲产国偷v产偷v自拍| 久久久久一区二区三区四区| 美女视频黄a大片欧美| 欧美肥妇bbw| 日韩黄色小视频| 欧美手机在线视频| 亚洲第一激情av| 精品视频在线免费| 亚洲国产成人高清精品| 欧美性做爰猛烈叫床潮| 一区二区三区在线观看国产| jizz一区二区| 亚洲欧美二区三区| 色综合色综合色综合| 亚洲女同一区二区| 91国在线观看| 日韩精品国产欧美| 日韩欧美一区二区视频| 精品一区二区久久|