then you need to correct the AlphaGo team on this


所有跟贴·加跟贴·新语丝读书论坛

送交者: conner 于 2016-03-13, 11:50:57:

回答: 对你来说理解不了是正常的,反过来是不正常的。 由 bluesea 于 2016-03-13, 11:19:30:

引用:
We first trained the policy network on 30 million moves from games played by human experts, until it could predict the human move 57% of the time (the previous record before AlphaGo was 44%). But our goal is to beat the best human players, not just mimic them. To do this, AlphaGo learned to discover new strategies for itself, by playing thousands of games between its neural networks, and gradually improving them using a trial-and-error process known as reinforcement learning. This approach led to much better policy networks, so strong in fact that the raw neural network (immediately, without any tree search at all) can defeat state-of-the-art Go programs that build enormous search trees.

http://googleresearch.blogspot.com/2016/01/alphago-mastering-ancient-game-of-go.html



所有跟贴:


加跟贴

笔名: 密码: 注册笔名请按这里

标题:

内容: (BBCode使用说明