MultiAgent¶

[!PREREQUISITE]

05-Trees_Minimax_Pruning

06-Expectimax_Monte_Carlo_Tree_Search

preject 2 （若需要认证，可见仓库）

Quick Review¶

博弈树（Game Tree）
- 博弈树是一种图形结构，用于表示博弈中的所有可能状态及其相应的决策。树的节点代表游戏状态，边代表玩家的行动。通过分析博弈树，玩家可以评估不同策略的结果，从而选择最佳行动。
Minimax（极小极大算法）
- 一种决策算法，用于在零和游戏中寻找最佳策略，通过假设对手也会采取最佳行动，从而最小化可能的损失或最大化可能的收益。
Alpha-Beta Pruning（α-β 剪枝）
- 一种优化算法，用于减少在博弈树中评估的节点数量，从而提高 minimax 算法的效率，常用于决策过程中。
Evaluation Functions（评估函数）
- 在决策中用于估计特定局面的函数，帮助 Agent 判断当前状态的优劣，以便做出最佳决策。
Expectimax（期望极大算法）
- 一种扩展的决策算法，适用于包含随机性因素的博弈，通过计算每个可能结果的期望值来选择最佳行动，常用于棋类游戏和其他不确定环境中的决策。
Mixed Layer Types（混合层类型）
- 在决策过程中，我们的对手不一定只有一个（但是目的相同），他们依次行动，每人依次在博弈树上占有一层。
General Game（通用游戏）
- 我们的多个对手的目的并非相同，对手的执行顺序对结果影响巨大。
Monte Carlo Tree Search（蒙特卡洛树搜索）
- 蒙特卡洛树搜索是一种启发式搜索算法，通过随机采样和模拟来评估游戏树中的节点，帮助做出决策。

explain¶

Q1 (4 pts): Reflex Agent¶

文档提醒我们关注：

game.py -> Grid

class Grid:
    """
    A 2-dimensional array of objects backed by a list of lists.  Data is accessed via grid[x][y] where (x,y) are positions on a Pacman map with x horizontal, y vertical and the origin (0,0) in the bottom left corner.
    """
    def __init__(self, width, height, initialValue=False, bitRepresentation=None):
        if initialValue not in [False, True]:
            raise Exception('Grids can only contain booleans')
        self.CELLS_PER_INT = 30

        self.width = width
        self.height = height
        self.data = [[initialValue for y in range(
            height)] for x in range(width)]
        if bitRepresentation:
            self._unpackBits(bitRepresentation)
    ... # 省略一些关系不大的内容
    def asList(self, key=True):
        list = []
        for x in range(self.width):
            for y in range(self.height):
                if self[x][y] == key:
                    list.append((x, y))
        return list

explore¶

对于 evalution，我们之前使用过曼哈顿距离，这里不妨继续使用；不同的是，这次我们应该把 ghost 的位置也考虑进来了；此外，距离食物越近，鬼魂越远，得分应当越高，这里按照 project 2 中的提示使用倒数：

evalutionFunction v1

def evaluationFunction(self, currentGameState: GameState, action):
    # Useful information you can extract from a GameState (pacman.py)
    successorGameState = currentGameState.generatePacmanSuccessor(action)
    newPos = successorGameState.getPacmanPosition()
    newFood = successorGameState.getFood()
    newGhostStates = successorGameState.getGhostStates()
    newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]

    # return successorGameState.getScore()
    "*** YOUR CODE HERE ***"
    # Get the food list and initial score
    foodList = newFood.asList()
    score = successorGameState.getScore()
    # Fooe evaluation
    for food in foodList:
        score += 1 / util.manhattanDistance(newPos, food)

    # Ghost evaluation
    for i, ghostState in enumerate(newGhostStates):
        ghostPos = ghostState.getPosition()
        distance = util.manhattanDistance(newPos, ghostPos)
        if newScaredTimes[i] > 0:
            # Ghost is scared, it's good to be close
            score += 2 / (distance + 1) # avoid ZeroDivisionError
        else:
            score -= 2 / (distance + 1)  

    return score

Q1 半通过

right¶

可以看到死了两次，但是平均分还是挺高的；但是依照我们的成绩标准，不死才是比较好的；所以我们离 ghost 远一点，近了就给个“差评”：

evalutionFunciotn v2

def evaluationFunction(self, currentGameState: GameState, action):
    # Useful information you can extract from a GameState (pacman.py)
    successorGameState = currentGameState.generatePacmanSuccessor(action)
    newPos = successorGameState.getPacmanPosition()
    newFood = successorGameState.getFood()
    newGhostStates = successorGameState.getGhostStates()
    newScaredTimes = [ghostState.scaredTimer for ghostState in newGhostStates]

    # return successorGameState.getScore()
    "*** YOUR CODE HERE ***"
    # Get the food list and initial score
    foodList = newFood.asList()
    score = successorGameState.getScore()
    for food in foodList:
        score += 1 / util.manhattanDistance(newPos, food)

    # Ghost evaluation
    for i, ghostState in enumerate(newGhostStates):
        ghostPos = ghostState.getPosition()
        distance = util.manhattanDistance(newPos, ghostPos)
        if newScaredTimes[i] > 0:
            # Ghost is scared, it's good to be close
            score += 2 / (distance + 1) # avoid ZeroDivisionError
        else:
            if distance < 2: # min_distance
                score -= 500  # Big penalty for being too close
            else:
                score -= 2 / (distance + 1)

    return score

Q1 通过

可以看到最后一次为了躲避 ghost 分数低于了 1000，但是活着才是硬道理。当然存在随机性，都高于 1000 也不是不可；如果将 min_distance 改为 1，效果和 v1 差不多，可能是逃跑的太晚了吧 hhh 。

最后再来测试一下 python pacman.py --frameTime 0 -p ReflexAgent -k 2

可以看到也成功通过了；min_distance = 1 时则失败。

Q2 (5 pts): Minimax¶

explore¶

在 project-2 中给出了一个 depth=2 时的决策树，我将其分为若干层 level （最上方为 level=0）：

思路：getAction 仿照 Q1 中的

MinimaxAgent -> getAction v1

def getAction(self, gameState: GameState):
    "*** YOUR CODE HERE ***"
    # Pacman is always agent 0, and the agents move in order of increasing agent index.
    legalActions = gameState.getLegalActions(0)
    scores = self.getNextLevel(gameState, 0, 0)
    bestScore = max(scores)
    bestIndices = [
        index for index in range(len(scores)) if scores[index] == bestScore
    ]
    chosenIndex = random.choice(bestIndices)  # Pick randomly among the best
    return legalActions[chosenIndex]
    util.raiseNotDefined()

# return a scores: [int]
def getNextLevel(self, gameState: GameState, agentIndex, level):
    agentIndex = agentIndex % gameState.getNumAgents()
    legalActions = gameState.getLegalActions(agentIndex)
    successors = [
        gameState.generateSuccessor(agentIndex, action) for action in legalActions
    ]
    return [
        self.getValue(successor, agentIndex + 1, level + 1)
        for successor in successors
    ]

# return a score
def getValue(self, gameState: GameState, agentIndex=0, level=0):
    if (
        gameState.isWin()
        or gameState.isLose()
        or level == self.depth * gameState.getNumAgents()
    ):
        return self.evaluationFunction(gameState)
    if agentIndex == 0:
        return max(self.getNextLevel(gameState, agentIndex, level))
    else:
        return min(self.getNextLevel(gameState, agentIndex, level))

right¶

MinimaxAgent -> getAction v2

def getAction(self, gameState: GameState):
    "*** YOUR CODE HERE ***"
    def getValue(state, agentIndex, depth):
        agentIndex = agentIndex % state.getNumAgents()
        if state.isWin() or state.isLose() or depth == 0:
            return self.evaluationFunction(state)
        elif agentIndex == 0:
            return max(
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    depth - 1,
                )
                for action in state.getLegalActions(agentIndex)
            )
        else:
            return min(
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    depth - 1,
                )
                for action in state.getLegalActions(agentIndex)
            )
    # Pacman is always agent 0, and the agents move in order of increasing agent index.
    legalActions = gameState.getLegalActions(0)
    scores = [
        getValue(
            gameState.generateSuccessor(0, action),
            1,
            self.depth * gameState.getNumAgents() - 1,
        )
        for action in legalActions
    ]
    bestScore = max(scores)
    bestIndices = [
        index for index in range(len(scores)) if scores[index] == bestScore
    ]
    chosenIndex = random.choice(bestIndices)  # Pick randomly among the best
    return legalActions[chosenIndex]

Q2 通过

Q3 (5 pts): Alpha-Beta Pruning¶

explore¶

在 05-Trees_Minimax_Pruning 中我们提到了 Alpha-Beta Pruning 的伪代码，由上一题的代码，我们将其 agentIndex 不同时的操作更换即可：

AlphaBetaAgent -> getAction v1

def getAction(self, gameState: GameState):
    def max_value(state, agentIndex, level, alpha, beta):
        v = float("-inf")
        for action in state.getLegalActions(agentIndex):
            v = max(
                v,
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    level + 1,
                    alpha,
                    beta,
                ),
            )
            if v > beta:
                return v
            alpha = max(alpha, v)
        return v

    def min_value(state, agentIndex, level, alpha, beta):
        v = float("inf")
        for action in state.getLegalActions(agentIndex):
            v = min(
                v,
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    level + 1,
                    alpha,
                    beta,
                ),
            )
            if v < alpha:
                return v
            beta = min(beta, v)
        return v

    def getValue(state, agentIndex, level, alpha, beta):
        agentIndex = agentIndex % state.getNumAgents()
        if (
            state.isWin()
            or state.isLose()
            or level == self.depth * gameState.getNumAgents() - 1
        ):
            return self.evaluationFunction(state)
        elif agentIndex == 0:
            return max_value(state, agentIndex, level, alpha, beta)
        else:
            return min_value(state, agentIndex, level, alpha, beta)

    # Pacman is always agent 0, and the agents move in order of increasing agent index.
    legalActions = gameState.getLegalActions(0)
    alpha = float("-inf")
    beta = float("inf")
    scores = [
        getValue(gameState.generateSuccessor(0, action), 1, 0, alpha, beta)
        for action in legalActions
    ]
    bestScore = max(scores)
    bestIndices = [
        index for index in range(len(scores)) if scores[index] == bestScore
    ]
    chosenIndex = random.choice(bestIndices)  # Pick randomly among the best
    return legalActions[chosenIndex]

但是运行依旧是失败的，部分测试无法通过。

right¶

参考 szzxljr 的代码我发现了问题：我在最后获取根节点的值是依旧遍历了其所有子代；在最后我们依旧应该剪枝：

AlphaBetaAgent -> getAction v2

def getAction(self, gameState: GameState):
    "*** YOUR CODE HERE ***"

    def getValue(state, agentIndex, level, alpha, beta):
        agentIndex = agentIndex % state.getNumAgents()
        if (
            state.isWin()
            or state.isLose()
            or level == self.depth * state.getNumAgents()
        ):
            return self.evaluationFunction(state)
        elif agentIndex == 0:
            return max_value(state, agentIndex, level, alpha, beta)
        else:
            return min_value(state, agentIndex, level, alpha, beta)

    def max_value(state, agentIndex, level, alpha, beta):
        v = float("-inf")
        for action in state.getLegalActions(agentIndex):
            v = max(
                v,
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    level + 1,
                    alpha,
                    beta,
                ),
            )
            if v > beta:
                return v
            alpha = max(alpha, v)
        return v

    def min_value(state, agentIndex, level, alpha, beta):
        v = float("inf")
        for action in state.getLegalActions(agentIndex):
            v = min(
                v,
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    level + 1,
                    alpha,
                    beta,
                ),
            )
            if v < alpha:
                return v
            beta = min(beta, v)
        return v

    # Pacman is always agent 0, and the agents move in order of increasing agent index.
    legalActions = gameState.getLegalActions(0)
    alpha = float("-inf")
    beta = float("inf")
    bestScore = float("-inf")
    bestAction = None

    for action in legalActions:
        score = getValue(gameState.generateSuccessor(0, action), 1, 1, alpha, beta)
        if score > bestScore:
            bestScore = score
            bestAction = action
        alpha = max(alpha, bestScore)

    return bestAction

Q3 通过

Q4 (5 pts): Expectimax¶

right¶

为什么 Q4 没有 explore? expectimax 和 minimax 的区别只有一个，ghost level 的效果不再是取最小，而是取平均，改改 Q2 不就好了：

ExpectimaxAgent -> getAction

def getAction(self, gameState: GameState):
    "*** YOUR CODE HERE ***"
    def getValue(state, agentIndex, depth):
        agentIndex = agentIndex % state.getNumAgents()
        if state.isWin() or state.isLose() or depth == 0:
            return self.evaluationFunction(state)
        elif agentIndex == 0:
            return max(
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    depth - 1,
                )
                for action in state.getLegalActions(agentIndex)
            )
        else: # 唯一改动之处
            return sum(
                getValue(
                    state.generateSuccessor(agentIndex, action),
                    agentIndex + 1,
                    depth - 1,
                )
                for action in state.getLegalActions(agentIndex)
            ) / len(state.getLegalActions(agentIndex))

    # Pacman is always agent 0, and the agents move in order of increasing agent index.
    legalActions = gameState.getLegalActions(0)
    scores = [
        getValue(
            gameState.generateSuccessor(0, action),
            1,
            self.depth * gameState.getNumAgents() - 1,
        )
        for action in legalActions
    ]
    bestScore = max(scores)
    bestIndices = [
        index for index in range(len(scores)) if scores[index] == bestScore
    ]
    chosenIndex = random.choice(bestIndices)  # Pick randomly among the best
    return legalActions[chosenIndex]
    util.raiseNotDefined()

Q4 通过

ExpectimaxAgent wins about half the time, while AlphaBetaAgent always loses.

Q5 (6 pts): Evaluation Function¶

right¶

其实和 Q1 差不多（因为我们当时的实现就挺不错了），改为评估当前状态就好了。

betterEvaluationFunction

def betterEvaluationFunction(currentGameState: GameState):
    """
    DESCRIPTION: <write something here so we know what you did>
    Just as what we do in ReflexAgent, but now we evaluate currentGameState
    """
    "*** YOUR CODE HERE ***"
    Pos = currentGameState.getPacmanPosition()
    Food = currentGameState.getFood()
    GhostStates = currentGameState.getGhostStates()
    ScaredTimes = [ghostState.scaredTimer for ghostState in GhostStates]
    foodList = Food.asList()
    score = currentGameState.getScore()
    for food in foodList:
        score += 1 / util.manhattanDistance(Pos, food)
    # Ghost evaluation
    for i, ghostState in enumerate(GhostStates):
        ghostPos = ghostState.getPosition()
        distance = util.manhattanDistance(Pos, ghostPos)
        if ScaredTimes[i] > 0:
            # Ghost is scared, it's good to be close
            score += 2 / (distance + 1)
        else:
            # Ghost is not scared, avoid it
            if distance < 2:
                score -= 500  # Big penalty for being too close
            else:
                score -= 2 / (distance + 1)
    return score
    util.raiseNotDefined()

Q5 通过

MultiAgent¶

Quick Review¶

explain¶

Q1 (4 pts): Reflex Agent¶

explore¶

right¶

Q2 (5 pts): Minimax¶

explore¶

right¶

Q3 (5 pts): Alpha-Beta Pruning¶

explore¶

right¶

Q4 (5 pts): Expectimax¶

right¶

Q5 (6 pts): Evaluation Function¶

right¶

pass¶

评论