📝 Abstract:
This thesis investigates whether transformer-based sequence modeling can serve as a practical and reusable reinforcement learning framework for trick-taking card games under partial observability. To address this problem, a unified framework named DTCard is developed and evaluated on three trick-taking games: Hearts, Whist, and Spades. Instead of relying on conventional value-based reinforcement learning alone, the proposed approach reformulates decision making as a return-conditioned sequence prediction problem through the use of Decision Transformers. The framework is organized as a three-phase pipeline. First, offline datasets are generated through simulated gameplay using random and rule-based baseline agents. Second, a Decision Transformer is trained offline on the collected trajectories. Third, the pretrained model is refined through a controlled Selective Expert Iteration stage in order to improve performance beyond the limitations of the static dataset. In addition, legality constraints are handled through a design that avoids strict train-time masking while ensuring safe inference through deterministic legal action filtering. Experimental results show that the proposed framework learns competitive policies across all three games and consistently performs above the parity level in a four-player setting. The refined DT+SEI variant further improves performance in most settings, with the strongest gains observed in Hearts. Additional analysis shows that the model achieves very low pre-filter illegal move rates, indicating that legality-sensitive behavior can largely be learned from trajectory data. Overall, the findings demonstrate that transformer-based offline reinforcement learning can provide a viable and reusable framework for imperfect-information trick-taking card games.