Project structure of: shunzh/mcts-for-llm
__init__.pyDynaGym registers variants, introduces "LanguageEnv-v0" environment.mcts.pyMCTS algorithm implementation for RL agent in dyna-gym environment.my_random_agent.pyRandom Agent class for action space selectionuct.pyUCT agent for Dyna-Gym with MCTS and UCB
default_policy.pyDefaultPolicy: Abstract class with abstract methods for sequence and top-k tokens.hf_default_policy.pyTransformer-based Gym Decision Maker with Top k Tokens
language_env.pyGym language environment using MCTS
__init__.pyImports uct pipeline for HF Transformers in Dyna-Gym.uct_for_hf_transformer.pyUCT agent for HuggingFace transformers, generate function.
benchmark.pyMultithreaded benchmark function for comparing agent performances.distribution.pyCalculates 1-Wasserstein distances, generates distributions via linear programs.tree_search_utils.pyDecision tree utilities for networkxutils.pyDyna Gym utilities for data handling, comparison, and verification.
mcts_nscartpole_v0.pyMonte Carlo agent plays NSCartPole, 100 timesteps, optional verbose output.random_nscartpole_v0.pyRandom agent interacts with environment in random_nscartpole_v0.pyuct_language_alignment.pyLanguage alignment model with MCTS for text generation.uct_nscartpole_v0.pyUCT agent plays NSCartPole-v0 environment, 100 timesteps.uct_nscartpole_v1.pyUCT agent runs 100 timesteps in NSCartPole-v1 environment.uct_nscartpole_v2.pyUCT agent controls NSCartPole-v2 environment in Dyna-Gym, learning through episodes.
README.mdMonte-Carlo search, large language models, sentiment analysis examples.setup.pyInstall scripts for dyna_gym, dependencies included