Project structure of: shunzh/mcts-for-llm
__init__.py
DynaGym registers variants, introduces "LanguageEnv-v0" environment.mcts.py
MCTS algorithm implementation for RL agent in dyna-gym environment.my_random_agent.py
Random Agent class for action space selectionuct.py
UCT agent for Dyna-Gym with MCTS and UCB
default_policy.py
DefaultPolicy: Abstract class with abstract methods for sequence and top-k tokens.hf_default_policy.py
Transformer-based Gym Decision Maker with Top k Tokens
language_env.py
Gym language environment using MCTS
__init__.py
Imports uct pipeline for HF Transformers in Dyna-Gym.uct_for_hf_transformer.py
UCT agent for HuggingFace transformers, generate function.
benchmark.py
Multithreaded benchmark function for comparing agent performances.distribution.py
Calculates 1-Wasserstein distances, generates distributions via linear programs.tree_search_utils.py
Decision tree utilities for networkxutils.py
Dyna Gym utilities for data handling, comparison, and verification.
mcts_nscartpole_v0.py
Monte Carlo agent plays NSCartPole, 100 timesteps, optional verbose output.random_nscartpole_v0.py
Random agent interacts with environment in random_nscartpole_v0.pyuct_language_alignment.py
Language alignment model with MCTS for text generation.uct_nscartpole_v0.py
UCT agent plays NSCartPole-v0 environment, 100 timesteps.uct_nscartpole_v1.py
UCT agent runs 100 timesteps in NSCartPole-v1 environment.uct_nscartpole_v2.py
UCT agent controls NSCartPole-v2 environment in Dyna-Gym, learning through episodes.
README.md
Monte-Carlo search, large language models, sentiment analysis examples.setup.py
Install scripts for dyna_gym, dependencies included