https://www.novaspivack.com/business/the-four-levels-of-ai-steps-to-self-evolution
i want to create some sort of agent, that learns autoregressively on historical tokens (not necessarily present in history, but close). however, when the agent is given some previous tokens, it is expected to send some actions to the environment in order to really observe the given tokens to get reward. the agent is not allowed to directly generate the token to the environment in order to prevent cheating. the agent is rewarded to successfully rebuild the past or predict and build the future. to predict the future is like the target token is generated by the agent itself instead of some automatic history replay bot, and the rest of the reward system follows the same way as the history replay reward system. this kind of system might have some sort of consciousness and therefore agi
the main objective of AGI is to create another version of itself.
the verification system can be built upon internal hidden tokens (you feel like you made it, feeling based) or similarity based (timeseries similarity or semantic similarity). there can be some external verification system such as lifespan, disk usage, view count, popularity, total capital etc.
the main problem of making this work is how to train it in parallel. the real world can be replaced by some world model (say some neural network) so that it can go back in time, or some really fast real world evaluators or some special world evaluators which supports time traversal, like virtual machine snapshots, web browsers (tab traversal). alphago has such advantage because go game is a very simple world model, while the real world is not.
also this could build some hierarchy like: real world -> world model -> agent -> superagent -> …