Replicate internally use Cog for packaging and serving large AI models in Docker containers. Currently it only supports macOS and Linux.
According to the doc it offers nearly the same functionally as Replicate such as API calls, fine-tuning.
You may connect your local LLM to VSCode using Continue, an open-source Copilot alternative.