focus on person only, crop video and leave only human region untouched:
https://github.com/ConceptCodes/portal-zoomer
focus/zoom on given object using pytweening, a easing/tweening function collection.
to tell you, pytweening is initially developed for pyautogui (by the same author at least), probably for evading AI detection, passing captcha or somehow, but it could also be used in animation rendering.
or just use ffmpeg. you need to handcraft those formulas anyway.
does vidpy/mltframework and some other libs supports that? requires investigation.