Generate Docx Document From Python Docstring

python
docstrings
pdoc3
pandoc
docxcompose
documentation
generation
This article details a method to generate DOCX documents from Python docstrings. It explains the process of installing and using pdoc3, pandoc, and docxcompose to convert, clean, and compose the required files. The step-by-step guide ensures easy understanding and implementation for developers.
Published

February 19, 2023


install and use pdoc3

pip install pdoc3
pdoc --html [-o <output_dir>] <python_script_or_module_path> # default output directory of "html" is `./html`

install and use pandoc, on its homepage we find some slideshow backends like reveal.js, dzslides, s5, slideous and slidy (alternative to microsoft powerpoint, may help rendering video, or let’s use libreoffice instead? or some dedicated video editing library like moviepy)

# let's convert the html version of
pandoc -o <output_docx_filename> <input_html_path>

remove unwanted parts from html (beautifulsoup), and split index from main content (split and concat with docxcompose)

for composing docx from hand, use python-docx. for template based docx generating, use docxtpl

to insert page break into converted docx, there are two ways (maybe):

  1. change css in the original html code

  2. insert page break while concatenating