How to Replace Text in Docx Files in Python

Learn how to replace text in Word document files (.docx) using python-docx library in Python.
  · · 4 min read · Updated jun 2022 · General Python Tutorials

Confused by complex code? Let our AI-powered Code Explainer demystify it for you. Try it out!

In this tutorial, we will make a simple command-line program that we can supply with a .docx file path and words that need replacing.

Imports

We start with the Imports.

The re library is essential here because we can use its sub() function to replace certain expressions with other text in a given string.

We also need the sys module so we can get the command line arguments with sys.argv.

Last but not least, we also get the Document class from docx so we can work with word files. We have to download it first with:

$ pip install python-docx

Let's get started:

# Import re for regex functions
import re

# Import sys for getting the command line arguments
import sys

# Import docx to work with .docx files.
# Must be installed: pip install python-docx
from docx import Document

Checking Command Line Arguments

Next, we get to the command line arguments. We want to check if the inputs are valid.

Now if the sys.argv list is shorter than three items, we know that the user didn't provide enough information. The first argument is always the file path of the Python file itself. The second one should be the file path of the file where the text will be replaced.

The rest of the arguments will be pairs like this text=replacewith which tells us what we replace with what. That's what we check in the for loop.

In the end, we also save the file path to a variable, so we don't have to type out sys.argv[1] every time.

# Check if Command Line Arguments are passed.
if len(sys.argv) < 3:
    print('Not Enough arguments where supplied')
    sys.exit()

# Check if replacers are in a valid schema
for replaceArg in sys.argv[2:]:
    if len(replaceArg.split('=')) != 2:
        print('Faulty replace argument given')
        print('-> ', replaceArg)
        sys.exit()

# Store file path from CL Arguments.
file_path = sys.argv[1]

Docx Files

If the file ends with .docx we know we have to use the docx class. We first make a new Document object which we will provide with our file path. Then we loop over the replacement arguments just like for the .txt files.

After that, we loop through the document's paragraphs right before looping through the runs of the paragraphs. These runs represent the style spans of the document; we replace the text and then simply save the document with the save() method.

if file_path.endswith('.docx'):
    doc = Document(file_path)
    # Loop through replacer arguments
    occurences = {}
    for replaceArgs in sys.argv[2:]:
        # split the word=replacedword into a list
        replaceArg = replaceArgs.split('=')
        # initialize the number of occurences of this word to 0
        occurences[replaceArg[0]] = 0
        # Loop through paragraphs
        for para in doc.paragraphs:
            # Loop through runs (style spans)
            for run in para.runs:
                # if there is text on this run, replace it
                if run.text:
                    # get the replacement text
                    replaced_text = re.sub(replaceArg[0], replaceArg[1], run.text, 999)
                    if replaced_text != run.text:
                        # if the replaced text is not the same as the original
                        # replace the text and increment the number of occurences
                        run.text = replaced_text
                        occurences[replaceArg[0]] += 1
                    
    # print the number of occurences of each word
    for word, count in occurences.items():
        print(f"The word {word} was found and replaced {count} times.")
    
    # make a new file name by adding "_new" to the original file name
    new_file_path = file_path.replace(".docx", "_new.docx")
    # save the new docx file
    doc.save(new_file_path)
else:
    print('The file type is invalid, only .docx are supported')

Let's run it on this document file:

$ python docx_text_replacer.py doc.docx SYN=TEST Linux=Windows TCP=UDP
The word SYN was found and replaced 5 times.
The word Linux was found and replaced 1 times.
The word TCP was found and replaced 1 times. 

I wanted to replace the "SYN" word with "TEST", "Linux" with "Windows", and "TCP" with "UDP" on the document, and it was successful!

Conclusion

Excellent! You have successfully created a file replacement program using Python code! See how you can add more features to this program, such as adding more file formats.

Get the complete code here.

Learn also: How to Convert PDF to Docx in Python.

Happy coding ♥

Want to code smarter? Our Python Code Assistant is waiting to help you. Try it now!

View Full Code Assist My Coding
Sharing is caring!



Read Also



Comment panel

    Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!