Python docx изменить документ

Содержание

Эффективное управление документами Word в любом масштабе с Python Simplified
Чтение документа Word с помощью Python
Запись в документ Word с помощью Python
Сканирование папки и применение изменений ко всем
Другие распространенные операции
Создание таблиц
Добавить изображения
Применение форматирования
Добавьте верхние и нижние колонтитулы
Создание списков
Saved searches
Use saved searches to filter your results more quickly
License
ivanbicalho/python-docx-replace
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md

Эффективное управление документами Word в любом масштабе с Python Simplified

С помощью python-docx вы можете легко читать, создавать и изменять документы Word с помощью Python. Независимо от того, нужно ли вам создавать отчеты, шаблоны или автоматизировать задачи управления документами, python-docx может помочь вам оптимизировать рабочий процесс и сэкономить драгоценное время. В этой статье мы рассмотрим основы использования python-docx, а также некоторые продвинутые методы работы с документами Word на Python.

Чтение документа Word с помощью Python

import docx # Load the Word document doc = docx.Document('my_document.docx') # Iterate through the paragraphs in the document for para in doc.paragraphs: print(para.text)

Запись в документ Word с помощью Python

import docx # Create a new Word document doc = docx.Document() # Add some text to the document doc.add_paragraph('Hello, World!') # Save the document doc.save('my_new_document.docx')

Сканирование папки и применение изменений ко всем

Вы можете использовать модули «os» и «python-docx» вместе. Вот пример фрагмента кода:

import os import docx # Set the path of the folder to scan folder_path = 'path/to/folder' # Iterate through all files in the folder for filename in os.listdir(folder_path): # Check if the file is a Word document if filename.endswith('.docx'): # Load the Word document doc = docx.Document(os.path.join(folder_path, filename)) # Apply changes to the document # For example, add a new paragraph doc.add_paragraph('This document has been modified.') # Save the modified document doc.save(os.path.join(folder_path, filename))

В этом коде мы сначала устанавливаем путь к папке для сканирования, используя переменную folder_path . Затем мы перебираем все файлы в папке, используя функцию os.listdir() , и проверяем, является ли каждый файл документом Word, используя метод endwith() , чтобы проверить, заканчивается ли имя файла на «.docx».

Для каждого файла документа Word мы загружаем документ с помощью функции Document() и пути к файлу, созданного с помощью функции os.path.join() . Затем мы применяем изменения к документу по желанию, например, добавляем новый абзац.

Наконец, мы сохраняем измененный документ, используя метод save() и путь к файлу, созданный с помощью os.path.join() , перезаписывая исходный файл.

Обратите внимание, что этот код изменяет исходные документы Word на месте, поэтому при его использовании следует соблюдать осторожность, чтобы случайно не перезаписать важные документы. Рекомендуется сделать резервную копию исходных файлов перед запуском этого кода.

Другие распространенные операции

Модуль «python-docx» предоставляет широкий набор функций и возможностей для работы с документами Word в Python. Вот некоторые общие операции, которые вы можете выполнять с помощью этого модуля:

Создание таблиц

Вы можете создавать таблицы в документе Word с помощью функции add_table() , которая принимает в качестве аргументов количество строк и столбцов в таблице. Затем вы можете добавить данные в таблицу, используя метод cell() для доступа к отдельным ячейкам.

import docx # Create a new Word document doc = docx.Document() # Add a table with 2 rows and 2 columns table = doc.add_table(rows=2, cols=2) # Access individual cells and set their text table.cell(0, 0).text = 'Header 1' table.cell(0, 1).text = 'Header 2' table.cell(1, 0).text = 'Cell 1' table.cell(1, 1).text = 'Cell 2' table.cell(2, 0).text = 'Cell 3' table.cell(2, 1).text = 'Cell 4' # Save the document doc.save('path/to/document_with_table.docx')

Добавить изображения

Вы можете добавлять изображения в документ Word с помощью функции add_picture() , которая принимает в качестве аргументов путь к файлу изображения, а также ширину и высоту изображения в документе.

import docx # Open an existing Word document doc = docx.Document('path/to/document.docx') # Add an image to the end of the document doc.add_picture('path/to/image.jpg', width=docx.shared.Inches(2), height=docx.shared.Inches(2)) # Save the modified document doc.save('path/to/document_with_image.docx')

Применение форматирования

Вы можете применять различные параметры форматирования к тексту и другим элементам в документе Word, например размер и стиль шрифта, полужирный и курсивный текст, подчеркивание и т. д., используя функции, предоставляемые классом Font.

import docx # Open an existing Word document doc = docx.Document('path/to/document.docx') # Access the first paragraph and apply formatting first_para = doc.paragraphs[0] first_para.add_run(' This text is bold.').bold = True first_para.add_run(' This text is italic.').italic = True first_para.add_run(' This text is underlined.').underline = True # Save the modified document doc.save('path/to/document_with_formatting.docx')

Добавьте верхние и нижние колонтитулы

Вы можете добавить верхние и нижние колонтитулы в документ Word, используя свойство section и методы верхнего и нижнего колонтитула класса Section.

import docx # Open an existing Word document doc = docx.Document('path/to/document.docx') # Access the first section and add a header and footer first_section = doc.sections[0] header = first_section.header footer = first_section.footer # Set the text of the header and footer header_text = header.paragraphs[0].add_run('This is the header text.') footer_text = footer.paragraphs[0].add_run('This is the footer text.') # Save the modified document doc.save('path/to/document_with_header_footer.docx')

Создание списков

Вы можете создавать маркированные и нумерованные списки в документе Word, используя функцию add_paragraph() с аргументом стиля, установленным на «List Bullet» или «List Number» соответственно.

import docx # Create a new Word document doc = docx.Document() # Add a bulleted list doc.add_paragraph('This is the first item.', style='List Bullet') doc.add_paragraph('This is the second item.', style='List Bullet') # Add a numbered list doc.add_paragraph('This is the first item.', style='List Number') doc.add_paragraph('This is the second item.', style='List Number') doc.add_paragraph('This is the third item.', style='List Number') # Save the document doc.save('path/to/document_with_lists.docx')

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Replace words inside a Word document without losing format

License

ivanbicalho/python-docx-replace

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This library was built on top of python-docx and the main purpose is to replace words inside a document without losing the format.

There is also a functionality that allows defining blocks in the Word document and set if they will be removed or not.

Replacing a word — docx_replace

You can define a key in your Word document and set the value to be replaced. This program requires the following key format: $

Let’s explain the process behind the library:

First way, losing formatting

One of the ways to replace a key inside a document is by doing something like the code below. Can you do this? YES! But you are going to lose all the paragraph formatting.

key = "$" value = "Ivan" for p in get_all_paragraphs(doc): if key in p.text: p.text = p.text.replace(key, value)

Using the python-docx library, each paragraph has a couple of runs which is a proxy for objects wrapping element. We are going to tell more about it later and you can see more details in the python-docx docs.

You can try replacing the text inside the runs and if it works, then your job is done:

key = "$" value = "Ivan" for p in get_all_paragraphs(doc): for run in p.runs: if key in run.text: run.text = run.text.replace(key, value)

The problem here is that the key can be broken in more than one run, and then you won’t be able to replace it, for example:

Word Paragraph: "Hello $ , welcome!" Run1: "Hello $ , w" Run2: "elcome!"

Word Paragraph: "Hello $ , welcome!" Run1: "Hello $" Run2: "me>, welcome!"

You are probably wondering, why does it break paragraph text this way? What are the purpose of the run ?

Imagine a Word document with this format:

word

Each run holds their own format! That’s the goal for the runs .

Considering this and using this library, what would be the format after parsing the key? Highlighted yellow? Bold and underline? Red with another font? All of them?

The final format will be the format that is present in the $ character. All of the others key’s characters and their formats will be discarded. In the example above, the final format will be highlighted yellow.

The solution adopted is quite simple. First we try to replace in the simplest way, as in the previous example. If it’s work, great, all done! If it’s not, we build a table of indexes:

key = "$ " value = "Ivan" Word Paragraph: "Hello $ , welcome!" Run1: "Hello $" Run2: "me>, welcome!" Word Paragraph: 'H' 'e' 'l' 'l' 'o' ' ' '$' '' 'n' 'a' 'm' 'e' '>' ',' ' ' 'w' 'e' 'l' 'c' 'o' 'm' 'e' '!' Char Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Run Index: 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Run Char Index: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 Here we have the char indexes, the index of each run by char index and the run char index by run. A little confusing, right? With this table we can process and replace all the keys, getting the result: # REPLACE PROCESS: Char Index 6 = p.runs[0].text = "Ivan" # replace '$' by the value Char Index 7 = p.runs[0].text = "" # clean all the others parts Char Index 8 = p.runs[0].text = "" Char Index 9 = p.runs[0].text = "" Char Index 10 = p.runs[1].text = "" Char Index 11 = p.runs[1].text = "" Char Index 12 = p.runs[1].text = ""

After that, we are going to have:

Word Paragraph: 'H' 'e' 'l' 'l' 'o' ' ' 'Ivan' '' '' '' '' '' '' ',' ' ' 'w' 'e' 'l' 'c' 'o' 'm' 'e' '!' Indexes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Run Index: 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 Run Char Index: 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12

All done, now you Word document is fully replaced keeping all the format.

Get document keys — docx_get_keys

You can get all the keys present in the Word document by calling the function docx_get_keys :

keys = docx_get_keys(doc) # Let's suppose the Word document has the keys: $ and $ print(keys) # ['name', 'phone']

Replace blocks — docx_blocks

You can define a block in your Word document and set if it is going to be removed or not. The format required for key blocks are exactly like tags HTML , as following:

Let’s say you define two blocks like this:

Contract Detais of the contract signature> Please, put your signature here: _________________ /signature>

Setting signature to be removed

docx_blocks(doc, signature=True)

Contract Detais of the contract Please, put your signature here: _________________

Setting signature to not be removed

docx_blocks(doc, signature=False)

Contract Detais of the contract

If there are tables inside a block that is set to be removed, these tables are not going to be removed. Tables are different objects in python-docx library and they are not present in the paragraph object.

You can use the function docx_remove_table to remove tables from the Word document by their index.

The table index works exactly like any indexing property. It means if you remove an index, it will affect the other indexes. For example, if you want to remove the first two tables, you can’t do like this:

docx_remove_table(doc, 0) docx_remove_table(doc, 1) # it will raise an index error

You should instead do like this:

docx_remove_table(doc, 0) docx_remove_table(doc, 0)

Источник

Python docx изменить документ

Эффективное управление документами Word в любом масштабе с Python Simplified

Чтение документа Word с помощью Python

Запись в документ Word с помощью Python

Сканирование папки и применение изменений ко всем

Другие распространенные операции

Создание таблиц

Добавить изображения

Применение форматирования

Добавьте верхние и нижние колонтитулы

Создание списков

Saved searches

Use saved searches to filter your results more quickly

License

ivanbicalho/python-docx-replace

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md