- Basics of python-docx library
- Contents
- Install python-docx
- Check if python-docx installed successfully
- Create new document
- Open and read a document
- Add text content to a document
- Align content
- Apply styles — add_run function
- Bold, Italics, Underline
- Font color, size and family
- Tables
- Tabbed paragraph
- Add Image
- Loop through paragraphs
- Font Color¶
- Candidate Protocol¶
- Specimen XML¶
- Schema excerpt¶
Basics of python-docx library
python-docx is an open source python library for working with word documents. If there is a requirement to work with word documents using python code, this library is the best option available.
This is also free to use, simple to understand and optimized for best utilization of CPU.
Most of the functionalities that Microsoft Word offers is covered in this library.
Also compatible with the latest version of Python.
Go through this tutorial and you will get to know all the basics you need to work with word documents using python.
Each of the common functionalities are explained with easy to understand code samples.
Contents
Install python-docx
Download and install the latest version of python from Python Website if not already done. Setting up virtual environment is not a must. But this is a best practice when working on python projects. Click virtual environment for more details. Install python-docx using the command: pip install python-docx
Check if python-docx installed successfully
Create new document
Blank word documents can be easily created using python-docx . Import Document module from docx library and also import the os standard library. We need os module functionalities to Open or Save files. Then create a Document object and add the necessary contents and save the file by calling Save() function.
from docx import Document import os newDoc = Document() newDoc.add_paragraph("First Line") filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Open and read a document
To open word files, call Document class and pass the file path as parameter as shown below. Then, we can loop through paragraphs or headings of any other type of contents to access the data.
from docx import Document import os filepath = os.path.dirname(__file__) + '\TestFile.docx' d = Document(filepath) print(len(d.paragraphs)) for p in d.paragraphs: print(p)
Add text content to a document
To add all the text content use add_paragraph built-in function. All content added are by default of normal style. Apply different styles as below. To add a line break, add an empty paragraph.
from docx import Document import os newDoc = Document() newDoc.add_paragraph("Top Heading", style = 'Heading 1')#'Heading 1' newDoc.add_paragraph("First Line")#'Normal' newDoc.add_paragraph("List 1", style = 'List Number')#'Ordered List' newDoc.add_paragraph("List 2", style = 'List Number')#'Ordered List' newDoc.add_paragraph("List 1", style = 'List Bullet')#'Unordered List' newDoc.add_paragraph("List 2", style = 'List Bullet')#'Unordered List' filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Align content
from docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() htop = newDoc.add_paragraph("Top Heading", style = 'Heading 1') htop.alignment = WD_ALIGN_PARAGRAPH.CENTER c = newDoc.add_paragraph("This is a justified paragraph.\nThis is a justified paragraph") c.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Apply styles — add_run function
To apply additional styles to a content we have to use add_run function. This function returns an object variable to which we can apply styles.
from docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() a1 = newDoc.add_paragraph().add_run('Bold Line') a1.bold = True filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Bold, Italics, Underline
We can change the style of the text content to Bold, Italics or Underlined by using the built-in properties shown below.
from docx import Document from docx.enum.text import WD_ALIGN_PARAGRAPH import os newDoc = Document() p = newDoc.add_paragraph() p.add_run('Bold Content\n') p.bold = True p.add_run('Italic Content\n').italic = True p.add_run('Underline Content\n').underline = True filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Font color, size and family
To modify the font color, size or family, import RGBColor, Pt from docx.shared module of python-docx .
from docx import Document from docx.shared import RGBColor, Pt import os newDoc = Document() p = newDoc.add_paragraph().add_run('Test Content') p.font.color.rgb = RGBColor(115, 52, 100) p.font.name = 'Calibri' p.font.size = Pt(20) filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Tables
from docx import Document import os newDoc = Document() t = newDoc.add_table(rows = 2, cols = 2) t.rows[0].cells[0].text = 'Cell 1' t.rows[0].cells[1].text = 'Cell 2' t.rows[1].cells[0].text = 'Cell 3' t.rows[1].cells[1].text = 'Cell 4' t.add_row() t.rows[2].cells[0].text = 'Cell 5' t.rows[2].cells[1].text = 'Cell 6' filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Tabbed paragraph
from docx import Document import os newDoc = Document() p = newDoc.add_paragraph() p.add_run().add_tab() p.add_run('After Tab') filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Add Image
from docx import Document from docx.shared import RGBColor, Pt, Inches import os newDoc = Document() img = newDoc.add_picture(imgpath, height=Inches(3), width=Inches(2)) filepath = os.path.dirname(__file__) + '\TestFile.docx' newDoc.save(filepath)
Loop through paragraphs
We can use for loop to navigate through the contents of the document. There are options to access the content based on the type like Paragraphs, Images, Headings, etc. Or we can provide a common property to a group of contents and access those. To see how this works, refer the sample application link provided below.
import docx as doc import os filepath = os.path.dirname(__file__) + '\TestFile.docx' d = doc.Document(filepath) print(len(d.paragraphs)) for p in d.paragraphs: print(p)
Font Color¶
Color, as a topic, extends beyond the Font object; font color is just the first place it’s come up. Accordingly, it bears a little deeper thought than usual since we’ll want to reuse the same objects and protocol to specify color in the other contexts; it makes sense to craft a general solution that will bear the expected reuse.
There are three historical sources to draw from for this API.
- The w:rPr/w:color element. This is used by default when applying color directly to text or when setting the text color of a style. This corresponds to the Font.Color property (undocumented, unfortunately). This element supports RGB colors, theme colors, and a tint or shade of a theme color.
- The w:rPr/w14:textFill element. This is used by Word for fancy text like gradient and shadow effects. This corresponds to the Font.Fill property.
- The PowerPoint font color UI. This seems like a reasonable compromise between the prior two, allowing direct-ish access to common color options while holding the door open for the Font.fill operations to be added later if required.
Candidate Protocol¶
>>> from docx import Document >>> from docx.text.run import Font, Run >>> run = Document().add_paragraph().add_run() >>> isinstance(run, Run) True >>> font = run.font >>> isinstance(font, Font) True
docx.text.run.Font has a read-only color property, returning a docx.dml.color.ColorFormat object:
>>> from docx.dml.color import ColorFormat >>> color = font.color >>> isinstance(font.color, ColorFormat) True >>> font.color = 'anything' AttributeError: can't set attribute
docx.dml.color.ColorFormat has a read-only type property and read/write rgb , theme_color , and brightness properties.
ColorFormat.type returns one of MSO_COLOR_TYPE.RGB , MSO_COLOR_TYPE.THEME , MSO_COLOR_TYPE.AUTO , or None , the latter indicating font has no directly-applied color:
ColorFormat.rgb returns an RGBColor object when type is MSO_COLOR_TYPE.RGB . It may also report an RGBColor value when type is MSO_COLOR_TYPE.THEME , since an RGB color may also be present in that case. According to the spec, the RGB color value is ignored when a theme color is specified, but Word writes the current RGB value of the theme color along with the theme color name (e.g. ‘accent1’) when assigning a theme color; perhaps as a convenient value for a file browser to use. The value of .type must be consulted to determine whether the RGB value is operative or a “best-guess”:
>>> font.color.type RGB (1) >>> font.color.rgb RGBColor(0x3f, 0x2c, 0x36)
Assigning an RGBColor value to ColorFormat.rgb causes ColorFormat.type to become MSO_COLOR_TYPE.RGB :
>>> font.color.type None >>> font.color.rgb = RGBColor(0x3f, 0x2c, 0x36) >>> font.color.type RGB (1) >>> font.color.rgb RGBColor(0x3f, 0x2c, 0x36)
ColorFormat.theme_color returns a member of MSO_THEME_COLOR_INDEX when type is MSO_COLOR_TYPE.THEME :
>>> font.color.type THEME (2) >>> font.color.theme_color ACCENT_1 (5)
Assigning a member of MSO_THEME_COLOR_INDEX to ColorFormat.theme_color causes ColorFormat.type to become MSO_COLOR_TYPE.THEME :
>>> font.color.type RGB (1) >>> font.color.theme_color = MSO_THEME_COLOR.ACCENT_2 >>> font.color.type THEME (2) >>> font.color.theme_color ACCENT_2 (6)
The ColorFormat.brightness attribute can be used to select a tint or shade of a theme color. Assigning the value 0.1 produces a color 10% brighter (a tint); assigning -0.1 produces a color 10% darker (a shade):
>>> font.color.type None >>> font.color.brightness 0.0 >>> font.color.brightness = 0.4 ValueError: not a theme color >>> font.color.theme_color = MSO_THEME_COLOR.TEXT_1 >>> font.color.brightness = 0.4 >>> font.color.brightness 0.4
Specimen XML¶
Baseline paragraph with no font color:
Paragraph with directly-applied RGB color:
w:val="0000FF"/> w:val="0000FF"/> Directly-applied color Blue.
Run with directly-applied theme color:
w:val="4F81BD" w:themeColor="accent1"/> Theme color Accent 1.
Run with 40% tint of Text 2 theme color:
w:val="548DD4" w:themeColor="text2" w:themeTint="99"/> Theme color with 40% tint.
Run with 25% shade of Accent 2 theme color:
w:val="943634" w:themeColor="accent2" w:themeShade="BF"/> Theme color with 25% shade.
Schema excerpt¶
name="CT_RPr"> minOccurs="0" maxOccurs="unbounded"/> name="rStyle" type="CT_String"/> name="rFonts" type="CT_Fonts"/> name="b" type="CT_OnOff"/> name="bCs" type="CT_OnOff"/> name="i" type="CT_OnOff"/> name="iCs" type="CT_OnOff"/> name="caps" type="CT_OnOff"/> name="smallCaps" type="CT_OnOff"/> name="strike" type="CT_OnOff"/> name="dstrike" type="CT_OnOff"/> name="outline" type="CT_OnOff"/> name="shadow" type="CT_OnOff"/> name="emboss" type="CT_OnOff"/> name="imprint" type="CT_OnOff"/> name="noProof" type="CT_OnOff"/> name="snapToGrid" type="CT_OnOff"/> name="vanish" type="CT_OnOff"/> name="webHidden" type="CT_OnOff"/> name="color" type="CT_Color"/> name="spacing" type="CT_SignedTwipsMeasure"/> name="w" type="CT_TextScale"/> name="kern" type="CT_HpsMeasure"/> name="position" type="CT_SignedHpsMeasure"/> name="sz" type="CT_HpsMeasure"/> name="szCs" type="CT_HpsMeasure"/> name="highlight" type="CT_Highlight"/> name="u" type="CT_Underline"/> name="effect" type="CT_TextEffect"/> name="bdr" type="CT_Border"/> name="shd" type="CT_Shd"/> name="fitText" type="CT_FitText"/> name="vertAlign" type="CT_VerticalAlignRun"/> name="rtl" type="CT_OnOff"/> name="cs" type="CT_OnOff"/> name="em" type="CT_Em"/> name="lang" type="CT_Language"/> name="eastAsianLayout" type="CT_EastAsianLayout"/> name="specVanish" type="CT_OnOff"/> name="oMath" type="CT_OnOff"/> name="rPrChange" type="CT_RPrChange" minOccurs="0"/> name="CT_Color"> name="val" type="ST_HexColor" use="required"/> name="themeColor" type="ST_ThemeColor"/> name="themeTint" type="ST_UcharHexNumber"/> name="themeShade" type="ST_UcharHexNumber"/> name="ST_HexColor"> memberTypes="ST_HexColorAuto s:ST_HexColorRGB"/> name="ST_HexColorAuto"> base="xsd:string"> value="auto"/> name="ST_HexColorRGB"> base="xsd:hexBinary"> value="3" fixed="true"/> name="ST_ThemeColor"> base="xsd:string"> value="dark1"/> value="light1"/> value="dark2"/> value="light2"/> value="accent1"/> value="accent2"/> value="accent3"/> value="accent4"/> value="accent5"/> value="accent6"/> value="hyperlink"/> value="followedHyperlink"/> value="none"/> value="background1"/> value="text1"/> value="background2"/> value="text2"/> name="ST_UcharHexNumber"> base="xsd:hexBinary"> value="1"/>