Back to productivity
productivity 4.8 min read 203 lines

word-document

Microsoft Word (.docx) 문서 생성, 읽기, 편집 — docx 라이브러리로 리포트/템플릿/편지 생성, Pandoc으로 텍스트 추출/변환

MIT

Word Document (.docx) Processing

Create, read, and manipulate Microsoft Word (.docx) documents.

When to Use

  • Generate Word reports (weekly status, audit summaries) from structured data.
  • Produce standardized memos/letters/templates with consistent formatting.
  • Insert images, headers/footers, page numbers, page breaks, and table of contents.
  • Extract text from existing .docx files for indexing, review, or conversion to Markdown.
  • Create invoices, contracts, or agreements from templates.

Key Features

  • Create .docx documents using the docx (docx-js) Node.js library.
  • Read/extract .docx content via Pandoc conversion.
  • Explicit page setup (US Letter vs A4), margins, and landscape handling.
  • Style control (default font, overriding built-in Heading styles for TOC compatibility).
  • Proper list generation (bullets/numbering via numbering config).
  • Robust table rendering (DXA widths, column widths + cell widths, shading, borders).
  • Images with required metadata and transformations.
  • Page breaks, headers/footers, and page numbering.
  • Table of contents generation based on Word heading levels.

Dependencies

  • Node.js: >= 18
  • docx: ^9.0.0 (npm i docx)
  • pandoc: >= 2.0 (CLI tool for extracting/converting .docx content)

Usage

1) Create a .docx (Node.js)

Install

npm i docx

create-doc.js

const fs = require("fs");
const {
Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell,
ImageRun, Header, Footer, AlignmentType, BorderStyle, WidthType,
ShadingType, PageNumber, PageBreak, TableOfContents, HeadingLevel,
LevelFormat,
} = require("docx");

const US_LETTER = { width: 12240, height: 15840 }; // DXA (1440 = 1 inch)
const MARGINS_1IN = { top: 1440, right: 1440, bottom: 1440, left: 1440 };
const CONTENT_WIDTH = US_LETTER.width - MARGINS_1IN.left - MARGINS_1IN.right; // 9360

const border = { style: BorderStyle.SINGLE, size: 1, color: "CCCCCC" };
const borders = { top: border, bottom: border, left: border, right: border };

const doc = new Document({
styles: {
default: {
document: { run: { font: "Arial", size: 24 } }, // 12pt
},
paragraphStyles: [
{
id: "Heading1", name: "Heading 1", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 32, bold: true, font: "Arial" },
paragraph: { spacing: { before: 240, after: 240 }, outlineLevel: 0 },
},
{
id: "Heading2", name: "Heading 2", basedOn: "Normal", next: "Normal", quickFormat: true,
run: { size: 28, bold: true, font: "Arial" },
paragraph: { spacing: { before: 180, after: 180 }, outlineLevel: 1 },
},
],
},
numbering: {
config: [
{
reference: "bullets", levels: [{
level: 0, format: LevelFormat.BULLET, text: "•", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } },
}],
},
{
reference: "numbers", levels: [{
level: 0, format: LevelFormat.DECIMAL, text: "%1.", alignment: AlignmentType.LEFT,
style: { paragraph: { indent: { left: 720, hanging: 360 } } },
}],
},
],
},
sections: [{
properties: {
page: { size: { width: US_LETTER.width, height: US_LETTER.height }, margin: MARGINS_1IN },
},
headers: {
default: new Header({ children: [new Paragraph({ children: [new TextRun("Sample Header")] })] }),
},
footers: {
default: new Footer({ children: [
new Paragraph({ children: [new TextRun("Page "), new TextRun({ children: [PageNumber.CURRENT] })] }),
] }),
},
children: [
new Paragraph({ heading: HeadingLevel.HEADING_1, children: [new TextRun("DOCX Creation Example")] }),
new TableOfContents("Table of Contents", { hyperlink: true, headingStyleRange: "1-3" }),
new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("Lists")] }),
new Paragraph({ numbering: { reference: "bullets", level: 0 }, children: [new TextRun("Bullet item")] }),
new Paragraph({ numbering: { reference: "numbers", level: 0 }, children: [new TextRun("Numbered item")] }),
new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("Table")] }),
new Table({
width: { size: CONTENT_WIDTH, type: WidthType.DXA },
columnWidths: [CONTENT_WIDTH / 2, CONTENT_WIDTH / 2],
rows: [
new TableRow({ children: [
new TableCell({
borders, width: { size: CONTENT_WIDTH / 2, type: WidthType.DXA },
shading: { fill: "D5E8F0", type: ShadingType.CLEAR },
margins: { top: 80, bottom: 80, left: 120, right: 120 },
children: [new Paragraph("Left cell")],
}),
new TableCell({
borders, width: { size: CONTENT_WIDTH / 2, type: WidthType.DXA },
shading: { fill: "FFFFFF", type: ShadingType.CLEAR },
margins: { top: 80, bottom: 80, left: 120, right: 120 },
children: [new Paragraph("Right cell")],
}),
] }),
],
}),
new Paragraph({ children: [new PageBreak()] }),
new Paragraph({ heading: HeadingLevel.HEADING_2, children: [new TextRun("New Page Section")] }),
new Paragraph({ children: [new TextRun("This is on a new page.")] }),
],
}],
});

(async () => {
const buffer = await Packer.toBuffer(doc);
fs.writeFileSync("output.docx", buffer);
console.log("Wrote output.docx");
})();

Run

node create-doc.js

2) Read/extract .docx text with Pandoc

pandoc document.docx -o output.md

3) Python alternative (python-docx)

pip install python-docx

from docx import Document

Create


doc = Document()
doc.add_heading('Title', level=1)
doc.add_paragraph('Hello World')
doc.save('output.docx')

Read


doc = Document('input.docx')
for para in doc.paragraphs:
print(para.text)

Implementation Details

  • DOCX structure: .docx is a ZIP archive containing XML parts.
  • Page sizing (DXA): 1440 DXA = 1 inch. US Letter: 12240 x 15840 DXA. A4: 11906 x 16838 DXA.
  • Landscape: Provide portrait dimensions and set orientation: PageOrientation.LANDSCAPE.
  • Headings and TOC: TOC requires HeadingLevel.*. Override styles with exact IDs: "Heading1", "Heading2". Include outlineLevel.
  • Paragraphs: Do not embed \n in runs — create separate Paragraph elements.
  • Lists: Use numbering.config with LevelFormat.BULLET / LevelFormat.DECIMAL. Same reference continues numbering; different reference restarts.
  • Tables: Always use WidthType.DXA. Set table width and columnWidths (sum must equal table width). Set each cell width to match its column width.
  • Images: ImageRun requires valid type and altText fields.
  • Page breaks: PageBreak must be inside a Paragraph (or use pageBreakBefore: true).

Troubleshooting

| Problem | Fix |
|---------|-----|
| Cannot find module 'docx' | Run npm i docx in working directory |
| pandoc: command not found | Install: brew install pandoc |
| TOC empty in Word | Open doc in Word and press Ctrl+A then F9 to update fields |
| Table widths broken in Google Docs | Use WidthType.DXA instead of percentages |
| python-docx not found | pip install python-docx |

Related Skills / 관련 스킬

google-workspace

Gmail, 캘린더, 드라이브, 연락처, 시트, 문서 통합 — gws CLI + OAuth2

linear

Linear 이슈/프로젝트/팀 관리 — GraphQL API, API 키 인증, curl만으로 동작

maps

>

nano-pdf

자연어 명령으로 PDF 편집 — 텍스트 수정, 오타 교정, 제목 업데이트