PDF Parser Benchmark - Hard Mode

Testing reading order, column handling, numeric extraction, table fidelity, and watermark filtering

6 documents, 17 quality tests

Overall Scorecard

Parser Avg Speed Avg Quality Score Tests Run
PyMuPDF 207 ms Good 86.6% Good 17 tests
pdfplumber 383 ms Good 85.9% Good 17 tests
pypdf 95 ms Excellent 92.4% Good 17 tests
LiteParse 3187 ms Very Slow 84.3% Fair 17 tests

Parse Speed by Document (ms, log scale)

Quality Score by Document (%)

Detailed Results by Document

Hover over scores to see missing items. Colors: green >=90%, yellow >=70%, red <70%

Watermarked Financial Projections

hard_watermark_financial.pdf - 9 KB

Parser Speed Words Watermark vs Content SeparationFinancial Numbers IntactSensitivity Table RowsReading Order (narrative flow)
PyMuPDF 56 ms 288 6/6 (100.0%)13/13 (100.0%)3/3 (100.0%)7/7 (100.0%)
pdfplumber 95 ms 295 4/6 !2 (66.7%)12/13 (92.3%)3/3 (100.0%)5/5 (100.0%)
pypdf 9 ms 288 6/6 (100.0%)13/13 (100.0%)3/3 (100.0%)7/7 (100.0%)
LiteParse 1988 ms 287 6/6 !4 (100.0%)13/13 (100.0%)2/3 (66.7%)7/7 (100.0%)

Clinical Trial Table

hard_clinical_table.pdf - 45 KB

Parser Speed Words Table Row Integrity (values on same line)All Statistical Values PresentSection Order
PyMuPDF 12 ms 236 0/5 (0.0%)18/18 (100.0%)7/7 (100.0%)
pdfplumber 126 ms 236 5/5 (100.0%)18/18 (100.0%)7/7 (100.0%)
pypdf 19 ms 236 5/5 (100.0%)18/18 (100.0%)7/7 (100.0%)
LiteParse 1550 ms 236 5/5 (100.0%)18/18 (100.0%)7/7 (100.0%)

Tax Compliance Report

hard_compliance.pdf - 9 KB

Parser Speed Words Treaty Table RowsFormula & Calculation ValuesLegal Citations Intact
PyMuPDF 12 ms 272 3/3 (100.0%)9/9 (100.0%)6/6 (100.0%)
pdfplumber 49 ms 272 3/3 (100.0%)9/9 (100.0%)6/6 (100.0%)
pypdf 7 ms 272 3/3 (100.0%)9/9 (100.0%)6/6 (100.0%)
LiteParse 1762 ms 272 3/3 (100.0%)9/9 (100.0%)6/6 (100.0%)

Two-Column Insurance Policy

hard_insurance.pdf - 18 KB

Parser Speed Words Column Reading OrderDefined Terms PresentFooter Data Intact
PyMuPDF 15 ms 459 25/25 (100.0%)6/7 (85.7%)6/6 (100.0%)
pdfplumber 88 ms 459 10/25 (40.0%)6/7 (85.7%)6/6 (100.0%)
pypdf 14 ms 459 25/25 (100.0%)6/7 (85.7%)6/6 (100.0%)
LiteParse 1763 ms 459 10/25 (40.0%)6/7 (85.7%)6/6 (100.0%)

Fed Financial Stability Report (real)

hard_fed_report.pdf - 4880 KB

Parser Speed Words Key Content PresentReading Order
PyMuPDF 810 ms 1,654 6/7 (85.7%)2/2 (100.0%)
pdfplumber 833 ms 1,331 6/7 (85.7%)2/2 (100.0%)
pypdf 209 ms 1,295 6/7 (85.7%)2/2 (100.0%)
LiteParse 4684 ms 1,662 6/7 (85.7%)2/2 (100.0%)

Census Bureau Operational Plan (real)

hard_census.pdf - 3542 KB

Parser Speed Words Key Content PresentReading Order
PyMuPDF 336 ms 9,503 4/4 (100.0%)0/1 (0.0%)
pdfplumber 1107 ms 2,062 4/4 (100.0%)0/1 (0.0%)
pypdf 312 ms 2,127 4/4 (100.0%)0/1 (0.0%)
LiteParse 7374 ms 8,776 3/4 (75.0%)0/1 (0.0%)

Text Output Comparison

Watermarked Financial Projections - Text Output Preview

PyMuPDF

DRAFT · CONFIDENTIAL · DO NOT DIS
DRAFT · CONFIDENTIAL · DO NOT DIS
DRAFT · CONFIDENTIAL · DO NOT DIS
PRIVILEGED & CONFIDENTIAL · ATTORNEY WORK PRODUCT
Merger Agreement · Exhibit A: Financial Projections
Prepared by: Morrison & Foerster LLP  |  Date: November 15, 2025
PROJECTED CASH FLOWS (USD '000s)
                        FY2025A      FY2026E      FY2027E      FY2028E      FY2029E
Revenue       

pdfplumber

PRIVILEGED & CONFIDENTIAL · ATTORNEY WORK PRODUCT
Merger Agreement · Exhibit A: Financial Projections
PreDparedR by: MAorrisoFn & FToerst er· LL PC | DatOe: NovNembeFr 15, 2I0D25 ENTIAL · DO NOT DISTRIBUTE
PROJECTED CASH FLOWS (USD '000s)
FY2025A FY2026E FY2027E FY2028E FY2029E
Revenue $42,150 $51,823 $63,742 $78,403 $96,436
YoY Growth · 22.9% 23.0% 23.0% 23.0%
COGS ($16,860) ($19,693) ($22,897) (

pypdf

DRAFT · CONFIDENTIAL · DO NOT DISTRIBUTE
DRAFT · CONFIDENTIAL · DO NOT DISTRIBUTE
DRAFT · CONFIDENTIAL · DO NOT DISTRIBUTE
PRIVILEGED & CONFIDENTIAL · ATTORNEY WORK PRODUCT
Merger Agreement · Exhibit A: Financial Projections
Prepared by: Morrison & Foerster LLP  |  Date: November 15, 2025
PROJECTED CASH FLOWS (USD '000s)
                        FY2025A      FY2026E      FY2027E      FY2028E      F

LiteParse

PRIVILEGED & CONFIDENTIAL · ATTORNEY WORK PRODUCT
Merger Agreement · Exhibit A: Financial Projections
   DRAFT ·
Prepared by: Morrison & Foerster CONFIDENTIAL
     LLP
 PROJECTED CASH FLOWS (USD '000s) | Date: November 15, 2025        · DO NOT DIS

                         FY2025A      FY2026E      FY2027E      FY2028E      FY2029E
 Revenue                 $42,150      $51,823      $63,742      $7

Clinical Trial Table - Text Output Preview

PyMuPDF

CLINICAL TRIAL RESULTS · PHASE III RANDOMIZED CONTROLLED STUDY
Protocol ID: NCT-2025-0847  |  Sponsor: AcmePharma Inc.  |  IRB Approval: #2025-0392
Endpoint
Drug (n=485)
Placebo (n=482)
Delta
95% CI
p-value
NNT
PRIMARY ENDPOINTS
  Overall Response Rate
68.2%
31.4%
+36.8%
30.1-43.5
<0.001*
3
  Complete Response
22.1%
5.0%
+17.1%
12.4-21.8
<0.001*
6
  Median PFS (months)
14.2
6.8
+7.4
5.9-8.9
<0.001

pdfplumber

CLINICAL TRIAL RESULTS · PHASE III RANDOMIZED CONTROLLED STUDY
Protocol ID: NCT-2025-0847 | Sponsor: AcmePharma Inc. | IRB Approval: #2025-0392
Endpoint Drug (n=485) Placebo (n=482) Delta 95% CI p-value NNT
PRIMARY ENDPOINTS
Overall Response Rate 68.2% 31.4% +36.8% 30.1-43.5 <0.001* 3
Complete Response 22.1% 5.0% +17.1% 12.4-21.8 <0.001* 6
Median PFS (months) 14.2 6.8 +7.4 5.9-8.9 <0.001* -
Median

pypdf

CLINICAL TRIAL RESULTS · PHASE III RANDOMIZED CONTROLLED STUDY
Protocol ID: NCT-2025-0847  |  Sponsor: AcmePharma Inc.  |  IRB Approval: #2025-0392
Endpoint Drug (n=485) Placebo (n=482) Delta 95% CI p-value NNT
PRIMARY ENDPOINTS
  Overall Response Rate 68.2% 31.4% +36.8% 30.1-43.5 <0.001* 3
  Complete Response 22.1% 5.0% +17.1% 12.4-21.8 <0.001* 6
  Median PFS (months) 14.2 6.8 +7.4 5.9-8.9 <0.001

LiteParse

CLINICAL TRIAL RESULTS · PHASE III RANDOMIZED CONTROLLED STUDY
Protocol ID: NCT-2025-0847 | Sponsor: AcmePharma Inc. | IRB Approval: #2025-0392


      Endpoint             Drug (n=485)  Placebo (n=482)  Delta     95% CI   p-value   NNT

 PRIMARY ENDPOINTS
  Overall Response Rate       68.2%           31.4%       +36.8%  30.1-43.5  <0.001*    3

  Complete Response           22.1%            5.0% 

Tax Compliance Report - Text Output Preview

PyMuPDF

INTERNATIONAL COMPLIANCE REPORT
Multi-Jurisdictional Tax Treaty Analysis · Cross-Border Transactions
1. TREATY OVERVIEW
The following tax treaties are analyzed under the OECD Model Tax Convention (2024 Update):
  Treaty              Withholding Rates              Effective Date    PE Threshold
  US-Germany (DTAA)   Div: 5%/15%  Int: 0%  Roy: 0%   01-Jan-2007      183 days
  US-Japan (DTAA)     Div

pdfplumber

INTERNATIONAL COMPLIANCE REPORT
Multi-Jurisdictional Tax Treaty Analysis · Cross-Border Transactions
1. TREATY OVERVIEW
The following tax treaties are analyzed under the OECD Model Tax Convention (2024 Update):
Treaty Withholding Rates Effective Date PE Threshold
US-Germany (DTAA) Div: 5%/15% Int: 0% Roy: 0% 01-Jan-2007 183 days
US-Japan (DTAA) Div: 5%/10% Int: 10% Roy: 0% 01-Jan-2004 183 days
US-

pypdf

INTERNATIONAL COMPLIANCE REPORT
Multi-Jurisdictional Tax Treaty Analysis · Cross-Border Transactions
1. TREATY OVERVIEW
The following tax treaties are analyzed under the OECD Model Tax Convention (2024 Update):
  Treaty              Withholding Rates              Effective Date    PE Threshold
  US-Germany (DTAA)   Div: 5%/15%  Int: 0%  Roy: 0%   01-Jan-2007      183 days
  US-Japan (DTAA)     Div

LiteParse

 INTERNATIONAL COMPLIANCE REPORT
 Multi-Jurisdictional Tax Treaty Analysis · Cross-Border Transactions

1. TREATY OVERVIEW

The following tax treaties are analyzed under the OECD Model Tax Convention (2024 Update):

  Treaty              Withholding Rates              Effective Date    PE Threshold
  US-Germany (DTAA)   Div: 5%/15%  Int: 0%  Roy: 0%   01-Jan-2007      183 days
  US-Japan (DTAA)   

Two-Column Insurance Policy - Text Output Preview

PyMuPDF

COMMERCIAL GENERAL LIABILITY COVERAGE FORM
Policy No. CGL-2025-048721  |  Effective: 01/01/2026 to 01/01/2027
SECTION I -- COVERAGES
COVERAGE A -- BODILY INJURY AND
PROPERTY DAMAGE LIABILITY
1. Insuring Agreement
  a. We will pay those sums that the
  insured becomes legally obligated
  to pay as damages because of
  "bodily injury" or "property damage"
  to which this insurance applies. We
  will

pdfplumber

COMMERCIAL GENERAL LIABILITY COVERAGE FORM
Policy No. CGL-2025-048721 | Effective: 01/01/2026 to 01/01/2027
SECTION I -- COVERAGES b. Contractual Liability
"Bodily injury" or "property damage"
COVERAGE A -- BODILY INJURY AND for which the insured is obligated
PROPERTY DAMAGE LIABILITY to pay damages by reason of the
assumption of liability in a
1. Insuring Agreement contract or agreement. This
a. 

pypdf

COMMERCIAL GENERAL LIABILITY COVERAGE FORM
Policy No. CGL-2025-048721  |  Effective: 01/01/2026 to 01/01/2027
SECTION I -- COVERAGES
COVERAGE A -- BODILY INJURY AND
PROPERTY DAMAGE LIABILITY
1. Insuring Agreement
  a. We will pay those sums that the
  insured becomes legally obligated
  to pay as damages because of
  "bodily injury" or "property damage"
  to which this insurance applies. We
  will

LiteParse

    COMMERCIAL GENERAL LIABILITY COVERAGE FORM
    Policy No. CGL-2025-048721 | Effective: 01/01/2026 to 01/01/2027

SECTION I -- COVERAGES   b. Contractual Liability
          "Bodily injury" or "property damage"
COVERAGE A -- BODILY INJURY AND   for which the insured is obligated
PROPERTY DAMAGE LIABILITY   to pay damages by reason of the
          assumption of liability in a
1. Insuring Agreem

Fed Financial Stability Report (real) - Text Output Preview

PyMuPDF

Financial Stability Report
November 2024
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM

The Federal Reserve System is the central
bank of the United States. It performs five key
functions to promote the effective operation
of the U.S. economy and, more generally, the
public interest.
The Federal Reserve
■conducts the nation’s monetary policy to promote maximum employment
and stable prices in th

pdfplumber

Financial Stability Report
November 2024
BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM
The Federal Reserve System is the central
bank of the United States. It performs five key
functions to promote the effective operation
of the U.S. economy and, more generally, the
public interest.
The Federal Reserve
■ conducts the nation’s monetary policyto promote maximum employment
and stable prices in the

pypdf

Financial Stability Report
November 2024
BOARD OF GO VERNORS OF THE FEDERAL RESER VE SYST EM
The Federal Reserve System is the centralbank of the United States. It performs five keyfunctions to promote the effective operationof the U.S. economy and, more generally, thepublic interest.The Federal Reserve■conducts the nation’s monetary policyto promote maximum employmentand stable prices in the U.S.

LiteParse

Financial Stability Report










|
|


59.102    A Il I
    BT 9
    C     3102

    41M


November 2024






BOARD OF GO VERNORS OF THE FEDERAL RESERVE SYST EM
                     The Federal Reserve System is the central
                     bank of the United States. It performs five key
                     functions to promote the effective operation
                     of the U.S. eco

Census Bureau Operational Plan (real) - Text Output Preview

PyMuPDF

Issued December 2018
Version 4.0 
A New Design for the 21st Century
2020 Census Operational Plan

Note to Reader: 
Please note that the 2020 Census Operational Plan v4.0 reflects the operational design for 
the 2020 Census as of October 31, 2018, unless noted otherwise. 

U.S. Census Bureau 	
 2020 Census Operational Plan—Version 4.0  i
TABLE OF CONTENTS
1.	 Introduction  .  .  .  .  .  .  .  .  .

pdfplumber

2020 Census Operational Plan
A New Design for the 21st Century
Issued December 2018
Version 4.0
Note to Reader:
Please note that the 2020 Census Operational Plan v4.0 reflects the operational design for
the 2020 Census as of October 31, 2018, unless noted otherwise.
TABLE OF CONTENTS
1. Introduction........................................................................... 1
1.1 Purpose ..........

pypdf

Issued December 2018
Version 4.0 
A New Design for the 21st Century
2020 Census Operational Plan
Note to Reader:  
Please note that the 2020 Census Operational Plan v4.0 reflects the operational design for 
the 2020 Census as of October 31, 2018, unless noted otherwise. 
U.S. Census Bureau   2020 Census Operational Plan—Version 4.0  i
TABLE OF CONTENTS
1. Introduction .............................

LiteParse

 2020 Census Operational Plan
 A New Design for the 21st Century


  Issued December 2018
  Version 4.0
      -     )         4  y

      ( A,            y

                      in
                      y y
      4
|

                          1
                          |



  q |
                              i

                      \    |   8
                      4       1









         

Test Methodology

Numeric Extraction: Checks if specific values (dollar amounts, percentages, citations) appear verbatim in extracted text.

Table Row Integrity: Verifies that values belonging to the same table row appear on the same line in the output.

Reading Order: Tests that phrases appear in the correct sequential order (important for multi-column and complex layouts).

Column Separation: For two-column documents, checks if left column is fully read before right column (vs interleaved line-by-line reading).

Watermark Separation: Checks if watermark text is cleanly separated from content text, with penalties for interleaving.

Benchmark run on 2026-04-10T16:05:27-0700

macOS-26.3-arm64-arm-64bit-Mach-O | 3.13.5

pymupdf 1.27.2.2, pdfplumber 0.11.9, pypdf 6.9.2, liteparse 1.2.1