Splitting PDF File Using Java iText API Into Multiple PDFs

Previously I wrote a tutorial about how to merge two or more PDF files. This tutorial will do the opposite. I will present how to split a PDF with multiple pages into multiple PDFs using the Java iText API from Lowagie. You will be requiring the iText API to run this program and you can download it from www.lowagie.com/iText/

This program will take two parameters as input which are defined inside the main method.

  • First parameter is the full path of PDF file that needs to be split.
  • Second parameter is the number of pages the each split should have.

For example, if you have a PDF of 15 pages, you might want to split into 4 pages each. There will be total 4 splits. First three splits will have 4 pages each, while the last split will have 3 pages (4+4+4+3=15).

Just summarizing what can be learned from the following program:

  • 1. Using iText API to read PDF file
  • 2. How to find total number of pages in the PDF
  • 3. How to Use PdfCopy and PDFImportedPage features
  • 4. PDF Splitting Logic
  • 5. How to trigger the PDF file writing using PdfCopy

Fully Compiled and Tested Source Code:

package com.kushal.pdf;

/**
 * @Author Kushal Paudyal
 * www.sanjaal.com/java
 * Last Modified On: 2009-11-04
 *
 * PDFSplitter.java
 * Split any PDF file into multiple PDFs
 */
import java.io.FileOutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.pdf.PdfCopy;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;

public class PDFSplitter {

	public static void main(String[] args) {

		/**
		 * Location of input file which is to be splitted.
		 */
		String fileToSplit = "C:/temp/general/MyWebReport.pdf";

		/**
		 * Page Size of each splitted files
		 *
		 * e.g. 4 pages each in the split.
		 */
		int splittedPageSize = 4;

		/**Call the split method with filename and page size as params**/
		splitPDFFile(fileToSplit, splittedPageSize);

	}

	/**
	 * @param fileName : PDF file that has to be splitted
	 * @param splittedPageSize : Page size of each splitted files
	 */
	public static void splitPDFFile(String fileName, int splittedPageSize) {
		try {
			/**
			 * Read the input PDF file
			 */
			PdfReader reader = new PdfReader(fileName);
			System.out.println("Successfully read input file: " + fileName
					+ "\n");
			int totalPages = reader.getNumberOfPages();
			System.out.println("There are total " + totalPages
					+ " pages in this input file\n");
			int split = 0;

			/**
			 * Note: Page numbers start from 1 to n (not 0 to n-1)
			 */
			for (int pageNum = 1; pageNum <= totalPages; pageNum += splittedPageSize) {
				split++;
				String outFile = fileName
						.substring(0, fileName.indexOf(".pdf"))
						+ "-split-" + split + ".pdf";
				Document document = new Document(reader
						.getPageSizeWithRotation(1));
				PdfCopy writer = new PdfCopy(document, new FileOutputStream(
						outFile));
				document.open();
				/**
				 * Each split might contain one or more pages defined by splittedPageSize
				 *
				 * E.g. We are splitting a 15 pages pdf to 4 page each.
				 * In this example, the last split will have only 3 pages (4+4+4+3 =15)
				 *
				 * Note the following condition that handles the scenario where total
				 * number of pages in the splitted file is less that splittedpageSize
				 *
				 * It will always be the last split.
				 *
				 * splittedPageSize && (pageNum+offset) <=totalPages
				 */
				int tempPageCount = 0;
				for (int offset = 0; offset < splittedPageSize
						&& (pageNum + offset) <= totalPages; offset++) {
					PdfImportedPage page = writer.getImportedPage(reader,
							pageNum + offset);
					writer.addPage(page);
					tempPageCount++;
				}

				document.close();
				/**The following will trigger the PDF file being written to the system**/
				writer.close();

				System.out.println("Split: [" + tempPageCount + " page]: "
						+ outFile);

			}

		} catch (Exception e) {
			e.printStackTrace();
		}
	}
/*
	 * SANJAAL CORPS MAKES NO REPRESENTATIONS OR WARRANTIES ABOUT THE SUITABILITY OF 
	 * THE SOFTWARE, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED 
	 * TO THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 
	 * PARTICULAR PURPOSE, OR NON-INFRINGEMENT. SANJAAL CORPS SHALL NOT BE LIABLE FOR 
	 * ANY DAMAGES SUFFERED BY LICENSEE AS A RESULT OF USING, MODIFYING OR 
	 * DISTRIBUTING THIS SOFTWARE OR ITS DERIVATIVES. 
	 * 
	 * THIS SOFTWARE IS NOT DESIGNED OR INTENDED FOR USE OR RESALE AS ON-LINE 
	 * CONTROL EQUIPMENT IN HAZARDOUS ENVIRONMENTS REQUIRING FAIL-SAFE 
	 * PERFORMANCE, SUCH AS IN THE OPERATION OF NUCLEAR FACILITIES, AIRCRAFT 
	 * NAVIGATION OR COMMUNICATION SYSTEMS, AIR TRAFFIC CONTROL, DIRECT LIFE 
	 * SUPPORT MACHINES, OR WEAPONS SYSTEMS, IN WHICH THE FAILURE OF THE 
	 * SOFTWARE COULD LEAD DIRECTLY TO DEATH, PERSONAL INJURY, OR SEVERE 
	 * PHYSICAL OR ENVIRONMENTAL DAMAGE ("HIGH RISK ACTIVITIES"). SANJAAL CORPS 
	 * SPECIFICALLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS FOR 
	 * HIGH RISK ACTIVITIES. 
	 */
}

Output of this program:

Successfully read input file: C:/temp/general/MyWebReport.pdf

There are total 15 pages in this input file

Split: [4 page]: C:/temp/general/MyWebReport-split-1.pdf
Split: [4 page]: C:/temp/general/MyWebReport-split-2.pdf
Split: [4 page]: C:/temp/general/MyWebReport-split-3.pdf
Split: [3 page]: C:/temp/general/MyWebReport-split-4.pdf

Original And Splitted PDFs In File Explorer:

Share