Your Ad Here

How To Read DOC file Using Java and Apache POI

Kushal Paudyal December 31st, 2009.This post has 2,505 views

One of the visitors of my blog asked me write how to read a document file using Java. I wrote the following program to demonstrate how Apache POI can be used for this purpose.

I have used the following API to write this program. If you have downloaded the Apache POI, you should fine this jar file within the bundle.

  • poi-scratchpad-3.2-FINAL-20081019.jar

The tutorial demonstrates the following features:

–How to read a simple Microsoft word document file using Java and Apache POI (.docx not supported)
–This includes the ability to read total number of paragraph and the paragraph content
–How to read the document headers
–How to read the document footers
–How to read the document summary

Apache POI is not robust yet. It has a long way to go through to handle complex document formats. Moreover I figured out that from one version to another, the classes are moving from one package to another. So if you are using the older/newer version of POI, in case of any compilation error for imports, try finding the classes in some other packages.

You can download the sample document that I used to read using the following program. You can also download the source code for this application. You are free to use and distribute the code. It comes with no warranty at all. I will be honored if you link back to my blog as a source.

/**
 * @author Kushal Paudyal
 * www.sanjaal.com/java
 * Last Modified On: 03/23/2009
 */
package com.kushal.utils;

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hpsf.DocumentSummaryInformation;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import org.apache.poi.hwpf.usermodel.HeaderStories;

import java.io.*;

public class ReadDocFileFromJava {

	public static void main(String[] args) {
		/**This is the document that you want to read using Java.**/
		String fileName = "C:\\Documents and Settings\\kushalp\\Desktop\\Test.doc";

		/**Method call to read the document (demonstrate some useage of POI)**/
		readMyDocument(fileName);

	}
	public static void readMyDocument(String fileName){
		POIFSFileSystem fs = null;
		try {
			fs = new POIFSFileSystem(new FileInputStream(fileName));
			HWPFDocument doc = new HWPFDocument(fs);

			/** Read the content **/
			readParagraphs(doc);

			int pageNumber=1;

			/** We will try reading the header for page 1**/
			readHeader(doc, pageNumber);

			/** Let's try reading the footer for page 1**/
			readFooter(doc, pageNumber);

			/** Read the document summary**/
			readDocumentSummary(doc);

		} catch (Exception e) {
			e.printStackTrace();
		}
	}	

	public static void readParagraphs(HWPFDocument doc) throws Exception{
		WordExtractor we = new WordExtractor(doc);

		/**Get the total number of paragraphs**/
		String[] paragraphs = we.getParagraphText();
		System.out.println("Total Paragraphs: "+paragraphs.length);

		for (int i = 0; i < paragraphs.length; i++) {

			System.out.println("Length of paragraph "+(i +1)+": "+ paragraphs[i].length());
			System.out.println(paragraphs[i].toString());

		}

	}

	public static void readHeader(HWPFDocument doc, int pageNumber){
		HeaderStories headerStore = new HeaderStories( doc);
		String header = headerStore.getHeader(pageNumber);
		System.out.println("Header Is: "+header);

	}

	public static void readFooter(HWPFDocument doc, int pageNumber){
		HeaderStories headerStore = new HeaderStories( doc);
		String footer = headerStore.getFooter(pageNumber);
		System.out.println("Footer Is: "+footer);

	}

	public static void readDocumentSummary(HWPFDocument doc) {
		DocumentSummaryInformation summaryInfo=doc.getDocumentSummaryInformation();
		String category = summaryInfo.getCategory();
		String company = summaryInfo.getCompany();
		int lineCount=summaryInfo.getLineCount();
		int sectionCount=summaryInfo.getSectionCount();
		int slideCount=summaryInfo.getSlideCount();

		System.out.println("---------------------------");
		System.out.println("Category: "+category);
		System.out.println("Company: "+company);
		System.out.println("Line Count: "+lineCount);
		System.out.println("Section Count: "+sectionCount);
		System.out.println("Slide Count: "+slideCount);

	}

}

Related Tutorials




Sanjaal.com is owned and maintained by Sanjaal Corps, Nepal. The company offers Webhosting and Domain Registration Services, IT Solutions and Business Analysis. Sanjaal.com website features H1B Visa Information, Entertainment Portal, Link Directory Service, Free Articles, Free Open Source Tutorials on Java and J2EE Platform, Digital Photography, High Resolution Picture Gallery and Free Reliable Image Hosting Services. Future plan includes Open Source Software Development Portal, Technical Solutions and Customizable Movie and Music Arena. We would be introducing data backup, data recovery, data hosting and voip solutions. Stay free from phishing – our website does not ask for your credit card and banking information. Happy Surfing!

Blog Widget by LinkWithin

Originally posted 2009-03-23 19:43:54.

  • Share/Bookmark

Your Ad Here

9 Responses to “How To Read DOC file Using Java and Apache POI”

  1. Saranyaon 03 Jul 2009 at 3:40 am

    Too Good

    the coding works perfectly

    i was searching POI Coding for past 2 days but atlast i got ur site and worked fine
    Thanks

  2. Bihagon 16 Jul 2009 at 9:13 pm

    Nice Tutorial …

    But is there any way to read word comments and bookmarks using Java? Do u have a sample code? Any help would be appreciated.

    Thanking you,
    Bihag Raval

  3. Darshanon 20 Jul 2009 at 10:22 pm

    Hi Kushal,
    Is there a way that i can read the xml contents and create an excel using these contents.Does POI supports this?? I tried something like this but not much of help.

    import java.io.FileNotFoundException;
    import java.io.FileOutputStream;
    import java.io.IOException;

    import org.apache.poi.hslf.model.Sheet;
    import org.apache.poi.xssf.usermodel.XSSFRow;
    import org.apache.poi.xssf.usermodel.XSSFSheet;
    import org.apache.poi.xssf.usermodel.XSSFWorkbook;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;

    import com.sun.rowset.internal.Row;

    public class Workbook1 {
    public static void main(String[] args) throws IOException {
    XSSFWorkbook wb = new XSSFWorkbook();
    XSSFSheet invoice = wb.createSheet(”Invoice”);
    int rowNumber = 2;
    XSSFRow row = invoice.createRow(rowNumber);
    row.createCell(0).setCellValue(”Product ID”);
    row.createCell(0).setCellValue(”Description”);
    row.createCell(0).setCellValue(”Price”);
    row.createCell(0).setCellValue(”Quantity”);
    row.createCell(0).setCellValue(”Size”);
    row.createCell(0).setCellValue(”Total”);

    rowNumber = rowNumber + 1;
    XSSFRow dataRow = invoice.createRow(rowNumber);
    Object fstNode = null;
    Element fstElmnt = (Element) fstNode;
    NodeList pdIdElmntLst = fstElmnt.getElementsByTagName(”Product ID”);
    Element pdIdElmnt = (Element) pdIdElmntLst.item(0);
    NodeList pdId = pdIdElmnt.getChildNodes();
    String productId = ((Node) pdId.item(0)).getNodeValue();
    dataRow.createCell(0).setCellValue(Integer.parseInt(productId));
    FileOutputStream fileOut;
    try {
    fileOut = new FileOutputStream(”C:/Test.xls”);
    wb.write(fileOut);
    fileOut.close();
    } catch (FileNotFoundException e) {

    e.printStackTrace();
    }

    }

    }
    It will be of great help if you can guide me through this.

  4. Darshanon 21 Jul 2009 at 2:24 am

    Hi Kushal,
    Is there a way that i can read an xml document and using these contents of the xml ,can i create a excel sheet using POI .
    Any help would be appreciated.

    Regards,
    Darshan

  5. kushalzoneon 21 Jul 2009 at 1:52 pm

    Darshan,

    Use the following tutorial to learn how to read XML Files in Java which I wrote Just today.

    http://sanjaal.com/java/2009/07/21/read-xml-file-in-java-using-dom-a-simple-tutorial/

    See the notes at the end of the above tutorial on how you can writes contents thus read from XML to the excel files. Hope my suggestions will guide you to write directions.

    Since both are working codes, you should have no problems modifying them to suit your needs.

  6. Darshanon 22 Jul 2009 at 4:00 am

    Hi Kunal,
    Thanks for the help.
    I had one small doubt , i could create a excel sheet using the code which you had suggested and we are using Sax parser instead of Dom.As mentioned in your post

    The method:
    public static String [][] preapreDataToWriteToExcel(){
    String [][] excelData = new String [4][4];
    excelData[0][0]=”First Name”;
    excelData [0][1]=”Last Name”;
    excelData[0][2]=”Telephone”;
    excelData[0][3]=”Address”;

    excelData[1][0]=”Kushal”;
    excelData[1][1]=”Paudyal”;
    excelData[1][2]=”000-000-0000″;
    excelData[1][3]=”IL,USA”;

    excelData[2][0]=”Randy”;
    excelData[2][1]=”Ram Robinson”;
    excelData[2][2]=”111-111-1111″;
    excelData[2][3]=”TX, USA”;

    excelData[3][0]=”Phil”;
    excelData[3][1]=”Collins”;
    excelData[3][2]=”222-222-2222″;
    excelData[3][3]=”NY, USA”;

    return excelData;

    }
    Can we make String [][] excelData = new String [4][4]; more generic ,i mean to grow based on the length of the data present in the list. Because we are parsing a xml of some 3000 data and we cannot manually create the cell rows and columns.

    For an instance in our apllication we are having something like this:
    public class XmlToExcel {
    public static void main(String[] args) {
    String fileName = “C:\\temp\\testPOIWrite.xls”;
    writeDataToExcelFile(fileName);
    }

    private static void writeDataToExcelFile(String fileName) {

    String[][] excelData = preapreDataToWriteToExcel();

    HSSFWorkbook myWorkBook = new HSSFWorkbook();
    HSSFSheet mySheet = myWorkBook.createSheet();
    HSSFRow myRow = null;
    HSSFCell myCell = null;

    for (int rowNum = 0; rowNum < 12; rowNum++) {
    myRow = mySheet.createRow(rowNum);

    for (int cellNum = 0; cellNum < 12; cellNum++) {
    myCell = myRow.createCell(cellNum);
    myCell.setCellValue(excelData[rowNum][cellNum]);

    }
    }

    try {
    FileOutputStream out = new FileOutputStream(fileName);
    myWorkBook.write(out);
    out.close();
    } catch (Exception e) {
    System.err.print(”In ” + e.getMessage());
    }

    }

    /** Prepare some demo data as excel file content* */
    public static String[][] preapreDataToWriteToExcel() {
    try {
    Parser ps = new Parser();
    XMLReader xmlReader = null;
    SAXParserFactory spfactory = SAXParserFactory.newInstance();
    spfactory.setValidating(false);
    SAXParser saxParser = spfactory.newSAXParser();
    xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(ps);
    xmlReader.setErrorHandler(ps);
    InputSource source = new InputSource(”C:/Women_RunningShoes.xml”);

    xmlReader.parse(source);
    List sellersList = ps.getDetails();
    for (ShoeCatalogBean seller : sellersList) {
    System.out.println(sellersList.size());
    String[][] excelData = new String[12][12];
    excelData[0][0] = “CategoryGroup”;
    excelData[0][1] = “Category”;
    excelData[0][2] = “SubCategory”;
    excelData[0][3] = “Gender”;
    excelData[0][4] = “Brands”;
    excelData[0][5] = “ProductName”;
    excelData[0][6] = “StyleNumber”;
    excelData[0][7] = “SelectedColor”;
    excelData[0][8] = “Size”;
    excelData[0][9] = “Width”;
    excelData[0][10] = “Price”;
    excelData[0][11] = “ProductDescription”;

    excelData[1][0] = seller.getCategoryGroup();
    excelData[1][1] = seller.getCategory();
    excelData[1][2] = seller.getSubCategory();
    excelData[1][3] = seller.getGender();
    excelData[1][4] = seller.getBrands();
    excelData[1][5] = seller.getProductName();
    excelData[1][6] = seller.getStyleNumber();
    excelData[1][7] = seller.getSelectedColor();
    excelData[1][8] = seller.getSize();
    excelData[1][9] = seller.getWidth();
    excelData[1][10] = seller.getPrice();
    excelData[1][11] = seller.getProductDescription();

    excelData[2][0] = seller.getCategoryGroup();
    excelData[2][1] = seller.getCategory();
    excelData[2][2] = seller.getSubCategory();
    excelData[2][3] = seller.getGender();
    excelData[2][4] = seller.getBrands();
    excelData[2][5] = seller.getProductName();
    excelData[2][6] = seller.getStyleNumber();
    excelData[2][7] = seller.getSelectedColor();
    excelData[2][8] = seller.getSize();
    excelData[2][9] = seller.getWidth();
    excelData[2][10] = seller.getPrice();
    excelData[2][11] = seller.getProductDescription();
    ………
    return excelData;
    }

    } catch (Exception e) {
    System.err.println(e);
    System.exit(1);
    }
    return null;
    }

    }

    Any help would be appreciated.
    I feel i posted a too big comment:-)

    Regards,
    Darshan

  7. Darshanon 22 Jul 2009 at 9:36 pm

    Hi kushal i did follow your documents and everything is working fine.Thanks for that.
    But i had one small doubt, in the method:
    public static String [][] preapreDataToWriteToExcel(){
    String [][] excelData = new String [4][4];
    excelData[0][0]=”First Name”;
    excelData [0][1]=”Last Name”;
    excelData[0][2]=”Telephone”;
    excelData[0][3]=”Address”;

    excelData[1][0]=”Kushal”;
    excelData[1][1]=”Paudyal”;
    excelData[1][2]=”000-000-0000″;
    excelData[1][3]=”IL,USA”;

    excelData[2][0]=”Randy”;
    excelData[2][1]=”Ram Robinson”;
    excelData[2][2]=”111-111-1111″;
    excelData[2][3]=”TX, USA”;

    excelData[3][0]=”Phil”;
    excelData[3][1]=”Collins”;
    excelData[3][2]=”222-222-2222″;
    excelData[3][3]=”NY, USA”;

    return excelData;
    }
    how to make it more generic.I mean for an instance i have some 1000 records in my list after parsing ,so i cannot enter manually those 1000 records,how to make the cell grow dynamically???

    public static String[][] preapreDataToWriteToExcel() {
    try {
    Parser ps = new Parser();
    XMLReader xmlReader = null;
    SAXParserFactory spfactory = SAXParserFactory.newInstance();
    spfactory.setValidating(false);
    SAXParser saxParser = spfactory.newSAXParser();
    xmlReader = saxParser.getXMLReader();
    xmlReader.setContentHandler(ps);
    xmlReader.setErrorHandler(ps);
    InputSource source = new InputSource(”C:/RunningShoes.xml”);

    xmlReader.parse(source);
    List sellersList = ps.getDetails();
    for (ShoeCatalogBean seller : sellersList) {
    System.out.println(sellersList.size());
    String[][] excelData = new String[12][12];
    excelData[0][0] = “CategoryGroup”;
    excelData[0][1] = “Category”;
    excelData[0][2] = “SubCategory”;
    excelData[0][3] = “Gender”;
    excelData[0][4] = “Brands”;
    excelData[0][5] = “ProductName”;
    excelData[0][6] = “StyleNumber”;
    excelData[0][7] = “SelectedColor”;
    excelData[0][8] = “Size”;
    excelData[0][9] = “Width”;
    excelData[0][10] = “Price”;
    excelData[0][11] = “ProductDescription”;

    excelData[1][0] = seller.getCategoryGroup();
    excelData[1][1] = seller.getCategory();
    excelData[1][2] = seller.getSubCategory();
    excelData[1][3] = seller.getGender();
    excelData[1][4] = seller.getBrands();
    excelData[1][5] = seller.getProductName();
    excelData[1][6] = seller.getStyleNumber();
    excelData[1][7] = seller.getSelectedColor();
    excelData[1][8] = seller.getSize();
    excelData[1][9] = seller.getWidth();
    excelData[1][10] = seller.getPrice();
    excelData[1][11] = seller.getProductDescription();
    ……….
    this is my piece of code.

    Pls can you help me through this.

    Regards,
    Darshan

  8. kushalzoneon 23 Jul 2009 at 8:46 am

    You can use a vector say vectOne where you can store another vector say vectTwo. vectTwo will contain your data (individual row). vectOne will contain collections of vectTwo.

    It’s a simple manipulation. You can do the vector to array and array to vector conversions easily, if required at any portion of the program.

  9. Anitaon 08 Dec 2009 at 5:58 am

    Hi

    I need to copy a XSSFSheet from one XSSFWorkbook to an other XSSFWorkbook. How can I do that? Can anyone help me please?

    Thanks in advance

Trackback URI | Comments RSS

Leave a Reply


Your Ad Here