Class XMLTextExtractor

  extended by org.apache.jackrabbit.extractor.AbstractTextExtractor
      extended by org.apache.jackrabbit.extractor.XMLTextExtractor
All Implemented Interfaces:

public class XMLTextExtractor
extends AbstractTextExtractor

Text extractor for XML documents. This class extracts the text content and attribute values from XML documents.

This class can handle any XML-based format (application/xml+something), not just the base XML content types reported by AbstractTextExtractor.getContentTypes(). However, it often makes sense to use more specialized extractors that better understand the specific content type.

Constructor Summary
          Creates a new XMLTextExtractor instance.
Method Summary
 Reader extractText(InputStream stream, String type, String encoding)
          Returns a reader for the text content of the given XML document.
Methods inherited from class org.apache.jackrabbit.extractor.AbstractTextExtractor
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public XMLTextExtractor()
Creates a new XMLTextExtractor instance.

Method Detail


public Reader extractText(InputStream stream,
                          String type,
                          String encoding)
                   throws IOException
Returns a reader for the text content of the given XML document. Returns an empty reader if the given encoding is not supported or if the XML document could not be parsed.

stream - XML document
type - XML content type
encoding - character encoding, or null
reader for the text content of the given XML document, or an empty reader if the document could not be parsed
IOException - if the XML document stream can not be closed

Copyright © 2004-2009 The Apache Software Foundation. All Rights Reserved.