Lucene 3.6.0 API

Apache Lucene is a high-performance, full-featured text search engine library.

See:
Description

Core
org.apache.lucene	Top-level package.
org.apache.lucene.analysis	API and code to convert text into indexable/searchable tokens.
org.apache.lucene.analysis.standard	Standards-based analyzers implemented with JFlex.
org.apache.lucene.analysis.standard.std31	Backwards-compatible implementation to match `Version.LUCENE_31`
org.apache.lucene.analysis.standard.std34	Backwards-compatible implementation to match `Version.LUCENE_34`
org.apache.lucene.analysis.tokenattributes	Useful `Attribute`s for text analysis.
org.apache.lucene.document	The logical representation of a `Document` for indexing and searching.
org.apache.lucene.index	Code to maintain and access indices.
org.apache.lucene.messages	For Native Language Support (NLS), system of software internationalization.
org.apache.lucene.queryParser	A simple query parser implemented with JavaCC.
org.apache.lucene.search	Code to search indices.
org.apache.lucene.search.function	Programmatic control over documents scores.
org.apache.lucene.search.payloads	The payloads package provides Query mechanisms for finding and using payloads.
org.apache.lucene.search.spans	The calculus of spans.
org.apache.lucene.store	Binary i/o API, used for all index data.
org.apache.lucene.util	Some utility classes.
org.apache.lucene.util.collections	Various optimized Collections implementations.
org.apache.lucene.util.encoding	Offers various encoders and decoders for integers, as well as the mechanisms to create new ones.
org.apache.lucene.util.fst	Finite state transducers
org.apache.lucene.util.packed	The packed package provides random access capable arrays of positive longs.

contrib: Analysis
org.apache.lucene.analysis.ar	Analyzer for Arabic.
org.apache.lucene.analysis.bg	Analyzer for Bulgarian.
org.apache.lucene.analysis.br	Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca	Analyzer for Catalan.
org.apache.lucene.analysis.charfilter	CharFilters: process text before the Tokenizer
org.apache.lucene.analysis.cjk	Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
org.apache.lucene.analysis.cn	Analyzer for Chinese, which indexes unigrams (individual chinese characters).
org.apache.lucene.analysis.cn.smart	Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm	SmartChineseAnalyzer Hidden Markov Model package.
org.apache.lucene.analysis.compound	A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation	The code for the compound word hyphenation is taken from the Apache FOP project.
org.apache.lucene.analysis.cz	Analyzer for Czech.
org.apache.lucene.analysis.da	Analyzer for Danish.
org.apache.lucene.analysis.de	Analyzer for German.
org.apache.lucene.analysis.el	Analyzer for Greek.
org.apache.lucene.analysis.en	Analyzer for English.
org.apache.lucene.analysis.es	Analyzer for Spanish.
org.apache.lucene.analysis.eu	Analyzer for Basque.
org.apache.lucene.analysis.fa	Analyzer for Persian.
org.apache.lucene.analysis.fi	Analyzer for Finnish.
org.apache.lucene.analysis.fr	Analyzer for French.
org.apache.lucene.analysis.ga	Analysis for Irish.
org.apache.lucene.analysis.gl	Analyzer for Galician.
org.apache.lucene.analysis.hi	Analyzer for Hindi.
org.apache.lucene.analysis.hu	Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell	Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
org.apache.lucene.analysis.hy	Analyzer for Armenian.
org.apache.lucene.analysis.icu	Analysis components based on ICU
org.apache.lucene.analysis.icu.segmentation	Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
org.apache.lucene.analysis.icu.tokenattributes	Additional ICU-specific Attributes for text analysis.
org.apache.lucene.analysis.id	Analyzer for Indonesian.
org.apache.lucene.analysis.in	Analysis components for Indian languages.
org.apache.lucene.analysis.it	Analyzer for Italian.
org.apache.lucene.analysis.ja	Analyzer for Japanese.
org.apache.lucene.analysis.ja.dict	Kuromoji dictionary implementation.
org.apache.lucene.analysis.ja.tokenattributes	Additional Kuromoji-specific Attributes for text analysis.
org.apache.lucene.analysis.ja.util	Kuromoji utility classes.
org.apache.lucene.analysis.lv	Analyzer for Latvian.
org.apache.lucene.analysis.miscellaneous	Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram	Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl	Analyzer for Dutch.
org.apache.lucene.analysis.no	Analyzer for Norwegian.
org.apache.lucene.analysis.path	Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.payloads	Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.phonetic	Analysis components for phonetic search.
org.apache.lucene.analysis.pl	Analyzer for Polish.
org.apache.lucene.analysis.position	Filter for assigning position increments.
org.apache.lucene.analysis.pt	Analyzer for Portuguese.
org.apache.lucene.analysis.query	Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse	Filter to reverse token text.
org.apache.lucene.analysis.ro	Analyzer for Romanian.
org.apache.lucene.analysis.ru	Analyzer for Russian.
org.apache.lucene.analysis.shingle	Word n-gram filters
org.apache.lucene.analysis.sinks	Implementations of the SinkTokenizer that might be useful.
org.apache.lucene.analysis.snowball	`TokenFilter` and `Analyzer` implementations that use Snowball stemmers.
org.apache.lucene.analysis.stempel	Stempel: Algorithmic Stemmer
org.apache.lucene.analysis.sv	Analyzer for Swedish.
org.apache.lucene.analysis.synonym	Analysis components for Synonyms.
org.apache.lucene.analysis.th	Analyzer for Thai.
org.apache.lucene.analysis.tr	Analyzer for Turkish.
org.apache.lucene.analysis.util	Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia	Tokenizer that is aware of Wikipedia syntax.
org.egothor.stemmer	Egothor stemmer API.
org.tartarus.snowball	Snowball stemmer API.
org.tartarus.snowball.ext	Autogenerated snowball stemmer implementations.

contrib: Benchmark
org.apache.lucene.benchmark	The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora.
org.apache.lucene.benchmark.byTask	Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feeds	Sources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.feeds.demohtml	Example html parser based on JavaCC
org.apache.lucene.benchmark.byTask.programmatic	Sample performance test written programmatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.stats	Statistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasks	Extendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utils	Utilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.quality	Search Quality Benchmarking.
org.apache.lucene.benchmark.quality.trec	Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs.
org.apache.lucene.benchmark.quality.utils	Miscellaneous utilities for search quality benchmarking: query parsing, submission reports.
org.apache.lucene.benchmark.utils	Benchmark Utility functions.

contrib: ICU
org.apache.lucene.collation	`CollationKeyFilter` converts each token into its binary `CollationKey` using the provided `Collator`, and then encode the `CollationKey` as a String using `IndexableBinaryStringTools`, to allow it to be stored as an index term.

contrib: Demo
org.apache.lucene.demo	Demo applications for indexing and searching.

contrib: Facet
org.apache.lucene.facet	Provides faceted indexing and search capabilities.
org.apache.lucene.facet.enhancements	Enhanced category features
org.apache.lucene.facet.enhancements.association	Association category enhancements
org.apache.lucene.facet.enhancements.params	Enhanced category features
org.apache.lucene.facet.index	Indexing of document categories
org.apache.lucene.facet.index.attributes	Category attributes and their properties for indexing
org.apache.lucene.facet.index.categorypolicy	Policies for indexing categories
org.apache.lucene.facet.index.params	Indexing-time specifications for handling facets
org.apache.lucene.facet.index.streaming	Expert: attributes streaming definition for indexing facets
org.apache.lucene.facet.search	Faceted Search API
org.apache.lucene.facet.search.aggregator	Aggregating Facets during Faceted Search
org.apache.lucene.facet.search.aggregator.association	Association-based aggregators.
org.apache.lucene.facet.search.cache	Caching to speed up facets accumulation.
org.apache.lucene.facet.search.params	Parameters for Faceted Search
org.apache.lucene.facet.search.params.association	Association-based Parameters for Faceted Search.
org.apache.lucene.facet.search.results	Results of Faceted Search
org.apache.lucene.facet.search.sampling	Sampling for facets accumulation
org.apache.lucene.facet.taxonomy	Taxonomy of Categories
org.apache.lucene.facet.taxonomy.directory	Taxonomy implemented using a Lucene-Index
org.apache.lucene.facet.taxonomy.writercache	Improves indexing time by caching a map of CategoryPath to their Ordinal
org.apache.lucene.facet.taxonomy.writercache.cl2o	Category->Ordinal caching implementation using an optimized data-structures
org.apache.lucene.facet.taxonomy.writercache.lru	An LRU cache implementation for the CategoryPath to Ordinal map
org.apache.lucene.facet.util	Various utilities for faceted search

contrib: Grouping
org.apache.lucene.search.grouping	This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together.

contrib: Highlighter
org.apache.lucene.search.highlight	The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.
org.apache.lucene.search.vectorhighlight	This is an another highlighter implementation.

contrib: Instantiated
org.apache.lucene.store.instantiated	InstantiatedIndex, alternative RAM store for small corpora.

contrib: Join
org.apache.lucene.search.join	This modules support index-time and query-time joins.

contrib: Memory
org.apache.lucene.index.memory	High-performance single-document main memory Apache Lucene fulltext search index.

contrib: Misc
org.apache.lucene.misc	Miscellaneous index tools.

contrib: Pruning
org.apache.lucene.index.pruning	Static Index Pruning Tools

contrib: Queries
org.apache.lucene.search.regex	Regular expression Query.
org.apache.lucene.search.similar	Document similarity query generators.

contrib: Query Parser
org.apache.lucene.queryParser.analyzing	QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
org.apache.lucene.queryParser.complexPhrase	QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
org.apache.lucene.queryParser.core	Contains the core classes of the flexible query parser framework
org.apache.lucene.queryParser.core.builders	Contains the necessary classes to implement query builders
org.apache.lucene.queryParser.core.config	Contains the base classes used to configure the query processing
org.apache.lucene.queryParser.core.messages	Contains messages usually used by query parser implementations
org.apache.lucene.queryParser.core.nodes	Contains query nodes that are commonly used by query parser implementations
org.apache.lucene.queryParser.core.parser	Contains the necessary interfaces to implement text parsers
org.apache.lucene.queryParser.core.processors	Interfaces and implementations used by query node processors
org.apache.lucene.queryParser.core.util	Utility classes to used with the Query Parser
org.apache.lucene.queryParser.ext	Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names.
org.apache.lucene.queryParser.precedence	This package contains the Precedence Query Parser Implementation
org.apache.lucene.queryParser.precedence.processors	This package contains the processors used by Precedence Query Parser
org.apache.lucene.queryParser.standard	Contains the implementation of the Lucene query parser using the flexible query parser frameworks
org.apache.lucene.queryParser.standard.builders	Standard Lucene Query Node Builders
org.apache.lucene.queryParser.standard.config	Standard Lucene Query Configuration
org.apache.lucene.queryParser.standard.nodes	Standard Lucene Query Nodes
org.apache.lucene.queryParser.standard.parser	Lucene Query Parser
org.apache.lucene.queryParser.standard.processors	Lucene Query Node Processors
org.apache.lucene.queryParser.surround.parser	This package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.query	This package contains SrndQuery and its subclasses.

contrib: Spatial
org.apache.lucene.spatial	Support for geospatial search.
org.apache.lucene.spatial.geohash	Support for Geohash encoding, decoding, and filtering.
org.apache.lucene.spatial.geometry	Coordinate and distance representations.
org.apache.lucene.spatial.geometry.shape	Shape representations.
org.apache.lucene.spatial.tier	Support for filtering based upon geographic location.
org.apache.lucene.spatial.tier.projections	Spatial projections.

contrib: SpellChecker
org.apache.lucene.search.spell	Suggest alternate spellings for words.
org.apache.lucene.search.suggest	Support for Autocomplete/Autosuggest
org.apache.lucene.search.suggest.fst	Finite-state based autosuggest.
org.apache.lucene.search.suggest.jaspell	JaSpell-based autosuggest.
org.apache.lucene.search.suggest.tst	Ternary Search Tree based autosuggest.

contrib: XML Query Parser
org.apache.lucene.xmlparser	Parser that produces Lucene Query objects from XML streams.
org.apache.lucene.xmlparser.builders	Builders to support various Lucene queries.

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
                                          new IndexWriter.MaxFieldLength(25000));
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.ANALYZED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    IndexReader ireader = IndexReader.open(directory); // read-only=true
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    ireader.close();
    directory.close();

The Lucene API is divided into several packages:

org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a java.io.Reader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. A handful of Analyzer implementations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.document provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of java.io.Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the abstract Searcher which turns queries into TopDocs. IndexSearcher implements search over a single IndexReader.
org.apache.lucene.queryParser uses JavaCC to implement a QueryParser.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, including FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
org.apache.lucene.util contains a few handy data structures and util classes, ie BitVector and PriorityQueue.

To use Lucene, an application should:

Create Documents by adding Fields;
Create an IndexWriter and add documents to it with addDocument();
Call QueryParser.parse() to build a query from a string; and
Create an IndexSearcher and pass the query to its search() method.

Some simple examples of code which does this are:

IndexFiles.java creates an index for all the files contained in a directory.
SearchFiles.java prompts for queries and searches an index.

To demonstrate these, try something like:

> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]
Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES