Lucene 3.6.0 API

Apache Lucene is a high-performance, full-featured text search engine library.

See:
          Description

Core
org.apache.lucene Top-level package.
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.
org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.
org.apache.lucene.analysis.standard.std31 Backwards-compatible implementation to match Version.LUCENE_31
org.apache.lucene.analysis.standard.std34 Backwards-compatible implementation to match Version.LUCENE_34
org.apache.lucene.analysis.tokenattributes Useful Attributes for text analysis.
org.apache.lucene.document The logical representation of a Document for indexing and searching.
org.apache.lucene.index Code to maintain and access indices.
org.apache.lucene.messages For Native Language Support (NLS), system of software internationalization.
org.apache.lucene.queryParser A simple query parser implemented with JavaCC.
org.apache.lucene.search Code to search indices.
org.apache.lucene.search.function
Programmatic control over documents scores.
org.apache.lucene.search.payloads
The payloads package provides Query mechanisms for finding and using payloads.
org.apache.lucene.search.spans The calculus of spans.
org.apache.lucene.store Binary i/o API, used for all index data.
org.apache.lucene.util Some utility classes.
org.apache.lucene.util.collections Various optimized Collections implementations.
org.apache.lucene.util.encoding Offers various encoders and decoders for integers, as well as the mechanisms to create new ones.
org.apache.lucene.util.fst Finite state transducers
org.apache.lucene.util.packed The packed package provides random access capable arrays of positive longs.

 

contrib: Analysis
org.apache.lucene.analysis.ar Analyzer for Arabic.
org.apache.lucene.analysis.bg Analyzer for Bulgarian.
org.apache.lucene.analysis.br Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca Analyzer for Catalan.
org.apache.lucene.analysis.charfilter CharFilters: process text before the Tokenizer
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).
org.apache.lucene.analysis.cn.smart
Analyzer for Simplified Chinese, which indexes words.
org.apache.lucene.analysis.cn.smart.hhmm
SmartChineseAnalyzer Hidden Markov Model package.
org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation The code for the compound word hyphenation is taken from the Apache FOP project.
org.apache.lucene.analysis.cz Analyzer for Czech.
org.apache.lucene.analysis.da Analyzer for Danish.
org.apache.lucene.analysis.de Analyzer for German.
org.apache.lucene.analysis.el Analyzer for Greek.
org.apache.lucene.analysis.en Analyzer for English.
org.apache.lucene.analysis.es Analyzer for Spanish.
org.apache.lucene.analysis.eu Analyzer for Basque.
org.apache.lucene.analysis.fa Analyzer for Persian.
org.apache.lucene.analysis.fi Analyzer for Finnish.
org.apache.lucene.analysis.fr Analyzer for French.
org.apache.lucene.analysis.ga Analysis for Irish.
org.apache.lucene.analysis.gl Analyzer for Galician.
org.apache.lucene.analysis.hi Analyzer for Hindi.
org.apache.lucene.analysis.hu Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
org.apache.lucene.analysis.hy Analyzer for Armenian.
org.apache.lucene.analysis.icu Analysis components based on ICU
org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
org.apache.lucene.analysis.icu.tokenattributes Additional ICU-specific Attributes for text analysis.
org.apache.lucene.analysis.id Analyzer for Indonesian.
org.apache.lucene.analysis.in Analysis components for Indian languages.
org.apache.lucene.analysis.it Analyzer for Italian.
org.apache.lucene.analysis.ja Analyzer for Japanese.
org.apache.lucene.analysis.ja.dict Kuromoji dictionary implementation.
org.apache.lucene.analysis.ja.tokenattributes Additional Kuromoji-specific Attributes for text analysis.
org.apache.lucene.analysis.ja.util Kuromoji utility classes.
org.apache.lucene.analysis.lv Analyzer for Latvian.
org.apache.lucene.analysis.miscellaneous Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl Analyzer for Dutch.
org.apache.lucene.analysis.no Analyzer for Norwegian.
org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.payloads
Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.phonetic Analysis components for phonetic search.
org.apache.lucene.analysis.pl Analyzer for Polish.
org.apache.lucene.analysis.position Filter for assigning position increments.
org.apache.lucene.analysis.pt Analyzer for Portuguese.
org.apache.lucene.analysis.query Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse Filter to reverse token text.
org.apache.lucene.analysis.ro Analyzer for Romanian.
org.apache.lucene.analysis.ru Analyzer for Russian.
org.apache.lucene.analysis.shingle Word n-gram filters
org.apache.lucene.analysis.sinks
Implementations of the SinkTokenizer that might be useful.
org.apache.lucene.analysis.snowball TokenFilter and Analyzer implementations that use Snowball stemmers.
org.apache.lucene.analysis.stempel Stempel: Algorithmic Stemmer
org.apache.lucene.analysis.sv Analyzer for Swedish.
org.apache.lucene.analysis.synonym Analysis components for Synonyms.
org.apache.lucene.analysis.th Analyzer for Thai.
org.apache.lucene.analysis.tr Analyzer for Turkish.
org.apache.lucene.analysis.util Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax.
org.egothor.stemmer Egothor stemmer API.
org.tartarus.snowball Snowball stemmer API.
org.tartarus.snowball.ext Autogenerated snowball stemmer implementations.

 

contrib: Benchmark
org.apache.lucene.benchmark

The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora.

org.apache.lucene.benchmark.byTask
Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feeds Sources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.feeds.demohtml Example html parser based on JavaCC
org.apache.lucene.benchmark.byTask.programmatic Sample performance test written programmatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.stats Statistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasks Extendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utils Utilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.quality Search Quality Benchmarking.
org.apache.lucene.benchmark.quality.trec Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs.
org.apache.lucene.benchmark.quality.utils Miscellaneous utilities for search quality benchmarking: query parsing, submission reports.
org.apache.lucene.benchmark.utils Benchmark Utility functions.

 

contrib: ICU
org.apache.lucene.collation CollationKeyFilter converts each token into its binary CollationKey using the provided Collator, and then encode the CollationKey as a String using IndexableBinaryStringTools, to allow it to be stored as an index term.

 

contrib: Demo
org.apache.lucene.demo Demo applications for indexing and searching.

 

contrib: Facet
org.apache.lucene.facet Provides faceted indexing and search capabilities.
org.apache.lucene.facet.enhancements Enhanced category features
org.apache.lucene.facet.enhancements.association Association category enhancements
org.apache.lucene.facet.enhancements.params Enhanced category features
org.apache.lucene.facet.index Indexing of document categories
org.apache.lucene.facet.index.attributes Category attributes and their properties for indexing
org.apache.lucene.facet.index.categorypolicy Policies for indexing categories
org.apache.lucene.facet.index.params Indexing-time specifications for handling facets
org.apache.lucene.facet.index.streaming Expert: attributes streaming definition for indexing facets
org.apache.lucene.facet.search Faceted Search API
org.apache.lucene.facet.search.aggregator Aggregating Facets during Faceted Search
org.apache.lucene.facet.search.aggregator.association Association-based aggregators.
org.apache.lucene.facet.search.cache Caching to speed up facets accumulation.
org.apache.lucene.facet.search.params Parameters for Faceted Search
org.apache.lucene.facet.search.params.association Association-based Parameters for Faceted Search.
org.apache.lucene.facet.search.results Results of Faceted Search
org.apache.lucene.facet.search.sampling Sampling for facets accumulation
org.apache.lucene.facet.taxonomy Taxonomy of Categories
org.apache.lucene.facet.taxonomy.directory Taxonomy implemented using a Lucene-Index
org.apache.lucene.facet.taxonomy.writercache Improves indexing time by caching a map of CategoryPath to their Ordinal
org.apache.lucene.facet.taxonomy.writercache.cl2o Category->Ordinal caching implementation using an optimized data-structures
org.apache.lucene.facet.taxonomy.writercache.lru An LRU cache implementation for the CategoryPath to Ordinal map
org.apache.lucene.facet.util Various utilities for faceted search

 

contrib: Grouping
org.apache.lucene.search.grouping This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together.

 

contrib: Highlighter
org.apache.lucene.search.highlight The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.
org.apache.lucene.search.vectorhighlight This is an another highlighter implementation.

 

contrib: Instantiated
org.apache.lucene.store.instantiated InstantiatedIndex, alternative RAM store for small corpora.

 

contrib: Join
org.apache.lucene.search.join This modules support index-time and query-time joins.

 

contrib: Memory
org.apache.lucene.index.memory High-performance single-document main memory Apache Lucene fulltext search index.

 

contrib: Misc
org.apache.lucene.misc Miscellaneous index tools.

 

contrib: Pruning
org.apache.lucene.index.pruning
Static Index Pruning Tools

 

contrib: Queries
org.apache.lucene.search.regex Regular expression Query.
org.apache.lucene.search.similar Document similarity query generators.

 

contrib: Query Parser
org.apache.lucene.queryParser.analyzing QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
org.apache.lucene.queryParser.complexPhrase QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
org.apache.lucene.queryParser.core Contains the core classes of the flexible query parser framework
org.apache.lucene.queryParser.core.builders Contains the necessary classes to implement query builders
org.apache.lucene.queryParser.core.config Contains the base classes used to configure the query processing
org.apache.lucene.queryParser.core.messages Contains messages usually used by query parser implementations
org.apache.lucene.queryParser.core.nodes Contains query nodes that are commonly used by query parser implementations
org.apache.lucene.queryParser.core.parser Contains the necessary interfaces to implement text parsers
org.apache.lucene.queryParser.core.processors Interfaces and implementations used by query node processors
org.apache.lucene.queryParser.core.util Utility classes to used with the Query Parser
org.apache.lucene.queryParser.ext Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names.
org.apache.lucene.queryParser.precedence This package contains the Precedence Query Parser Implementation
org.apache.lucene.queryParser.precedence.processors This package contains the processors used by Precedence Query Parser
org.apache.lucene.queryParser.standard Contains the implementation of the Lucene query parser using the flexible query parser frameworks
org.apache.lucene.queryParser.standard.builders Standard Lucene Query Node Builders
org.apache.lucene.queryParser.standard.config Standard Lucene Query Configuration
org.apache.lucene.queryParser.standard.nodes Standard Lucene Query Nodes
org.apache.lucene.queryParser.standard.parser Lucene Query Parser
org.apache.lucene.queryParser.standard.processors Lucene Query Node Processors
org.apache.lucene.queryParser.surround.parser This package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.query This package contains SrndQuery and its subclasses.

 

contrib: Spatial
org.apache.lucene.spatial Support for geospatial search.
org.apache.lucene.spatial.geohash Support for Geohash encoding, decoding, and filtering.
org.apache.lucene.spatial.geometry Coordinate and distance representations.
org.apache.lucene.spatial.geometry.shape Shape representations.
org.apache.lucene.spatial.tier Support for filtering based upon geographic location.
org.apache.lucene.spatial.tier.projections Spatial projections.

 

contrib: SpellChecker
org.apache.lucene.search.spell Suggest alternate spellings for words.
org.apache.lucene.search.suggest Support for Autocomplete/Autosuggest
org.apache.lucene.search.suggest.fst Finite-state based autosuggest.
org.apache.lucene.search.suggest.jaspell JaSpell-based autosuggest.
org.apache.lucene.search.suggest.tst Ternary Search Tree based autosuggest.

 

contrib: XML Query Parser
org.apache.lucene.xmlparser Parser that produces Lucene Query objects from XML streams.
org.apache.lucene.xmlparser.builders Builders to support various Lucene queries.

 

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory = FSDirectory.open("/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
                                          new IndexWriter.MaxFieldLength(25000));
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
        Field.Index.ANALYZED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    IndexReader ireader = IndexReader.open(directory); // read-only=true
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    isearcher.close();
    ireader.close();
    directory.close();

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Documents by adding Fields;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
  [ ... ]

> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
  [ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
  [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
    [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]