java Lucene 中自定义排序的实现

Lucene中的自定义排序功能和Java集合中的自定义排序的实现方法差不多,都要实现一下比较接口. 在Java中只要实现Comparable接口就可以了.但是在Lucene中要实现SortComparatorSource接口和ScoreDocComparator接口.在了解具体实现方法之前先来看看这两个接口的定义吧.
SortComparatorSource接口的功能是返回一个用来排序ScoreDocs的comparator(Expert: returns a comparator for sorting ScoreDocs).该接口只定义了一个方法.如下:
Java代码
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
view plaincopy to clipboardprint?
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
/**
* Creates a comparator for the field in the given index.
* @param reader - Index to create comparator for.
* @param fieldname - Field to create comparator for.
* @return Comparator of ScoreDoc objects.
* @throws IOException - If an error occurs reading the index.
*/
public ScoreDocComparator newComparator(IndexReader reader,String fieldname) throws IOException
该方法只是创造一个ScoreDocComparator 实例用来实现排序.所以我们还要实现ScoreDocComparator 接口.来看看ScoreDocComparator 接口.功能是比较来两个ScoreDoc 对象来排序(Compares two ScoreDoc objects for sorting) 里面定义了两个Lucene实现的静态实例.如下:
Java代码
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
view plaincopy to clipboardprint?
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;
//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
//Special comparator for sorting hits according to computed relevance (document score).
public static final ScoreDocComparator RELEVANCE;

//Special comparator for sorting hits according to index order (document number).
public static final ScoreDocComparator INDEXORDER;
有3个方法与排序相关,需要我们实现 分别如下:
Java代码
/**
* Compares two ScoreDoc objects and returns a result indicating their sort order.
* @param i First ScoreDoc
* @param j Second ScoreDoc
* @return -1 if i should come before j;
* 1 if i should come after j;
* 0 if they are equal
*/
public int compare(ScoreDoc i,ScoreDoc j);
/**
* Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
* @param i Document
* @return Serializable object
*/
public Comparable sortValue(ScoreDoc i);
/**
* Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
* @return One of the constants in SortField.
*/
public int sortType();
view plaincopy to clipboardprint?
/**
* Compares two ScoreDoc objects and returns a result indicating their sort order.
* @param i First ScoreDoc
* @param j Second ScoreDoc
* @return -1 if i should come before j;
* 1 if i should come after j;
* 0 if they are equal
*/
public int compare(ScoreDoc i,ScoreDoc j);
/**
* Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
* @param i Document
* @return Serializable object
*/
public Comparable sortValue(ScoreDoc i);
/**
* Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
* @return One of the constants in SortField.
*/
public int sortType();
/**
     * Compares two ScoreDoc objects and returns a result indicating their sort order.
     * @param i First ScoreDoc
     * @param j Second ScoreDoc
     * @return -1 if i should come before j;
     * 1 if i should come after j;
     * 0 if they are equal
     */
    public int compare(ScoreDoc i,ScoreDoc j);
    /**
     * Returns the value used to sort the given document. The object returned must implement the java.io.Serializable interface. This is used by multisearchers to determine how to collate results from their searchers.
     * @param i Document
     * @return Serializable object
     */
    public Comparable sortValue(ScoreDoc i);
    /**
     * Returns the type of sort. Should return SortField.SCORE, SortField.DOC, SortField.STRING, SortField.INTEGER, SortField.FLOAT or SortField.CUSTOM. It is not valid to return SortField.AUTO. This is used by multisearchers to determine how to collate results from their searchers.
     * @return One of the constants in SortField.
     */
    public int sortType();
看个例子吧!
该例子为Lucene in Action中的一个实现,用来搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储.
Java代码
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int compare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
view plaincopy to clipboardprint?
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
private static final long serialVersionUID = 1L;
// x y 用来保存 坐标位置
private int x;
private int y;
public DistanceComparatorSource(int x, int y) {
this.x = x;
this.y = y;
}
// 返回ScoreDocComparator 用来实现排序功能
public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
}
//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
private float[] distances; // 保存每个餐馆到指定点的距离
// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
System.out.println("fieldName2="+fieldname);
final TermEnum enumerator = reader.terms(new Term(fieldname, ""));
System.out.println("maxDoc="+reader.maxDoc());
distances = new float[reader.maxDoc()]; // 初始化distances
if (distances.length > 0) {
TermDocs termDocs = reader.termDocs();
try {
if (enumerator.term() == null) {
throw new RuntimeException("no terms in field " + fieldname);
}
int i = 0,j = 0;
do {
System.out.println("in do-while :" + i ++);
Term term = enumerator.term(); // 取出每一个Term
if (term.field() != fieldname) // 与给定的域不符合则比较下一个
break;
//Sets this to the data for the current term in a TermEnum.
//This may be optimized in some implementations.
termDocs.seek(enumerator); //参考TermDocs Doc
while (termDocs.next()) {
System.out.println(" in while :" + j ++);
System.out.println(" in while ,Term :" + term.toString());
String[] xy = term.text().split(","); // 去处x y
int deltax = Integer.parseInt(xy[0]) - x;
int deltay = Integer.parseInt(xy[1]) - y;
// 计算距离
distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
}
}
while (enumerator.next());
} finally {
termDocs.close();
}
}
}
//有上面的构造函数的准备 这里就比较简单了
public int compare(ScoreDoc i, ScoreDoc j) {
if (distances[i.doc] < distances[j.doc])
return -1;
if (distances[i.doc] > distances[j.doc])
return 1;
return 0;
}
// 返回距离
public Comparable sortValue(ScoreDoc i) {
return new Float(distances[i.doc]);
}
//指定SortType
public int sortType() {
return SortField.FLOAT;
}
}
public String toString() {
return "Distance from (" + x + "," + y + ")";
}
}
package com.nikee.lucene;
import java.io.IOException;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.index.TermDocs;
import org.apache.lucene.index.TermEnum;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.ScoreDocComparator;
import org.apache.lucene.search.SortComparatorSource;
import org.apache.lucene.search.SortField;
//实现了搜索距你最近的餐馆的名字. 餐馆坐标用字符串"x,y"来存储
//DistanceComparatorSource 实现了SortComparatorSource接口
public class DistanceComparatorSource implements SortComparatorSource {
    private static final long serialVersionUID = 1L;

// x y 用来保存 坐标位置
    private int x;
    private int y;

public DistanceComparatorSource(int x, int y) {
        this.x = x;
        this.y = y;
    }

// 返回ScoreDocComparator 用来实现排序功能
    public ScoreDocComparator newComparator(IndexReader reader, String fieldname) throws IOException {
        return new DistanceScoreDocLookupComparator(reader, fieldname, x, y);
    }

//DistanceScoreDocLookupComparator 实现了ScoreDocComparator 用来排序
    private static class DistanceScoreDocLookupComparator implements ScoreDocComparator {
        private float[] distances; // 保存每个餐馆到指定点的距离

// 构造函数 , 构造函数在这里几乎完成所有的准备工作.
        public DistanceScoreDocLookupComparator(IndexReader reader, String fieldname, int x, int y) throws IOException {
            System.out.println("fieldName2="+fieldname);
            final TermEnum enumerator = reader.terms(new Term(fieldname, ""));

System.out.println("maxDoc="+reader.maxDoc());
            distances = new float[reader.maxDoc()]; // 初始化distances
            if (distances.length > 0) {
                TermDocs termDocs = reader.termDocs();
                try {
                    if (enumerator.term() == null) {
                        throw new RuntimeException("no terms in field " + fieldname);
                    }
                    int i = 0,j = 0;
                    do {
                        System.out.println("in do-while :" + i ++);
                        Term term = enumerator.term(); // 取出每一个Term
                        if (term.field() != fieldname) // 与给定的域不符合则比较下一个
                            break;

//Sets this to the data for the current term in a TermEnum.
                        //This may be optimized in some implementations.
                        termDocs.seek(enumerator); //参考TermDocs Doc
                        while (termDocs.next()) {
                            System.out.println(" in while :" + j ++);
                            System.out.println(" in while ,Term :" + term.toString());

String[] xy = term.text().split(","); // 去处x y
                            int deltax = Integer.parseInt(xy[0]) - x;
                            int deltay = Integer.parseInt(xy[1]) - y;
                            // 计算距离
                            distances[termDocs.doc()] = (float) Math.sqrt(deltax * deltax + deltay * deltay);
                        }
                    }
                    while (enumerator.next());
                } finally {
                    termDocs.close();
                }
            }
        }
        //有上面的构造函数的准备 这里就比较简单了
        public int compare(ScoreDoc i, ScoreDoc j) {
            if (distances[i.doc] < distances[j.doc])
                return -1;
            if (distances[i.doc] > distances[j.doc])
                return 1;
            return 0;
        }

// 返回距离
        public Comparable sortValue(ScoreDoc i) {
            return new Float(distances[i.doc]);
        }

//指定SortType
        public int sortType() {
            return SortField.FLOAT;
        }
    }

public String toString() {
        return "Distance from (" + x + "," + y + ")";
    }
}
这是一个实现了上面两个接口的两个类, 里面带有详细注释, 可以看出 自定义排序并不是很难的. 该实现能否正确实现,我们来看看测试代码能否通过吧.
Java代码
package com.nikee.lucene.test;
import java.io.IOException;
import junit.framework.TestCase;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.FieldDoc;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopFieldDocs;
import org.apache.lucene.store.RAMDirectory;
import com.nikee.lucene.DistanceComparatorSource;
public class DistanceComparatorSourceTest extends TestCase {
private RAMDirectory directory;
private IndexSearcher searcher;
private Query query;
//建立测试环境
protected void setUp() throws Exception {
directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new WhitespaceAnalyzer(), true);
addPoint(writer, "El Charro", "restaurant", 1, 2);
addPoint(writer, "Cafe Poca Cosa", "restaurant", 5, 9);
addPoint(writer, "Los Betos", "restaurant", 9, 6);
addPoint(writer, "Nico's Taco Shop", "restaurant", 3, 8);
writer.close();
searcher = new IndexSearcher(directory);
query = new TermQuery(new Term("type", "restaurant"));
}
private void addPoint(IndexWriter writer, String name, String type, int x, int y) throws IOException {
Document doc = new Document();
doc.add(new Field("name", name, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("type", type, Field.Store.YES, Field.Index.TOKENIZED));
doc.add(new Field("location", x + "," + y, Field.Store.YES, Field.Index.UN_TOKENIZED));
writer.addDocument(doc);
}
public void testNearestRestaurantToHome() throws Exception {
//使用DistanceComparatorSource来构造一个SortField
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(0, 0)));
Hits hits = searcher.search(query, sort); // 搜索
//测试
assertEquals("closest", "El Charro", hits.doc(0).get("name"));
assertEquals("furthest", "Los Betos", hits.doc(3).get("name"));
}
public void testNeareastRestaurantToWork() throws Exception {
Sort sort = new Sort(new SortField("location", new DistanceComparatorSource(10, 10))); // 工作的坐标 10,10
//上面的测试实现了自定义排序,但是并不能访问自定义排序的更详细信息,利用
//TopFieldDocs 可以进一步访问相关信息
TopFieldDocs docs = searcher.search(query, null, 3, sort);
assertEquals(4, docs.totalHits);
assertEquals(3, docs.scoreDocs.length);
//取得FieldDoc 利用FieldDoc可以取得关于排序的更详细信息 请查看FieldDoc Doc
FieldDoc fieldDoc = (FieldDoc) docs.scoreDocs[0];
assertEquals("(10,10) -> (9,6) = sqrt(17)", new Float(Math.sqrt(17)), fieldDoc.fields[0]);
Document document = searcher.doc(fieldDoc.doc);
assertEquals("Los Betos", document.get("name"));
dumpDocs(sort, docs); // 显示相关信息
}
// 显示有关排序的信息
private void dumpDocs(Sort sort, TopFieldDocs docs) throws IOException {
System.out.println("Sorted by: " + sort);
ScoreDoc[] scoreDocs = docs.scoreDocs;
for (int i = 0; i < scoreDocs.length; i++) {
FieldDoc fieldDoc = (FieldDoc) scoreDocs[i];
Float distance = (Float) fieldDoc.fields[0];
Document doc = searcher.doc(fieldDoc.doc);
System.out.println(" " + doc.get("name") + " @ (" + doc.get("location") + ") -> " + distance);
}
}
}

时间: 2008-12-24

使用Java的Lucene搜索工具对检索结果进行分组和分页

使用GroupingSearch对搜索结果进行分组 Package org.apache.lucene.search.grouping Description 这个模块可以对Lucene的搜索结果进行分组,指定的单值域被聚集到一起.比如,根据"author"域进行分组,"author"域值相同的的文档分成一个组. 进行分组的时候需要输入一些必要的信息: 1.groupField:根据这个域进行分组.比如,如果你使用"author"域进行分组,那么

Java实现lucene搜索功能的方法(推荐)

直接上代码: package com.sand.mpa.sousuo; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.PrintWriter; import java.sql.Connection; import java.sql.DriverMa

Python通过90行代码搭建一个音乐搜索工具

下面小编把具体实现代码给大家分享如下: 之前一段时间读到了这篇博客,其中描述了作者如何用java实现国外著名音乐搜索工具shazam的基本功能.其中所提到的文章又将我引向了关于shazam的一篇论文及另外一篇博客.读完之后发现其中的原理并不十分复杂,但是方法对噪音的健壮性却非常好,出于好奇决定自己用python自己实现了一个简单的音乐搜索工具-- Song Finder, 它的核心功能被封装在SFEngine 中,第三方依赖方面只使用到了 scipy. 工具demo 这个demo在ipython

Java 中DateUtils日期工具类的实例详解

Java 中DateUtils日期工具类的实例详解 介绍 在java中队日期类型的处理并不方便,通常都需要借助java.text.SimpleDateFormat类来实现日期类型 和字符串类型之间的转换,但是在jdk1.8之后有所改善,jdk1.7以及之前的版本处理日期类型并不方便, 可以借助Joda Time组件来处理,尤其是日期类型的一些数学操作就更是不方便. java代码 /** * * 日期工具类 java对日期的操作一直都很不理想,直到jdk1.8之后才有了本质的改变. * 如果使用的

java实现的正则工具类

本文实例讲述了java实现的正则工具类.分享给大家供大家参考.具体如下: 这里实现的正则工具类适用于:正则电话号码.邮箱.QQ号码.QQ密码.手机号 java代码如下: package com.zhanggeng.contact.tools; /** * RegexTool is used to regex the string ,such as : phone , qq , password , email . * * @author ZHANGGeng * @version v1.0.1 *

Java常用的时间工具类实例

本文实例讲述了Java常用的时间工具类.分享给大家供大家参考,具体如下: package org.zhy.date; import java.text.DateFormat; import java.text.SimpleDateFormat; import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; /** * 时间类型工具类 * * @author zhengyi * */ pu

Java TokenProcessor令牌校验工具类

关于TokenProcessor令牌校验工具类废话不多说了,直接给大家贴代码了,一切内容就在下面一段代码中,具体代码详情如下所示: public class TokenProcessor { private long privious;// 上次生成表单标识号得时间值 private static TokenProcessor instance = new TokenProcessor(); public static String FORM_TOKEN_KEY = "FORM_TOKEN_KE

详解JAVA中使用FTPClient工具类上传下载

详解JAVA中使用FTPClient工具类上传下载 在Java程序中,经常需要和FTP打交道,比如向FTP服务器上传文件.下载文件.本文简单介绍如何利用jakarta commons中的FTPClient(在commons-net包中)实现上传下载文件. 1.写一个javabean文件,描述ftp上传或下载的信息 实例代码: public class FtpUseBean { private String host; private Integer port; private String us

Java基础之java处理ip的工具类

java处理ip的工具类,包括把long类型的Ip转为一般Ip类型.把xx.xx.xx.xx类型的转为long类型.根据掩码位获取掩码.根据 ip/掩码位 计算IP段的起始IP.根据 ip/掩码位 计算IP段的终止IP等方法,可以直接使用! 复制代码 代码如下: package com.hh.test; import java.util.HashMap; import java.util.Map; import org.apache.commons.lang3.StringUtils; /**

Java实现的汉语拼音工具类完整实例

本文实例讲述了Java实现的汉语拼音工具类.分享给大家供大家参考,具体如下: package test; import net.sourceforge.pinyin4j.PinyinHelper; import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType; import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat; import net.sourceforge.piny