文章目录
1 深分页Scroll
1.1 分页的查询过程
ES对from+size是有限制的,from+size两者和不能超过1w
使用from+size在es中查询数据:
1,将用户指定的关键词通过分词器分片
2,用分词后的词语在分词库中检索,得到多个document的id
3,去各个分片中拉取指定的document(耗时较长)
4,将数据按照score进行排序(耗时较长)
5,根据from的值,将查询到的数据舍弃一部分
6,返回结果使用scroll+size在es中查询数据:
1,将用户指定的关键词通过分词器分片
2,用分词后的词语在分词库中检索,得到多个document的id
3,将这些id存放在内存中的一个上下文中
4,根据指定的size从上下文中取id,再去分片中获取document,默认根据id排序,并将这些document的id从上下文中移除
5,需要下一页数据时,从上下文中继续取id
Scroll查询不适合实时查询,因为在内存中建立的id集合会持续使用一段时间后再过期
这段时间es数据更新后,内存中的集合无法感受到
1.2 Scroll查询的实现
DSL:
查询第一页两条数据,根据age倒序排序
POST /index_name/type_name/_search?scroll=1m (这里scroll=1m 即内存中的id集合1min后过期删除)
{"query" : {"match_all" : {}},"size" : 2,"sort" : [{"age" : {"order" : "desc"}}]
}
第一次使用scroll查询后会返回一个scroll_id查询下一页数据,根据scroll_id进行查询
POST /_search/scroll
{"scroll_id" : "xxxxx","scroll" : "1m"
}删除es中scroll上下文数据
DELETE /_search/scroll/scroll_id
Java:
@Testpublic void queryByScroll() throws IOException {// 创建searchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定scroll信息request.scroll(TimeValue.timeValueMinutes(1L));// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();builder.size(4);builder.sort("age", SortOrder.DESC);builder.query(QueryBuilders.matchAllQuery());request.source(builder);// 执行并返回结果 获取scroll_id和sourceSearchResponse response = client.search(request, RequestOptions.DEFAULT);String scrollId = response.getScrollId();System.out.println("------首页------");for(SearchHit hit : response.getHits().getHits()) {System.out.println(hit.getSourceAsMap());}// 继续分页查询while (true) {// 循环创建searchScrollRequestSearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);scrollRequest.scroll(TimeValue.timeValueMinutes(1L));// 指定scroll_id 执行查询并返回结果SearchResponse scrollResp = client.scroll(scrollRequest, RequestOptions.DEFAULT);// 判断是否查询到数据SearchHit[] hits = scrollResp.getHits().getHits();if(hits != null && hits.length > 0) {System.out.println("------下一页------");for(SearchHit hit : hits) {System.out.println(hit.getSourceAsMap());}} else {System.out.println("------结束------");break;}}// 创建clearScrollRequest 指定scroll_idClearScrollRequest clearScrollRequest = new ClearScrollRequest();clearScrollRequest.addScrollId(scrollId);// 在内存中删除scroll上下文ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);System.out.println("删除scroll: " + clearScrollResponse.isSucceeded());}
2 delete-by-query
delete-by-query即根据查询条件删除文档
使用match,term删除大量内容
DSL:
删除age小于20的document
POST /index_name/type_name/delete_by_query
{"query" : {"range" : {"age" : {"lt" : 20}}}
}
Java:
@Testpublic void deleteByQuery() throws IOException {// 创建deleteByQueryRequestDeleteByQueryRequest request = new DeleteByQueryRequest(index);request.types(type);// 指定查询条件request.setQuery(QueryBuilders.rangeQuery("age").lt(18));// 执行删除BulkByScrollResponse response = client.deleteByQuery(request, RequestOptions.DEFAULT);// 输出返回结果System.out.println(response.toString());}
3 ES的复合查询
3.1 bool查询
复合过滤器,可以将多个查询条件,以一定逻辑组合在一起
must: 所有的条件,用must组合在一起,表示and的意思must_not: 所有的条件,用must_not组合在一起,表示not的意思should: 所有的条件,用should组合在一起,表示or的意思
DSL:
查询省份为北京或上海
不是学生
技能中包括"计算机"和"通信"的人POST /index_name/type_name/_search
{"query" : {"bool" : {"should" : [{"term" : {"province" : {"value" : "北京"}}},{"term" : {"province" : {"value" : "上海"}}}],"must_not" : [{"term" : {"prof" : {"value" : "学生"}} }],"must" : [{"match" : {"skill" : "计算机"}},{"match" : {"skill" : "通信"}}]}}
}
Java:
@Testpublic void boolQuery() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();boolQueryBuilder.should(QueryBuilders.termQuery("province", "北京"));boolQueryBuilder.should(QueryBuilders.termQuery("province", "上海"));boolQueryBuilder.mustNot(QueryBuilders.termQuery("prof", "学生"));boolQueryBuilder.must(QueryBuilders.matchQuery("skill", "计算机"));boolQueryBuilder.must(QueryBuilders.matchQuery("skill", "通信"));SearchSourceBuilder builder = new SearchSourceBuilder();builder.query(boolQueryBuilder);request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果for (SearchHit hit : resp.getHits().getHits()) {System.out.println(hit.getSourceAsMap()); }}
3.2 boosting查询
查询时的score的计算方式:
1,搜索的关键字在文档中出现的频次越多,score越高
2,指定的文档越短,score越高
3,搜索时指定的关键字也会被分词,被分词库匹配的内容越多,score越高
boosting查询可以影响查询后的score
positive: 只有匹配positive的查询内容,才会被放在返回结果集中negative: 如果匹配上了positive并且匹配上negative 才能降低文档的scorenegative_boost: 指定系数,小于1,将匹配positive和negative的document的score乘以negative_boot
DSL:
查询有计算机技能的人,能力B的score降低30%
POST /index_name/type_name/_search
{"query" : {"boosting" : {"positive" : {"match" : {"skill" : "计算机"}},"negative" : {"match" : {"power" : "B"} },"negative_boot" : 0.3}}
}
Java:
@Testpublic void boostingQuery() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();BoostingQueryBuilder boostingQueryBuilder = QueryBuilders.boostingQuery(QueryBuilders.matchQuery("skill", "计算机"),QueryBuilders.matchQuery("power", "B")).negativeBoost(0.3f);builder.query(boostingQueryBuilder);request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果for (SearchHit hit : resp.getHits().getHits()) {System.out.println(hit.getSourceAsMap());}}
4 filter查询
不关注匹配度score的情况下,使用filter过滤器查询有更高的查询效率
通常使用的query查询,根据查询条件,去计算文档的匹配度得到一个分数,并根据分数进行排序,不会做缓存
filter过滤查询,根据查询条件去查询文档,不计算分数,而且filter查询会对经常被过滤出来的数据进行缓存
DSL:
filter查询不关注score,查到的document的score为0,但效率更高
POST /index_name/type_name/_search
{"query" : {"bool" : {"filter" : [{"term" : {"name" : "北京"}},{"range" : {"age" : {"lt" : 18}}}]}}
}
Java:
@Testpublic void filter() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();boolQueryBuilder.filter(QueryBuilders.termQuery("name", "北京"));boolQueryBuilder.filter(QueryBuilders.rangeQuery("age").lt(18));builder.query(boolQueryBuilder);request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果for(SearchHit hit : resp.getHits().getHits()) {System.out.println(hit.getSourceAsMap());}}
5 高亮查询
高亮展示的数据,是document中的一个field
DSL:
POST /index_name/type_name/_search
{"query" : {"match" : {"name" : "北京"}},"highlight" : {"fields" : { // fields中指定要展示的高亮属性"name" : {}},"pre_tags" : "<font color = 'red'>", // 指定前缀标签"post_tags" : "</font>", // 指定后缀标签"fragment_size" : 100 // 指定高亮数据最多展示100个字符}
}
Java:
@Testpublic void highlight() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();builder.query(QueryBuilders.matchQuery("name", "北京"));HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("name", 10).preTags("<font color = 'red'>").postTags("</font>");builder.highlighter(highlightBuilder);request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果for(SearchHit hit : resp.getHits().getHits()) {System.out.println(hit.getHighlightFields().get("name"));}}
6 聚合查询
es聚合查询语法:
POST /index_name/type_name/_search
{"aggs" : {"agg" : {"agg_type" :{"属性" : "值"}}}
}
6.1 去重计数查询cardinality
查询到结果后,先将返回的文档中按照一个指定的field进行去重,并统计去重后文档个数
DSL:
去重计数查询,索引中有几种城市
POST /index_name/type_name/_search
{"aggs" : {"agg" : {"cardinality" : {"field" : "city"}}}
}
Java:
@Testpublic void cardinality() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();builder.aggregation(AggregationBuilders.cardinality("agg").field("city"));request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果Cardinality agg = resp.getAggregations().get("agg");System.out.println(agg.getValue());}
6.2 范围统计range
统计一定范围内出现的文档个数,如针对某个field的值在0~100间文档出现的个数是多少
范围统计不仅针对普通数值(range),还可以针对时间类型(date_range),ip类型(ip_range)
DSL:
查询年龄在[18-60)和[70,...的peopel
POST /index_name/type_name/_search
{"aggs" : {"agg" : {"range" : {"field" : "age","ranges" : [{"from" : 18,"to" : 60},{"from" : 70}]}}}
}
Java:
@Testpublic void range() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();builder.aggregation(AggregationBuilders.range("agg").field("age").addRange(18, 60).addUnboundedFrom(70));request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果Range agg = resp.getAggregations().get("agg");for (Range.Bucket bucket : agg.getBuckets()) {String key = bucket.getKeyAsString();Object from = bucket.getFrom();Object to = bucket.getTo();long docCount = bucket.getDocCount();System.out.println(key + from + to);}}
6.3 统计聚合查询extended_stats
查询指定field的最大值,最小值,平均值…
DSL:
POST /index_name/type_name/_search
{"aggs" : {"agg" : {"extended_stats" : {"field" : "age"}}}
}
Java:
@Testpublic void extendedStats() throws IOException {// 创建SearchRequestSearchRequest request = new SearchRequest(index);request.types(type);// 指定查询条件SearchSourceBuilder builder = new SearchSourceBuilder();builder.aggregation(AggregationBuilders.extendedStats("agg").field("age"));request.source(builder);// 执行查询SearchResponse resp = client.search(request, RequestOptions.DEFAULT);// 输出结果ExtendedStats agg = resp.getAggregations().get("agg");System.out.println(agg.getMax());System.out.println(agg.getMin());System.out.println(agg.getSum());}