elasticsearch-search

中间件

发布日期: 2022-12-06

Elasticsearch-search知识点。

Elasticsearch-search知识点

search结果分析

GET _search

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      }
    ]
  }
}

took：整个搜索请求花费了1毫秒

hits.total：本次搜索，返回了1条结果（es官方默认限制索引查询最多只能查询10000条数据）

hits.max_score：本次搜索的所有结果中，最大的相关度分数是多少

hits.hits：默认查询前10条数据，完整数据，按_score降序排序

shards：这次查询路由到的primary shard和replica shard

timeout：默认无timeout，可以手动指定timeout，走timeout查询执行机制

search的timeout机制

有些搜索应用对时间是很敏感的，比如说我们的电商网站，你不能说让用户等10分钟，才能等到一次搜索请求的结果，如果那样的话人家早走了，不来买东西了。

而timeout机制，就是指定每个shard只能在timeout时间范围内将搜索到的部分数据(也可能全都搜索到了)直接返回给client，而不是等所有的数据全部搜索出来以后再返回。

比如，有2个shard，每个shard要搜索出来1000条数据需要1分钟，此时我们指定timeout=10ms，每个shard运行到10ms的时候可能就搜索出10条，那么这个请求本来应该在1分钟后总共拿到2000条数据，但是指定timeout后，就会在10ms拿到20条数据返回给客户端。

这样就可以确保一次搜索请求可以在指定的timeout时长内完成。为一些时间敏感的搜索应用提供良好的支持。

timeout的语法：

GET /_search?timeout=10m

timeout的单位：ms(毫秒),s(秒),m(分钟)

multi-index&multi-type搜索模式

(1) /_search：所有索引，所有type下的所有数据都搜索出来
(2) /index1/_search：指定一个index，搜索其下所有type的数据
(3) /index1,index2/_search：同时搜索两个index下的数据
(4) /*1,*2/_search：按照通配符去匹配多个索引
(5) /index1/type1/_search：搜索一个index下指定的type的数据
(6) /index1/type1,type2/_search：可以搜索一个index下多个type的数据
(7) /index1,index2/type1,type2/_search：搜索多个index下的多个type的数据
(8) /_all/type1,type2/_search：_all，可以代表搜索所有index下的指定type的数据

备注：中间不要带空格

GET /rrc,rrs/index1,user/_search

{
  "error": {
    "root_cause": [
      {
        "type": "index_not_found_exception",
        "reason": "no such index",
        "resource.type": "index_or_alias",
        "resource.id": "rrs",
        "index_uuid": "_na_",
        "index": "rrs"
      }
    ],
    "type": "index_not_found_exception",
    "reason": "no such index",
    "resource.type": "index_or_alias",
    "resource.id": "rrs",
    "index_uuid": "_na_",
    "index": "rrs"
  },
  "status": 404
}

# ?怎么和老师讲的有点出入，百度中：https://stackoverflow.com/questions/45461096/elasticsearch-returns-404-while-multi-index-multi-type-search
GET /rrc,index2/user,type2/_search?ignore_unavailable

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "rrc",
        "_type": "user",
        "_id": "9",
        "_score": 1,
        "_source": {
          "name": "tiedan",
          "price": 1764
        }
      }
    ]
  }
}

ES分页

GET _search
"hits": {
    "total": 17,
    "max_score": 1
}

# 作分页查询
GET _search?from=0&size=1
    
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 21,
    "successful": 21,
    "failed": 0
  },
  "hits": {
    "total": 17,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      }
    ]
  }
}

关注深度分页问题，是一个归并排序的过程。

query string基础语法

q=field:search content的语法，还有一个是掌握+和-的含义

GET /test_index/test_type/_search?q=test_field:test12

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 0.2876821,
        "_source": {
          "test_field": "test12"
        }
      }
    ]
  }
}


GET /test_index/test_type/_search?q=+test_field:test12

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "12",
        "_score": 0.2876821,
        "_source": {
          "test_field": "test12"
        }
      }
    ]
  }
}

GET /test_index/test_type/_search?q=-test_field:test12
    
    
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "test_field": "replaced test2"
        }
      }
    ]
  }
}

_all metadata的原理和作用

GET /test_index/test_type/_search?q=test

直接可以搜索所有的field，任意一个field包含指定的关键字就可以搜索出来。我们在进行中搜索的时候，难道是对document中的每一个field都进行一次搜索吗？不是的

es中的_all元数据，在建立索引的时候，我们插入一条document，它里面包含了多个field，此时，es会自动将多个field的值，全部用字符串的方式串联起来，变成一个长的字符串，作为_all field的值，同时建立索引

后面如果在搜索的时候，没有对某个field指定搜索，就默认搜索_all field，其中是包含了所有field的值的

举个例子

{
  "name": "jack",
  "age": 26,
  "email": "jack@sina.com",
  "address": "guamgzhou"
}

"jack 26 jack@sina.com guangzhou"，作为这一条document的_all field的值，同时进行分词后建立对应的倒排索引

生产环境不使用

mapping

准备数据：

PUT /website/article/1
{
  "post_date": "2017-01-01",
  "title": "my first article",
  "content": "this is my first article in this website",
  "author_id": 11400
}

PUT /website/article/2
{
  "post_date": "2017-01-02",
  "title": "my second article",
  "content": "this is my second article in this website",
  "author_id": 11400
}

PUT /website/article/3
{
  "post_date": "2017-01-03",
  "title": "my third article",
  "content": "this is my third article in this website",
  "author_id": 11400
}

尝试各种搜索

GET /website/article/_search?q=2017 3条结果
GET /website/article/_search?q=2017-01-01 3条结果
GET /website/article/_search?q=post_date:2017-01-01 1条结果
GET /website/article/_search?q=post_date:2017 1条结果

在上面的准备数据过程中，ES使用dynamic mapping，自动为我们建立index，创建type，以及type对应的mapping，mapping中包含了每个field对应的数据类型，以及如何分词等设置
我们当然，后面会讲解，也可以手动在创建数据之前，先创建index和type，以及type对应的mapping

GET /website/_mapping/article
    
{
  "website": {
    "mappings": {
      "article": {
        "properties": {
          "author_id": {
            "type": "long"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "post_date": {
            "type": "date"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}