鼎鼎小筑

ES数据类型错误问题解决

2017/10/21 Share

最近sentry频现一个报错:

TransportError(500, ‘SearchPhaseExecutionException[Failed to execute phase [query], all shards failed; shardFailures {[sTjRYf4YTxS9dwBIKOmpSQ][tale][0]: RemoteTransportException[[bj2-storm03][inet[/10.81.10.15:9300]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[tale][0]: query[filtered(ConstantScore(BooleanFilter(+cache(is_frozen: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x00) +cache(sea_type: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x06))))->cache(_type:lead)],from[0],size[20],sort[<custom:”update_time”: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@4cb3edbd>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; }{[nzPbm4moSmafvJAPhBZxFg][tale][1]: QueryPhaseExecutionException[[tale][1]: query[filtered(ConstantScore(BooleanFilter(+cache(is_frozen: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x00) +cache(sea_type: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x06))))->cache(_type:lead)],from[0],size[20],sort[<custom:”update_time”: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@e7463d2>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; }{[sTjRYf4YTxS9dwBIKOmpSQ][tale][2]: RemoteTransportException[[bj2-storm03][inet[/10.81.10.15:9300]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[tale][2]: query[filtered(ConstantScore(BooleanFilter(+cache(is_frozen: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x00) +cache(sea_type: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x06))))->cache(_type:lead)],from[0],size[20],sort[<custom:”update_time”: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@62817996>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; }{[sTjRYf4YTxS9dwBIKOmpSQ][tale][3]: RemoteTransportException[[bj2-storm03][inet[/10.81.10.15:9300]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[tale][3]: query[filtered(ConstantScore(BooleanFilter(+cache(is_frozen: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x00) +cache(sea_type: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x06))))->cache(_type:lead)],from[0],size[20],sort[<custom:”update_time”: org.elasticsearch.index.fielddata.fieldcomparator.LongValuesComparatorSource@7de52fa4>!]: Query Failed [Failed to execute main query]]; nested: ElasticsearchException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: UncheckedExecutionException[java.lang.NumberFormatException: Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; nested: NumberFormatException[Invalid shift value in prefixCoded bytes (is encoded value really an INT?)]; }{[sTjRYf4YTxS9dwBIKOmpSQ][tale][4]: RemoteTransportException[[bj2-storm03][inet[/10.81.10.15:9300]][indices:data/read/search[phase/query]]]; nested: QueryPhaseExecutionException[[tale][4]: query[filtered(ConstantScore(BooleanFilter(+cache(is_frozen: \x01\x00\x00\x00\x00\x00\x00\x00\x00\x00…

解决过程如下:

  1. 错误发生场景:
    • 搜索lead,并通过“最新更新时间”、“最早更新时间”排序时
  2. 解决步骤:

    1. 根据报错推断,应该是update_time数据格式或类型出现异常

      • 查看mapping,发现3个doc有update_time字段,其中两个是integer,一个是long,而ES要求同名field的type必须一致。
        幸亏long类型的field竟然没有实际使用,所以以为会愉快的解决问题————直接删除这个field~,悲剧的是,同样的错误继续报。。
    2. ES中删除数据时并不是真的删除数据,只是将数据的found标记为false,当下次搜索时,那些found被标记为false的数据还是会被搜索到,只是ES呈现给我们结果时将这部分数据剔除掉,所以继续报相同的错误是理所当然;

    3. ES中的数据是存储在一个个segment中的,ES在后台会根据Lucene的合并规则定期进行segment merging操作,被标记删除的数据在segment合并时才会真正删除掉。如果要马上解决这个报错的话,需要手动强制进行segment merging操作(curl -XPOST “http://localhost:9200/tale/_optimize?only_expunge_deletes=true①&wait_for_completion=true②”),

    4. 但是这个操作据说极耗CPU和磁盘I/O,所以在下班后做了该操作,负载确实较之前高了10倍,且整个过程持续了22分钟。至此解决。

参考:

https://www.elastic.co/guide/en/elasticsearch/reference/1.7/indices-optimize.html#optimize-parameters
https://github.com/elastic/elasticsearch/issues/9638

CATALOG
  1. 1. 最近sentry频现一个报错: