Elasticsearch upgrade: 7.9 to 7.10

Useful links:

Changes in Elasticsearch 7.10

  • benchmarks have reported space reductions of up to 10% using a new stored field compression

  • search.max_buckets now defaults to 10,000

  • Cluster-wide shard soft limit

    • Clusters now have soft limits on the total number of open shards in the cluster based on the number of nodes and the cluster.max_shards_per_node cluster setting

    • may need to reduce the # of shards in settings in case we are using a single node cluster

  • : is no longer allowed in index name

    • Due to cross-cluster search using : to separate a cluster and index name, index names may no longer contain :

  • Deprecated geo_shape parameters

    • The following type parameters are deprecated for the geo_shape field type: tree, precision, tree_levels, distance_error_pct, points_only, and strategy. They will be removed in a future version.

  • Point-in-time (PIT) queries:

    • lightweight deep pagination

    • Elasticsearch is moving away from the scroll API

      • no more dealing with 500 active scroll_id's at a time

    • PIT queries gives a view into the state of the data when it was initiated

      • refreshes will not affect the data

    • # must be opened explicitly POST /my-index-000001/_pit?keep_alive=1m # executing the query POST /_search { "size": 100, "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } # deleting the PIT query DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" }
  • this is Elasticsearch's future plan for deep pagination so we should change the query method in hysds_common's ElasticsearchUtility class
    - if version < 7.10, use scroll API, if version >= 7.10, use search after API
    - this will allow for backwards compatibility with HySDS + different versions of ES

Level-of-effort:

  • (MEDIUM) creating "snapshot" of ec2 instance with elasticsearch 7.10 + geonames index

    • we have a snapshot of geonames in s3 saved so restoring it will be fast and straight forward

    • no need to populate the geonames index manually, can take 2+ days

    • coordinate with Susan to get this done

  • (MEDIUM, OPTIONAL) changes to hysds_common's query method:

    • if version < 7.10, use scroll API

    • if version >= 7.10, use point-in-time API
      -

      from packaging import version version.parse("7.9.3") >= version.parse("7.10") # False
  • (LOW) sanity testing NISAR & SWOT with Elasticsearch 7.10

    • NISAR: dev-e2e forward & reprocessing, real-pge, create & restore snapshot

    • SWOT: need consult from @Michael Cayanan

  • (LOW) fix NISAR e2e test with AWS Elasticsearch

 

Note: JPL employees can also get answers to HySDS questions at Stack Overflow Enterprise: