Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Useful links:

...

  • benchmarks have reported space reductions of up to 10% using a new stored field compression

  • search.max_buckets now defaults to 10,000

  • Cluster-wide shard soft limit

    • Clusters now have soft limits on the total number of open shards in the cluster based on the number of nodes and the cluster.max_shards_per_node cluster setting

    • may need to reduce the # of shards in settings in case we are using a single node cluster

  • : is no longer allowed in index name

    • Due to cross-cluster search using : to separate a cluster and index name, index names may no longer contain :

  • Deprecated geo_shape parameters

    • The following type parameters are deprecated for the geo_shape field type: tree, precision, tree_levels, distance_error_pct, points_only, and strategy. They will be removed in a future version.

  • Point-in-time (PIT) queries:

    • lightweight deep pagination

    • Elasticsearch is moving away from the scroll API

      • no more dealing with 500 active scroll_id's at a time

    • PIT queries gives a view into the state of the data when it was initiated

      • refreshes will not affect the data

    • Code Block
      languagebash
      # must be opened explicitly
      POST /my-index-000001/_pit?keep_alive=1m
      
      # executing the query
      POST /_search 
      {
          "size": 100,
          "query": {
              "match" : {
                  "title" : "elasticsearch"
              }
          },
          "pit": {
              "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", 
              "keep_alive": "1m"  
          }
      }
      
      # deleting the PIT query
      DELETE /_pit
      {
          "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
      }
  • this is Elasticsearch's future plan for deep pagination so we should change the query method in hysds_common's ElasticsearchUtility class
    - if version < 7.10, use scroll API, if version >= 7.10, use search after API
    - this will allow for backwards compatibility with HySDS + different versions of ES

Level-of-effort:

  • (MEDIUM) creating "snapshot" of ec2 instance with elasticsearch 7.10 + geonames index

    • we have a snapshot of geonames in s3 saved so restoring it will be fast and straight forward

    • no need to populate the geonames index manually, can take 2+ days

    • coordinate with Susan to get this done

  • (MEDIUM, OPTIONAL) changes to hysds_common's query method:

  • (LOW) sanity testing NISAR & SWOT with Elasticsearch 7.10

    • NISAR: dev-e2e forward & reprocessing, real-pge, create & restore snapshot

    • SWOT: need consult from Michael Cayanan

  • (LOW) fix NISAR e2e test with AWS Elasticsearch