Elasticsearch upgrade: 7.9 to 7.10
Useful links:
https://www.elastic.co/blog/whats-new-elasticsearch-7-10-0-searchable-snapshots-store-more-for-less
https://www.elastic.co/guide/en/elasticsearch/reference/7.10/breaking-changes-7.0.html
https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html
Changes in Elasticsearch 7.10
benchmarks have reported space reductions of up to 10% using a new stored field compression
search.max_buckets
now defaults to10,000
will severely limit and affect aggregation queries in PCM code
we will have to switch to composite aggregation beforehand (NSDS-2245)
Cluster-wide shard soft limit
Clusters now have soft limits on the total number of open shards in the cluster based on the number of nodes and the
cluster.max_shards_per_node
cluster settingmay need to reduce the # of shards in settings in case we are using a single node cluster
:
is no longer allowed in index nameDue to cross-cluster search using : to separate a cluster and index name, index names may no longer contain
:
Deprecated
geo_shape
parametersThe following type parameters are deprecated for the
geo_shape
field type:tree
,precision
,tree_levels
,distance_error_pct
,points_only
, and strategy. They will be removed in a future version.
Point-in-time (PIT) queries:
lightweight deep pagination
Elasticsearch is moving away from the scroll API
no more dealing with 500 active
scroll_id
's at a time
PIT queries gives a view into the state of the data when it was initiated
refreshes will not affect the data
# must be opened explicitly POST /my-index-000001/_pit?keep_alive=1m # executing the query POST /_search { "size": 100, "query": { "match" : { "title" : "elasticsearch" } }, "pit": { "id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==", "keep_alive": "1m" } } # deleting the PIT query DELETE /_pit { "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA=" }
this is Elasticsearch's future plan for deep pagination so we should change the
query
method inhysds_common
'sElasticsearchUtility
class
- if version < 7.10, use scroll API, if version >= 7.10, use search after API
- this will allow for backwards compatibility with HySDS + different versions of ES
Level-of-effort:
(MEDIUM) creating "snapshot" of ec2 instance with elasticsearch 7.10 + geonames index
we have a snapshot of geonames in s3 saved so restoring it will be fast and straight forward
no need to populate the geonames index manually, can take 2+ days
coordinate with Susan to get this done
(MEDIUM, OPTIONAL) changes to hysds_common's
query
method:if version < 7.10, use scroll API
if version >= 7.10, use point-in-time API
- https://packaging.pypa.io/en/latest/version.html#packaging.version.parsefrom packaging import version version.parse("7.9.3") >= version.parse("7.10") # False
(LOW) sanity testing NISAR & SWOT with Elasticsearch 7.10
NISAR:
dev-e2e
forward & reprocessing,real-pge
, create & restore snapshotSWOT: need consult from @Michael Cayanan
(LOW) fix NISAR e2e test with AWS Elasticsearch