Elasticsearch Upgrade: 1.7 to 7.1
Version: 1.7 → 7.1
Relevant Github Repos/Branches:
- Removal of Tosca/Figaro user interfaces in favor of hysds_ui
Big Changes
PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)
- Only 1 type allowed in each index: _doc
- Need to manually enable all field text searches
- Removal of
filtered
since ES 5.0 - Split
string
intotext
andkeyword
- text allows for more searching capabilities documentation
- keyword allows for aggregation, etc. Documentation
fielddata: true
is the mapping allows for sorting (but we'll sort on thekeyword
instead): Documentation- Support for z coordinate in
geoshapes
: documentation- it wont affect searches but adds more flexibility in location data
_default
_ mapping deprecated in ES 6.0.0 (Link)- workaround is using
index templates
: (Documentation)
- workaround is using
Changes in the geo coordinates query
Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2
{ "query": { "bool": { "filter": { "geo_shape": { "location": { "shape": { "type": "polygon", "coordinates": [[<coordinates>]] }, "relation": "within" } } } } } }
Changes in Percolator
- Removal of
.percolator
type Documentation- Instead a percolator field type must be configured prior to indexing percolator queries
- Complete overhaul in the percolate index mapping
- Removal of
Removal of _all: { "enabled": true } type in indices so we cannot search for all fields
workaround is adding copy_to in field mapping, especially in dynamic templating
- Does not work with multi-fields
"random_field_name": { "type": "keyword", "ignore_above": 256, "copy_to": "all_text_fields", # DOES WORK "fields": { "keyword": { "type": "text" "copy_to": "all_text_fields" # DOES NOT WORK } } }
Proper mapping with text fields
"random_field_name": { "type": "text", "copy_to": "all_text_fields" "fields": { "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI "ignore_above": 256 } } }
- Need to add the
copy_to
field mapping"all_text_fields": { "type": "text" }
General changes to the mapping
create example mapping called grq_v1.1_s1-iw_slc
copied example data into new ES index, using built in dynamic mapping to build initial mapping
mapping changes:
metadata.context to {"type": "object", "enabled": false}
properties.location to {"type": "geo_shape","tree": "quadtree"}
use type keyword to be able to use msearch:
"reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."
Changes to query_string
- removal of escaping literal double quotes in query_string
- old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D
{ "query": { "query_string": { "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"", "default_operator": "OR" } } }
- new query_string returns equivalent document, requires wildcard * at the beginning and end of string
{ "query": { "query_string": { "default_field": "all_text_fields", "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*", "default_operator": "OR" } } }
- i dont think date searches really changed much
{ "query": { "query_string": { "query": "starttime: [2019-01-01 TO 2019-01-31]", "default_operator": "OR" } } }
- can combine different fields as well
{ "query": { "query_string": { "fields": ["all_text_fields", "all_date_fields"], "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*", "default_operator": "OR" } } }
Removal of
search_type=scan
- https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html
- NOTE: must clear _
scroll_id
after using the scroll API to pull data- Will return error is
_scroll_id
's not cleared query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
- Will return error is
Requires changes in our HySDS code, wherever it uses
search_type=scan
curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100 { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "No search type for [scan]" } ], "type": "illegal_argument_exception", "reason": "No search type for [scan]" }, "status": 400 } # removing search_type=scan from the endpoint fixes this problem curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100 { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==", "took": 34, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] } }
Removal of filtered: Link and Guide
- deprecated in version 5.x, move all logic to
query
andbool
and
,or
,not
changed tomust
should
andmust_not
- if using should, will need to add
minimum_should_match: 1
- if using should, will need to add
# from this: { "filtered": { "filter": { "and": [ { "match": { "tags": "ISL" } }, { "range": { "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"} } }, { "range": { "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"} } } ] } } } # change to this: { "query": { "bool": { "must": [ { "match": { "tags": "ISL" } } ], "filter": [ { "range": { "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"} } }, { "range": { "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"} } } ] } } }
- deprecated in version 5.x, move all logic to
Changes to Logstash
- Mozart streams data to elasticsearch through Logstash
Changes to logstash 7.1
- time long are read as
int
instead ofdate_epoch_milis
- need to convert and split the string, and removal the decimal places
- removal of
flush_size
, was originally set to 1- will need to set settings in
logstash.yml
- Link to forum
- will need to set settings in
- time long are read as
- https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart
input { redis { host => "{{ MOZART_REDIS_PVT_IP }}" {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %} # these settings should match the output of the agent data_type => "list" key => "logstash" # We use the 'msgpack' codec here because we expect to read # msgpack events from redis. codec => msgpack } } filter { if [resource] in ["worker", "task"] { mutate { convert => { "[event][timestamp]" => "string" "[event][local_received]" => "string" } split => ["[event][timestamp]", "."] split => ["[event][local_received]", "."] add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ] add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ] remove_field => ["[event][timestamp]", "[event][local_received]"] } mutate { rename => { "[event][timestamp_new]" => "timestamp" } rename => { "[event][local_received_new]" => "local_received" } } } } output { #stdout { codec => rubydebug } if [resource] == "job" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "job_status-current" document_id => "%{payload_id}" template => "{{ OPS_HOME }}/mozart/etc/job_status.template" template_name => "job_status" } } else if [resource] == "worker" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "worker_status-current" document_id => "%{celery_hostname}" template => "{{ OPS_HOME }}/mozart/etc/worker_status.template" template_name => "worker_status" } } else if [resource] == "task" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "task_status-current" document_id => "%{uuid}" template => "{{ OPS_HOME }}/mozart/etc/task_status.template" template_name => "task_status" } } else if [resource] == "event" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "event_status-current" document_id => "%{uuid}" template => "{{ OPS_HOME }}/mozart/etc/event_status.template" template_name => "event_status" } } else {} }
Running Elasticsearch 7 on EC2 instance
In order to expose port 0.0.0.0 properly, we need to edit the config/elasticsearch.yml file
network.host: 0.0.0.0 cluster.name: grq_cluster node.name: ESNODE_CYR node.master: true node.data: true transport.host: localhost transport.tcp.port: 9300 http.port: 9200 discovery.zen.minimum_master_nodes: 2 # allows UI to talk to elasticsearch (in production we would put the actual hostname of the uI) http.cors.enabled : true http.cors.allow-origin: "*"
Running Kibana on EC2 instance
Install Kibana in command line
curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.1.1-darwin-x86_64.tar.gz tar -xzf kibana-7.1.1-darwin-x86_64.tar.gz cd kibana-7.1.1-darwin-x86_64/
Edit the config/kibana.yml file to expose host 0.0.0.0
server.host: 0.0.0.0
Index Template
So that every index created automatically follows this template for its mapping
- grq2
_default_
mapping template Link - python code to create the index template: Link
- Documentation
{ "order": 0, "index_patterns": [ "{{ prefix }}_*" ], "settings": { "index.refresh_interval": "5s", "analysis": { "analyzer": { "default": { "filter": [ "standard", "lowercase", "word_delimiter" ], "tokenizer": "keyword" } } } }, "mappings": { "dynamic_templates": [ { "integers": { "match_mapping_type": "long", "mapping": { "type": "integer" } } }, { "strings": { "match_mapping_type": "string", "mapping": { "norms": false, "type": "text", "copy_to": "all_text_fields", "fields": { "raw": { "type": "keyword", "ignore_above": 256 } } }, "match": "*" } } ], "properties": { "browse_urls": { "type": "text", "copy_to": "all_text_fields" }, "urls": { "type": "text", "copy_to": "all_text_fields" }, "location": { "tree": "quadtree", "type": "geo_shape" }, "center": { "tree": "quadtree", "type": "geo_shape" }, "starttime": { "type": "date" }, "endtime": { "type": "date" }, "creation_timestamp": { "type": "date" }, "metadata": { "properties": { "context": { "type": "object", "enabled": false } } }, "prov": { "properties": { "wasDerivedFrom": { "type": "keyword" }, "wasGeneratedBy": { "type": "keyword" } } }, "all_text_fields": { "type": "text" } } }, "aliases": { "{{ alias }}": {} } }
Percolator
Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)
User Rules (Documentation for user rules triggering)
mapping added in mozart server
/home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping
- python code to create the
user_rules
index: Link - Mapping template for
user_rules
index Link # PUT user_rules { "mappings": { "properties": { "creation_time": { "type": "date" }, "enabled": { "type": "boolean", "null_value": true }, "job_type": { "type": "keyword" }, "kwargs": { "type": "keyword" }, "modification_time": { "type": "date" }, "modified_time": { "type": "date" }, "passthru_query": { "type": "boolean" }, "priority": { "type": "long" }, "query": { "type": "object", "enabled": false }, "query_all": { "type": "boolean" }, "query_string": { "type": "text" }, "queue": { "type": "text" }, "rule_name": { "type": "keyword" }, "username": { "type": "keyword" }, "workflow": { "type": "keyword" } } } }
hysds_ios Index
Github Link to template.json: Link
- Python code to create
hysds_ios
index template: Link - Follow HySDS and Job-Spec documentation for Jenkins build Link
{ "order": 0, "template": "{{ index }}", "settings": { "index.refresh_interval": "5s", "analysis": { "analyzer": { "default": { "filter": [ "standard", "lowercase", "word_delimiter" ], "tokenizer": "keyword" } } } }, "mappings": { "dynamic_templates": [ { "integers": { "match_mapping_type": "long", "mapping": { "type": "integer" } } }, { "strings": { "match_mapping_type": "string", "mapping": { "norms": false, "type": "text", "copy_to": "all_text_fields", "fields": { "raw": { "type": "keyword", "ignore_above": 256 } } }, "match": "*" } } ], "properties": { "_timestamp": { "type": "date", "store": true } } } }