...
...
...
...
...
...
Version: 1.7 → 7.1
Relevant Github Repos/Branches:
- Removal of Tosca/Figaro user interfaces in favor of hysds_ui
Big Changes
PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)
- Only 1 type allowed in each index: _doc
- Need to manually enable all field text searches
- Removal of
filtered
since ES 5.0 - Split
string
intotext
andkeyword
- text allows for more searching capabilities documentation
- keyword allows for aggregation, etc. Documentation
fielddata: true
is the mapping allows for sorting (but we'll sort on thekeyword
instead): Documentation- Support for z coordinate in
geoshapes
: documentation- it wont affect searches but adds more flexibility in location data
_default
_ mapping deprecated in ES 6.0.0 (Link)- workaround is using
index templates
: (Documentation)
- workaround is using
Changes in the geo coordinates query
Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2
Code Block { "query": { "bool": { "filter": { "geo_shape": { "location": { "shape": { "type": "polygon", "coordinates": [[<coordinates>]] }, "relation": "within" } } } } } }
Changes in Percolator
- Removal of
.percolator
type Documentation- Instead a percolator field type must be configured prior to indexing percolator queries
- Complete overhaul in the percolate index mapping
- Removal of
Removal of _all: { "enabled": true } type in indices so we cannot search for all fields
workaround is adding copy_to in field mapping, especially in dynamic templating
- Does not work with multi-fields
Code Block "random_field_name": { "type": "keyword", "ignore_above": 256, "copy_to": "all_text_fields", # DOES WORK "fields": { "keyword": { "type": "text" "copy_to": "all_text_fields" # DOES NOT WORK } } }
Proper mapping with text fields
Code Block "random_field_name": { "type": "text", "copy_to": "all_text_fields" "fields": { "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI "ignore_above": 256 } } }
- Need to add the
copy_to
field mappingCode Block "all_text_fields": { "type": "text" }
General changes to the mapping
create example mapping called grq_v1.1_s1-iw_slc
copied example data into new ES index, using built in dynamic mapping to build initial mapping
mapping changes:
metadata.context to {"type": "object", "enabled": false}
properties.location to {"type": "geo_shape","tree": "quadtree"}
use type keyword to be able to use msearch:
Code Block "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."
Changes to query_string
- removal of escaping literal double quotes in query_string
- old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D
Code Block { "query": { "query_string": { "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"", "default_operator": "OR" } } }
- new query_string returns equivalent document, requires wildcard * at the beginning and end of string
Code Block { "query": { "query_string": { "default_field": "all_text_fields", "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*", "default_operator": "OR" } } }
- i dont think date searches really changed much
Code Block { "query": { "query_string": { "query": "starttime: [2019-01-01 TO 2019-01-31]", "default_operator": "OR" } } }
- can combine different fields as well
Code Block { "query": { "query_string": { "fields": ["all_text_fields", "all_date_fields"], "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*", "default_operator": "OR" } } }
Removal of
search_type=scan
- https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html
- NOTE: must clear _
scroll_id
after using the scroll API to pull data- Will return error is
_scroll_id
's not cleared Code Block query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.
- Will return error is
Requires changes in our HySDS code, wherever it uses
search_type=scan
Code Block curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100 { "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "No search type for [scan]" } ], "type": "illegal_argument_exception", "reason": "No search type for [scan]" }, "status": 400 } # removing search_type=scan from the endpoint fixes this problem curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100 { "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==", "took": 34, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 0, "relation": "eq" }, "max_score": null, "hits": [] } }
Removal of filtered: Link and Guide
- deprecated in version 5.x, move all logic to
query
andbool
and
,or
,not
changed tomust
should
andmust_not
- if using should, will need to add
minimum_should_match: 1
- if using should, will need to add
Code Block # from this: { "filtered": { "filter": { "and": [ { "match": { "tags": "ISL" } }, { "range": { "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"} } }, { "range": { "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"} } } ] } } } # change to this: { "query": { "bool": { "must": [ { "match": { "tags": "ISL" } } ], "filter": [ { "range": { "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"} } }, { "range": { "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"} } } ] } } }
- deprecated in version 5.x, move all logic to
Changes to Logstash
- Mozart streams data to elasticsearch through Logstash
Changes to logstash 7.1
- time long are read as
int
instead ofdate_epoch_milis
- need to convert and split the string, and removal the decimal places
- removal of
flush_size
, was originally set to 1- will need to set settings in
logstash.yml
- Link to forum
- will need to set settings in
- time long are read as
- https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart
Code Block input { redis { host => "{{ MOZART_REDIS_PVT_IP }}" {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %} # these settings should match the output of the agent data_type => "list" key => "logstash" # We use the 'msgpack' codec here because we expect to read # msgpack events from redis. codec => msgpack } } filter { if [resource] in ["worker", "task"] { mutate { convert => { "[event][timestamp]" => "string" "[event][local_received]" => "string" } split => ["[event][timestamp]", "."] split => ["[event][local_received]", "."] add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ] add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ] remove_field => ["[event][timestamp]", "[event][local_received]"] } mutate { rename => { "[event][timestamp_new]" => "timestamp" } rename => { "[event][local_received_new]" => "local_received" } } } } output { #stdout { codec => rubydebug } if [resource] == "job" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "job_status-current" document_id => "%{payload_id}" template => "{{ OPS_HOME }}/mozart/etc/job_status.template" template_name => "job_status" } } else if [resource] == "worker" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "worker_status-current" document_id => "%{celery_hostname}" template => "{{ OPS_HOME }}/mozart/etc/worker_status.template" template_name => "worker_status" } } else if [resource] == "task" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "task_status-current" document_id => "%{uuid}" template => "{{ OPS_HOME }}/mozart/etc/task_status.template" template_name => "task_status" } } else if [resource] == "event" { elasticsearch { hosts => ["{{ MOZART_ES_PVT_IP }}:9200"] index => "event_status-current" document_id => "%{uuid}" template => "{{ OPS_HOME }}/mozart/etc/event_status.template" template_name => "event_status" } } else {} }
Running Elasticsearch 7 on EC2 instance
In order to expose port 0.0.0.0 properly, we need to edit the config/elasticsearch.yml file
Code Block |
---|
network.host: 0.0.0.0 cluster.name: grq_cluster node.name: ESNODE_CYR node.master: true node.data: true transport.host: localhost transport.tcp.port: 9300 http.port: 9200 discovery.zen.minimum_master_nodes: 2 # allows UI to talk to elasticsearch (in production we would put the actual hostname of the uI) http.cors.enabled : true http.cors.allow-origin: "*" |
Running Kibana on EC2 instance
Install Kibana in command line
...
Code Block |
---|
server.host: 0.0.0.0 |
Index Template
So that every index created automatically follows this template for its mapping
...
Code Block |
---|
{ "order": 0, "index_patterns": [ "{{ prefix }}_*" ], "settings": { "index.refresh_interval": "5s", "analysis": { "analyzer": { "default": { "filter": [ "standard", "lowercase", "word_delimiter" ], "tokenizer": "keyword" } } } }, "mappings": { "dynamic_templates": [ { "integers": { "match_mapping_type": "long", "mapping": { "type": "integer" } } }, { "strings": { "match_mapping_type": "string", "mapping": { "norms": false, "type": "text", "copy_to": "all_text_fields", "fields": { "raw": { "type": "keyword", "ignore_above": 256 } } }, "match": "*" } } ], "properties": { "browse_urls": { "type": "text", "copy_to": "all_text_fields" }, "urls": { "type": "text", "copy_to": "all_text_fields" }, "location": { "tree": "quadtree", "type": "geo_shape" }, "center": { "tree": "quadtree", "type": "geo_shape" }, "starttime": { "type": "date" }, "endtime": { "type": "date" }, "creation_timestamp": { "type": "date" }, "metadata": { "properties": { "context": { "type": "object", "enabled": false } } }, "prov": { "properties": { "wasDerivedFrom": { "type": "keyword" }, "wasGeneratedBy": { "type": "keyword" } } }, "all_text_fields": { "type": "text" } } }, "aliases": { "{{ alias }}": {} } } |
Percolator
Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)
...
mapping added in mozart server
/home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping
- python code to create the
user_rules
index: Link - Mapping template for
user_rules
index Link Code Block # PUT user_rules { "mappings": { "properties": { "creation_time": { "type": "date" }, "enabled": { "type": "boolean", "null_value": true }, "job_type": { "type": "keyword" }, "kwargs": { "type": "keyword" }, "modification_time": { "type": "date" }, "modified_time": { "type": "date" }, "passthru_query": { "type": "boolean" }, "priority": { "type": "long" }, "query": { "type": "object", "enabled": false }, "query_all": { "type": "boolean" }, "query_string": { "type": "text" }, "queue": { "type": "text" }, "rule_name": { "type": "keyword" }, "username": { "type": "keyword" }, "workflow": { "type": "keyword" } } } }
hysds_ios Index
Github Link to template.json: Link
...