Version: 1.7 → 7.1

Relevant Github Repos/Branches:

Removal of Tosca/Figaro user interfaces in favor of hysds_ui

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)
Only 1 type allowed in each index: _doc
Need to manually enable all field text searches
Removal of filtered since ES 5.0
- documentation
Split string into text and keyword
- text allows for more searching capabilities documentation
- keyword allows for aggregation, etc. Documentation
fielddata: true is the mapping allows for sorting (but we'll sort on the keyword instead): Documentation
Support for z coordinate in geoshapes: documentation
- it wont affect searches but adds more flexibility in location data
_default_ mapping deprecated in ES 6.0.0 (Link)
- workaround is using index templates: (Documentation)

Changes in the geo coordinates query

Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2

{
  "query": {
    "bool": {
      "filter": {
        "geo_shape": {
          "location": {
            "shape": {
              "type": "polygon",
              "coordinates": [[<coordinates>]]
            },
            "relation": "within"
          }
        }
      }
    }
  }
}

Changes in Percolator
- Removal of .percolator type Documentation
  - Instead a percolator field type must be configured prior to indexing percolator queries
- Complete overhaul in the percolate index mapping
  - New version documentation VS. Old version documentation

Removal of _all: { "enabled": true } type in indices so we cannot search for all fields

workaround is adding copy_to in field mapping, especially in dynamic templating
Link to copy_to documentation

Does not work with multi-fields

"random_field_name": {
  "type": "keyword",
  "ignore_above": 256,
  "copy_to": "all_text_fields", # DOES WORK
  "fields": {
    "keyword": {
      "type": "text"
      "copy_to": "all_text_fields" # DOES NOT WORK
    }
  }
}

Proper mapping with text fields

"random_field_name": {
  "type": "text",
  "copy_to": "all_text_fields"
  "fields": {
    "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices
      "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI
      "ignore_above": 256
    }
  }
}

Need to add the copy_to field mapping
- ```
"all_text_fields": {
  "type": "text"
}
```

General changes to the mapping
- create example mapping called grq_v1.1_s1-iw_slc
- copied example data into new ES index, using built in dynamic mapping to build initial mapping
- mapping changes:
  - metadata.context to {"type": "object", "enabled": false}
    - properties.location to {"type": "geo_shape","tree": "quadtree"}
    - use type keyword to be able to use msearch:
      - "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."

Changes to query_string

removal of escaping literal double quotes in query_string

old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D

{
  "query": {
    "query_string": {
      "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"",
      "default_operator": "OR"
    }
  }
}

new query_string returns equivalent document, requires wildcard * at the beginning and end of string

{
  "query": {
    "query_string": {
      "default_field": "all_text_fields",
      "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*",
      "default_operator": "OR"
    }
  }
}

i dont think date searches really changed much

{
  "query": {
    "query_string": {
      "query": "starttime: [2019-01-01 TO 2019-01-31]",
      "default_operator": "OR"
    }
  }
}

can combine different fields as well

{
  "query": {
    "query_string": {
      "fields": ["all_text_fields", "all_date_fields"],
      "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*",
      "default_operator": "OR"
    }
  }
}

Removal of `search_type=scan`

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html

NOTE: must clear _scroll_id after using the scroll API to pull data

Will return error is _scroll_id's not cleared

query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

Requires changes in our HySDS code, wherever it uses search_type=scan

curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No search type for [scan]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "No search type for [scan]"
  },
  "status": 400
}

# removing search_type=scan from the endpoint fixes this problem
curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100
{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==",
  "took": 34,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

Removal of filtered: Link and Guide

deprecated in version 5.x, move all logic to query and bool
and, or, not changed to must should and must_not
- if using should, will need to add minimum_should_match: 1
- Link

# from this:
{
  "filtered": {
    "filter": {
      "and": [
        {
          "match": {
            "tags": "ISL"
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
          }
        }
      ]
    }
  }
}

# change to this:
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tags": "ISL"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
          }
        }
      ]
    }
  }
}

Changes to Logstash

Mozart streams data to elasticsearch through Logstash
Changes to logstash 7.1
- time long are read as int instead of date_epoch_milis
  - need to convert and split the string, and removal the decimal places
- removal of flush_size, was originally set to 1
  - will need to set settings in logstash.yml
  - Link to forum
https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart

input {
  redis {
    host => "{{ MOZART_REDIS_PVT_IP }}"
    {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %}
    # these settings should match the output of the agent
    data_type => "list"
    key => "logstash"

    # We use the 'msgpack' codec here because we expect to read
    # msgpack events from redis.
    codec => msgpack
  }
}

filter {
  if [resource] in ["worker", "task"] {
    mutate {
      convert => {
        "[event][timestamp]" => "string"
        "[event][local_received]" => "string"
      }

      split => ["[event][timestamp]", "."]
      split => ["[event][local_received]", "."]

      add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ]
      add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ]

      remove_field => ["[event][timestamp]", "[event][local_received]"]
    }

    mutate {
      rename => { "[event][timestamp_new]" => "timestamp" }
      rename => { "[event][local_received_new]" => "local_received" }
    }
  }
}

output {
  #stdout { codec => rubydebug }

  if [resource] == "job" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "job_status-current"
      document_id => "%{payload_id}"
      template => "{{ OPS_HOME }}/mozart/etc/job_status.template"
      template_name => "job_status"
    }
  } else if [resource] == "worker" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "worker_status-current"
      document_id => "%{celery_hostname}"
      template => "{{ OPS_HOME }}/mozart/etc/worker_status.template"
      template_name => "worker_status"
    }
  } else if [resource] == "task" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "task_status-current"
      document_id => "%{uuid}"
      template => "{{ OPS_HOME }}/mozart/etc/task_status.template"
      template_name => "task_status"
    }
  } else if [resource] == "event" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "event_status-current"
      document_id => "%{uuid}"
      template => "{{ OPS_HOME }}/mozart/etc/event_status.template"
      template_name => "event_status"
    }
  } else {}
}

Running Elasticsearch 7 on EC2 instance

In order to expose port 0.0.0.0 properly, we need to edit the config/elasticsearch.yml file

network.host: 0.0.0.0
cluster.name: grq_cluster
node.name: ESNODE_CYR
node.master: true
node.data: true
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
discovery.zen.minimum_master_nodes: 2

# allows UI to talk to elasticsearch (in production we would put the actual hostname of the uI)
http.cors.enabled : true
http.cors.allow-origin: "*"

Running Kibana on EC2 instance

Install Kibana in command line

curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.1.1-darwin-x86_64.tar.gz
tar -xzf kibana-7.1.1-darwin-x86_64.tar.gz
cd kibana-7.1.1-darwin-x86_64/

Edit the config/kibana.yml file to expose host 0.0.0.0

server.host: 0.0.0.0

Index Template

So that every index created automatically follows this template for its mapping

grq2 _default_ mapping template Link
python code to create the index template: Link
Documentation

{
  "order": 0,
  "index_patterns": [
    "{{ prefix }}_*"
  ],
  "settings": {
    "index.refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "standard",
            "lowercase",
            "word_delimiter"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "norms": false,
            "type": "text",
            "copy_to": "all_text_fields",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "match": "*"
        }
      }
    ],
    "properties": {
      "browse_urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "location": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "center": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "starttime": {
        "type": "date"
      },
      "endtime": {
        "type": "date"
      },
      "creation_timestamp": {
        "type": "date"
      },
      "metadata": {
        "properties": {
          "context": {
            "type": "object",
            "enabled": false
          }
        }
      },
      "prov": {
        "properties": {
          "wasDerivedFrom": {
            "type": "keyword"
          },
          "wasGeneratedBy": {
            "type": "keyword"
          }
        }
      },
      "all_text_fields": {
        "type": "text"
      }
    }
  },
  "aliases": {
    "{{ alias }}": {}
  }
}

Percolator

Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)

User Rules (Documentation for user rules triggering)

mapping added in mozart server /home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping
python code to create the user_rules index: Link
Mapping template for user_rules index Link

# PUT user_rules
{
  "mappings": {
    "properties": {
      "creation_time": {
        "type": "date"
      },
      "enabled": {
        "type": "boolean",
        "null_value": true
      },
      "job_type": {
        "type": "keyword"
      },
      "kwargs": {
        "type": "keyword"
      },
      "modification_time": {
        "type": "date"
      },
      "modified_time": {
        "type": "date"
      },
      "passthru_query": {
        "type": "boolean"
      },
      "priority": {
        "type": "long"
      },
      "query": {
        "type": "object",
        "enabled": false
      },
      "query_all": {
        "type": "boolean"
      },
      "query_string": {
        "type": "text"
      },
      "queue": {
        "type": "text"
      },
      "rule_name": {
        "type": "keyword"
      },
      "username": {
        "type": "keyword"
      },
      "workflow": {
        "type": "keyword"
      }
    }
  }
}

hysds_ios Index

Github Link to template.json: Link

Python code to create hysds_ios index template: Link
Follow HySDS and Job-Spec documentation for Jenkins build Link

{
  "order": 0,
  "template": "{{ index }}",
  "settings": {
    "index.refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "standard",
            "lowercase",
            "word_delimiter"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "norms": false,
            "type": "text",
            "copy_to": "all_text_fields",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "match": "*"
        }
      }
    ],
    "properties": {
      "_timestamp": {
        "type": "date",
        "store": true
      }
    }
  }
}

Elasticsearch Upgrade