...

Version: 1.7 → 7.1

Relevant Github Repos/Branches

...

:

hysds https://github.com/hysds/hysds.git
container-builder https://github.com/hysds/container-builder.git
Rest APIs
- grq2 https://github.com/hysds/grq2.git
- mozart https://github.com/hysds/mozart.git
Good for now, may move to hysds repo
- hysds_commons https://github.com/hysds/hysds_commons.git
Will remove because code and scripts ahve been moved to grq/mozart repo
- figaro https://github.com/hysds/figaro.git
- tosca https://github.com/hysds/tosca.git
sdscli https://github.com/sdskit/sdscli.git
lightweight-jobs https://github.com/hysds/lightweight-jobs.git
Other repos that are installed in every hysds component
- prov_es https://github.com/hysds/prov_es.git
- chimera https://github.com/hysds/chimera.git
- pele https://github.com/hysds/pele.git

Big Changes

...

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

...

text allows for more searching capabilities documentation
keyword allows for aggregation, etc. Documentation

...

Removal of Tosca/Figaro user interfaces in favor of hysds_ui

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)
Only 1 type allowed in each index: _doc
Need to manually enable all field text searches
Removal of filtered since ES 5.0
- documentation
Split string into text and keyword
- text allows for more searching capabilities documentation
- keyword allows for aggregation, etc. Documentation
fielddata: true is the mapping allows for sorting (but we'll sort on the keyword instead): Documentation Documentation
Support for z coordinate in geoshapes: documentation
- it wont affect searches but adds more flexibility in location data
_default_ mapping deprecated in ES 6.0.0 (Link)
- workaround is using index templates: (Documentation)
Changes in the geo coordinates query
Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2

Code Block
{ "query": { "bool": { "filter": { "geo_shape": { "location": { "shape": { "type": "polygon", "coordinates": [[<coordinates>]] }, "relation": "within" } } } } } }

Changes in Percolator
- Removal of .percolator type Documentation
  - Instead a percolator field type must be configured prior to indexing percolator queries
- Complete overhaul in the percolate index mapping
  - New version documentation VS. Old version documentation
Removal of _all: { "enabled": true } type in indices so we cannot search for all fields
workaround is adding copy_to in field mapping, especially in dynamic templating
Link to copy_to documentation

Does not work with multi-

.0.0

(Link)

workaround is using index templates: (Documentation)

Changes in the geo coordinates query

Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2

Code Block
{ "query": { "bool": { "filter": { "geo_shape": { "location": { "shape": { "type": "polygon", "coordinates": [[<coordinates>]] }, "relation": "within" } } } } } }

Changes in Percolator
- Removal of .percolator type Documentation
  - Instead a percolator field type must be configured prior to indexing percolator queries
- Complete overhaul in the percolate index mapping
  - New version documentation VS. Old version documentation

Removal of _all: { "enabled": true } type in indices so we cannot search for all fields

workaround is adding copy_to in field mapping, especially in dynamic templating
Link to copy_to documentation

Does not work with multi-fields

Code Block
"random_field_name": { "type": "keyword", "ignore_above": 256, "copy_to": "all_text_fields", # DOES WORK "fields": { "keyword": { "type": "text" "copy_to": "all_text_fields" # DOES NOT WORK } } }

Proper mapping with text fields

Code Block

"random_field_name": {
  "type": "keyword",
  "ignore_abovetype": 256"text",
  "copy_to": "all_text_fields", # DOES WORK
  "fields": {
    "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices
      "type": "textkeyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI
      "ignore_above": 256
    }
  }
}

Need to add the copy_to": field mapping
- Code Block
  "all_text_fields": #{ DOES NOT WORK } }"type": "text" }
Proper mapping with text fields
Code Block"random_field_name": {

General changes to the mapping
- create example mapping called grq_v1.1_s1-iw_slc
- copied example data into new ES index, using built in dynamic mapping to build initial mapping
- mapping changes:
  - metadata.context to {"type":
- - "
text
- - object",
"copy_to": "all_text_fields" "fields": { "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI "ignore_above": 256 } } }Need to add the copy_to field mapping
Code Block
"all_text_fields": { "type": "text" }
General changes to the mapping
create example mapping called grq_v1.1_s1-iw_slc
copied example data into new ES index, using built in dynamic mapping to build initial mapping

mapping changes:

metadata.context to {"type": "object", "enabled": false}

properties.location to {"type": "geo_shape","tree": "quadtree"}

use type keyword to be able to use msearch:

Code Block
"reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."

Changes to

query_string

removal of escaping literal double quotes in query_string
old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D
- "enabled": false}
  - properties.location to {"type": "geo_shape","tree": "quadtree"}
  - use type keyword to be able to use msearch:
    - Code Block
      "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."

Changes to query_string

removal of escaping literal double quotes in query_string

old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D

Code Block
{ "query": { "query_string": { "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"", "default_operator": "OR" } } }

new query_string returns equivalent document, requires wildcard * at the beginning and end of string

Code Block
{ "query": { "query_string": { "default_field": "all_text_fields", "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"", "default_operator": "OR" } } }

new query_string returns equivalent document, requires wildcard * at the beginning and end of string

i dont think date searches really changed much

Code Block

{
  "query": {
    "query_string": {
      "default_field": "allquery_text_fields",string": {
      "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*starttime: [2019-01-01 TO 2019-01-31]",
      "default_operator": "OR"
    }
  }
}

i dont think date searches really changed much

can combine different fields as well

Code Block
{ "query": { "query_string": {"query_string": { "fields": ["all_text_fields", "all_date_fields"], "query": "starttime: [2019-01-01 TO 2019-01-31]", "default_operator": "OR" } } }

can combine different fields as well

Code Block

{
  "query": {
    "query_string": {
      "fields": ["all_text_fields", "all_date_fields"],
      "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*",
      "default_operator": "OR"
    }
  }
}

Removal of search_type=scan

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html

 AND *__1SDV_20190109T020750_20190109T020817_014411*",
      "default_operator": "OR"
    }
  }
}

Removal of `search_type=scan`

https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html

NOTE: must clear _scroll_id after using the scroll API to pull data

Will return error is _scroll_id's not cleared

Code Block

query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

Requires changes in our HySDS code, wherever it uses search_type=scan

Code Block

curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "No search type for [scan]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "No search type for [scan]"
  },
  "status": 400
}

# removing search_type=scan from the endpoint fixes this problem
curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100
{
  "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==",
  "took": 34,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

Removal of

filter and

filtered: Link and Guide

deprecated in version 5.x, move all logic to query and bool
and, or, not changed to must should and must_not
- if using should, will need to add minimum_should_match: 1
- Link

Code Block

# from this:
{
  "filtered": {
    "filter": {
      "and": [
        {
          "match": {
            "tags": "ISL"
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
          }
        }
      ]
    }
  }
}

# change to this:
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "tags": "ISL"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
          }
        },
        {
          "range": {
            "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
          }
        }
      ]
    }
  }
}

Changes to Logstash

Mozart streams data to elasticsearch through Logstash
Changes to logstash 7.1
- time long are read as int instead of date_epoch_milis
  - need to convert and split the string, and removal the decimal places
- removal of flush_size, was originally set to 1
  - will need to set settings in logstash.yml
  - Link to forum
https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart

Code Block

input {
  redis {
    host => "{{ MOZART_REDIS_PVT_IP }}"
    {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %}
    # these settings should match the output of the agent
    data_type => "list"
    key => "logstash"

    # We use the 'msgpack' codec here because we expect to read
    # msgpack events from redis.
    codec => msgpack
  }
}

filter {
  if [resource] in ["worker", "task"] {
    mutate {
      convert => {
        "[event][timestamp]" => "string"
        "[event][local_received]" => "string"
      }

      split => ["[event][timestamp]", "."]
      split => ["[event][local_received]", "."]

      add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ]
      add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ]

      remove_field => ["[event][timestamp]", "[event][local_received]"]
    }

    mutate {
      rename => { "[event][timestamp_new]" => "timestamp" }
      rename => { "[event][local_received_new]" => "local_received" }
    }
  }
}

output {
  #stdout { codec => rubydebug }

  if [resource] == "job" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "job_status-current"
      document_id => "%{payload_id}"
      template => "{{ OPS_HOME }}/mozart/etc/job_status.template"
      template_name => "job_status"
    }
  } else if [resource] == "worker" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "worker_status-current"
      document_id => "%{celery_hostname}"
      template => "{{ OPS_HOME }}/mozart/etc/worker_status.template"
      template_name => "worker_status"
    }
  } else if [resource] == "task" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "task_status-current"
      document_id => "%{uuid}"
      template => "{{ OPS_HOME }}/mozart/etc/task_status.template"
      template_name => "task_status"
    }
  } else if [resource] == "event" {
    elasticsearch {
      hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
      index => "event_status-current"
      document_id => "%{uuid}"
      template => "{{ OPS_HOME }}/mozart/etc/event_status.template"
      template_name => "event_status"
    }
  } else {}
}

Running Elasticsearch 7 on EC2 instance

In order to expose port 0.0.0.0 properly, we need to edit the config/elasticsearch.yml file

Code Block

network.host: 0.0.0.0
cluster.name: grq_cluster
node.name: ESNODE_CYR
node.master: true
node.data: true
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
discovery.zen.minimum_master_nodes: 2

# allows UI to talk to elasticsearch (in production we would put the actual hostname of the uI)
http.cors.enabled : true
http.cors.allow-origin: "*"

Running Kibana on EC2 instance

Install Kibana in command line

...

Code Block
server.host: 0.0.0.0

Index Template

So that every index created automatically follows this template for its mapping

...

Code Block

{
  "order": 0,
  "index_patterns": [
    "{{ prefix }}_*"
  ],
  "settings": {
    "index.refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "standard",
            "lowercase",
            "word_delimiter"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "norms": false,
            "type": "text",
            "copy_to": "all_text_fields",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "match": "*"
        }
      }
    ],
    "properties": {
      "browse_urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "location": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "center": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "starttime": {
        "type": "date"
      },
      "endtime": {
        "type": "date"
      },
      "creation_timestamp": {
        "type": "date"
      },
      "metadata": {
        "properties": {
          "context": {
            "type": "object",
            "enabled": false
          }
        }
      },
      "prov": {
        "properties": {
          "wasDerivedFrom": {
            "type": "keyword"
          },
          "wasGeneratedBy": {
            "type": "keyword"
          }
        }
      },
      "all_text_fields": {
        "type": "text"
      }
    }
  },
  "aliases": {
    "{{ alias }}": {}
  }
}

Percolator

Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)

...

mapping added in mozart server /home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping
python code to create the user_rules index: Link
Mapping template for user_rules index Link

Code Block

# PUT user_rules
{
  "mappings": {
    "properties": {
      "creation_time": {
        "type": "date"
      },
      "enabled": {
        "type": "boolean",
        "null_value": true
      },
      "job_type": {
        "type": "keyword"
      },
      "kwargs": {
        "type": "keyword"
      },
      "modification_time": {
        "type": "date"
      },
      "modified_time": {
        "type": "date"
      },
      "passthru_query": {
        "type": "boolean"
      },
      "priority": {
        "type": "long"
      },
      "query": {
        "type": "object",
        "enabled": false
      },
      "query_all": {
        "type": "boolean"
      },
      "query_string": {
        "type": "text"
      },
      "queue": {
        "type": "text"
      },
      "rule_name": {
        "type": "keyword"
      },
      "username": {
        "type": "keyword"
      },
      "workflow": {
        "type": "keyword"
      }
    }
  }
}

hysds_ios Index

Github Link to template.json: Link

...

Versions Compared

Old Version 3

New Version Current

Key

Version: 1.7 → 7.1

Relevant Github Repos/Branches

:

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

Changes in Percolator

General changes to the mapping

Changes to query_string

Removal of `search_type=scan`

Removal of

filtered: Link and Guide

Changes to Logstash

Running Elasticsearch 7 on EC2 instance

Running Kibana on EC2 instance

Index Template

Percolator

hysds_ios Index

Page Comparison

Versions Compared

Old Version 3

New Version Current

Key

<span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-0">[data-colorid=</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-2">wqejpwtu5w</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-3">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-4">wqejpwtu5w</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-5">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-6">e0ngn6dlkf</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-7">]{color:</span>

<span class="diff-html-added" data-a11y-before="Start of added content" data-a11y-after="End of added content" id="added-diff-8">e0ngn6dlkf</span><span class="diff-html-changed" data-a11y-before="Start of changed content" data-a11y-after="End of changed content" id="changed-diff-9">]{color:</span>

Relevant Github Repos/Branches

:

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

Big Changes

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

Changes in Percolator

General changes to the mapping

Changes to query_string

Removal of search_type=scan

Removal of

filtered: Link and Guide

Changes to Logstash

Running Elasticsearch 7 on EC2 instance

Running Kibana on EC2 instance

Index Template

Percolator

hysds_ios Index

Removal of `search_type=scan`