Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

...

...

...

...

...

...

...

...

...

Version: 1.7 → 7.1

Relevant Github Repos/Branches

...

:

Big Changes

...

PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

...

...

  • Removal of Tosca/Figaro user interfaces in favor of hysds_ui

Big Changes

  • PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

  • Only 1 type allowed in each index: _doc
  • Need to manually enable all field text searches
  • Removal of filtered since ES 5.0
  • Split string into text and keyword
  • fielddata: true is the mapping allows for sorting (but we'll sort on the keyword instead): Documentation Documentation
  • Support for z coordinate in geoshapesdocumentation
    • it wont affect searches but adds more flexibility in location data
  • _default_ mapping deprecated in ES 6.0.0 (Link)

    Changes in the geo coordinates query

  • Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2

  • Code Block
    {
      "query": {
        "bool": {
          "filter": {
            "geo_shape": {
              "location": {
                "shape": {
                  "type": "polygon",
                  "coordinates": [[<coordinates>]]
                },
                "relation": "within"
              }
            }
          }
        }
      }
    }
  • Changes in Percolator
  • Removal of   _all: { "enabled": true }   type in indices so we cannot search for all fields

  • workaround is adding copy_to in field mapping, especially in dynamic templating

  • Link to copy_to documentation

  • Does not work with multi-.0.0 (Link)
  • Changes in the geo coordinates query

    • Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2


    • Code Block
      {
        "query": {
          "bool": {
            "filter": {
              "geo_shape": {
                "location": {
                  "shape": {
                    "type": "polygon",
                    "coordinates": [[<coordinates>]]
                  },
                  "relation": "within"
                }
              }
            }
          }
        }
      }


  • Changes in Percolator


  • Removal of   _all: { "enabled": true }   type in indices so we cannot search for all fields

    • workaround is adding copy_to in field mapping, especially in dynamic templating

    • Link to copy_to documentation

    • Does not work with multi-fields

      • Code Block
        "random_field_name": {
          "type": "keyword",
          "ignore_above": 256,
          "copy_to": "all_text_fields", # DOES WORK
          "fields": {
            "keyword": {
              "type": "text"
              "copy_to": "all_text_fields" # DOES NOT WORK
            }
          }
        }


    • Proper mapping with text fields


      • Code Block
        "random_field_name": {
          "type": "keyword",
          "ignore_abovetype": 256"text",
          "copy_to": "all_text_fields", # DOES WORK
          "fields": {
            "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices
              "type": "textkeyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI
              "ignore_above": 256
            }
          }
        }


    • Need to add the copy_to": field mapping

      • Code Block
        "all_text_fields": #{
        DOES NOT WORK
            }
          }"type": "text"
        }

      Proper mapping with text fields

      Code Block"random_field_name": {

  • General changes to the mapping

    • create example mapping called grq_v1.1_s1-iw_slc

    • copied example data into new ES index, using built in dynamic mapping to build initial mapping

    • mapping changes:

      • metadata.context to {"type":

      • "

    text
      • object",

    "copy_to": "all_text_fields" "fields": { "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI "ignore_above": 256 } } }Need to add the copy_to field mapping
    Code Block
    "all_text_fields": {
      "type": "text"
    }
    General changes to the mapping
  • create example mapping called grq_v1.1_s1-iw_slc

  • copied example data into new ES index, using built in dynamic mapping to build initial mapping

  • mapping changes:

    metadata.context to {"type": "object", "enabled": false}

  • properties.location to {"type": "geo_shape","tree": "quadtree"}

  • use type keyword to be able to use msearch:

    Code Block
    "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."
    Changes to query_string
    • removal of escaping literal double quotes in query_string
    • old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D
      • "enabled": false}

        • properties.location to {"type": "geo_shape","tree": "quadtree"}

        • use type keyword to be able to use msearch:


          • Code Block
            "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."


  • Changes to query_string

    • removal of escaping literal double quotes in query_string
    • old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D

      • Code Block
        {
          "query": {
            "query_string": {
              "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"",
              "default_operator": "OR"
            }
          }
        }


    • new query_string returns equivalent document, requires wildcard * at the beginning and end of string

      • Code Block
        {
          "query": {
            "query_string": {
              "default_field": "all_text_fields",
              "query": "\"*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\*"",
              "default_operator": "OR"
            }
          }
        }
      new query_string returns equivalent document, requires wildcard * at the beginning and end of string

    • i dont think date searches really changed much

      • Code Block
        {
          "query": {
            "query_string": {
              "default_field": "allquery_text_fields",string": {
              "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*starttime: [2019-01-01 TO 2019-01-31]",
              "default_operator": "OR"
            }
          }
        }
      i dont think date searches really changed much

    • can combine different fields as well

      • Code Block
        {
          "query": {
              "query_string": {"query_string": {
              "fields": ["all_text_fields", "all_date_fields"],
              "query": "starttime: [2019-01-01 TO 2019-01-31]",
              "default_operator": "OR"
            }
          }
        }
      can combine different fields as well
      Code Block
      {
        "query": {
          "query_string": {
            "fields": ["all_text_fields", "all_date_fields"],
            "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*",
            "default_operator": "OR"
          }
        }
      }
    Removal of search_type=scan


  • Removal of search_type=scan

    • https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html
    • NOTE: must clear _scroll_id after using the scroll API to pull data
      • Will return error is _scroll_id's not cleared

      • Code Block
        query_phase_execution_exception","reason":"Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.


    • Requires changes in our HySDS code, wherever it uses search_type=scan

      Code Block
      curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100
      {
        "error": {
          "root_cause": [
            {
              "type": "illegal_argument_exception",
              "reason": "No search type for [scan]"
            }
          ],
          "type": "illegal_argument_exception",
          "reason": "No search type for [scan]"
        },
        "status": 400
      }
      
      # removing search_type=scan from the endpoint fixes this problem
      curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100
      {
        "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==",
        "took": 34,
        "timed_out": false,
        "_shards": {
          "total": 1,
          "successful": 1,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": {
            "value": 0,
            "relation": "eq"
          },
          "max_score": null,
          "hits": []
        }
      }


  • Removal of

    filter and

    filtered: Link and Guide

    • deprecated in version 5.x, move all logic to query and bool
    • andornot changed to must should and must_not
      • if using should, will need to add minimum_should_match: 1
      • Link


    • Code Block
      # from this:
      {
        "filtered": {
          "filter": {
            "and": [
              {
                "match": {
                  "tags": "ISL"
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
                }
              }
            ]
          }
        }
      }
      
      # change to this:
      {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "tags": "ISL"
                }
              }
            ],
            "filter": [
              {
                "range": {
                  "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
                }
              }
            ]
          }
        }
      }


Changes to Logstash

  • Mozart streams data to elasticsearch through Logstash
  • Changes to logstash 7.1

    • time long are read as int instead of date_epoch_milis
      • need to convert and split the string, and removal the decimal places
    • removal of flush_size, was originally set to 1
  • https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart

  • Code Block
    input {
      redis {
        host => "{{ MOZART_REDIS_PVT_IP }}"
        {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %}
        # these settings should match the output of the agent
        data_type => "list"
        key => "logstash"
    
        # We use the 'msgpack' codec here because we expect to read
        # msgpack events from redis.
        codec => msgpack
      }
    }
    
    filter {
      if [resource] in ["worker", "task"] {
        mutate {
          convert => {
            "[event][timestamp]" => "string"
            "[event][local_received]" => "string"
          }
    
          split => ["[event][timestamp]", "."]
          split => ["[event][local_received]", "."]
    
          add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ]
          add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ]
    
          remove_field => ["[event][timestamp]", "[event][local_received]"]
        }
    
        mutate {
          rename => { "[event][timestamp_new]" => "timestamp" }
          rename => { "[event][local_received_new]" => "local_received" }
        }
      }
    }
    
    output {
      #stdout { codec => rubydebug }
    
      if [resource] == "job" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "job_status-current"
          document_id => "%{payload_id}"
          template => "{{ OPS_HOME }}/mozart/etc/job_status.template"
          template_name => "job_status"
        }
      } else if [resource] == "worker" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "worker_status-current"
          document_id => "%{celery_hostname}"
          template => "{{ OPS_HOME }}/mozart/etc/worker_status.template"
          template_name => "worker_status"
        }
      } else if [resource] == "task" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "task_status-current"
          document_id => "%{uuid}"
          template => "{{ OPS_HOME }}/mozart/etc/task_status.template"
          template_name => "task_status"
        }
      } else if [resource] == "event" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "event_status-current"
          document_id => "%{uuid}"
          template => "{{ OPS_HOME }}/mozart/etc/event_status.template"
          template_name => "event_status"
        }
      } else {}
    }


Running Elasticsearch 7 on EC2 instance

In order to expose port 0.0.0.0 properly, we need to edit the config/elasticsearch.yml file

Code Block
network.host: 0.0.0.0
cluster.name: grq_cluster
node.name: ESNODE_CYR
node.master: true
node.data: true
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
discovery.zen.minimum_master_nodes: 2

# allows UI to talk to elasticsearch (in production we would put the actual hostname of the uI)
http.cors.enabled : true
http.cors.allow-origin: "*"


Running Kibana on EC2 instance

Install Kibana in command line

...

Code Block
server.host: 0.0.0.0


Index Template

So that every index created automatically follows this template for its mapping

...

Code Block
{
  "order": 0,
  "index_patterns": [
    "{{ prefix }}_*"
  ],
  "settings": {
    "index.refresh_interval": "5s",
    "analysis": {
      "analyzer": {
        "default": {
          "filter": [
            "standard",
            "lowercase",
            "word_delimiter"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "integers": {
          "match_mapping_type": "long",
          "mapping": {
            "type": "integer"
          }
        }
      },
      {
        "strings": {
          "match_mapping_type": "string",
          "mapping": {
            "norms": false,
            "type": "text",
            "copy_to": "all_text_fields",
            "fields": {
              "raw": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "match": "*"
        }
      }
    ],
    "properties": {
      "browse_urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "urls": {
        "type": "text",
        "copy_to": "all_text_fields"
      },
      "location": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "center": {
        "tree": "quadtree",
        "type": "geo_shape"
      },
      "starttime": {
        "type": "date"
      },
      "endtime": {
        "type": "date"
      },
      "creation_timestamp": {
        "type": "date"
      },
      "metadata": {
        "properties": {
          "context": {
            "type": "object",
            "enabled": false
          }
        }
      },
      "prov": {
        "properties": {
          "wasDerivedFrom": {
            "type": "keyword"
          },
          "wasGeneratedBy": {
            "type": "keyword"
          }
        }
      },
      "all_text_fields": {
        "type": "text"
      }
    }
  },
  "aliases": {
    "{{ alias }}": {}
  }
}


Percolator

Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)

...

  • mapping added in mozart server /home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping

  • python code to create the user_rules index: Link
  • Mapping template for user_rules index Link

  • Code Block
    # PUT user_rules
    {
      "mappings": {
        "properties": {
          "creation_time": {
            "type": "date"
          },
          "enabled": {
            "type": "boolean",
            "null_value": true
          },
          "job_type": {
            "type": "keyword"
          },
          "kwargs": {
            "type": "keyword"
          },
          "modification_time": {
            "type": "date"
          },
          "modified_time": {
            "type": "date"
          },
          "passthru_query": {
            "type": "boolean"
          },
          "priority": {
            "type": "long"
          },
          "query": {
            "type": "object",
            "enabled": false
          },
          "query_all": {
            "type": "boolean"
          },
          "query_string": {
            "type": "text"
          },
          "queue": {
            "type": "text"
          },
          "rule_name": {
            "type": "keyword"
          },
          "username": {
            "type": "keyword"
          },
          "workflow": {
            "type": "keyword"
          }
        }
      }
    }


hysds_ios Index

Github Link to template.json: Link

...