Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Version: 1.7 → 7.1

...

  • PLEASE LOOK AT AND USE THE NEW ELASTICSEARCH UTILITY CLASS: (SOURCE CODE)

  • Only 1 type allowed in each index: _doc
  • Need to manually enable all field text searches
  • Removal of filtered since ES 5.0
  • Split string into text and keyword
  • fielddata: true is the mapping allows for sorting (but we'll sort on the keyword instead): Documentation
  • _default_ mapping deprecated in ES 6.0.0 (Link)
  • Changes in the geo coordinates query

    • Note: {"type": "geo_shape","tree": "quadtree","tree_levels": 26} makes uploading documents slow, specifically "tree_levels”: 2


    • Code Block
      {
        "query": {
          "bool": {
            "filter": {
              "geo_shape": {
                "location": {
                  "shape": {
                    "type": "polygon",
                    "coordinates": [[<coordinates>]]
                  },
                  "relation": "within"
                }
              }
            }
          }
        }
      }


  • Changes in Percolator
  • Removal of   _all: { "enabled": true }   type in indices so we cannot search for all fields

    • workaround is adding copy_to in field mapping, especially in dynamic templating

    • Link to copy_to documentation

    • Does not work with multi-fields

      • Code Block
        "random_field_name": {
          "type": "keyword",
          "ignore_above": 256,
          "copy_to": "all_text_fields", # DOES WORK
          "fields": {
            "keyword": {
              "type": "text"
              "copy_to": "all_text_fields" # DOES NOT WORK
            }
          }
        }


    • Proper mapping with text fields


      • Code Block
        "random_field_name": {
          "type": "text",
          "copy_to": "all_text_fields"
          "fields": {
            "keyword": { # WE USE 'raw' instead of 'keyword' in our own indices
              "type": "keyword" # THIS IS NEEDED FOR AGGREGATION ON THE FACETS FOR THE UI
              "ignore_above": 256
            }
          }
        }


    • Need to add the copy_to field mapping

      • Code Block
        "all_text_fields": {
          "type": "text"
        }


  • General changes to the mapping
    • create example mapping called grq_v1.1_s1-iw_slc

    • copied example data into new ES index, using built in dynamic mapping to build initial mapping

    • mapping changes:

      • metadata.context to {"type": "object", "enabled": false}

        • properties.location to {"type": "geo_shape","tree": "quadtree"}

        • use type keyword to be able to use msearch:


          • Code Block
            "reason": "Fielddata is disabled on text fields by default. ... Alternatively use a keyword field instead."


  • Changes to query_string
    • removal of escaping literal double quotes in query_string
    • old query_string from 1.7, would return S1B_IW_SLC__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16D

      • Code Block
        {
          "query": {
            "query_string": {
              "query": "\"__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16\"",
              "default_operator": "OR"
            }
          }
        }


    • new query_string returns equivalent document, requires wildcard * at the beginning and end of string

      • Code Block
        {
          "query": {
            "query_string": {
              "default_field": "all_text_fields",
              "query": "*__1SDV_20170812T010949_20170812T011016_006900_00C25E_B16*",
              "default_operator": "OR"
            }
          }
        }


    • i dont think date searches really changed much

      • Code Block
        {
          "query": {
            "query_string": {
              "query": "starttime: [2019-01-01 TO 2019-01-31]",
              "default_operator": "OR"
            }
          }
        }


    • can combine different fields as well

      • Code Block
        {
          "query": {
            "query_string": {
              "fields": ["all_text_fields", "all_date_fields"],
              "query": "[2019-01-01 TO 2019-01-31] AND *__1SDV_20190109T020750_20190109T020817_014411*",
              "default_operator": "OR"
            }
          }
        }


...

  • Removal of search_type=scan
    • https://www.elastic.co/guide/en/elasticsearch/reference/5.5/breaking_50_search_changes.html
    • Requires changes in our HySDS code, wherever it uses search_type=scan

      Code Block
      curl -X POST http://localhost:9200/hysds_ios/_search?search_type=scan&scroll=10m&size=100
      {
        "error": {
          "root_cause": [
            {
              "type": "illegal_argument_exception",
              "reason": "No search type for [scan]"
            }
          ],
          "type": "illegal_argument_exception",
          "reason": "No search type for [scan]"
        },
        "status": 400
      }
      
      # removing search_type=scan from the endpoint fixes this problem
      curl -X POST http://100.64.134.55:9200/user_rules/_search?scroll=10m&size=100
      {
        "_scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAEWMUpVeFNzVXpTVktlUzFPc0NKa1dndw==",
        "took": 34,
        "timed_out": false,
        "_shards": {
          "total": 1,
          "successful": 1,
          "skipped": 0,
          "failed": 0
        },
        "hits": {
          "total": {
            "value": 0,
            "relation": "eq"
          },
          "max_score": null,
          "hits": []
        }
      }


  • Removal of filter and filtered: Link and Guide
    • deprecated in version 5.x, move all logic to query and bool
    • andornot changed to must should and must_not
      • if using should, will need to add minimum_should_match: 1
      • Link


    • Code Block
      # from this:
      {
        "filtered": {
          "filter": {
            "and": [
              {
                "match": {
                  "tags": "ISL"
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
                }
              }
            ]
          }
        }
      }
      
      # change to this:
      {
        "query": {
          "bool": {
            "must": [
              {
                "match": {
                  "tags": "ISL"
                }
              }
            ],
            "filter": [
              {
                "range": {
                  "metadata.ProductReceivedTime": {"gte": "2020-03-24T00:00:00.000000Z"}
                }
              },
              {
                "range": {
                  "metadata.ProductReceivedTime": {"lte": "2020-03-24T23:59:59.999999Z"}
                }
              }
            ]
          }
        }
      }


Changes to Logstash

  • Mozart streams data to elasticsearch through Logstash
  • Changes to logstash 7.1

    • time long are read as int instead of date_epoch_milis
      • need to convert and split the string, and removal the decimal places
    • removal of flush_size, was originally set to 1
  • https://github.com/hysds/hysds/blob/develop-es7/configs/logstash/indexer.conf.mozart

  • Code Block
    input {
      redis {
        host => "{{ MOZART_REDIS_PVT_IP }}"
        {% if MOZART_REDIS_PASSWORD != "" %}password => "{{ MOZART_REDIS_PASSWORD }}"{% endif %}
        # these settings should match the output of the agent
        data_type => "list"
        key => "logstash"
    
        # We use the 'msgpack' codec here because we expect to read
        # msgpack events from redis.
        codec => msgpack
      }
    }
    
    filter {
      if [resource] in ["worker", "task"] {
        mutate {
          convert => {
            "[event][timestamp]" => "string"
            "[event][local_received]" => "string"
          }
    
          split => ["[event][timestamp]", "."]
          split => ["[event][local_received]", "."]
    
          add_field => [ "[event][timestamp_new]" , "%{[event][timestamp][0]}" ]
          add_field => [ "[event][local_received_new]" , "%{[event][local_received][0]}" ]
    
          remove_field => ["[event][timestamp]", "[event][local_received]"]
        }
    
        mutate {
          rename => { "[event][timestamp_new]" => "timestamp" }
          rename => { "[event][local_received_new]" => "local_received" }
        }
      }
    }
    
    output {
      #stdout { codec => rubydebug }
    
      if [resource] == "job" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "job_status-current"
          document_id => "%{payload_id}"
          template => "{{ OPS_HOME }}/mozart/etc/job_status.template"
          template_name => "job_status"
        }
      } else if [resource] == "worker" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "worker_status-current"
          document_id => "%{celery_hostname}"
          template => "{{ OPS_HOME }}/mozart/etc/worker_status.template"
          template_name => "worker_status"
        }
      } else if [resource] == "task" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "task_status-current"
          document_id => "%{uuid}"
          template => "{{ OPS_HOME }}/mozart/etc/task_status.template"
          template_name => "task_status"
        }
      } else if [resource] == "event" {
        elasticsearch {
          hosts => ["{{ MOZART_ES_PVT_IP }}:9200"]
          index => "event_status-current"
          document_id => "%{uuid}"
          template => "{{ OPS_HOME }}/mozart/etc/event_status.template"
          template_name => "event_status"
        }
      } else {}
    }


Running Elasticsearch 7 on EC2 instance

...

Code Block
server.host: 0.0.0.0


Index Template

So that every index created automatically follows this template for its mapping

...

Percolator needs to be compatible with ES 7.1 (not applicable because HySDS uses its own version of percolator)

User Rules (Documentation for user rules triggering)

  • mapping added in mozart server /home/ops/mozart/ops/tosca/configs/user_rules_dataset.mapping

  • python code to create the user_rules index: Link
  • Mapping template for user_rules index Link

  • Code Block
    # PUT user_rules
    {
      "mappings": {
        "properties": {
          "creation_time": {
            "type": "date"
          },
          "enabled": {
            "type": "boolean",
            "null_value": true
          },
          "job_type": {
            "type": "keyword"
          },
          "kwargs": {
            "type": "keyword"
          },
          "modification_time": {
            "type": "date"
          },
          "modified_time": {
            "type": "date"
          },
          "passthru_query": {
            "type": "boolean"
          },
          "priority": {
            "type": "long"
          },
          "query": {
            "type": "object",
            "enabled": false
          },
          "query_all": {
            "type": "boolean"
          },
          "query_string": {
            "type": "text"
          },
          "queue": {
            "type": "text"
          },
          "rule_name": {
            "type": "keyword"
          },
          "username": {
            "type": "keyword"
          },
          "workflow": {
            "type": "keyword"
          }
        }
      }
    }


hysds_ios Index

Github Link to template.json: Link

  • Python code to create hysds_ios index template: Link
  • Follow HySDS and Job-Spec documentation for Jenkins build Link

  • Code Block
    {
      "order": 0,
      "template": "{{ index }}",
      "settings": {
        "index.refresh_interval": "5s",
        "analysis": {
          "analyzer": {
            "default": {
              "filter": [
                "standard",
                "lowercase",
                "word_delimiter"
              ],
              "tokenizer": "keyword"
            }
          }
        }
      },
      "mappings": {
        "dynamic_templates": [
          {
            "integers": {
              "match_mapping_type": "long",
              "mapping": {
                "type": "integer"
              }
            }
          },
          {
            "strings": {
              "match_mapping_type": "string",
              "mapping": {
                "norms": false,
                "type": "text",
                "copy_to": "all_text_fields",
                "fields": {
                  "raw": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "match": "*"
            }
          }
        ],
        "properties": {
          "_timestamp": {
            "type": "date",
            "store": true
          }
        }
      }
    }


...