Trinity Mode for Larger Scales
Note: A full breakout of the underlying components are documented here in the SDD. For the purposes of this discussion, we focus on the key and driving components relevant for increasing the scale of total distributed workers.
Basic Deployment between Mozart resource manager and Verdi compute nodes
A basic deployment of HySDS has the Mozart resource manager component handling the full load of the distributed Verdi compute nodes. Figure 1 shows this baseline deployment mode where if we have n workers, then Mozart has to support 3n persistent connections. Each compute node makes 3 connections back to the resource manager:
(redis) job status events - Verdi emits job state changes back to mozart
(rabbitmq) job descriptors - Verdi gets the next job popped off the queue
(rabbitmq) control messages - Verdi gets a control command such as revoke a running job.
At the end of each job, if datasets are created to be ingested back into GRQ datasets catalog, then at the end of each job, there are periodic calls to submit percolator jobs to evaluate if there are any production rules that acts on that dataset type just ingested.
Trinity Mode Deployment: move RabbitMQ and ElasticSearch out of original Mozart
The next iteration is to break apart Mozart to enable more scaling. By moving RabbitMQ and ElasticSearch out to their own standalone services, the distribution of worker connections are spread across more services. Figure 2 shows the rearrangement of network topology in this mode where for n workers will make 2n persistent connections to only the RabbitMQ service. The main Mozart component now needs 1n connections for n workers.
In this approach, the ElasticSearch component can be more easily replaced with other managed services such as AWS OpenSearch. This is what the SWOT SDS PCM is using. Similarly, the RabbitMQ component can be updated to high availability (HA) mode as well. Or alternatively replaced with AWS MQ.
Trinity+ Mode Deployment: move Redis, RabbitMQ, and ElasticSearch out of original Mozart
Yet another iteration is to also include Redis in the redistribution to more standalone services. This would reduce the footprint of the core value-added of Mozart to be task workers and logstash for high rate job management (Figure 3). Similar to Trinity Mode, the network topology is spread out to also include Redis as its own standalone service. This allows the use of switching out Redis with managed service offerings such as AWS ElasticCache/MemoryDB.
References
Software Design Document (SDD) | SoftwareDesignDocument(SDD) EnvironmentView
Diagram Source (Google Slides)
https://www.rabbitmq.com/connections.html
https://www.rabbitmq.com/networking.html#open-file-handle-limit