AWS Opensearch Service

Petabyte-scale analytic and reporting
a search engine but also analytic and reporting
often along with kinesis for Realtime bigdata
applications
- full-text search
- log analytics
- application monitoring
- security analytics
- clickstream analytics
3 entities
- Documents: text/ structured json, every document has a unique id and type
- type (depreciate soon): defines schema and mapping shared by documents
- indices: the object being searched like a database, included all documents within a collection of types, which split into shard, each of which may be on a different node in a cluster

Characteristics:

Fully-managed
scaling without downtime
pay for what you use
network isolation
AWS integration
- S3 via lambda to kinesis
- kinesis data streams
- dynamoDB streams
- cloudwatch/ cloudtrail
- zone awareness

Options:

dedicated master nodes (choice of count and instance types)
domains: a cluster with all configuration
snapshots to S3
zone awareness

Security

network isolation
Resource-based policies
identity- based polices
IPs-based polices
request signing
put cluster into VPC instead of open to public (harder to connect) (have to decide from start)
use Cognito to get in the dashboard within a VPC from enterprise identity providers like Microsoft active directory using SAMLs

Anti-patterns

OLTP
ad-hoc data querying (Athena is better)
remember OpenSearch is primarily for search and analytics

Storage:

Hot “standard”: instance stores/ EBS volumes, fastest performance
ultrawarm: use S3 + caching, slower performance but much lower cost (must have dedicated master node)
cold storage: use S3, even cheaper. (must have dedicated master node and not compatible with T2/ T3 instance types)
can migrate between storage type

Index State Management

Automate index management policies
example
- delete old indices after period of time
- move indices into read only after a period of time for compliance purpose
- move indices between storage type over time
- reduce replica count over time
- automate index snapshots
ISM polices are run every 30-48 minutes
can send noti when done
index rollups
- can roll up old data into summarized indices for time-series
- saves storage costs
- new index may have fewer fields, coarser time buckets
Index transforms
- create a different view to analyze data differently
- reshape data with pivot, stats, group…
- grouping/ aggregations

Cross-cluster replication - replicate indices/ mappings/ metadata across domains - ensures high availability in an outage - replicate data geographically for better latency - Leader - Follower pattern - requires frine-grained access control and node-to-node encryption - “Remote Reindex” allows copying indices from one cluster to another on demand

Stability

3 dedicated master nodes is best to avoids “split brain” ( doesn’t know which half is true)
Make sure not running out of disk space
choose a good number of shard, may need to limit the nubmer of shard per node
choose a instance types
- at least 3 nodes
- mostly about storage requirements as OpenSearch is storage heavy

Performance (JVMMemoryPressure error)

unbalanced shard allocations/ too many shards that pressure memory
Fewer shards can yield better performance by deleting old/ unused indices

Stanley Chan's Note🧠

Explorer

AWS Opensearch Service

Graph View

Backlinks