• Petabyte-scale analytic and reporting
  • a search engine but also analytic and reporting
  • often along with kinesis for Realtime bigdata
  • applications
    • full-text search
    • log analytics
    • application monitoring
    • security analytics
    • clickstream analytics
  • 3 entities
    • Documents: text/ structured json, every document has a unique id and type
    • type (depreciate soon): defines schema and mapping shared by documents
    • indices: the object being searched like a database, included all documents within a collection of types, which split into shard, each of which may be on a different node in a cluster

Characteristics:

  • Fully-managed
  • scaling without downtime
  • pay for what you use
  • network isolation
  • AWS integration
    • S3 via lambda to kinesis
    • kinesis data streams
    • dynamoDB streams
    • cloudwatch/ cloudtrail
    • zone awareness

Options:

  • dedicated master nodes (choice of count and instance types)
  • domains: a cluster with all configuration
  • snapshots to S3
  • zone awareness

Security

  • network isolation
  • Resource-based policies
  • identity- based polices
  • IPs-based polices
  • request signing
  • put cluster into VPC instead of open to public (harder to connect) (have to decide from start)
  • use Cognito to get in the dashboard within a VPC from enterprise identity providers like Microsoft active directory using SAMLs

Anti-patterns

  • OLTP
  • ad-hoc data querying (Athena is better)
  • remember OpenSearch is primarily for search and analytics

Storage:

  • Hot “standard”: instance stores/ EBS volumes, fastest performance
  • ultrawarm: use S3 + caching, slower performance but much lower cost (must have dedicated master node)
  • cold storage: use S3, even cheaper. (must have dedicated master node and not compatible with T2/ T3 instance types)
  • can migrate between storage type

Index State Management

  • Automate index management policies
  • example
    • delete old indices after period of time
    • move indices into read only after a period of time for compliance purpose
    • move indices between storage type over time
    • reduce replica count over time
    • automate index snapshots
  • ISM polices are run every 30-48 minutes
  • can send noti when done
  • index rollups
    • can roll up old data into summarized indices for time-series
    • saves storage costs
    • new index may have fewer fields, coarser time buckets
  • Index transforms
    • create a different view to analyze data differently
    • reshape data with pivot, stats, group…
    • grouping/ aggregations

Cross-cluster replication - replicate indices/ mappings/ metadata across domains - ensures high availability in an outage - replicate data geographically for better latency - Leader - Follower pattern - requires frine-grained access control and node-to-node encryption - “Remote Reindex” allows copying indices from one cluster to another on demand

Stability

  • 3 dedicated master nodes is best to avoids “split brain” ( doesn’t know which half is true)
  • Make sure not running out of disk space
  • choose a good number of shard, may need to limit the nubmer of shard per node
  • choose a instance types
    • at least 3 nodes
    • mostly about storage requirements as OpenSearch is storage heavy

Performance (JVMMemoryPressure error)

  • unbalanced shard allocations/ too many shards that pressure memory
  • Fewer shards can yield better performance by deleting old/ unused indices