AWS Cloudwatch
- public service - usable from AWS/ on-premises
- collects and manages operational data
- AWS integration - EC2, VPC, lambda, cloudtrail, R53…
- cloudwatch agent is needed to log data inside EC2
- can generate metrics based on logs
3 Functions in one
- gathering/ collecting metrics from AWS products, apps, on-premises
- Logs from AWS products, apps, on-premises
- Events - services (auto-scaling/ SNS) & schedules
Concepts
Namespace
- as a container for monitoring data
- format as AWS/’service’ (eg AWS/EC2)
Dimension
- separate the data from different perspectives (server/ instances)
- include instanceID, instancetype
Metric
- provides metrics for every services in AWS, belong to a namespaces
- have timestamps
- up to 30 dimensions per metrics
- can create custom metrics (e.g. extract RAM usage via unified agent on EC2 instance/ on-premise servers)
- metrics streams:
- to kinesis
- to 3rd party service, S3, amazon via firehose
- option to filter metrics to only stream a subset of streams
Logs
- store application log
- log groups: abitrary name representing an application
- log stream: instances within application/ log files/ containers
- log expiration policies: can be defined as never expire, 1d to 10y…
- send to S3, kinesis, lambda
- logs are encrypted by default
- metrics filter to create metrics from logs
- sources
- SDK, Cloudwatch logs agent, unified agent
- Elastic beanstalk
- ECS: from containers
- AWS Lambda collection from function logs
- VPC flow logs
- API gateway
- cloudtrail
- route53: DNS queries
- Queries and analyze log by CloudWatch Logs Insights
- able to filter, agg, sort and limit
- Export
- can export to S3 (via CreateExportTask API, non Realtime)
- can export to kinesis/ Lambda (via Logs subscriptions, Realtime and multiple account is possible by cross-account subscription)
Alarm
- link to metrics and flag the alarm state when certain metric condition is met, which may follow by an designated action
- State (OK, insufficient_data, alarm)
- Period (length of time (s) to evaluate metrics)
- can based on CloudWatch logs metrics filters
- can manually trigger alarm in CLI
- Targets:
- Stop, Terminate, Reboot, or Recover an EC2 Instance
- Trigger Auto Scaling Action
- Send notification to SNS (from which you can do pretty much anything)s
- Composite Alarms
- monitoring the states of multiple other alarms (with and/or condition)
- reduce “alarm noise” by creating complex composite alarms
- EC2 instance recovery
- Status check:
- instance status
- system status (hardware)
- attached EBS status
- Recovery: Same Private, Public, Elastic IP, metadata, placement group
- Status check:
CloudTrail
- Provides governance, compliance and audit for your AWS Account (enabled by default!)
- Get an log (history of events / API calls) made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
- Can put logs from CloudTrail into CloudWatch Logs or S3
- Can be configured to one region/ all regions, global services (IAM, cloudfront) can be logged (need to be configured)
- e.g. who deleted the EC2?? → check CloudTrail
- 90days stored by default (for longer retention, use S3 & Athena as Json)
- Event type
- Management events - for control plane operation
- Operations that are performed on resources in your AWS account
- Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)
- e.g. create, terminate EC2
- Data events - resource operations
- can separate read and writes event
- upload object to S3, Lambda function via Invoke API
- CloudTrail Insight events (pay-service)
- analyzes normal management events to create a baseline
- then detect unusual activity in your account via analyzing write events:
- inaccurate resource provisioning
- hitting service limits
- Bursts of AWS IAM actions
- Gaps in periodic maintenance activity
- Management events - for control plane operation
- can be integrated with CloudWatch logs
- Not Realtime
CloudTrail Lake
- Managed data lake for CloudTrail events
- Integrates collection, storage, preparation, and optimization for analysis & query
- Events are converted to ORC format
- Enables querying CloudTrail data with SQL
- Enable it with the “Create event data store” menu choice in the console
- Data is retained for up to 7 years
- Specify the event types you want to track (Management events/ Data events)
- Note KMS events add up fast and can make your costs blow up
- Basic event selectors can be selected in the UI
- Finer grained selection may be achieved with advanced event selectors
- What fields, what prefixes, event type, resources, event name…
- This can help control your ingestion and storage costs
- You can create “channels” to integrate with events outside of AWS
- Built-in support for Okta, LaunchDarkly, Clumio, and other CloudTrail partners
- Or custom integrations
- Querying
- Lake dashboards allow you to visualize events
- Roll your own SQL queries
- Start from sample queries in the CloudTrail Lake Editor
- Remember to bound your queries by eventTime to constrain costs