System Design for Data-Driven Application
Realtime fault tolerance System Design
In this article we will try to achieve the aspects: critical, performance indicators and Realtime.
Business activity monitoring (BAM) is composed by processes and technologies to enable analysis of critical business performance indicators based on real-time data. It’s an real use case of Data-Driven Application implementation.
As a bonus we can try to explore more about integrations and SLA.
Let's go!
Data Ingestion, the first!
First of all, we need to consume and understand the company data to create some key performance indicators along the time that it occurs.
- Datasources to be consumed by the Data Ingestion layer.
- API Gateway to rate limit and secure internals through based on pre defined rules. Rate limit is very important because previous layers can be configured to execute horizontal auto scaling. Wrong configurations can be disaster increasing considerably the size and mainly the cost of the cluster in burst cenários.
- Data Ingest processes reads and puts data into message queue.
- Message queue stores messages and delivery to subscribers.
- Workers consumes and processes messages on the queue, creates KPIs, aggregates data, executes Machine Learning Jobs and Data Quality. Alerts are created here if he Anomaly Detector algorithm find some anomaly. These data will be stored on Message Queue for subscribers execute some integration with external software like ITSM, create and sent notifications via e-mail or even push a message for Mobile App.
- All processed data will be stored on Data Layer, relacional data on MariaDB and schemaless/fast data on ClickHouse Database as an Enterprise Data Warehouse.
IAM — Identity Access Management secure all layers.
Integration with 3rd Part Software
This data can be shared with 3rd part software. Ex: ITSM, BI, etc.
- External software wants to read aggregated data processed for BAM Application, gets Alerts and integrate with your BI Software for Data Scientists.
- Connects over HTTPs on API Gateway to secure the access and control the Rate Limit.
- Finally, reads the data on Message Queue.
IAM — Identity Access Management secure all layers.
The Data-Driven Application
Finally, let's discuss about the application to provide tools do users visualize and interact with data.
- A DNS Server resolves IP address by domain name.
- Content Delivery Network serves static contet.
- Using the IP Address the request is received by Load Balancer.
- The Load Balancer route requests using rules to distribute the load and deal with server failures.
- API Gateway decreases surfaces of atack increasing security and executes rules to apply Rate Limiter protecting servers from burst.
- Al session state are stored on in memory shared storage for high throughput and low latency.
- All requests are secured by Identity Access Management.
- BAM application is accessed and processes the requests.
- Before access a BAM database, and in memory Data cache is used to get better performance.
- In case of missed cache the request will get data from Data Warehouse, update the cache and deliver data to request.
IAM — Identity Access Management secure all layers.
Increasing nines on Application SLA
To increase nines on the SLA, like 99,999% we will have to distribute the infrastructure around different locations around the world and maybe different cloud providers, depends on how critical is your operations and how much money do you have for this!
- Replicate all infrastructure in other regions.
- Put one Load Balancer to route requests to best region based on localization of user.
- The request follow the same path above.
Next Steps:
- Data Lake implementation for RAW data store;
- Plug Machine Learning algorithms to RPA executing self-healing scripts.
- Connect CI/CD pipelines.
- Monitor logs, metrics, Application Perfomance and User Experience.
- Continuous Feedback and revalidate the model efficiency.
References:
- Xu A. System Design Interview — An insider’s guide. Independently Published; 2020. 324 p.