设计模式

特定设计模式

Asynchronous Request-Reply Pattern

服务端马上返回客户端请求，并且异步处理任务，客户端可以稍后查询任务状态

何时使用

应用	功能	需要关注
CPU-bound	查询任务状态

需要关注

Issues and considerations

HTTP status code

There are a number of possible ways to implement this pattern over HTTP and not all upstream services have the same semantics. For example, most services won't return an HTTP 202 response back from a GET method when a remote process hasn't finished. Following pure REST semantics, they should return HTTP 404 (Not Found). This response makes sense when you consider the result of the call isn't present yet.

Perform long-running tasks with the webhook action pattern

Legacy clients might not support this pattern. In that case, you might need to place a facade over the asynchronous API to hide the asynchronous processing from the original client. For example, Azure Logic Apps supports this pattern natively can be used as an integration layer between an asynchronous API and a client that makes synchronous calls. See Perform long-running tasks with the webhook action pattern.

如何执行

具体策略请浏览：

Solution

Compensating Transaction pattern

使用事务补偿的方式使多步骤的事务保持最终一致性

何时使用

应用	功能	需要关注
Uber	叫车流程中出错（叫到车但是司机取消，自动重新叫车）还原司机以及乘客状态	Retry
File Sync	文件同步过程出错（文件夹名被占用，自动修改），还原文件状态	Retry

需要关注

Issues and considerations

Order

The order of the steps in the compensating transaction doesn't necessarily have to be the exact opposite of the steps in the original operation. For example, one data store might be more sensitive to inconsistencies than another, and so the steps in the compensating transaction that undo the changes to this store should occur first.

Fail

A compensating transaction is also an eventually consistent operation and it could also fail. The system should be able to resume the compensating transaction at the point of failure and continue. It might be necessary to repeat a step that's failed, so the steps in a compensating transaction should be defined as idempotent commands.

Retry logic

Only stop the operation and initiate a compensating transaction if a step fails repeatedly or cannot be recovered.

如何执行

具体策略请浏览：

Solution

Competing Consumers pattern

Event Sourcing pattern

使用 append-only 的方式来记录所有数据的变化，方便进行数据溯源以及保证了数据的一致性，也能提供事务补偿功能。

何时使用

应用	功能	需要关注
Shopping	购物的整个流程拆分成不同的事件
Ranking	每次用户使用该数据（Hashtag，听歌）则产生多个事件

需要关注

Issues and considerations

Eventually consistent

The system will only be eventually consistent when creating materialized views or generating projections of data by replaying events. There's some delay between an application adding events to the event store as the result of handling a request, the events being published, and consumers of the events handling them. During this period, new events that describe further changes to entities might have arrived at the event store.

Difficult to update

If the format (rather than the data) of the persisted events needs to change, perhaps during a migration, it can be difficult to combine existing events in the store with the new version.

Idempotent

Event publication might be at least once, and so consumers of the events must be idempotent.

如何执行

具体策略请浏览：

Solution

CQRS pattern

分开读和写的操作以及存储，通常和 Event Sourcing pattern 与 Materialized View pattern 一起使用

何时使用

应用	功能	需要关注
Shopping	购物的整个流程拆分成不同的事件
Ranking	每次用户使用该数据（Hashtag，听歌）则产生多个事件

需要关注

Issues and considerations

Messaging

Although CQRS does not require messaging, it's common to use messaging to process commands and publish update events. In that case, the application must handle message failures or duplicate messages. See the guidance on Priority Queues for dealing with commands having different priorities.

Eventual consistency

If you separate the read and write databases, the read data may be stale. The read model store must be updated to reflect changes to the write model store, and it can be difficult to detect when a user has issued a request based on stale read data.

如何执行

具体策略请浏览：

Solution

Gatekeeper pattern

使用 Gateway 去验证和处理请求，提高系统安全性

何时使用

应用	功能	需要关注
Shopping	限制流量以及服务降级
Video	流量分发以及负载均衡
Trading	流量验证以及安全防护

需要关注

Issues and considerations

Security

Ensure that the trusted hosts the gatekeeper passes requests to expose only internal or protected endpoints and connect only to the gatekeeper. The trusted hosts shouldn't expose any external endpoints or interfaces.

Single point of failure

The gatekeeper instance could be a single point of failure. To minimize the impact of a failure, consider deploying additional instances and using an autoscaling mechanism to ensure capacity to maintain availability.

如何执行

具体策略请浏览：

Solution

Geode pattern

根据地理位置将请求导向最优的服务器/数据库处理请求。

何时使用

应用	功能	需要关注
Live	减少浏览直播的延迟，多级缓存和大量重复请求处理	-
Video	减少浏览直播的延迟，多级缓存和大量重复请求处理	Cache

需要关注

Issues and considerations

Log Analytics

Consider how to track different components of the same request as they might execute asynchronously on different instances. A proper monitoring strategy is crucial.

如何执行

具体策略请浏览：

Solution

Rate Limiting pattern

节流模式用于减少流量以及增加带宽吞吐量，节流策略可以从每秒请求数，请求数据量，费用等维度进行配置。

何时使用

应用	功能	需要关注
Live	对流量进行限制	-
Video	对流量进行限制	-
Shopping	对流量进行限制	-

需要关注

Issues and considerations

Durable messaging system

In scenarios where your APIs can handle requests faster than any throttled ingestion services allow, you'll need to manage how quickly you can use the service. However, only treating the throttling as a data rate mismatch problem, and simply buffering your ingestion requests until the throttled service can catch up, is risky. If your application crashes in this scenario, you risk losing any of this buffered data.

Improve throughput

While not required, it's often recommended to send smaller amounts of records more frequently to improve throughput. So rather than trying to batch things up for a release once a second or once a minute, you can be more granular than that to keep your resource consumption (memory, CPU, network, etc.) flowing at a more even rate, preventing potential bottlenecks due to sudden bursts of requests. For example, if a service allows 100 operations per second, the implementation of a rate limiter may even out requests by releasing 20 operations every 200 milliseconds, as shown in the following graph.

Errors handle

While the rate limiting pattern can reduce the number of throttling errors, your application will still need to properly handle any throttling errors that may occur.

Integrate multiple workstreams into rate limiting strategy

If your application has multiple workstreams that access the same throttled service, you'll need to integrate all of them into your rate limiting strategy. For instance, you might support bulk loading records into a database but also querying for records in that same database. You can manage capacity by ensuring all workstreams are gated through the same rate limiting mechanism. Alternatively, you might reserve separate pools of capacity for each workstream.

如何执行

具体策略请浏览：

Solution

相关组件

名称	特点	阅读
Google Cloud Armor	Adaptive protection Bot management Rate limiting	Cloud Armor Key features
Amazon API Gateway	Support for RESTful APIs and WebSocket APIs API Operations Monitoring	Amazon API Gateway Features

Saga distributed transactions pattern

Saga 是一种故障处理 pattern，用于分布式应用程序中建立一致性，并协调多个微服务之间的事务以保持数据一致性。其使用 Choreography 和 Orchestration 实现 Compensating Transaction pattern

何时使用

应用	功能	需要关注
Uber	叫车流程中出错（叫到车但是司机取消，自动重新叫车）还原司机以及乘客状态	Retry
File Sync	文件同步过程出错（文件夹名被占用，自动修改），还原文件状态	Retry

需要关注

Issues and considerations

idempotence

The implementation must be capable of handling a set of potential transient failures, and provide idempotence for reducing side-effects and ensuring data consistency.

Lost updates, Dirty reads

Lost updates, when one saga writes without reading changes made by another saga. Dirty reads, when a transaction or a saga reads updates made by a saga that has not yet completed those updates.

如何执行

Choreography

Choreography is a way to coordinate sagas where participants exchange events without a centralized point of control. With choreography, each local transaction publishes domain events that trigger local transactions in other services.

Orchestration

Orchestration is a way to coordinate sagas where a centralized controller tells the saga participants what local transactions to execute. The saga orchestrator handles all the transactions and tells the participants which operation to perform based on events. The orchestrator executes saga requests, stores and interprets the states of each task, and handles failure recovery with compensating transactions.

具体策略请浏览：

Solution

Sharding pattern

对数据存储进行水平划分，提高系统的 scalability 和 availability

何时使用

应用	功能	需要关注
Newsfeed	对 news 进行 sharding
Live	对 video 进行 sharding
search	对查询内容进行 sharding

需要关注

Issues and considerations

Rebalance

Rebalancing can be an expensive operation. To reduce the necessity of rebalancing, plan for growth by ensuring that each shard contains sufficient free space to handle the expected volume of changes. You should also develop strategies and scripts you can use to quickly rebalance shards if this becomes necessary.

Use stable data for the shard key

avoid basing the shard key on potentially volatile information. Instead, look for attributes that are invariant or that naturally form a key.

如何执行

具体策略请浏览：

Valet Key pattern

使用 token 进行鉴权，使用户可以直接访问资源，无需每次都经过服务器

何时使用

应用	功能	需要关注
Newsfeed	上传图片和视频
Infrastructure	认证调用者，记录 quota

需要关注

Issues and considerations

Manage the validity status and period of the key.

A key can usually be revoked or disabled, depending on how it was issued. Server-side policies can be changed or, the server key it was signed with can be invalidated. Specify a short validity period to minimize the risk of allowing unauthorized operations to take place against the data store. However, if the validity period is too short, the client might not be able to complete the operation before the key expires.

Control the level of access the key will provide

Typically, the key should allow the user to only perform the actions necessary to complete the operation, such as read-only access if the client shouldn't be able to upload data to the data store.

Deliver the key securely

It can be embedded in a URL that the user activates in a web page, or it can be used in a server redirection operation so that the download occurs automatically. Always use HTTPS to deliver the key over a secure channel.

如何执行

具体策略请浏览：

Solution

valet-key-pattern

通用设计模式

Ambassador pattern

使用一个适配器来管理新系统与旧系统直接的数据交换

需要关注

Issues and considerations

Latency

The proxy adds some latency overhead. Consider whether a client library, invoked directly by the application, is a better approach.

Idempotent

Consider the possible impact of including generalized features in the proxy. For example, the ambassador could handle retries, but that might not be safe unless all operations are idempotent.

如何执行

具体策略请浏览：

Solution

ambassador

Anti-corruption Layer pattern

使用一个适配器来管理新系统与旧系统直接的数据交换

何时使用

应用	功能	需要关注
Infrastructure	框架迁移，新旧系统交互

需要关注

Issues and considerations

Latency

The anti-corruption layer may add latency to calls made between the two systems.

Transaction

Make sure transaction and data consistency are maintained and can be monitored.

如何执行

具体策略请浏览：

Solution

anti-corruption-layer

Cache-Aside pattern

使用缓存组件对数据进行分级存储，提高读取速度

何时使用

应用	功能	需要关注
Video	使用 CDN 和 POP 缓存热门视频	热门视频提前预热，定时更新缓存视频
Ranking	对排名结果进行缓存	定期计算排名并且更新缓存
Search	对搜索结果进行缓存	定期计算搜索结果并且更新缓存

需要关注

Issues and considerations

Lifetime of cached data

Don't make the expiration period too short because this can cause applications to continually retrieve data from the data store and add it to the cache. Similarly, don't make the expiration period so long that the cached data is likely to become stale.

Evicting data

Most caches adopt a least-recently-used policy for selecting items to evict, but this might be customizable.

Priming the cache

Many solutions prepopulate the cache with the data that an application is likely to need as part of the startup processing.

Consistency

Implementing the Cache-Aside pattern doesn't guarantee consistency between the data store and the cache.

Local (in-memory) caching

This data could quickly become inconsistent between caches, so it might be necessary to expire data held in a private cache and refresh it more frequently. In these scenarios, consider investigating the use of a shared or a distributed caching mechanism.

如何执行

读取数据：

检查缓存是否有数据，有的话从缓存返回
如果没有的话，从持久化存储中读取数据
将数据写入缓存

写数据：

将数据写入持久化存储
从缓存删除数据

相关组件

名称	特点	阅读
Redis	atomic operations in-memory dataset asynchronous replication	Introduction to Redis
Memorystore	supports both Redis and Memcached and is fully protocol compatible Provisioning, replication, failover, and patching are all automated Highly scalable & Highly available	Memorystore
Amazon ElastiCache	Support for two engines: Memcached and Redis Ease of management via the AWS Management Console Detailed monitoring statistics for the engine nodes at no extra cost via Amazon CloudWatch	Amazon ElastiCache features

Health Endpoint Monitoring pattern

定时请求服务器检查其健康状况

何时使用

应用	功能	需要关注

需要关注

Issues and considerations

How to validate the response.

While this provides the most basic measure of application availability, and is the minimum implementation of this pattern, it provides little information about the operations, trends, and possible upcoming issues in the application.

Security

To protect them from public access, which might expose the application to malicious attacks, risk the exposure of sensitive information, or attract denial of service (DoS) attacks.

如何执行

具体策略请浏览：

Solution

Retry pattern & Circuit Breaker pattern

使用代理策略对重试功能进行管理和限制，防止出现大量重试导致原本崩溃的组件无法恢复以及耗用过多重试资源。

何时使用

应用	功能	需要关注
News Feed	对消息发送失败进行重试	Exception Handling, Logging
IM	对消息发送失败进行重试	Exception Handling, Logging
File Sync	对业务操作进行重试（更新文件）	Retry after delay

需要关注

Retry Issues and considerations Circuit Breaker Issues and considerations

Fail fast

For some noncritical operations, it's better to fail fast rather than retry several times and impact the throughput of the application.

Exception Handling

A request to a service can fail for a variety of reasons raising different exceptions depending on the nature of the failure. Some exceptions indicate a failure that can be resolved quickly, while others indicate that the failure is longer lasting. It's useful for the retry policy to adjust the time between retry attempts based on the type of the exception.

Idempotent

retries could cause the operation to be executed more than once, with unintended side effects

Logging

It's important to log all connectivity failures that cause a retry so that underlying problems with the application, services, or resources can be identified.

如何执行

具体策略请浏览：

设计模式

特定设计模式​

通用设计模式​

Retry pattern & Circuit Breaker pattern​

特定设计模式

通用设计模式

Retry pattern & Circuit Breaker pattern