论文标题
致电计划以减少FAAS系统的响应时间
Call Scheduling to Reduce Response Time of a FaaS System
论文作者
论文摘要
在一个超载的FAA群集中,在延长请求的队列下,个别工人节点菌株。尽管群集最终可能是水平缩放的,但添加一个新节点需要数十个秒。由于服务应用程序是针对尾部服务潜伏期的,并且在较重的负载下大大增加,因此当前的解决方法是资源过度提供的。实际上,即使服务可以承受稳定的载荷,例如70%的CPU利用率,但触发了自动制剂,例如30-40%(因此,该服务使用的是所需的两倍的节点)。我们提出了一种替代方案:一种工人级方法处理重负载而不增加节点的数量。 与文本编辑器相比,FAAS的执行不是交互式的:最终用户不会从分配给流程的CPU中受益,但短期内。受到高性能计算的调度方法的启发,我们采取了基于(1)基于其历史特征的排队请求来替换经典OS抢占的根本性步骤; (2)一旦处理了请求,将其CPU限制设置为一个核心(没有CPU超额要求)。 我们扩展了开放式风格,并使用SEBS基准测量了拟议溶液的效率。在加载系统中,我们的方法将平均响应时间减少4倍。较短请求的改进甚至更高,因为平均延伸量减少了18倍。这使我们表明我们可以使用3台机器提供更好的响应时间统计数据,而不是4机械基线。
In an overloaded FaaS cluster, individual worker nodes strain under lengthening queues of requests. Although the cluster might be eventually horizontally-scaled, adding a new node takes dozens of seconds. As serving applications are tuned for tail serving latencies, and these greatly increase under heavier loads, the current workaround is resource over-provisioning. In fact, even though a service can withstand a steady load of, e.g., 70% CPU utilization, the autoscaler is triggered at, e.g., 30-40% (thus the service uses twice as many nodes as it would be needed). We propose an alternative: a worker-level method handling heavy load without increasing the number of nodes. FaaS executions are not interactive, compared to, e.g., text editors: end-users do not benefit from the CPU allocated to processes often, yet for short periods. Inspired by scheduling methods for High Performance Computing, we take a radical step of replacing the classic OS preemption by (1) queuing requests based on their historical characteristics; (2) once a request is being processed, setting its CPU limit to exactly one core (with no CPU oversubscription). We extend OpenWhisk and measure the efficiency of the proposed solutions using the SeBS benchmark. In a loaded system, our method decreases the average response time by a factor of 4. The improvement is even higher for shorter requests, as the average stretch is decreased by a factor of 18. This leads us to show that we can provide better response-time statistics with 3 machines compared to a 4-machine baseline.