论文标题

互联网交通量不是高斯 - 它们是对数正常的:一项18年的纵向研究,对建模和预测有影响(完整版本)

Internet Traffic Volumes Are Not Gaussian -- They Are Log-Normal: An 18-Year Longitudinal Study With Implications for Modelling and Prediction (Complete Version)

论文作者

Alasmar, Mohammed, Clegg, Richard, Zakhleniuk, Nickolay, Parisis, George

论文摘要

在网络链接上获得良好的流量统计模型是一个众所周知的,经常研究的问题。对相关模式和流动持续时间有很多关注。每单位时间流量的分布是同样重要但研究较少的问题。我们使用最先进的统计技术研究了许多不同网络,包括学术,商业和住宅网络,包括学术,商业和住宅网络的大量交通轨迹。我们表明,交通遵守对数正态分布的分布,这比文献中通常声称的高斯分布更好。我们还研究了另一种重尾分布(Weibull),并表明其性能胜于高斯,但比对数正常效果差。我们检查了异常的痕迹,这些痕迹表现出适合所有试用的所有分布的差,并表明这通常是由于交通中断或链接达到了最大容量。我们证明,如果我们考虑长15分钟甚至1小时的样本,我们所看到的数据是静止的。这使我们相信我们可以将分布用于估计和建模目的。我们在两种情况下证明了我们发现的效用:预测时间流量的比例将超过给定级别(用于服务水平协议或链接容量估计)并预测95%的价格定价。我们还表明,在这两种情况下,对数正态分布是比高斯或韦布尔分布更好的预测指标。

Getting good statistical models of traffic on network links is a well-known, often-studied problem. A lot of attention has been given to correlation patterns and flow duration. The distribution of the amount of traffic per unit time is an equally important but less studied problem. We study a large number of traffic traces from many different networks including academic, commercial and residential networks using state-of-the-art statistical techniques. We show that traffic obeys the log-normal distribution which is a better fit than the Gaussian distribution commonly claimed in the literature. We also investigate an alternative heavy-tailed distribution (the Weibull) and show that its performance is better than Gaussian but worse than log-normal. We examine anomalous traces which exhibit a poor fit for all distributions tried and show that this is often due to traffic outages or links that hit maximum capacity. We demonstrate that the data we look at is stationary if we consider samples of 15- minute long or even 1-hour long. This gives confidence that we can use the distributions for estimation and modelling purposes. We demonstrate the utility of our findings in two contexts: predicting that the proportion of time traffic will exceed a given level (for service level agreement or link capacity estimation) and predicting 95th percentile pricing. We also show that the log-normal distribution is a better predictor than Gaussian or Weibull distributions in both contexts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源