Hadoop的过去，现在和未来：一项调查

论文标题

Hadoop的过去，现在和未来：一项调查

Past, Present and Future of Hadoop: A Survey

论文作者

Zarei, Ameneh, Safari, Shahla, Ahmadi, Mahmood, Mardukhi, Farhad

论文摘要

在本文中，调查了一种用于大规模数据存储和计算的技术。 Hadoop由异构计算设备（如常规PC）组成，将并行处理的细节抽象出来，而开发人员只能集中于其计算问题。 Hadoop群集由两个部分组成：HDFS和MAPREDUCE。 Hadoop群集使用HDF进行数据管理。 HDFS为MAPREDUCE作业中的输入和输出数据提供了存储，并具有高门公差，高差异能力和高通量等能力。它也适用于在群集上存储Terabyte数据，并且可以在诸如商品设备之类的灵活硬件上运行。

In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just concentrate on their computational problem. A Hadoop cluster is made of two parts: HDFs and Mapreduce. Hadoop cluster uses HDFS for data management. HDFS provides storage for input and output data in MapReduce jobs and is designed with abilities like high-fault tolerance, high-distribution capacity, and high throughput. It is also suitable for storing Terabyte data on clusters and it runs on flexible hardware like commodity devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题