论文标题
Hadoop的过去,现在和未来:一项调查
Past, Present and Future of Hadoop: A Survey
论文作者
论文摘要
在本文中,调查了一种用于大规模数据存储和计算的技术。 Hadoop由异构计算设备(如常规PC)组成,将并行处理的细节抽象出来,而开发人员只能集中于其计算问题。 Hadoop群集由两个部分组成:HDFS和MAPREDUCE。 Hadoop群集使用HDF进行数据管理。 HDFS为MAPREDUCE作业中的输入和输出数据提供了存储,并具有高门公差,高差异能力和高通量等能力。它也适用于在群集上存储Terabyte数据,并且可以在诸如商品设备之类的灵活硬件上运行。
In this paper, a technology for massive data storage and computing named Hadoop is surveyed. Hadoop consists of heterogeneous computing devices like regular PCs abstracting away the details of parallel processing and developers can just concentrate on their computational problem. A Hadoop cluster is made of two parts: HDFs and Mapreduce. Hadoop cluster uses HDFS for data management. HDFS provides storage for input and output data in MapReduce jobs and is designed with abilities like high-fault tolerance, high-distribution capacity, and high throughput. It is also suitable for storing Terabyte data on clusters and it runs on flexible hardware like commodity devices.