论文标题
InterSectX:用于图挖掘的有效加速器
IntersectX: An Efficient Accelerator for Graph Mining
论文作者
论文摘要
图模式挖掘应用程序尝试查找与特定模式匹配的所有嵌入。与传统的图表计算相比,图挖掘应用程序是计算密集型的。最先进的方法,模式枚举,构造了与模式相匹配的嵌入。两个边缘列表的关键操作 - 交点 - 对常规体系结构构成挑战,需要大量执行时间。在本文中,我们提出了InterSectX,这是一种垂直设计的加速器,用于基于传统处理器的流指令集扩展和体系结构支持。基于流的ISA可以被视为对标量值操作的传统指令的自然扩展。我们开发了由有效实现流ISA扩展的专业机制组成的InterSectX架构,包括:(1)流映射表(SMT)记录流ID和流寄存器之间的映射; (2)启用有效流数据移动的仅读取流缓存(S-CACHE); (3)跟踪与交叉特性的流之间的依赖关系; (4)实现稀疏价值计算的流价值处理单元(SVPU); (5)生成用于实现嵌套相交的微型OP序列的嵌套交叉转换器。我们在ZSIM上实现InterSectX ISA和体系结构,并使用七个流行的图表挖掘应用程序(三角/三链/尾部 - 跟踪计数,3-motif挖掘,4/5-clique Counting和FSM)在十个真实图上进行测试。我们开发自己的汽车实施(不舒服)。结果表明,InterSectX在CPU上的表现显着优于不壳不素,平均为10.7次,高达83.9次。以及基于详尽检查的最先进的图形挖掘加速器Gramer,平均40.1次,最高为181.8次。
Graph pattern mining applications try to find all embeddings that match specific patterns. Compared to the traditional graph computation, graph mining applications are computation-intensive. The state-of-the-art method, pattern enumeration, constructs the embeddings that match the pattern. The key operation -- intersection -- of two edge lists, poses challenges to conventional architectures and requires substantial execution time. In this paper, we propose IntersectX, a vertically designed accelerator for pattern enumeration with stream instruction set extension and architectural supports based on conventional processor. The stream based ISA can be considered as a natural extension to the traditional instructions that operate on scalar values. We develop the IntersectX architecture composed of specialized mechanisms that efficiently implement the stream ISA extensions, including: (1) Stream Mapping Table (SMT) that records the mapping between stream ID and stream register; (2) the read-only Stream Cache (S-Cache) that enables efficient stream data movements; (3) tracking the dependency between streams with a property of intersection; (4) Stream Value Processing Unit (SVPU) that implements sparse value computations; and (5) the nested intersection translator that generates micro-op sequences for implementing nested intersections. We implement IntersectX ISA and architecture on zSim, and test it with seven popular graph mining applications (triangle/three-chain/tailed-traingle counting, 3-motif mining, 4/5-clique counting, and FSM) on ten real graphs. We develop our own implementation of AutoMine (InHouseAutomine). The results show that IntersectX significantly outperforms InHouseAutomine on CPU, on average 10.7 times and up to 83.9 times ; and GRAMER, a state-of-the-art graph pattern mining accelerator, based on exhaustive check, on average 40.1 times and up to 181.8 times.