论文标题
通过神经示例分裂
Neuro-Symbolic Regex Synthesis Framework via Neural Example Splitting
论文作者
论文摘要
由于正则表达式的实际重要性(简而言之),已经进行了大量研究来自动从正面和负弦示例中产生regexes。我们通过依靠一种称为“神经示例分裂”的新方法来解决从正面和负面弦中更快地学习Regexes的问题。我们的方法实质上是将每个示例字符串分为多个部分,使用经过训练的神经网络从正弦分组类似的子字符串。这有助于更快地学习正则态度,因此,因为我们现在从几个短长的字符串中学习。我们提出了一个称为“ splitRegex”的有效的正则综合框架,该框架从“拆分”正阳性子字中综合了子regexes,并通过串联合成的子重新分解物来产生最终的正则gegex。对于负样本,我们在子regex合成过程中利用了预生成的亚regexes,并针对负字符串执行匹配。然后,最终正则与所有负字符串一致。 SplitRegex是学习目标再发格的分界框架框架;拆分(=分隔)正弦和多个部分推断部分倾向,这比推断整个字符串更准确,并在满足负面字符串的同时推断出(=征服)推断出Regexes。我们从经验上证明,所提出的SplitRegex框架基本上改善了四个基准数据集的先前正则综合方法。
Due to the practical importance of regular expressions (regexes, for short), there has been a lot of research to automatically generate regexes from positive and negative string examples. We tackle the problem of learning regexes faster from positive and negative strings by relying on a novel approach called `neural example splitting'. Our approach essentially split up each example string into multiple parts using a neural network trained to group similar substrings from positive strings. This helps to learn a regex faster and, thus, more accurately since we now learn from several short-length strings. We propose an effective regex synthesis framework called `SplitRegex' that synthesizes subregexes from `split' positive substrings and produces the final regex by concatenating the synthesized subregexes. For the negative sample, we exploit pre-generated subregexes during the subregex synthesis process and perform the matching against negative strings. Then the final regex becomes consistent with all negative strings. SplitRegex is a divided-and-conquer framework for learning target regexes; split (=divide) positive strings and infer partial regexes for multiple parts, which is much more accurate than the whole string inferring, and concatenate (=conquer) inferred regexes while satisfying negative strings. We empirically demonstrate that the proposed SplitRegex framework substantially improves the previous regex synthesis approaches over four benchmark datasets.