论文标题

SQLFlow:SQL和机器学习之间的桥梁

SQLFlow: A Bridge between SQL and Machine Learning

论文作者

Wang, Yi, Yang, Yang, Zhu, Weiguo, Wu, Yi, Yan, Xu, Liu, Yongfeng, Wang, Yu, Xie, Liang, Gao, Ziyao, Zhu, Wenjing, Chen, Xiang, Yan, Wei, Tang, Mingjie, Tang, Yuan

论文摘要

工业AI系统主要是端到端的机器学习(ML)工作流程。典型的建议或商业智能系统包括许多在线微服务和离线工作。我们描述了用于在SQL中有效开发此类工作流程的SQLFlow。 SQL使开发人员能够编写关注目的(什么)并忽略过程(如何)的简短程序。以前的数据库系统扩展了其SQL方言以支持ML。 SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn.我们仔细扩展了SQL语法,以使扩展名与各种SQL方言一起工作。我们通过发明协作解析算法来实现扩展。 SQLFlow对各种ML技术具有有效的表现 - 受监督和无监督的学习;深网和树模型;视觉模型解释除了训练和预测;除ML外,数据处理和特征提取。 SQLFLOF将SQL程序编译到Kubernetes-native工作流程中,以进行故障倾斜的执行和云部署。当前的工业用户包括Ant Financial,Didi和Alibaba Group。

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源