Парсинг sql запроса python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Python SQL Parser and Transpiler

License

tobymao/sqlglot

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

SQLGlot is a no-dependency SQL parser, transpiler, optimizer, and engine. It can be used to format SQL or translate between 19 different dialects like DuckDB, Presto, Spark, Snowflake, and BigQuery. It aims to read a wide variety of SQL inputs and output syntactically correct SQL in the targeted dialects.

It is a very comprehensive generic SQL parser with a robust test suite. It is also quite performant, while being written purely in Python.

You can easily customize the parser, analyze queries, traverse expression trees, and programmatically build SQL.

Syntax errors are highlighted and dialect incompatibilities can warn or raise depending on configurations. However, it should be noted that SQL validation is not SQLGlot’s goal, so some syntax errors may go unnoticed.

Learn more about the SQLGlot API in the documentation.

Contributions are very welcome in SQLGlot; read the contribution guide to get started!

  • Install
  • Versioning
  • Get in Touch
  • Examples
    • Formatting and Transpiling
    • Metadata
    • Parser Errors
    • Unsupported Errors
    • Build and Modify SQL
    • SQL Optimizer
    • AST Introspection
    • AST Diff
    • Custom Dialects
    • SQL Execution

    Requirements for development (optional):

    Given a version number MAJOR . MINOR . PATCH , SQLGlot uses the following versioning strategy:

    • The PATCH version is incremented when there are backwards-compatible fixes or feature additions.
    • The MINOR version is incremented when there are backwards-incompatible fixes or feature additions.
    • The MAJOR version is incremented when there are significant backwards-incompatible fixes or feature additions.

    We’d love to hear from you. Join our community Slack channel!

    Formatting and Transpiling

    Easily translate from one dialect to another. For example, date/time functions vary between dialects and can be hard to deal with:

    import sqlglot sqlglot.transpile("SELECT EPOCH_MS(1618088028295)", read="duckdb", write="hive")[0]
    'SELECT FROM_UNIXTIME(1618088028295 / 1000)'

    SQLGlot can even translate custom time formats:

    import sqlglot sqlglot.transpile("SELECT STRFTIME(x, '%y-%-m-%S')", read="duckdb", write="hive")[0]
    "SELECT DATE_FORMAT(x, 'yy-M-ss')"

    As another example, let’s suppose that we want to read in a SQL query that contains a CTE and a cast to REAL , and then transpile it to Spark, which uses backticks for identifiers and FLOAT instead of REAL :

    import sqlglot sql = """WITH baz AS (SELECT a, c FROM foo WHERE a = 1) SELECT f.a, b.b, baz.c, CAST("b"."a" AS REAL) d FROM foo f JOIN bar b ON f.a = b.a LEFT JOIN baz ON f.a = baz.a""" print(sqlglot.transpile(sql, write="spark", identify=True, pretty=True)[0])
    WITH `baz` AS ( SELECT `a`, `c` FROM `foo` WHERE `a` = 1 ) SELECT `f`.`a`, `b`.`b`, `baz`.`c`, CAST(`b`.`a` AS FLOAT) AS `d` FROM `foo` AS `f` JOIN `bar` AS `b` ON `f`.`a` = `b`.`a` LEFT JOIN `baz` ON `f`.`a` = `baz`.`a`

    Comments are also preserved in a best-effort basis when transpiling SQL code:

    sql = """ /* multi line comment */ SELECT tbl.cola /* comment 1 */ + tbl.colb /* comment 2 */, CAST(x AS INT), # comment 3 y -- comment 4 FROM bar /* comment 5 */, tbl # comment 6 """ print(sqlglot.transpile(sql, read='mysql', pretty=True)[0])
    /* multi line comment */ SELECT tbl.cola /* comment 1 */ + tbl.colb /* comment 2 */, CAST(x AS INT), /* comment 3 */ y /* comment 4 */ FROM bar /* comment 5 */, tbl /* comment 6 */

    You can explore SQL with expression helpers to do things like find columns and tables:

    from sqlglot import parse_one, exp # print all column references (a and b) for column in parse_one("SELECT a, b + 1 AS c FROM d").find_all(exp.Column): print(column.alias_or_name) # find all projections in select statements (a and c) for select in parse_one("SELECT a, b + 1 AS c FROM d").find_all(exp.Select): for projection in select.expressions: print(projection.alias_or_name) # find all tables (x, y, z) for table in parse_one("SELECT * FROM x JOIN y JOIN z").find_all(exp.Table): print(table.name)

    When the parser detects an error in the syntax, it raises a ParserError:

    import sqlglot sqlglot.transpile("SELECT foo( FROM bar")
    sqlglot.errors.ParseError: Expecting ). Line 1, Col: 13. select foo( FROM bar ~~~~ 

    Structured syntax errors are accessible for programmatic use:

    import sqlglot try: sqlglot.transpile("SELECT foo( FROM bar") except sqlglot.errors.ParseError as e: print(e.errors)
    [< 'description': 'Expecting )', 'line': 1, 'col': 16, 'start_context': 'SELECT foo( ', 'highlight': 'FROM', 'end_context': ' bar', 'into_expression': None, >]

    Presto APPROX_DISTINCT supports the accuracy argument which is not supported in Hive:

    import sqlglot sqlglot.transpile("SELECT APPROX_DISTINCT(a, 0.1) FROM foo", read="presto", write="hive")
    APPROX_COUNT_DISTINCT does not support accuracy 'SELECT APPROX_COUNT_DISTINCT(a) FROM foo'

    SQLGlot supports incrementally building sql expressions:

    from sqlglot import select, condition where = condition("x=1").and_("y=1") select("*").from_("y").where(where).sql()
    'SELECT * FROM y WHERE x = 1 AND y = 1'

    You can also modify a parsed tree:

    from sqlglot import parse_one parse_one("SELECT x FROM y").from_("z").sql()

    There is also a way to recursively transform the parsed tree by applying a mapping function to each tree node:

    from sqlglot import exp, parse_one expression_tree = parse_one("SELECT a FROM x") def transformer(node): if isinstance(node, exp.Column) and node.name == "a": return parse_one("FUN(a)") return node transformed_tree = expression_tree.transform(transformer) transformed_tree.sql()

    SQLGlot can rewrite queries into an «optimized» form. It performs a variety of techniques to create a new canonical AST. This AST can be used to standardize queries or provide the foundations for implementing an actual engine. For example:

    import sqlglot from sqlglot.optimizer import optimize print( optimize( sqlglot.parse_one(""" SELECT A OR (B OR (C AND D)) FROM x WHERE Z = date '2021-01-01' + INTERVAL '1' month OR 1 = 0 """), schema="x": "A": "INT", "B": "INT", "C": "INT", "D": "INT", "Z": "STRING">> ).sql(pretty=True) )
    SELECT ( "x"."a" <> 0 OR "x"."b" <> 0 OR "x"."c" <> 0 ) AND ( "x"."a" <> 0 OR "x"."b" <> 0 OR "x"."d" <> 0 ) AS "_col_0" FROM "x" AS "x" WHERE CAST("x"."z" AS DATE) = CAST('2021-02-01' AS DATE)

    You can see the AST version of the sql by calling repr :

    from sqlglot import parse_one print(repr(parse_one("SELECT a + 1 AS z")))
    (SELECT expressions: (ALIAS this: (ADD this: (COLUMN this: (IDENTIFIER this: a, quoted: False)), expression: (LITERAL this: 1, is_string: False)), alias: (IDENTIFIER this: z, quoted: False)))

    SQLGlot can calculate the difference between two expressions and output changes in a form of a sequence of actions needed to transform a source expression into a target one:

    from sqlglot import diff, parse_one diff(parse_one("SELECT a + b, c, d"), parse_one("SELECT c, a - b, d"))
    [ Remove(expression=(ADD this: (COLUMN this: (IDENTIFIER this: a, quoted: False)), expression: (COLUMN this: (IDENTIFIER this: b, quoted: False)))), Insert(expression=(SUB this: (COLUMN this: (IDENTIFIER this: a, quoted: False)), expression: (COLUMN this: (IDENTIFIER this: b, quoted: False)))), Move(expression=(COLUMN this: (IDENTIFIER this: c, quoted: False))), Keep(source=(IDENTIFIER this: b, quoted: False), target=(IDENTIFIER this: b, quoted: False)), . ]

    Dialects can be added by subclassing Dialect :

    from sqlglot import exp from sqlglot.dialects.dialect import Dialect from sqlglot.generator import Generator from sqlglot.tokens import Tokenizer, TokenType class Custom(Dialect): class Tokenizer(Tokenizer): QUOTES = ["'", '"'] IDENTIFIERS = ["`"] KEYWORDS = < **Tokenizer.KEYWORDS, "INT64": TokenType.BIGINT, "FLOAT64": TokenType.DOUBLE, > class Generator(Generator): TRANSFORMS = exp.Array: lambda self, e: f"[self.expressions(e)>]"> TYPE_MAPPING = < exp.DataType.Type.TINYINT: "INT64", exp.DataType.Type.SMALLINT: "INT64", exp.DataType.Type.INT: "INT64", exp.DataType.Type.BIGINT: "INT64", exp.DataType.Type.DECIMAL: "NUMERIC", exp.DataType.Type.FLOAT: "FLOAT64", exp.DataType.Type.DOUBLE: "FLOAT64", exp.DataType.Type.BOOLEAN: "BOOL", exp.DataType.Type.TEXT: "STRING", > print(Dialect["custom"])

    One can even interpret SQL queries using SQLGlot, where the tables are represented as Python dictionaries. Although the engine is not very fast (it’s not supposed to be) and is in a relatively early stage of development, it can be useful for unit testing and running SQL natively across Python objects. Additionally, the foundation can be easily integrated with fast compute kernels (arrow, pandas). Below is an example showcasing the execution of a SELECT expression that involves aggregations and JOINs:

    from sqlglot.executor import execute tables = < "sushi": [ "id": 1, "price": 1.0>, "id": 2, "price": 2.0>, "id": 3, "price": 3.0>, ], "order_items": [ "sushi_id": 1, "order_id": 1>, "sushi_id": 1, "order_id": 1>, "sushi_id": 2, "order_id": 1>, "sushi_id": 3, "order_id": 2>, ], "orders": [ "id": 1, "user_id": 1>, "id": 2, "user_id": 2>, ], > execute( """ SELECT o.user_id, SUM(s.price) AS price FROM orders o JOIN order_items i ON o.id = i.order_id JOIN sushi s ON i.sushi_id = s.id GROUP BY o.user_id """, tables=tables )

    Источник

    Saved searches

    Use saved searches to filter your results more quickly

    You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

    A non-validating SQL parser module for Python

    License

    andialbrecht/sqlparse

    This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

    Name already in use

    A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

    Sign In Required

    Please sign in to use Codespaces.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching GitHub Desktop

    If nothing happens, download GitHub Desktop and try again.

    Launching Xcode

    If nothing happens, download Xcode and try again.

    Launching Visual Studio Code

    Your codespace will open once ready.

    There was a problem preparing your codespace, please try again.

    Latest commit

    The requirements in dev section are more general (and may require newer versions).

    Git stats

    Files

    Failed to load latest commit information.

    README.rst

    python-sqlparse — Parse SQL statements

    sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.

    The module is compatible with Python 3.6+ and released under the terms of the New BSD license.

    Visit the project page at https://github.com/andialbrecht/sqlparse for further information about this project.

    >>> import sqlparse >>> # Split a string containing two SQL statements: >>> raw = 'select * from foo; select * from bar;' >>> statements = sqlparse.split(raw) >>> statements ['select * from foo;', 'select * from bar;'] >>> # Format the first statement and print it out: >>> first = statements[0] >>> print(sqlparse.format(first, reindent=True, keyword_case='upper')) SELECT * FROM foo; >>> # Parsing a SQL statement: >>> parsed = sqlparse.parse('select * from foo')[0] >>> parsed.tokens [DML 'select' at 0x7f22c5e15368>, Whitespace ' ' at 0x7f22c5e153b0>, Wildcard '*' … ] >>>

    sqlparse is licensed under the BSD license.

    Parts of the code are based on pygments written by Georg Brandl and others. pygments-Homepage: http://pygments.org/

    About

    A non-validating SQL parser module for Python

    Источник

    Читайте также:  Алгоритм факторизации ленстры python
Оцените статью