Skip to content
This repository was archived by the owner on Aug 2, 2022. It is now read-only.

Support group by in new sql parser #724

Conversation

dai-chen
Copy link
Member

@dai-chen dai-chen commented Aug 31, 2020

Issue #, if available:

  1. chained functions in aggregation throw NullReferenceException #280: Chained function call or expression
  2. Group By * Query ERROR #401, Tableau query not working #430, "select column from table group by column order by column LIMIT 5" is not supported #643: Group by multi field
  3. Field name is converted to lowercase if present in GROUP BY #373, Function arguments missing if present in GROUP BY #389: function name changed to lower case
  4. aggregates over functions almost work #288: group by function
  5. Bool field gets integer value after aggregation #717: bool field returns integer

Description of changes: This PR is to change the grammar in new SQL parser and integrate with core engine. The basic idea is:

  1. Build Aggregation node with aggregators, so the output aggregation of SQL is same as PPL AST builder.
  2. Only perform "static" validation in SQL frontend and leave more analysis and validations in analyzer in core engine.

To make 1st item happen, the major change is adding a new domain class QuerySpecification and separate AST building process into 2 phases. Copy and pasting JavaDoc for your reference. Please see JavaDoc on QuerySpecification and AstAggregationBuilder class.

* AST aggregation builder that builds AST aggregation node for the following scenarios:
 * 1. Explicit GROUP BY
 *     1.1 Group by column name or scalar expression:
 *          SELECT ABS(age) FROM test GROUP BY ABS(age)
 *     1.2 Group by alias in SELECT AS clause:
 *          SELECT state AS s FROM test GROUP BY s
 *     1.3 Group by ordinal referring to select list:
 *          SELECT state FROM test GROUP BY 1
 *  2. Implicit GROUP BY
 *     2.1 No non-aggregated item (only aggregate functions):
 *          SELECT AVG(age), SUM(balance) FROM test
 *     2.2 Non-aggregated item exists:
 *          SELECT state, AVG(age) FROM test
 *
 *  For 1.1 and 2.1, Aggregation node is built with aggregators.
 *  For 1.2 and 1.3, alias and ordinal is replaced first and then
 *    Aggregation is built same as above.
 *  For 2.2, Exception thrown for now. We may support this by different SQL mode.

Documentation: No impact from user perspective so the old doc still up-to-date.

Testing: Add comparison test for basic cases and all bug fixes above. 3 legacy IT has to be skipped and the reason is listed in @Ignore annotation.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@dai-chen dai-chen added the SQL label Aug 31, 2020
@dai-chen dai-chen self-assigned this Sep 2, 2020
@dai-chen dai-chen requested review from chloe-zh and penghuo September 3, 2020 20:52
@dai-chen dai-chen marked this pull request as ready for review September 3, 2020 20:52
Copy link
Contributor

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change.

@dai-chen dai-chen merged commit f40fed6 into opendistro-for-elasticsearch:develop Sep 8, 2020
@dai-chen dai-chen deleted the support-group-by-in-new-sql-parser branch September 8, 2020 21:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants