Subject Re: [ib-support] Select syntax
Author Ann W. Harrison
>At 12/21/2001 09:59 PM (Friday), Claudio Valderrama C. wrote:
> >I'm going to paraphrase Doug: unfortunately, our DSQL parser sucks. :-)
At 10:55 PM 12/21/2001 -0500, Doug Chamberlin wrote:

>What we are seeing here is typical of parsers which are generated via
>software. Since the parser can tell the delimiters are separate tokens it
>does not require the spaces. No matter that most people would consider this
>syntax wrong. I don't know anyone who, on a SQL syntax test, would consider
>"SELECT*" correct or who would intentionally write it that way!

Actually, the quirk is not in the parser, but in the lexical analysis
and it follows the SQL-92 standard. select*from is perfectly legal.


Regards,

Ann


Chapter and verse:

American National Standard X3.135-1992
5.2 <token> and <separator>

Function
Specify lexical units (tokens and separators) that participate in SQL language.
Format
<token> ::=
<nondelimiter token>
| <delimiter token>

<nondelimiter token> ::=
<regular identifier>
| <key word>
| <unsigned numeric literal>
| <national character string literal>
| <bit string literal>
| <hex string literal>

<regular identifier> ::= <identifier body>

<identifier body> ::=
<identifier start>
[ { <underscore> | <identifier part> }... ]

<identifier start> ::= !! See the Syntax Rules

<identifier part> ::=
<identifier start>
| <digit>

<delimited identifier> ::=
<double quote> <delimited identifier body>
<double quote>

<delimited identifier body> ::=
<delimited identifier part>...

<delimited identifier part> ::=
<nondoublequote character>
| <doublequote symbol>

<nondoublequote character> ::= !! See the Syntax Rules

<doublequote symbol> ::= <double quote><double quote>

<delimiter token> ::=
<character string literal>
| <date string>
| <time string>
| <timestamp string>
| <interval string>
| <delimited identifier>
| <SQL special character>
| <not equals operator>
| <greater than or equals operator>
| <less than or equals operator>
| <concatenation operator>
| <double period>
| <left bracket>
| <right bracket>

<not equals operator> ::= <>

<greater than or equals operator> ::= >=

<less than or equals operator> ::= <=

<concatenation operator> ::= ||

<double period> ::= ..

<separator> ::= { <comment> | <space> | <newline> }...

<comment> ::=
<comment introducer>
[ <comment character>... ] <newline>

<comment character> ::=
<nonquote character>
| <quote>

<comment introducer> ::= <minus sign><minus sign>
[<minus sign>...]

<newline> : := !! implementation-defined end-of-line
indicator

<key word> ::=
<reserved word>
| <non-reserved word>

<non-reserved word> ::= !! too long to quote

<reserved word> ::= !! too long to quote


Syntax Rules:

1) An <identifier start> is one of:
a) A <simple Latin letter>; or
b) A character that is identified as a letter in the character repertoire
identified by the <module character set specification> or by the <character
set specification>; or
c) A character that is identified as a syllable in the character repertoire
identified by the <module character set specification> or by the <character
set specification>; or
d) A character that is identified as an ideograph in the character
repertoire identified by the <module character set specification> or by the
<character set specification>.

2) With the exception of the <space> character explicitly contained in
<timestamp string> and <interval string> and the permitted <separator>s in
<bit string literals and <hex string literal>s, a <token>, other than a
<character string literal>, a <national character string literal>, or a
<delimited identifier>, shall not include a <space> character or other
<separator>.

3) A <nondoublequote character> is one of:
a) Any <SQL language character> other than a <double quote>;
b) Any character other than a <double quote> in the character repertoire
identified by the <module character set specification>; or
c) Any character other than a <double quote> in the character repertoire
identified by the <character set specification>.

4) The two <doublequote>s contained in a <doublequote symbol> shall not be
separated by any <separator>.

5) Any <token> may be followed by a <separator>. A <nondelimiter token>
shall be followed by a <delimiter token> or a <separator>. If the Format
does not allow a <nondelimiter token>
to be followed by a <delimiter token>, then that <nondelimiter token> shall
be followed by a
<separator>.

6) There shall be no <space> nor <newline> separating the <minus sign>s of
a <comment introducer> .

7) SQL text containing one or more instances of <comment> is equivalent to
the same SQL text with the <comment> replaced with <newline>.

8) The sum of the number of <identifier start>s and the number of
<identifier part>s in a <regular identifier> shall not be greater than 128.

9) The <delimited identifier body> of a <delimited identifier> shall not
comprise more than 128 <delimited identifier part>s.

10) The <identifier body> of a <regular identifier> is equivalent to an
<identifier body> in which every letter that is a lower-case letter is
replaced by the equivalent upper-case letter or letters. This treatment
includes determination of equivalence, representation in the Information
and Definition Schemas, representation in the diagnostics area, and similar
uses.

11) The <identifier body> of a <regular identifier> (with every letter that
is a lower-case letter replaced by the equivalent upper-case letter or
letters), treated as the repetition of a <character string literal> that
specifies a <character set specification> of SQL-TEXT, shall not be equal,
according to the comparison rules in Subclause 8.2, "<comparison
predicate>", to any <reserved word> (with every letter that is a lower-case
letter replaced by the equivalent upper-case letter or letters), treated as
the repetition of a <character string literal> that specifies a <character
set specification> of SQL-TEXT.
Note: It is the intention that no <key word> specified in this American
Standard or revisions thereto shall end with an <underscore>.

12) Two <regular identifier>s are equivalent if their <identifier body>s,
considered as the repetition of a <character string literal> that specifies
a <character set specification> of SQL-TEXT, compare equally according to
the comparison rules in Subclause 8.2, "<comparison predicate>".

13) A <regular identifier> and a <delimited identifier> are equivalent if
the <identifier body> of the negular identifier> (with every letter that is
a lower-case letter replaced by the equivalent upper-case letter or
letters) and the <delimited identifier body> of the <delimited identifier>
(with all occurrences of <quote> replaced by <quote symbol> and all
occurrences of <double- quote symbol> replaced by <double quote>),
considered as the repetition of a <character string literal> that specifies
a <character set specification> of SQL-TEXT and an implementation- defined
collation that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".

14) Two <delimited identifier-x are equivalent if their <delimited
identifier body> (with all occurrences of <quote> replaced by <quote
symbol> and all occurrences of <doublequote symbol> replaced by
<doublequote>), considered as the repetition of a <character string
literal> that specifies a <character set specification> of SQL-TEXT and an
implementation-defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison predicate>".

15) For the purposes of identifying <key word>, any <simple Latin lower
case letter> contained in a candidate <key word> shall be effectively
treated as the corresponding <simple Latin upper case letter>.

Access Rules
None.

General Rules
None.

Leveling Rules
|) The following restrictions apply for Intermediate SQL:
a) No identifier body> shall end in an <underscore>.
2) The following restrictions apply for Entry SQL in addition to any
Intermediate SQL restrictions:
a) No <regular identifier> or <delimited identifier body> shall contain
more than 18 <character
representations.
b) An <identifier body> shall contain no <simple Latin lower case letter>.