ConceptLexerParser can how handle UnrecognizedTokens
This commit is contained in:
+97
-1
@@ -19,7 +19,7 @@ For those you don't know this old cartoon, it's the Odyssey story from Homer,
|
||||
ported in the 31st century. Ulysses has a spacecraft with an AI named Shyrka
|
||||
|
||||
I was a great fan of this cartoon when I was young. I thought that the idea of
|
||||
bringing the ancient story of Ulysses in the future was a bright.
|
||||
bringing the ancient story of Ulysses in the future was bright.
|
||||
|
||||
Ever since then, Sheerka was my reference for any sophisticated computer. Unfortunately
|
||||
for me, at that time there was no wikipedia to tell the the correct spelling.
|
||||
@@ -654,3 +654,99 @@ For the two questions, I will first try the simple implementations and see there
|
||||
* the entry in sdp will not be all_number, but all_id_of_number. I will use the concept id instead of its name
|
||||
|
||||
|
||||
2019-24-12
|
||||
**********
|
||||
|
||||
Going back on BNF implementation. As it's Christmas eve today, I won't stay very long.
|
||||
|
||||
So, the implementation lies in the class ConceptLexerParser, a it's a lexer not for token, but for concept.
|
||||
The purpose of this class is to recognize a sequence of Concept.
|
||||
|
||||
So if we defines the following concepts
|
||||
|
||||
::
|
||||
|
||||
def concept foo from bnf one two three
|
||||
def concept bar form bnf four five
|
||||
|
||||
when you input
|
||||
|
||||
::
|
||||
|
||||
one two three four five
|
||||
|
||||
the list of :code:`[foo, bar]` will be returned by the parser (as return values)
|
||||
|
||||
How does it works ?
|
||||
|
||||
As explained in the code, my implementation is highly inspired by Arpegio project. To define your grammar, you
|
||||
use **ParsingExpressions**. There are several types
|
||||
|
||||
* some use to recognize tokens StrMatch, ConceptMatch
|
||||
* other use to tell how to recognize Sequence, OrderedChoice, Optional, OneOrMore, ZeroOrMore...
|
||||
|
||||
Some example :
|
||||
|
||||
::
|
||||
|
||||
to recognize 'foo' -> StrMatch("foo')
|
||||
to recognize 'foo bar' -> Sequence(StrMatch("foo'), StrMatch("bar'))
|
||||
to recognize 'foo' or 'bar' -> OrderedChoice(StrMatch("foo'), StrMatch("bar'))
|
||||
|
||||
and so on...
|
||||
|
||||
So when a concept is defined using its bnf definition, I use the **BnfParser** to create the grammar, and then
|
||||
I use the **ConceptLexerParser** to recognize the concepts
|
||||
|
||||
The current implementation to recognize a concept is not very efficient. All the definitions are in a dictionary
|
||||
and I go thru the whole dictionary to see if some concepts are recognized. Once a concept is found, I loop again
|
||||
on the whole dictionary to find the next concept.
|
||||
|
||||
| -> I need a btree to order the concept
|
||||
| -> I need a predictive algorithm to guess the next concept
|
||||
|
||||
But it is for later.
|
||||
|
||||
So once the parsing is effective, I return a **ConceptNode** object
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class ConceptNode(LexerNode):
|
||||
"""
|
||||
Returned by the ConceptLexerParser
|
||||
It represents a recognized concept
|
||||
"""
|
||||
|
||||
def __init__(self, concept, start, end, tokens=None, source=None, underlying=None):
|
||||
super().__init__(start, end, tokens, source)
|
||||
self.concept = concept
|
||||
self.underlying = underlying
|
||||
|
||||
if self.source is None:
|
||||
self.source = BaseParser.get_text_from_tokens(self.tokens)
|
||||
|
||||
|
||||
concept
|
||||
| Remember that all grammars are listed in a dictionary of <Concept, ParsingExpression>.
|
||||
| So when a parsing expression is verified, it's easy to link it with the concept
|
||||
start
|
||||
position first of the token
|
||||
end
|
||||
position of the last token
|
||||
tokens
|
||||
list of tokens that are recognized
|
||||
underling
|
||||
**NonTerminalNode** or **TerminalNode** that wraps the underlying **ParsingExpression** used to recognize the concept
|
||||
source
|
||||
| The source is deduced from the tokens
|
||||
| But in the unit tests, they are directly given for speed up and simplicity
|
||||
|
||||
What is the difference between the **[Non]TerminalNode** and the **ParsingExpression** ?
|
||||
|
||||
The ParsingExpression
|
||||
defines how to recognize a concept
|
||||
|
||||
The [Non]TerminalNode
|
||||
represents what was found. So similarly to the ConceptNode, you will find the start, end and token attributes
|
||||
|
||||
That's all for today !
|
||||
Reference in New Issue
Block a user