1104 lines
35 KiB
ReStructuredText
1104 lines
35 KiB
ReStructuredText
.. toctree::
|
|
:maxdepth: 1
|
|
|
|
concepts
|
|
rules
|
|
parsers
|
|
persistence
|
|
|
|
|
|
2019-10-30
|
|
**********
|
|
|
|
What is Sheerka ?
|
|
"""""""""""""""""
|
|
|
|
Sheerka is a *communication* language,
|
|
as opposed to the traditional *programming* languages. Its
|
|
purpose is to ease the communication between the (wo)man and the machine,
|
|
ultimately using the voice. I will first use it to program faster, and maybe
|
|
more easily.
|
|
|
|
.. _ulysse31: https://fr.wikipedia.org/wiki/Ulysse_31
|
|
|
|
Where does the name Sheerka came from ?
|
|
"""""""""""""""""""""""""""""""""""""""
|
|
Sheerka is my misspell of Shyrka, from my childhood anime ulysse31_.
|
|
For those you don't know this old cartoon, it's the Odyssey story from Homer,
|
|
ported in the 31st century. Ulysses has a spacecraft with an AI named Shyrka
|
|
|
|
I was a great fan of this cartoon when I was young. I thought that the idea of
|
|
bringing the ancient story of Ulysses in the future was bright.
|
|
|
|
Ever since then, Sheerka was my reference for any sophisticated computer. Unfortunately
|
|
for me, at that time there was no wikipedia to tell the the correct spelling.
|
|
|
|
Model v0
|
|
""""""""
|
|
In my view, the beginning of everything are the **Events**. Basically, they are the commands (ie requests)
|
|
entered by the users.
|
|
|
|
The events are parsed, to understand what is required, so they produce a new **State**.
|
|
The state is a like a big dictionary that holds everything that is known by the system.
|
|
|
|
Most of the elements saved in the **State** are the **Concepts**. In this first version,
|
|
it's a little bit complicated to define what is the **Concept** as it can have several
|
|
usages. To make it simple, I will say that a **Concept** is an idea that can be
|
|
manipulated by the rest of the system.
|
|
I am pretty sure that its form and usage will evolve as I will manipulate
|
|
them
|
|
|
|
- Each **State** has a reference to the event(s) that trigger this state
|
|
- Each **State** has an **history**
|
|
- Each **Concept** has an **history**
|
|
|
|
|
|
An **history** is a triplet of
|
|
|
|
- user name
|
|
- modification date
|
|
- digest of the parent
|
|
|
|
.. _git: https://git-scm.com/
|
|
|
|
Personally, i have taken this way of tracking modification from how it's done on git_,
|
|
I guess Linux Torvalds took it from somewhere.
|
|
|
|
|
|
|
|
|
|
About versioning of the information
|
|
"""""""""""""""""""""""""""""""""""""
|
|
As I said previously, I mimic how git_ versions its objects.
|
|
|
|
::
|
|
|
|
Obj v0 : parents = []
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = xxxxx
|
|
|
|
Obj v1: parents = [xxxxx]
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = yyyyy
|
|
|
|
Obj v1: parents = [yyyyy]
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = zzzzz
|
|
|
|
and so on...
|
|
|
|
I always keep a reference to the last version of the object, so I can navigate through
|
|
the versions using the :code:`parents` attribute of the object
|
|
|
|
In git_, there are basically two types of objects :
|
|
|
|
- **content** (file content, or directory structure)
|
|
- **reference** to content (commit or tags)
|
|
|
|
The hash a **content** only depends on it, while the hash of a **reference** also depends
|
|
on the user name, the modification date and the parents. In both cases, the hash is
|
|
computed on the whole object. So the hash can also be used to check the integrity
|
|
of an object.
|
|
|
|
For my objects, I need to decide how I compute the hash.
|
|
|
|
**Concepts** have history, if I decide to include the history in the hash,
|
|
as the modification date is :code:`datetime.now()`, a new version will be created
|
|
even if the **Concept** has not changed. If I don't include it, the integrity of the
|
|
what is saved is no longer guaranteed.
|
|
|
|
I choose to value identity over integrity. The hash code of the **Concepts** does not depend
|
|
on his history. We will see what the future will say about this.
|
|
|
|
2019-11-01
|
|
**********
|
|
|
|
Inspired by CodinGames
|
|
""""""""""""""""""""""
|
|
|
|
|
|
.. _codingame: https://www.codingame.com/home
|
|
|
|
I am trying to teach my little kid how to code. He is 12 years old and it was his very
|
|
first time.
|
|
|
|
Rather than trying a standard formal approach, we went on the codingame_ web site. There
|
|
are some pro and cons to use this platform, specially for the very beginners, but
|
|
I like the visual output of the programs. It's really like coding a game !
|
|
|
|
What I haven't noticed previously, is that (at least for the first programs), the solution
|
|
is given in human language.
|
|
|
|
For example, for the exercise called "The descent" you will find
|
|
|
|
::
|
|
|
|
For each round of play :
|
|
Reset the variables containing the index of the highest mountain and its height to 0
|
|
For each mountain index (from 0 to 7 included) :
|
|
Read the height of the mountain (variable 'mountainH') from stdin
|
|
If it's higher than the highest known mountain, save its index and height
|
|
Returns the index of the highest mountain on stdout
|
|
|
|
It will be great if Sheerka is able to produce some code from these instructions :-)
|
|
|
|
|
|
|
|
|
|
SheerkaDataProvider (sdp)
|
|
"""""""""""""""""""""""""
|
|
Not I great name, I confess. But who care ?
|
|
|
|
What are the main design constraints?
|
|
|
|
::
|
|
|
|
1. No adherence with the filesystem.
|
|
We must not care about where the data are stored.
|
|
The first implementation will be file based, but it has to be extensible.
|
|
The final target will be to have a decentralized persistence system
|
|
2. CRUD operations are designed according to my needs
|
|
I don't want standard CRUD operations that I will have tweak.
|
|
The direct consequence is that sdp won't fit any other purpose
|
|
3. History management for State and other objects for free.
|
|
|
|
|
|
sdp, like many modern database systems, is a dictionary. A big list of key-value pairs.
|
|
The key is a string, the value can be almost anything. Actually, for my needs, I guess
|
|
that I only need strings, numbers and list (of strings and numbers :-)
|
|
|
|
Json also provide, true, false and null. So I guess that I will also need them.
|
|
|
|
I need at least one level of categorization. That means that my objects can be grouped.
|
|
The basic signature to add a new element :code:`add(entry, obj)`.
|
|
|
|
with
|
|
|
|
::
|
|
|
|
entry : is the group / category where I want to put the object
|
|
object : object to persist
|
|
|
|
With :code:`add("All_Concepts", "foo")` the database, let's call it **State** once for all, will be updated like this:
|
|
|
|
.. code-block:: json
|
|
|
|
{"All_Concepts" : "foo"}
|
|
|
|
If I want to have another entry, I don't want to care about what was previously done. I
|
|
need the second call :code:`add("All_Concepts", "bar")` to produce
|
|
|
|
.. code-block:: json
|
|
|
|
{"All_Concepts" : ["foo", "bar"]}
|
|
|
|
|
|
So we are no longer in the usual way of implementing a CRUD.
|
|
|
|
|
|
|
|
2019-11-06
|
|
**********
|
|
|
|
Input processing
|
|
"""""""""""""""""
|
|
The basic processing flow should be
|
|
|
|
::
|
|
|
|
1. parsers
|
|
2. evaluators
|
|
3. printers
|
|
|
|
So, for each new input, all known parsers will try to recognize the input. Each parser will
|
|
return a triplet of :code:`(status, concept found (or node found), text message)`
|
|
|
|
This list of triplet is given to the evaluators. In the same way, there should be multiple
|
|
types of evaluators. There will be the rules that will be introduced later.
|
|
|
|
All evaluators will provide a list (a guess it will be triplets as well) to the printers.
|
|
|
|
Python processing
|
|
"""""""""""""""""
|
|
Sheerka natively understand Python. So it will be able to execute Python code.
|
|
I will manage later on the issues caused by the different version of Python, or the fact
|
|
that some external modules must remain isolated (maybe using virtualenv)
|
|
|
|
My first problem is to correctly implement the :code:`eval / exec` function.
|
|
|
|
I don't know why, by Python has two similar function to do the same thing. One must use
|
|
eval to evaluate expression, or use exec to execute code. There must be an explanation but,
|
|
as for know, it seems to be a complication for nothing.
|
|
|
|
The next issue that I will have to tackle is that Sheerka is not a REPL. After the execution
|
|
of the input, the system stops. Nothing is kept in memory (eg RAM).
|
|
The whole idea is to make Sheerka 'remember', even something that happened a long time ago.
|
|
So I should find a way to 'freeze the time'
|
|
|
|
To better explain what I have in mind. let's say that I want to pretty print an object
|
|
|
|
.. code-block:: python
|
|
|
|
import pprint
|
|
pp = pprint.PrettyPrinter(indent=4)
|
|
pp.pprint(stuff)
|
|
|
|
I need three line in oder to be able to pretty print. I will first try by dumping the
|
|
globals(), using pickle and load it back whenever needed.
|
|
|
|
If it does not work as expected, I can find a way to save the commands a exec everything
|
|
when needed. (first time, I exec import... second time I exec import + pp == and the last
|
|
time I exec the three statements).
|
|
|
|
2019-11-07
|
|
**********
|
|
|
|
Back on data persistence
|
|
"""""""""""""""""""""""""
|
|
Last time, I talked on how to add new entries in the **State**. I only need the name of
|
|
the category, on the object. If I add several objects under the same entry,
|
|
they don't override each other, they are kept as a list.
|
|
|
|
.. code-block:: python
|
|
|
|
add("All_Concepts", "foo")
|
|
add("All_Concepts", "bar")
|
|
|
|
will produce something like
|
|
|
|
.. code-block:: json
|
|
|
|
{"All_Concepts" : ["foo", "bar"]}
|
|
|
|
The reason behind this chose is that, in the human world, the same name can refer to
|
|
several concepts. The first obvious cases are the synonyms. Same word, but different
|
|
meaning. There are also some other case where the meaning of the world depend on the context.
|
|
Rather than forcing the user to spend some time to find another way to express the concept,
|
|
(as the name already exists), I prefer allow the storage under the same key.
|
|
The choice of the correct item to use in the list will be done on execution.
|
|
|
|
I also need sdp to manage the key of my object. So 'entry' will be used to group object,
|
|
and the key will help to quick access to it.
|
|
|
|
I don't want the signature :code:`add(entry, key, object)` because sometimes there is a key,
|
|
but keys are not mandatory. So I keep the signature :code:`add(entry, object)`
|
|
|
|
To manage the key, the object either is a key/value entry :code:`{key: value}` (Python dict) or
|
|
has an attribute :code:`key`, or has a method :code:`get_key()`
|
|
|
|
For example **Concepts** have a method :code:`get_key()`, so if the key of 'concept' is "foo",
|
|
the code
|
|
|
|
.. code-block:: python
|
|
|
|
add("All_Concepts", concept)
|
|
|
|
will produce something like
|
|
|
|
.. code-block:: python
|
|
|
|
{"All_Concepts" : {"foo" : concept}}
|
|
|
|
If I add another concept (concept2) which has tke key "bar", I will have
|
|
|
|
.. code-block:: python
|
|
|
|
{"All_Concepts" : {"foo" : concept, "bar": concept2}}
|
|
|
|
and so on..
|
|
|
|
So under the 'All_Concepts' group, I have a quick access to the concept "foo"
|
|
|
|
Note that, if for some reason, I end up with several concepts this the same key, they will
|
|
be just stack as list. I don't loose information.
|
|
|
|
We will talk again about sdp later
|
|
|
|
Status
|
|
""""""
|
|
As of today, I have a first implementation of several main functionalities of the system
|
|
|
|
|
|
1. I have a good implementation of sdp
|
|
* When I say good, I talk about the coverage of the functionalities, not the efficiency of the code
|
|
* I can add object to the state
|
|
* The objects can be saved as reference (will be explained later)
|
|
* I manage events
|
|
* I manage history
|
|
* I manage several types of serialisation
|
|
2. I have two parsers
|
|
* DefaultParser : to detect sheerka specific language (like def concept)
|
|
* PythonParser : to parse Python code.
|
|
* There are called for every new event.
|
|
3. I have a first version of the evaluators
|
|
* These have piece of code that recognize a result and process it
|
|
* The current algo is not finished, but it works for simple cases
|
|
* I can create a new concept
|
|
* I can evaluate simple Python expression
|
|
4. I don't have the printers, but it's ok, I just dump the result of processing
|
|
|
|
so I can type
|
|
|
|
::
|
|
|
|
def concept hello name as "hello" + name
|
|
1 + 1
|
|
sheerka.test()
|
|
|
|
I will now work on how to call an already defined concept.
|
|
|
|
|
|
2019-11-11
|
|
**********
|
|
|
|
Maintaining the blog
|
|
""""""""""""""""""""
|
|
It's not very easy to maintain this blog. Every time I have some time to work on **Sheerka**,
|
|
I must choose between expressing my ideas in this blog and coding.
|
|
|
|
I have plenty of ideas that I would like to express, sometimes just to put the idea down,
|
|
but I lack of time. It would be great if I can find a tool that will allow me to just to
|
|
dictate my words. I know that there are plenty out there, I need to spend some time to test
|
|
them and choose one.
|
|
|
|
2019-11-15
|
|
**********
|
|
|
|
Managing concepts resolutions
|
|
"""""""""""""""""""""""""""""
|
|
I am a little stuck on the algorithm I must use to derive (resolve) concepts. This is
|
|
one of this day I strongly regret to have someone I can discuss with :-(
|
|
|
|
Let's write the problem down, sometimes, it helps figure out the best approach.
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
one
|
|
|
|
The concept is first define (it returns the number 1), and then it's called.
|
|
During the call
|
|
|
|
1. During parsing,
|
|
Both Python parser and concept parser will recognize 'one'
|
|
2. During Evaluation,
|
|
* Python Evaluator will fail (one is not know by python)
|
|
* Concept Evaluator will success. My question is what should it return ?
|
|
|
|
The two option are:
|
|
1. Python node, to let the Python Evaluator work and return one, in the next row
|
|
2. Returns '1' directly
|
|
|
|
I as write it down, it is obvious that it must return 1, since the purpose of any
|
|
evaluation is to give a result, not the path to find the result.
|
|
|
|
Plus, if don"t resolve the body in the Concept Evaluator, I will loose where the
|
|
'1' comes from.
|
|
|
|
I don't know if I was clear. I don't even know if I will be able to re-read myself.
|
|
But I think that I have my solution.
|
|
|
|
|
|
2019-11-16
|
|
**********
|
|
|
|
ExactConceptParser limitation
|
|
"""""""""""""""""""""""""""""
|
|
|
|
From the beginning, my simplest example is to show that addition can be simply
|
|
explained to Sheerka
|
|
|
|
::
|
|
|
|
def concept a plus b as a + b
|
|
def concept one as 1
|
|
def concept two as 2
|
|
one plus two
|
|
|
|
The :code:`one plus two` is perfectly recognized, and the result is 3.
|
|
:code:`two plus one` also work (with the correct response).
|
|
|
|
But I was quite surprised to see that :code:`one plus one` was not recognized !!
|
|
|
|
Indeed, the **ExactConceptParser** looks for :code:`__var0__ plus __var1__`. So
|
|
the first operand and the second have to be different.
|
|
|
|
It's unexpected :-(
|
|
|
|
Do I need to enhance the parser to recognize it, or no I need to build another parser ?
|
|
|
|
If I tell the parser that :code:`a plus b`, how do I handle the cases where 'a 'and 'b'
|
|
MUST be different ? How I handle when the explicitly have to be the same ?
|
|
|
|
I seems that the purpose of the **ExactConceptParser** is to find exact match.
|
|
I need another way to express that 'a' and 'b' can be the same.
|
|
|
|
2019-11-21
|
|
**********
|
|
|
|
MemoryFS, is it a joke ?
|
|
"""""""""""""""""""""""""""""
|
|
|
|
I spent this day working on a improving the test performances. By default Sheerka
|
|
persists its data on the file system (even if I said that where the data is saved)
|
|
is not important for the sdp module.
|
|
|
|
For each test, a folder in initialized to hold concepts information. And this folder
|
|
is destroyed after usage. For almost every single test !
|
|
|
|
So I decided to implement fs.MemoryFS. Information in memory is supposed to be
|
|
faster than on the disk !
|
|
|
|
I was very disappointed, after a afternoon of refactoring that it is actually slower
|
|
than the native io implementation.
|
|
|
|
Even now that I am writing it, I just can't believe it. I must I have implemented
|
|
it wrong. But the profiling shows that the time is lost in the under layers of the
|
|
FS library.
|
|
|
|
It's a shame !
|
|
|
|
2019-12-01
|
|
**********
|
|
|
|
Using BNF to define concept
|
|
"""""""""""""""""""""""""""""
|
|
|
|
I always knew that there will be several ways to define the body of a concept (same
|
|
goes for the 'pre', 'post' and 'where' parts). It can be defined as Python code,
|
|
or something that is related to concepts. It can even be a new language that I will
|
|
design. The important point, is that contrarily to traditional development languages,
|
|
Sheerka must remain extensible.
|
|
|
|
Same goes for the definition of the name.
|
|
|
|
The traditional form is:
|
|
|
|
::
|
|
|
|
def concept boo bar baz as ...
|
|
|
|
So the concept is defined by the sequence 'foo', then 'bar' then 'baz'. In this order.
|
|
|
|
Another way is
|
|
|
|
::
|
|
def concept a plus b where a,b as ...
|
|
|
|
In this form, a and b are supposed to be variables.
|
|
It will be matched against :code:`one plus two`.
|
|
|
|
The concept name is 'a plus b'. It is a quick way to declare a concept with variable,
|
|
but if someone define another concept
|
|
|
|
::
|
|
|
|
def concept number1 plus number2 where number1,number2 as ...
|
|
|
|
This will produce another concept (with the same key although). I guess that, at
|
|
some point, Sheerka will be able to detect that the concepts are the same, but
|
|
the name of the concept includes its variables. Which may be annoying in some
|
|
situations.
|
|
|
|
Plus, it's not possible to define rules precedences in this way. For example,
|
|
|
|
::
|
|
|
|
def concept a plus b as ...
|
|
def concept a times b as ...
|
|
|
|
How do you express that multiplications have a higher priority in for example
|
|
:code:`one plus two times three` ?
|
|
|
|
The only right answer, at least to me, is to implement something that is inspired
|
|
by the BNF definition of a grammar.
|
|
|
|
So the definition of the concept will look like
|
|
|
|
::
|
|
|
|
def concept term as factor (('+' | '-') term)?
|
|
def concept factor as number (('*' | '/') factor)?
|
|
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
|
|
|
|
This form seems great, but in the definition of term and factor, there is no more
|
|
room for the real body. ie once the components are recognized, what do we do with them ?
|
|
|
|
So we can try
|
|
|
|
::
|
|
|
|
def concept factor (('+') factor)* as factor[0] + factor[i]
|
|
def concept number (('*') number)? as number[0] + number[i]
|
|
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
|
|
|
|
The body is defined, but the name of concept is to complicated ex: factor (('+') factor)*
|
|
It's quite impossible to reference a concept that is defined in this way.
|
|
|
|
So my last proposal, with marry the two ideas, is to introduce the two keyword 'using' 'bnf'
|
|
|
|
.. _bnf : https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
|
|
|
|
::
|
|
|
|
def concept term using bnf factor (('+' | '-') term)? as factor + (or -) term
|
|
def concept factor using bnf number (('*' | '/') factor)? as number * (or /) factor
|
|
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
|
|
|
|
In my implementation:
|
|
|
|
* Terminals are between quotes
|
|
* Sequences are separated by whitespaces
|
|
* '|' (vertical bar) is used for alternatives
|
|
|
|
Like in regular expressions, you will also find
|
|
|
|
* '*' (star) is used to express zero or many
|
|
* '+' (plus) to express one or many
|
|
* '?' (question mark) to expression zero or one
|
|
|
|
For those who doesn't know that BNF stands for, please have a look at the bnf_
|
|
wikipedia page.
|
|
|
|
I guess that I will need a complete chapter to explain how you retrieve what was parsed
|
|
|
|
2019-12-21
|
|
**********
|
|
|
|
Implementing Inheritance
|
|
""""""""""""""""""""""""
|
|
|
|
Except that it is not inheritance, at least the way it is seen in modern programing languages.
|
|
|
|
I think that I should first express what I am trying to do. I guess that it will help me
|
|
have a better understanding myself.
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
def concept two as 2
|
|
one is a number
|
|
two is a number
|
|
|
|
When I enter :code:`one`, the result should be :code:`1`
|
|
|
|
But I should be able to express other concepts by using
|
|
|
|
::
|
|
|
|
def concept a plus b where a is a number and b is a number as a + b
|
|
|
|
Just by reading what I have just written, we can see that 'is a' has two separate meanings.
|
|
In the first usage, it's an affirmation, in the latter one, it's a question.
|
|
|
|
Should we consider them as the same concept, with two usages, or as two separate concepts,
|
|
which are somehow linked ?
|
|
|
|
As of now, there is only one usage to all concepts, which is the property 'BODY', but I have
|
|
prepared the property 'PRE' which can be used for that.
|
|
|
|
I am a little bit making a digression. The original subject was on how I can express that a
|
|
concept is an element of another concept. We may focus on the implementation later.
|
|
|
|
So saying that 'one' is a 'number' means that there is a set called 'number'
|
|
in which 'one' belong.
|
|
|
|
The simple implementation will be to create an entry 'all_number' in sdp and to add 'one' in it.
|
|
The two issue that I foresee are:
|
|
|
|
* What about infinite sets ? (my set 'number' can never be completed if I put the item one by one)
|
|
* What if the same name refers to different set (I don't have any example in mind, but I guess that synonyms of sets do exist)
|
|
|
|
|
|
For the two questions, I will first try the simple implementations and see there I go from there. ie :
|
|
|
|
* on the top of the entry all_numbers which lists the known numbers, you can define concepts :code:`is a number`
|
|
that can be also used to detect the the concept is part of the set
|
|
* the entry in sdp will not be all_number, but all_id_of_number. I will use the concept id instead of its name
|
|
|
|
|
|
2019-24-12
|
|
**********
|
|
|
|
Going back on BNF implementation. As it's Christmas eve today, I won't stay very long.
|
|
|
|
So, the implementation lies in the class BnfNodeParser, a it's a lexer not for token, but for concept.
|
|
The purpose of this class is to recognize a sequence of Concept.
|
|
|
|
So if we defines the following concepts
|
|
|
|
::
|
|
|
|
def concept foo from bnf one two three
|
|
def concept bar form bnf four five
|
|
|
|
when you input
|
|
|
|
::
|
|
|
|
one two three four five
|
|
|
|
the list of :code:`[foo, bar]` will be returned by the BnfNodeParser (as return values)
|
|
|
|
How does it works ?
|
|
|
|
As explained in the code, my implementation is highly inspired by Arpegio project. To define your grammar, you
|
|
use **ParsingExpressions**. There are several types
|
|
|
|
* some use to recognize tokens StrMatch, ConceptExpression
|
|
* other use to tell how to recognize Sequence, OrderedChoice, Optional, OneOrMore, ZeroOrMore...
|
|
|
|
Some example :
|
|
|
|
::
|
|
|
|
to recognize 'foo' -> StrMatch("foo')
|
|
to recognize 'foo bar' -> Sequence(StrMatch("foo'), StrMatch("bar'))
|
|
to recognize 'foo' or 'bar' -> OrderedChoice(StrMatch("foo'), StrMatch("bar'))
|
|
|
|
and so on...
|
|
|
|
So when a concept is defined using its bnf definition, I use the **BnfParser** to create the grammar, and then
|
|
I use the **BnfNodeParser** to recognize the concepts
|
|
|
|
The current implementation to recognize a concept is not very efficient. All the definitions are in a dictionary
|
|
and I go thru the whole dictionary to see if some concepts are recognized. Once a concept is found, I loop again
|
|
on the whole dictionary to find the next concept.
|
|
|
|
| -> I need a btree to order the concept
|
|
| -> I need a predictive algorithm to guess the next concept
|
|
|
|
But it is for later.
|
|
|
|
So once the parsing is effective, I return a **ConceptNode** object
|
|
|
|
.. code-block:: python
|
|
|
|
class ConceptNode(LexerNode):
|
|
"""
|
|
Returned by the BnfNodeParser
|
|
It represents a recognized concept
|
|
"""
|
|
|
|
def __init__(self, concept, start, end, tokens=None, source=None, underlying=None):
|
|
super().__init__(start, end, tokens, source)
|
|
self.concept = concept
|
|
self.underlying = underlying
|
|
|
|
if self.source is None:
|
|
self.source = BaseParser.get_text_from_tokens(self.tokens)
|
|
|
|
|
|
concept
|
|
| Remember that all grammars are listed in a dictionary of <Concept, ParsingExpression>.
|
|
| So when a parsing expression is verified, it's easy to link it with the concept
|
|
start
|
|
position first of the token
|
|
end
|
|
position of the last token
|
|
tokens
|
|
list of tokens that are recognized
|
|
underling
|
|
**NonTerminalNode** or **TerminalNode** that wraps the underlying **ParsingExpression** used to recognize the concept
|
|
source
|
|
| The source is deduced from the tokens
|
|
| But in the unit tests, they are directly given for speed up and simplicity
|
|
|
|
What is the difference between the **[Non]TerminalNode** and the **ParsingExpression** ?
|
|
|
|
The ParsingExpression
|
|
defines how to recognize a concept
|
|
|
|
The [Non]TerminalNode
|
|
represents what was found. So similarly to the ConceptNode, you will find the start, end and token attributes
|
|
|
|
That's all for today !
|
|
|
|
2019-27-12
|
|
**********
|
|
|
|
How to manage variables resolutions
|
|
"""""""""""""""""""""""""""""""""""
|
|
|
|
I have to admit that I am a little bit stuck with how to manage variable resolution with PythonEvaluator.
|
|
What is expected by the expression depends on the expression itself.
|
|
|
|
Let's see an example
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
def concept two as 2
|
|
|
|
eval one + two
|
|
|
|
In this situation, I expect PythonEvaluator to resolve the concepts 'one' and 'two' and to return 1 + 2, hence 3
|
|
|
|
In this other situation
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
def concept desc a as sheerka.desc(a)
|
|
desc one
|
|
|
|
I expect Python evaluator NOT to resolve the concept one and to pass it strait to the function.
|
|
|
|
Unfortunately for me, in the current implementation. 'a' is resolved to the concept 'one', which is resolved to its
|
|
body "1". So the call failed, as there is not concept 1 (moreover, 1 is an integer, it's not even the string "1").
|
|
|
|
There also be some cases where 'sheerka.desc()' expects the name of a concept (and the resolution of the concept
|
|
will be done inside the function). In this case, it's not the body nor the concept itself that is required, but the name
|
|
of the concept.
|
|
|
|
So here are three cases where the behaviour of PythonEvaluator is required to be different. I cannot hard code theses
|
|
behaviours as they depend on the context.
|
|
|
|
The global idea, to resolve this situation is to give to Sheerka a memory. What I am currently working on is the possibility
|
|
**to create** and **to recognize** concepts. As a recall :
|
|
|
|
You can create simple concepts
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
|
|
or concept using bnf
|
|
|
|
::
|
|
|
|
def concept twenties from bnf twenty (one | two | three...)=unit as 20 + unit
|
|
|
|
|
|
Both can be recognised.
|
|
But if I define
|
|
|
|
::
|
|
|
|
def a plus b as a + 1
|
|
|
|
|
|
:code:`one + two` will be recognized but twenty two plus one is not correctly implemented yet.
|
|
|
|
To go back on my issue with the variables resolutions with PythonEvaluator, the idea is to implement rules that will
|
|
recognize the concept, so you will tell Sheerka if the value, the concept or the name is expected.
|
|
|
|
I am far from implementing the rules. To be honest, I don't even know now how they will look like.
|
|
|
|
So I am going to introduce the keyword :code:`concept:name:` or :code:`c:name:`
|
|
|
|
It will means that the concept is required.
|
|
|
|
If the name is required, you can use :code:`"'name'"` or :code:`'"name"'`.
|
|
It's already working. There is nothing to do for this one.
|
|
|
|
2020-07-01
|
|
**********
|
|
|
|
How do we perform the parsing ?
|
|
"""""""""""""""""""""""""""""""
|
|
|
|
The basic flow of an execution is :
|
|
|
|
* Parse the data -> Nodes
|
|
* Evaluate the nodes -> Concepts
|
|
* Display the results
|
|
|
|
The theories says that there can exist as many parsers as necessary. Each one of them will
|
|
be specialized to recognize a specific pattern. They will then send there information to
|
|
the evaluators.
|
|
|
|
As of now, I have implemented the following parsers:
|
|
|
|
* EmptyStringParser
|
|
To recognize empty strings and react accordingly
|
|
|
|
* PythonParser
|
|
To recognize Python source code
|
|
|
|
* ExactConceptParser
|
|
To recognize simple form of concepts
|
|
|
|
* DefaultParser (the name is not accurate)
|
|
To recognize builtin syntax (like 'def concept' or 'isa')
|
|
|
|
* BnfNodeParser
|
|
To recognize concept defined with BNF language
|
|
|
|
All theses parsers are executed in the row (the order in not very important)
|
|
|
|
The first observation is that there is lot of CPU waste. Most of the time (at least as of
|
|
now, when a there is a match with one parser, the others fail). So there is no need to
|
|
execute them.
|
|
|
|
The second point is that there is now way for a parser to use the result of another.
|
|
My idea is to have parsers that can be chained, each one of them will do the little thing
|
|
it is capable of before leaving the rest to some more powerful parser.
|
|
|
|
I don't want to bring out the big guns for every single user input. And I certainly
|
|
don't want a massive and over complex parser that will be capable (in theory) of everything
|
|
|
|
Why ?
|
|
|
|
| First of all, monolithic code is bad :-)
|
|
| Then I have to keep in mind that the process will be somehow distributed
|
|
| And last, but not least. I don't have (and I certainly will never have) the full completion
|
|
of all possible parsing situation. So what I need is a plug and play system where I can add
|
|
and remove and chain parsers, depending of the input.
|
|
|
|
So,
|
|
|
|
* I'll give all parsers a priority
|
|
* The parsers with the highest priority will be executed first
|
|
* The parsers with the same priority will be executed at the same time (The order does matter)
|
|
* If, for a given priority there is a match, the parser with a lower priority won't be executed
|
|
* A parser has access to the output of the parsers of higher priorities (which were executed before it)
|
|
|
|
2020-01-11
|
|
**********
|
|
|
|
Status
|
|
""""""
|
|
|
|
Last status was back in October. At that time I could
|
|
|
|
::
|
|
|
|
def concept hello name as "hello" + name
|
|
1 + 1
|
|
sheerka.test()
|
|
|
|
1. I can evaluate concepts
|
|
|
|
::
|
|
|
|
def concept hello a where a
|
|
hello kodjo
|
|
|
|
2. I have worked on BNF definition of the concept
|
|
|
|
::
|
|
|
|
def concept twenties from bnf 'twenty' (one | two | three)=unit as 20 + unit
|
|
twenty one
|
|
eval twenty one
|
|
|
|
3. I can mix complex concepts (concepts with more than one word) and Python
|
|
|
|
::
|
|
|
|
twenty one + twenty two
|
|
twenty one + one does not work :-(
|
|
|
|
|
|
4. I have a basic implementation for logging. With control of the verbosity
|
|
|
|
5. The result of an user input evaluation is now persisted, alongside with the event
|
|
that was used for it.
|
|
|
|
|
|
|
|
2020-04-18
|
|
**********
|
|
|
|
Blog
|
|
""""""
|
|
|
|
It's been a (very) long time since I have written in this blog.
|
|
|
|
The main reason is that I found reStructured markup too complicated. I'm still not used to how directives are
|
|
supposed to work. There are so many way to do the same thing !
|
|
|
|
I guess that it's also because I don't have the proper tool to write this doc.
|
|
I use PyCharm and thought a have the basic rendering, I cannot easily navigate between
|
|
the articles
|
|
|
|
In need to install Sphinx. I want it in a docker. For sure it's not mandatory, but I'm must practice my
|
|
docker skill if I don't want to forget everything
|
|
|
|
Parsers
|
|
"""""""
|
|
As I keep repeating, parsing expression is a very big part of what I want to achieve (alongside with the
|
|
rule engine and the speech recognition)
|
|
It as to be very easy to expression a new concept
|
|
|
|
::
|
|
|
|
def concept one as 1
|
|
def concept two as 2
|
|
|
|
That's it !
|
|
I should now can do
|
|
|
|
::
|
|
|
|
one + one
|
|
one + two
|
|
|
|
Now, I can decide that plus is also a concept
|
|
|
|
::
|
|
|
|
def concept a plus b as a + b
|
|
|
|
So basically, every time Sheerka will parse something 'plus' something else, it will recognize the concept a plus b
|
|
|
|
::
|
|
|
|
one plus two
|
|
|
|
worked, but
|
|
|
|
::
|
|
|
|
one plus one
|
|
|
|
doesn't. Because 'a' and 'b' are two different letters, so it was looking for two different values. That was
|
|
an unexpected side effect of my first naive implementation.
|
|
|
|
Let's put that aside for the moment and keep on our exercise to model the world.
|
|
|
|
After an addition, it will be good to have the multiplication. Easy
|
|
|
|
::
|
|
|
|
def concept a mult b as a * b
|
|
|
|
So I can try
|
|
|
|
::
|
|
|
|
one plus two mult three
|
|
|
|
Of course, this one does now work by magic. The precedence (priority ?) between addition and multiplication
|
|
was not respected.
|
|
|
|
.. _bnf: https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
|
|
|
|
The first idea was the bnf_ parser in order to be able to write something like
|
|
|
|
::
|
|
|
|
def concept plus from bnf mult ('plus' mult)*
|
|
def concept mult from bnf number ('mult' number)*
|
|
def concept number from bnf one|two
|
|
|
|
Expressing recursive concepts was simple. I was proud of this implementation :code:`one plus two mult three`
|
|
was understood in the correct way.
|
|
|
|
|
|
But it started to become complicated when I wanted to define the body. In ':code:`mult (plus mult)*`' where
|
|
is the left part? where is the right part ?
|
|
|
|
Ok, let's try something like
|
|
|
|
::
|
|
|
|
def concept plus from bnf mult=a ('plus' mult)*=b
|
|
def concept mult from bnf number=a ('mult' number)*=b
|
|
def concept number from bnf one|two
|
|
|
|
We now have 'a' for the left part, and a potential list of 'b' for the right part.
|
|
The full definition of the concept :code:`plus` will look like
|
|
|
|
::
|
|
|
|
def concept plus from bnf mult=a ('plus' mult)*=b as:
|
|
res = a
|
|
for value in b:
|
|
res += value
|
|
return res
|
|
|
|
This should work fine. In my current implementation, 'a' is an instance of the concept 'mult', correctly
|
|
initialized with concept one or a concept two, and likewise 'b' is a list of concept 'mult'.
|
|
|
|
So it should work.
|
|
|
|
It's just that I have never been this far in the tests. I just couldn't. THIS IS WAY MORE TOO COMPLICATED
|
|
TO DEFINE A SIMPLE ADDITION !!!
|
|
|
|
**Note** that you must have quote surrounding the 'plus' in the definition, to make the difference between
|
|
the concept and the literal. It's necessary, but when you start to do that, you start to narrow the usage
|
|
of your system to developers only. So, even if there is no other way, I didn't really liked that.
|
|
|
|
.. _IronPython: https://ironpython.net/
|
|
.. _parsec: https://github.com/jparsec/jparsec
|
|
.. _Holy Grail: //www.youtube.com/watch?v=YxG5mDItkGU
|
|
.. _one: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
|
|
|
|
So I am done ? Is this the end ? There should be another way to express the priority (precedence ?) between the concept.
|
|
|
|
Luckily for me, I remembered that I have once seen a implementation of the Python parser (IronPython_ I think) were they
|
|
used numbers to evaluate the precedence between additions and multiplications. And there were also something
|
|
like that when I used parsec_ parser.
|
|
|
|
So I went back on internet and found my `Holy Grail`_, well not this one, this one_.
|
|
|
|
**The Shunting Yard Algorithm**
|
|
|
|
I took me a few days to understand it and implement it in its basic form (which a already too long),
|
|
but it took me one entire month to adapt it to the concepts. I know, I am not quick :-)
|
|
|
|
As a matter of fact, the sya (Shunting Yard Algorithm) is designed for binary operators and functions where the number
|
|
of arguments is known. You can support unary operators, but there is nothing explained to ternary and more.
|
|
Dealing with concepts that can be expressed as :code:`'foo a b'` (suffixed concept) or :code:`'a b bar'`
|
|
(prefixed concept) was a interesting challenge!
|
|
|
|
Anyway, I am now in position where I can simply define my addition and my multiplication
|
|
|
|
::
|
|
|
|
> def concept a plus b as a + b
|
|
> def concept a mult b as a * b
|
|
> eval one plus two mult three
|
|
> 7
|
|
|
|
That's it !
|
|
|
|
At least in theory. The definition and the parsing of the concepts is done and fully tested when you
|
|
programmatically set the precedences, I now need a way to define/express the priorities
|
|
|
|
What I surely don't want is to write something like:
|
|
|
|
::
|
|
|
|
plus.precedence = 1
|
|
mult.precedence = 2
|
|
|
|
or
|
|
|
|
::
|
|
|
|
set_precedence(plus, 1)
|
|
set_precedence(mult, 2)
|
|
|
|
Any solution where you have to give the actual value of the precedence is a bad solution. I would like to
|
|
have something like
|
|
|
|
::
|
|
|
|
precedence mult > precedence plus
|
|
|
|
or
|
|
|
|
::
|
|
|
|
mult.precedence > plus.precedence
|
|
|
|
|
|
It means that I now have to implement a partitioning algorithm with simple constraints (<, >). I think that I will
|
|
include <=, >=, = and != as well, once for all. Sorting things according to these constraints is something
|
|
human naturally do.
|
|
|
|
|
|
2020-05-01
|
|
**********
|
|
|
|
Blog
|
|
""""""
|
|
Hi, I have the feeling that I am almost there with the parsers part. I have
|
|
|