.. toctree::
:maxdepth: 1
concepts
rules
parsers
persistence
2019-10-30
**********
What is Sheerka ?
"""""""""""""""""
Sheerka is a *communication* language,
as opposed to the traditional *programming* languages. Its
purpose is to ease the communication between the (wo)man and the machine,
ultimately using the voice. I will first use it to program faster, and maybe
more easily.
.. _ulysse31: https://fr.wikipedia.org/wiki/Ulysse_31
Where does the name Sheerka came from ?
"""""""""""""""""""""""""""""""""""""""
Sheerka is my misspell of Shyrka, from my childhood anime ulysse31_.
For those you don't know this old cartoon, it's the Odyssey story from Homer,
ported in the 31st century. Ulysses has a spacecraft with an AI named Shyrka
I was a great fan of this cartoon when I was young. I thought that the idea of
bringing the ancient story of Ulysses in the future was bright.
Ever since then, Sheerka was my reference for any sophisticated computer. Unfortunately
for me, at that time there was no wikipedia to tell the the correct spelling.
Model v0
""""""""
In my view, the beginning of everything are the **Events**. Basically, they are the commands (ie requests)
entered by the users.
The events are parsed, to understand what is required, so they produce a new **State**.
The state is a like a big dictionary that holds everything that is known by the system.
Most of the elements saved in the **State** are the **Concepts**. In this first version,
it's a little bit complicated to define what is the **Concept** as it can have several
usages. To make it simple, I will say that a **Concept** is an idea that can be
manipulated by the rest of the system.
I am pretty sure that its form and usage will evolve as I will manipulate
them
- Each **State** has a reference to the event(s) that trigger this state
- Each **State** has an **history**
- Each **Concept** has an **history**
An **history** is a triplet of
- user name
- modification date
- digest of the parent
.. _git: https://git-scm.com/
Personally, i have taken this way of tracking modification from how it's done on git_,
I guess Linux Torvalds took it from somewhere.
About versioning of the information
"""""""""""""""""""""""""""""""""""""
As I said previously, I mimic how git_ versions its objects.
::
Obj v0 : parents = []
user name =
modification date =
digest = xxxxx
Obj v1: parents = [xxxxx]
user name =
modification date =
digest = yyyyy
Obj v1: parents = [yyyyy]
user name =
modification date =
digest = zzzzz
and so on...
I always keep a reference to the last version of the object, so I can navigate through
the versions using the :code:`parents` attribute of the object
In git_, there are basically two types of objects :
- **content** (file content, or directory structure)
- **reference** to content (commit or tags)
The hash a **content** only depends on it, while the hash of a **reference** also depends
on the user name, the modification date and the parents. In both cases, the hash is
computed on the whole object. So the hash can also be used to check the integrity
of an object.
For my objects, I need to decide how I compute the hash.
**Concepts** have history, if I decide to include the history in the hash,
as the modification date is :code:`datetime.now()`, a new version will be created
even if the **Concept** has not changed. If I don't include it, the integrity of the
what is saved is no longer guaranteed.
I choose to value identity over integrity. The hash code of the **Concepts** does not depend
on his history. We will see what the future will say about this.
2019-11-01
**********
Inspired by CodinGames
""""""""""""""""""""""
.. _codingame: https://www.codingame.com/home
I am trying to teach my little kid how to code. He is 12 years old and it was his very
first time.
Rather than trying a standard formal approach, we went on the codingame_ web site. There
are some pro and cons to use this platform, specially for the very beginners, but
I like the visual output of the programs. It's really like coding a game !
What I haven't noticed previously, is that (at least for the first programs), the solution
is given in human language.
For example, for the exercise called "The descent" you will find
::
For each round of play :
Reset the variables containing the index of the highest mountain and its height to 0
For each mountain index (from 0 to 7 included) :
Read the height of the mountain (variable 'mountainH') from stdin
If it's higher than the highest known mountain, save its index and height
Returns the index of the highest mountain on stdout
It will be great if Sheerka is able to produce some code from these instructions :-)
SheerkaDataProvider (sdp)
"""""""""""""""""""""""""
Not I great name, I confess. But who care ?
What are the main design constraints?
::
1. No adherence with the filesystem.
We must not care about where the data are stored.
The first implementation will be file based, but it has to be extensible.
The final target will be to have a decentralized persistence system
2. CRUD operations are designed according to my needs
I don't want standard CRUD operations that I will have tweak.
The direct consequence is that sdp won't fit any other purpose
3. History management for State and other objects for free.
sdp, like many modern database systems, is a dictionary. A big list of key-value pairs.
The key is a string, the value can be almost anything. Actually, for my needs, I guess
that I only need strings, numbers and list (of strings and numbers :-)
Json also provide, true, false and null. So I guess that I will also need them.
I need at least one level of categorization. That means that my objects can be grouped.
The basic signature to add a new element :code:`add(entry, obj)`.
with
::
entry : is the group / category where I want to put the object
object : object to persist
With :code:`add("All_Concepts", "foo")` the database, let's call it **State** once for all, will be updated like this:
.. code-block:: json
{"All_Concepts" : "foo"}
If I want to have another entry, I don't want to care about what was previously done. I
need the second call :code:`add("All_Concepts", "bar")` to produce
.. code-block:: json
{"All_Concepts" : ["foo", "bar"]}
So we are no longer in the usual way of implementing a CRUD.
2019-11-06
**********
Input processing
"""""""""""""""""
The basic processing flow should be
::
1. parsers
2. evaluators
3. printers
So, for each new input, all known parsers will try to recognize the input. Each parser will
return a triplet of :code:`(status, concept found (or node found), text message)`
This list of triplet is given to the evaluators. In the same way, there should be multiple
types of evaluators. There will be the rules that will be introduced later.
All evaluators will provide a list (a guess it will be triplets as well) to the printers.
Python processing
"""""""""""""""""
Sheerka natively understand Python. So it will be able to execute Python code.
I will manage later on the issues caused by the different version of Python, or the fact
that some external modules must remain isolated (maybe using virtualenv)
My first problem is to correctly implement the :code:`eval / exec` function.
I don't know why, by Python has two similar function to do the same thing. One must use
eval to evaluate expression, or use exec to execute code. There must be an explanation but,
as for know, it seems to be a complication for nothing.
The next issue that I will have to tackle is that Sheerka is not a REPL. After the execution
of the input, the system stops. Nothing is kept in memory (eg RAM).
The whole idea is to make Sheerka 'remember', even something that happened a long time ago.
So I should find a way to 'freeze the time'
To better explain what I have in mind. let's say that I want to pretty print an object
.. code-block:: python
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(stuff)
I need three lines in oder to be able to pretty print. I will first try by dumping the
globals(), using pickle and load it back whenever needed.
If it does not work as expected, I can find a way to save the commands a exec everything
when needed. (first time, I exec import... second time I exec import + pp == and the last
time I exec the three statements).
2019-11-07
**********
Back on data persistence
"""""""""""""""""""""""""
Last time, I talked on how to add new entries in the **State**. I only need the name of
the category, on the object. If I add several objects under the same entry,
they don't override each other, they are kept as a list.
.. code-block:: python
add("All_Concepts", "foo")
add("All_Concepts", "bar")
will produce something like
.. code-block:: json
{"All_Concepts" : ["foo", "bar"]}
The reason behind this chose is that, in the human world, the same name can refer to
several concepts. The first obvious cases are the synonyms. Same word, but different
meaning. There are also some other case where the meaning of the world depend on the context.
Rather than forcing the user to spend some time to find another way to express the concept,
(as the name already exists), I prefer allow the storage under the same key.
The choice of the correct item to use in the list will be done on execution.
I also need sdp to manage the key of my object. So 'entry' will be used to group object,
and the key will help to quick access to it.
I don't want the signature :code:`add(entry, key, object)` because sometimes there is a key,
but keys are not mandatory. So I keep the signature :code:`add(entry, object)`
To manage the key, the object either is a key/value entry :code:`{key: value}` (Python dict) or
has an attribute :code:`key`, or has a method :code:`get_key()`
For example **Concepts** have a method :code:`get_key()`, so if the key of 'concept' is "foo",
the code
.. code-block:: python
add("All_Concepts", concept)
will produce something like
.. code-block:: python
{"All_Concepts" : {"foo" : concept}}
If I add another concept (concept2) which has tke key "bar", I will have
.. code-block:: python
{"All_Concepts" : {"foo" : concept, "bar": concept2}}
and so on..
So under the 'All_Concepts' group, I have a quick access to the concept "foo"
Note that, if for some reason, I end up with several concepts this the same key, they will
be just stack as list. I don't loose information.
We will talk again about sdp later
Status
""""""
As of today, I have a first implementation of several main functionalities of the system
1. I have a good implementation of sdp
* When I say good, I talk about the coverage of the functionalities, not the efficiency of the code
* I can add object to the state
* The objects can be saved as reference (will be explained later)
* I manage events
* I manage history
* I manage several types of serialisation
2. I have two parsers
* DefaultParser : to detect sheerka specific language (like def concept)
* PythonParser : to parse Python code.
* There are called for every new event.
3. I have a first version of the evaluators
* These have piece of code that recognize a result and process it
* The current algo is not finished, but it works for simple cases
* I can create a new concept
* I can evaluate simple Python expression
4. I don't have the printers, but it's ok, I just dump the result of processing
so I can type
::
def concept hello name as "hello" + name
1 + 1
sheerka.test()
I will now work on how to call an already defined concept.
2019-11-11
**********
Maintaining the blog
""""""""""""""""""""
It's not very easy to maintain this blog. Every time I have some time to work on **Sheerka**,
I must choose between expressing my ideas in this blog and coding.
I have plenty of ideas that I would like to express, sometimes just to put the idea down,
but I lack of time. It would be great if I can find a tool that will allow me to just to
dictate my words. I know that there are plenty out there, I need to spend some time to test
them and choose one.
2019-11-15
**********
Managing concepts resolutions
"""""""""""""""""""""""""""""
I am a little stuck on the algorithm I must use to derive (resolve) concepts. This is
one of this day I strongly regret to have someone I can discuss with :-(
Let's write the problem down, sometimes, it helps figure out the best approach.
::
def concept one as 1
one
The concept is first define (it returns the number 1), and then it's called.
During the call
1. During parsing,
Both Python parser and concept parser will recognize 'one'
2. During Evaluation,
* Python Evaluator will fail (one is not know by python)
* Concept Evaluator will success. My question is what should it return ?
The two option are:
1. Python node, to let the Python Evaluator work and return one, in the next row
2. Returns '1' directly
I as write it down, it is obvious that it must return 1, since the purpose of any
evaluation is to give a result, not the path to find the result.
Plus, if don"t resolve the body in the Concept Evaluator, I will loose where the
'1' comes from.
I don't know if I was clear. I don't even know if I will be able to re-read myself.
But I think that I have my solution.
2019-11-16
**********
ExactConceptParser limitation
"""""""""""""""""""""""""""""
From the beginning, my simplest example is to show that addition can be simply
explained to Sheerka
::
def concept a plus b as a + b
def concept one as 1
def concept two as 2
one plus two
The :code:`one plus two` is perfectly recognized, and the result is 3.
:code:`two plus one` also work (with the correct response).
But I was quite surprised to see that :code:`one plus one` was not recognized !!
Indeed, the **ExactConceptParser** looks for :code:`__var0__ plus __var1__`. So
the first operand and the second have to be different.
It's unexpected :-(
Do I need to enhance the parser to recognize it, or no I need to build another parser ?
If I tell the parser that :code:`a plus b`, how do I handle the cases where 'a 'and 'b'
MUST be different ? How I handle when the explicitly have to be the same ?
I seems that the purpose of the **ExactConceptParser** is to find exact match.
I need another way to express that 'a' and 'b' can be the same.
2019-11-21
**********
MemoryFS, is it a joke ?
"""""""""""""""""""""""""""""
I spent this day working on a improving the test performances. By default Sheerka
persists its data on the file system (even if I said that where the data is saved)
is not important for the sdp module.
For each test, a folder in initialized to hold concepts information. And this folder
is destroyed after usage. For almost every single test !
So I decided to implement fs.MemoryFS. Information in memory is supposed to be
faster than on the disk !
I was very disappointed, after a afternoon of refactoring that it is actually slower
than the native io implementation.
Even now that I am writing it, I just can't believe it. I must I have implemented
it wrong. But the profiling shows that the time is lost in the under layers of the
FS library.
It's a shame !
2019-12-01
**********
Using BNF to define concept
"""""""""""""""""""""""""""""
I always knew that there will be several ways to define the body of a concept (same
goes for the 'pre', 'post' and 'where' parts). It can be defined as Python code,
or something that is related to concepts. It can even be a new language that I will
design. The important point, is that contrarily to traditional development languages,
Sheerka must remain extensible.
Same goes for the definition of the name.
The traditional form is:
::
def concept boo bar baz as ...
So the concept is defined by the sequence 'foo', then 'bar' then 'baz'. In this order.
Another way is
::
def concept a plus b where a,b as ...
In this form, a and b are supposed to be variables.
It will be matched against :code:`one plus two`.
The concept name is 'a plus b'. It is a quick way to declare a concept with variable,
but if someone define another concept
::
def concept number1 plus number2 where number1,number2 as ...
This will produce another concept (with the same key although). I guess that, at
some point, Sheerka will be able to detect that the concepts are the same, but
the name of the concept includes its variables. Which may be annoying in some
situations.
Plus, it's not possible to define rules precedences in this way. For example,
::
def concept a plus b as ...
def concept a times b as ...
How do you express that multiplications have a higher priority in for example
:code:`one plus two times three` ?
The only right answer, at least to me, is to implement something that is inspired
by the BNF definition of a grammar.
So the definition of the concept will look like
::
def concept term as factor (('+' | '-') term)?
def concept factor as number (('*' | '/') factor)?
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
This form seems great, but in the definition of term and factor, there is no more
room for the real body. ie once the components are recognized, what do we do with them ?
So we can try
::
def concept factor (('+') factor)* as factor[0] + factor[i]
def concept number (('*') number)? as number[0] + number[i]
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
The body is defined, but the name of concept is to complicated ex: factor (('+') factor)*
It's quite impossible to reference a concept that is defined in this way.
So my last proposal, with marry the two ideas, is to introduce the two keyword 'using' 'bnf'
.. _bnf : https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
::
def concept term using bnf factor (('+' | '-') term)? as factor + (or -) term
def concept factor using bnf number (('*' | '/') factor)? as number * (or /) factor
def number where number in ['one', 'two', 'three'] as match(body, 'one', 1, 'two', 2, 'three', 3)
In my implementation:
* Terminals are between quotes
* Sequences are separated by whitespaces
* '|' (vertical bar) is used for alternatives
Like in regular expressions, you will also find
* '*' (star) is used to express zero or many
* '+' (plus) to express one or many
* '?' (question mark) to expression zero or one
For those who doesn't know that BNF stands for, please have a look at the bnf_
wikipedia page.
I guess that I will need a complete chapter to explain how you retrieve what was parsed
2019-12-21
**********
Implementing Inheritance
""""""""""""""""""""""""
Except that it is not inheritance, at least the way it is seen in modern programing languages.
I think that I should first express what I am trying to do. I guess that it will help me
have a better understanding myself.
::
def concept one as 1
def concept two as 2
one is a number
two is a number
When I enter :code:`one`, the result should be :code:`1`
But I should be able to express other concepts by using
::
def concept a plus b where a is a number and b is a number as a + b
Just by reading what I have just written, we can see that 'is a' has two separate meanings.
In the first usage, it's an affirmation, in the latter one, it's a question.
Should we consider them as the same concept, with two usages, or as two separate concepts,
which are somehow linked ?
As of now, there is only one usage to all concepts, which is the property 'BODY', but I have
prepared the property 'PRE' which can be used for that.
I am a little bit making a digression. The original subject was on how I can express that a
concept is an element of another concept. We may focus on the implementation later.
So saying that 'one' is a 'number' means that there is a set called 'number'
in which 'one' belong.
The simple implementation will be to create an entry 'all_number' in sdp and to add 'one' in it.
The two issue that I foresee are:
* What about infinite sets ? (my set 'number' can never be completed if I put the item one by one)
* What if the same name refers to different set (I don't have any example in mind, but I guess that synonyms of sets do exist)
For the two questions, I will first try the simple implementations and see there I go from there. ie :
* on the top of the entry all_numbers which lists the known numbers, you can define concepts :code:`is a number`
that can be also used to detect the the concept is part of the set
* the entry in sdp will not be all_number, but all_id_of_number. I will use the concept id instead of its name
2019-24-12
**********
Going back on BNF implementation. As it's Christmas eve today, I won't stay very long.
So, the implementation lies in the class BnfNodeParser, a it's a lexer not for token, but for concept.
The purpose of this class is to recognize a sequence of Concept.
So if we defines the following concepts
::
def concept foo from bnf one two three
def concept bar form bnf four five
when you input
::
one two three four five
the list of :code:`[foo, bar]` will be returned by the BnfNodeParser (as return values)
How does it works ?
As explained in the code, my implementation is highly inspired by Arpegio project. To define your grammar, you
use **ParsingExpressions**. There are several types
* some use to recognize tokens StrMatch, ConceptExpression
* other use to tell how to recognize Sequence, OrderedChoice, Optional, OneOrMore, ZeroOrMore...
Some example :
::
to recognize 'foo' -> StrMatch("foo')
to recognize 'foo bar' -> Sequence(StrMatch("foo'), StrMatch("bar'))
to recognize 'foo' or 'bar' -> OrderedChoice(StrMatch("foo'), StrMatch("bar'))
and so on...
So when a concept is defined using its bnf definition, I use the **BnfDefinitionParser** to create the grammar, and then
I use the **BnfNodeParser** to recognize the concepts
The current implementation to recognize a concept is not very efficient. All the definitions are in a dictionary
and I go thru the whole dictionary to see if some concepts are recognized. Once a concept is found, I loop again
on the whole dictionary to find the next concept.
| -> I need a btree to order the concept
| -> I need a predictive algorithm to guess the next concept
But it is for later.
So once the parsing is effective, I return a **ConceptNode** object
.. code-block:: python
class ConceptNode(LexerNode):
"""
Returned by the BnfNodeParser
It represents a recognized concept
"""
def __init__(self, concept, start, end, tokens=None, source=None, underlying=None):
super().__init__(start, end, tokens, source)
self.concept = concept
self.underlying = underlying
if self.source is None:
self.source = BaseParser.get_text_from_tokens(self.tokens)
concept
| Remember that all grammars are listed in a dictionary of .
| So when a parsing expression is verified, it's easy to link it with the concept
start
position first of the token
end
position of the last token
tokens
list of tokens that are recognized
underling
**NonTerminalNode** or **TerminalNode** that wraps the underlying **ParsingExpression** used to recognize the concept
source
| The source is deduced from the tokens
| But in the unit tests, they are directly given for speed up and simplicity
What is the difference between the **[Non]TerminalNode** and the **ParsingExpression** ?
The ParsingExpression
defines how to recognize a concept
The [Non]TerminalNode
represents what was found. So similarly to the ConceptNode, you will find the start, end and token attributes
That's all for today !
2019-27-12
**********
How to manage variables resolutions
"""""""""""""""""""""""""""""""""""
I have to admit that I am a little bit stuck with how to manage variable resolution with PythonEvaluator.
What is expected by the expression depends on the expression itself.
Let's see an example
::
def concept one as 1
def concept two as 2
eval one + two
In this situation, I expect PythonEvaluator to resolve the concepts 'one' and 'two' and to return 1 + 2, hence 3
In this other situation
::
def concept one as 1
def concept desc a as sheerka.desc(a)
desc one
I expect Python evaluator NOT to resolve the concept one and to pass it strait to the function.
Unfortunately for me, in the current implementation. 'a' is resolved to the concept 'one', which is resolved to its
body "1". So the call failed, as there is not concept 1 (moreover, 1 is an integer, it's not even the string "1").
There also be some cases where 'sheerka.desc()' expects the name of a concept (and the resolution of the concept
will be done inside the function). In this case, it's not the body nor the concept itself that is required, but the name
of the concept.
So here are three cases where the behaviour of PythonEvaluator is required to be different. I cannot hard code theses
behaviours as they depend on the context.
The global idea, to resolve this situation is to give to Sheerka a memory. What I am currently working on is the possibility
**to create** and **to recognize** concepts. As a recall :
You can create simple concepts
::
def concept one as 1
or concept using bnf
::
def concept twenties from bnf twenty (one | two | three...)=unit as 20 + unit
Both can be recognised.
But if I define
::
def a plus b as a + 1
:code:`one + two` will be recognized but twenty two plus one is not correctly implemented yet.
To go back on my issue with the variables resolutions with PythonEvaluator, the idea is to implement rules that will
recognize the concept, so you will tell Sheerka if the value, the concept or the name is expected.
I am far from implementing the rules. To be honest, I don't even know now how they will look like.
So I am going to introduce the keyword :code:`concept:name:` or :code:`c:name:`
It will means that the concept is required.
If the name is required, you can use :code:`"'name'"` or :code:`'"name"'`.
It's already working. There is nothing to do for this one.
2020-07-01
**********
How do we perform the parsing ?
"""""""""""""""""""""""""""""""
The basic flow of an execution is :
* Parse the data -> Nodes
* Evaluate the nodes -> Concepts
* Display the results
The theories says that there can exist as many parsers as necessary. Each one of them will
be specialized to recognize a specific pattern. They will then send there information to
the evaluators.
As of now, I have implemented the following parsers:
* EmptyStringParser
To recognize empty strings and react accordingly
* PythonParser
To recognize Python source code
* ExactConceptParser
To recognize simple form of concepts
* DefaultParser (the name is not accurate)
To recognize builtin syntax (like 'def concept' or 'isa')
* BnfNodeParser
To recognize concept defined with BNF language
All theses parsers are executed in the row (the order in not very important)
The first observation is that there is lot of CPU waste. Most of the time (at least as of
now, when a there is a match with one parser, the others fail). So there is no need to
execute them.
The second point is that there is now way for a parser to use the result of another.
My idea is to have parsers that can be chained, each one of them will do the little thing
it is capable of before leaving the rest to some more powerful parser.
I don't want to bring out the big guns for every single user input. And I certainly
don't want a massive and over complex parser that will be capable (in theory) of everything
Why ?
| First of all, monolithic code is bad :-)
| Then I have to keep in mind that the process will be somehow distributed
| And last, but not least. I don't have (and I certainly will never have) the full completion
of all possible parsing situation. So what I need is a plug and play system where I can add
and remove and chain parsers, depending of the input.
So,
* I'll give all parsers a priority
* The parsers with the highest priority will be executed first
* The parsers with the same priority will be executed at the same time (The order does matter)
* If, for a given priority there is a match, the parser with a lower priority won't be executed
* A parser has access to the output of the parsers of higher priorities (which were executed before it)
2020-01-11
**********
Status
""""""
Last status was back in October. At that time I could
::
def concept hello name as "hello" + name
1 + 1
sheerka.test()
1. I can evaluate concepts
::
def concept hello a where a
hello kodjo
2. I have worked on BNF definition of the concept
::
def concept twenties from bnf 'twenty' (one | two | three)=unit as 20 + unit
twenty one
eval twenty one
3. I can mix complex concepts (concepts with more than one word) and Python
::
twenty one + twenty two
twenty one + one does not work :-(
4. I have a basic implementation for logging. With control of the verbosity
5. The result of an user input evaluation is now persisted, alongside with the event
that was used for it.
2020-04-18
**********
Blog
""""""
It's been a (very) long time since I have written in this blog.
The main reason is that I found reStructured markup too complicated. I'm still not used to how directives are
supposed to work. There are so many way to do the same thing !
I guess that it's also because I don't have the proper tool to write this doc.
I use PyCharm and thought a have the basic rendering, I cannot easily navigate between
the articles
In need to install Sphinx. I want it in a docker. For sure it's not mandatory, but I'm must practice my
docker skill if I don't want to forget everything
Parsers
"""""""
As I keep repeating, parsing expression is a very big part of what I want to achieve (alongside with the
rule engine and the speech recognition)
It as to be very easy to expression a new concept
::
def concept one as 1
def concept two as 2
That's it !
I should now can do
::
one + one
one + two
Now, I can decide that plus is also a concept
::
def concept a plus b as a + b
So basically, every time Sheerka will parse something 'plus' something else, it will recognize the concept a plus b
::
one plus two
worked, but
::
one plus one
doesn't. Because 'a' and 'b' are two different letters, so it was looking for two different values. That was
an unexpected side effect of my first naive implementation.
Let's put that aside for the moment and keep on our exercise to model the world.
After an addition, it will be good to have the multiplication. Easy
::
def concept a mult b as a * b
So I can try
::
one plus two mult three
Of course, this one does now work by magic. The precedence (priority ?) between addition and multiplication
was not respected.
.. _bnf: https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
The first idea was the bnf_ parser in order to be able to write something like
::
def concept plus from bnf mult ('plus' mult)*
def concept mult from bnf number ('mult' number)*
def concept number from bnf one|two
Expressing recursive concepts was simple. I was proud of this implementation :code:`one plus two mult three`
was understood in the correct way.
But it started to become complicated when I wanted to define the body. In ':code:`mult (plus mult)*`' where
is the left part? where is the right part ?
Ok, let's try something like
::
def concept plus from bnf mult=a ('plus' mult)*=b
def concept mult from bnf number=a ('mult' number)*=b
def concept number from bnf one|two
We now have 'a' for the left part, and a potential list of 'b' for the right part.
The full definition of the concept :code:`plus` will look like
::
def concept plus from bnf mult=a ('plus' mult)*=b as:
res = a
for value in b:
res += value
return res
This should work fine. In my current implementation, 'a' is an instance of the concept 'mult', correctly
initialized with concept one or a concept two, and likewise 'b' is a list of concept 'mult'.
So it should work.
It's just that I have never been this far in the tests. I just couldn't. THIS IS WAY MORE TOO COMPLICATED
TO DEFINE A SIMPLE ADDITION !!!
**Note** that you must have quote surrounding the 'plus' in the definition, to make the difference between
the concept and the literal. It's necessary, but when you start to do that, you start to narrow the usage
of your system to developers only. So, even if there is no other way, I didn't really liked that.
.. _IronPython: https://ironpython.net/
.. _parsec: https://github.com/jparsec/jparsec
.. _Holy Grail: //www.youtube.com/watch?v=YxG5mDItkGU
.. _one: https://en.wikipedia.org/wiki/Shunting-yard_algorithm
So I am done ? Is this the end ? There should be another way to express the priority (precedence ?) between the concept.
Luckily for me, I remembered that I have once seen a implementation of the Python parser (IronPython_ I think) were they
used numbers to evaluate the precedence between additions and multiplications. And there were also something
like that when I used parsec_ parser.
So I went back on internet and found my `Holy Grail`_, well not this one, this one_.
**The Shunting Yard Algorithm**
I took me a few days to understand it and implement it in its basic form (which a already too long),
but it took me one entire month to adapt it to the concepts. I know, I am not quick :-)
As a matter of fact, the sya (Shunting Yard Algorithm) is designed for binary operators and functions where the number
of arguments is known. You can support unary operators, but there is nothing explained to ternary and more.
Dealing with concepts that can be expressed as :code:`'foo a b'` (suffixed concept) or :code:`'a b bar'`
(prefixed concept) was a interesting challenge!
Anyway, I am now in position where I can simply define my addition and my multiplication
::
> def concept a plus b as a + b
> def concept a mult b as a * b
> eval one plus two mult three
> 7
That's it !
At least in theory. The definition and the parsing of the concepts is done and fully tested when you
programmatically set the precedences, I now need a way to define/express the priorities
What I surely don't want is to write something like:
::
plus.precedence = 1
mult.precedence = 2
or
::
set_precedence(plus, 1)
set_precedence(mult, 2)
Any solution where you have to give the actual value of the precedence is a bad solution. I would like to
have something like
::
precedence mult > precedence plus
or
::
mult.precedence > plus.precedence
It means that I now have to implement a partitioning algorithm with simple constraints (<, >). I think that I will
include <=, >=, = and != as well, once for all. Sorting things according to these constraints is something
human naturally do.
2020-05-01
**********
Blog
""""""
Hi, I have the feeling that I am almost there with the parsers part. I have