Sheerka-Old/docs/blog.rst

2019-10-30
**********

What is Sheerka ?
"""""""""""""""""

Sheerka is a *communication* language,
as opposed to the traditional *programming* languages. Its
purpose is to ease the communication between the (wo)man  and the machine,
ultimately using the voice. I will first use it to program faster, and maybe
more easily.

.. _ulysse31: https://fr.wikipedia.org/wiki/Ulysse_31

Where does the name Sheerka came from ?
"""""""""""""""""""""""""""""""""""""""
Sheerka is my misspell of Shyrka, from my childhood anime ulysse31_.
For those you don't know this old cartoon, it's the Odyssey story from Homer,
ported in the 31st century. Ulysses has a spacecraft with an AI named Shyrka

I was a great fan of this cartoon when I was young. I thought that the idea of
bringing the ancient story of Ulysses in the future was a bright.

Ever since then, Sheerka was my reference for any sophisticated computer. Unfortunately
for me, at that time there was no wikipedia to tell the the correct spelling.

Model v0
""""""""
In my view, the beginning of everything  are the **Events**. Basically, they are the commands (ie requests)
entered by the users.

The events are parsed, to understand what is required, so they produce a new **State**.
The state is a like a big dictionary that holds everything that is known by the system.

Most of the elements saved in the **State** are the **Concepts**. In this first version,
it's a little bit complicated to define what is the **Concept** as it can have several
usages. To make it simple, I will say that a **Concept** is an idea that can be
manipulated by the rest of the system.
I am pretty sure that its form and usage will evolve as I will manipulate
them

- Each **State** has a reference to the event(s) that trigger this state
- Each **State** has an **history**
- Each **Concept** has an **history**


An **history** is a triplet of

- user name
- modification date
- digest of the parent

.. _git: https://git-scm.com/

Personally, i have taken this way of tracking modification from how it's done on git_,
I guess Linux Torvarlds took it from somewhere.


2019-10-31
**********

More on Concepts
""""""""""""""""
To define a new concept

::

    def concept hello a as "hello" + a


Note that the traditional quotes that would surround 'hello' and 'a' are not necessary.
In this example 'a' is a variable, as it appears as variable in the 'as' section (while hello
appears as a string)

So, you could call the concept by

::

    hello kodjo
    hello my friend

They will produce the strings "hello kodjo" or "hello my friend"

About versioning
""""""""""""""""
As I said previously, I mimic how git_ versions its objects.

::

   Obj v0 :  parents = []
            user name = <a name>
            modification date = <a date>
            digest = xxxxx

    Obj v1: parents = [xxxxx]
            user name = <a name>
            modification date = <a date>
            digest = yyyyy

    Obj v1: parents = [yyyyy]
            user name = <a name>
            modification date = <a date>
            digest = zzzzz

    and so on...

I always keep a reference to the last version of the object, so I can navigate through
the versions using the :code:`parents` attribute of the object

In git_, there are basically two types of objects :

- **content** (file content, or directory structure)
- **reference** to content (commit or tags)

The hash a **content** only depends on it, while the hash of a **reference** also depends
on the user name, the modification date and the parents. In both cases, the hash is
computed on the whole object. So the hash can also be used to check the integrity
of an object.

For my objects, I need to decide how I compute the hash.

**Concepts** have history, if I decide to include the history in the hash,
as the modification date is :code:`datetime.now()`, a new version will be created
even if the **Concept** has not changed. If I don't include it, the integrity of the
what is saved is no longer guaranteed.

I choose to value identity over integrity. The hash code of the **Concepts** does not depend
on his history. We will see what the future will say about this.

2019-11-01
**********

Inspired by CodinGames
""""""""""""""""""""""


.. _codingame: https://www.codingame.com/home

I am trying to teach my little kid how to code. He is 12 years old and it was his very
first time.

Rather than trying a standard formal approach, we went on the codingame_ web site. There
are some pro and cons to use this platform, specially for the very beginners, but
I like the visual output of the programs. It's really like coding a game !

What I haven't noticed previously, is that (at least for the first programs), the solution
is given in human language.

For example, for the exercise called "The descent" you will find

::

    For each round of play :
        Reset the variables containing the index of the highest mountain and its height to 0
        For each mountain index (from 0 to 7 included) :
            Read the height of the mountain (variable 'mountainH') from stdin
            If it's higher than the highest known mountain, save its index and height
        Returns the index of the highest mountain on stdout

It will be great if Sheerka is able to produce some code from these instructions :-)

Some words on data persistence
"""""""""""""""""""""""""""""""""""""""""
As I previously said (or not), the main difference between Sheerka and other languages,
is that Sheerka has a memory of its (her ? :-) previous interactions with the users.

The **Concepts**, as well as the **Events** or the **Rules** are persisted. Because of
that, I think that the more Sheerka is used, the more easier it will be to use it.

So my first focus was to decide which database to use.

There are tons of different databases already on the market. Unfortunately for me, I'm not
a database expert. But, I already know that I was not looking for a traditional
relational database (SGDB) as the structure will evolve and I didn't want to spend
my time on redesigning the schemas and the constraints.

As I was learning Python, it could have been a good idea to also start looking at an
already existing NoSql database. I started to look at MongoDB, but I got lazy. I knew that
the top feature that I needed was that management of the history (the way git does it),
and it was not provided by Mongo, or I didn't notice it in my first readings on the subject.

So I decided to design and implement my own database.


SheerkaDataProvider (sdp)
"""""""""""""""""""""""""
Not I great name, I confess. But who care ?

What are the main design constraints?

::

    1. No adherence with the filesystem.
        We must not care about where the data are stored.
        The first implementation will be file based, but it has to be extensible.
        The final target will be to have a decentralized persistence system
    2. CRUD operations are designed according to my needs
        I don't want standard CRUD operations that I will have tweak.
        The direct consequence is that sdp won't fit any other purpose
    3. History management for State and other objects for free.


sdp, like many modern database systems, is a dictionary. A big list of key-value pairs.
The key is a string, the value can be almost anything. Actually, for my needs, I guess
that I only need strings, numbers and list (of strings and numbers :-)

Json also provide, true, false and null. So I guess that I will also need them.

I need at least one level of categorization. That means that my objects can be grouped.
The basic signature to add a new element :code:`add(entry, obj)`.

with

::

    entry : is the group / category where I want to put the object
    object : object to persist

With :code:`add("All_Concepts", "foo")` the database, let's call it **State** once for all, will be updated like this:

.. code-block:: json

    {"All_Concepts" : "foo"}

If I want to have another entry, I don't want to care about what was previously done. I
need the second call :code:`add("All_Concepts", "bar")` to produce

.. code-block:: json

    {"All_Concepts" : ["foo", "bar"]}


So we are no longer in the usual way of implementing a CRUD.


2019-11-06
**********

Input processing
"""""""""""""""""
The basic processing flow should be

::

    1. parsers
    2. evaluators
    3. printers

So, for each new input, all known parsers will try to recognize the input. Each parser will
return a triplet of :code:`(status, concept found (or node found), text message)`

This list of triplet is given to the evaluators. In the same way, there should be multiple
types of evaluators. There will be the rules that will be introduced later.

All evaluators will provide a list (a guess it will be triplets as well) to the printers.

Python processing
"""""""""""""""""
Sheerka natively understand Python. So it will be able to execute Python code.
I will manage later on the issues caused by the different version of Python, or the fact
that some external modules must remain isolated (maybe using virtualenv)

My first problem is to correctly implement the :code:`eval / exec` function.

I don't know why, by Python has two similar function to do the same thing. One must use
eval to evaluate expression, or use exec to execute code. There must be an explanation but,
as for know, it seems to be a complication for nothing.

The next issue that I will have to tackle is that Sheerka is not a REPL. After the execution
of the input, the system stops. Nothing is kept in memory (eg RAM).
The whole idea is to make Sheerka 'remember', even something that happened a long time ago.
So I should find a way to 'freeze the time'

To better explain what I have in mind. let's say that I want to pretty print an object

.. code-block:: python

    import pprint
    pp = pprint.PrettyPrinter(indent=4)
    pp.pprint(stuff)

I need three line in oder to be able to pretty print. I will first try by dumping the
globals(), using pickle and load it back whenever needed.

If it does not work as expected, I can find a way to save the commands a exec everything
when needed. (first time, I exec import... second time I exec import + pp == and the last
time I exec the three statements).

2019-11-07
**********

Back on data persistence
"""""""""""""""""""""""""
Last time, I talked on how to add new entries in the **State**. I only need the name of
the category, on the object. If I add several objects under the same entry,
they don't override each other, they are kept as a list.

.. code-block:: python

    add("All_Concepts", "foo")
    add("All_Concepts", "bar")

will produce something like

.. code-block:: json

    {"All_Concepts" : ["foo", "bar"]}

The reason behind this chose is that, in the human world, the same name can refer to
several concepts. The first obvious cases are the synonyms. Same word, but different
meaning. There are also some other case where the meaning of the world depend on the context.
Rather than forcing the user to spend some time to find another way to express the concept,
(as the name already exists), I prefer allow the storage under the same key.
The choice of the correct item to use in the list will be done on execution.

I also need sdp to manage the key of my object. So 'entry' will be used to group object,
and the key will help to quick access to it.

I don't want the signature :code:`add(entry, key, object)` because sometimes there is a key,
but keys are not mandatory. So I keep the signature :code:`add(entry, object)`

To manage the key, the object either is a key/value entry :code:`{key: value}` (Python dict) or
has an attribute :code:`key`, or has a method :code:`get_key()`

For example **Concepts** have a method :code:`get_key()`, so if the key of 'concept' is "foo",
the code

.. code-block:: python

    add("All_Concepts", concept)

will produce something like

.. code-block:: python

    {"All_Concepts" : {"foo" : concept}}

If I add another concept (concept2) which has tke key "bar", I will have

.. code-block:: python

    {"All_Concepts" : {"foo" : concept, "bar": concept2}}

and so on..

So under the 'All_Concepts' group, I have a quick access to the concept "foo"

Note that, if for some reason, I end up with several concepts this the same key, they will
be just stack as list. I don't loose information.

We will talk again about sdp later

Status
""""""
As of today, I have a first implementation of several main functionalities of the system


1. I have a good implementation of sdp
    * When I say good, I talk about the coverage of the functionalities, not the efficiency of the code
    * I can add object to the state
    * The objects can be saved as reference (will be explained later)
    * I manage events
    * I manage history
    * I manage several types of serialisation
2. I have two parsers
    * DefaultParser : to detect sheerka specific language (like def concept)
    * PythonParser : to parse Python code.
    * There are called for every new event.
3. I have a first version of the evaluators
    * These have piece of code that recognize a result and process it
    * The current algo is not finished, but it works for simple cases
    * I can create a new concept
    * I can evaluate simple Python expression
4. I don't have the printers, but it's ok, I just dump the result of  processing

so I can type

::

    def concept hello name as "hello" + name
    1 + 1
    sheerka.test()

I will now work on how to call an already defined concept.