Sheerka-Old/docs/source/blog/persistence.rst

Data Persistence
=================


The basic idea
""""""""""""""

Everything starts with a basic and simple idea.

My simple idea for the persistence is that **everything** should be persisted.
The actual main difference between an human being and a computer is that we have the
ability to remember almost everything (at least everything that we have not forgotten).

On the contrary, we only allow computer to remember specific stuff that we think (as of
today) will be relevant in the future.

There are two majors issues with that:

1. The obvious one, is that we don't know what will be needed in the future
2. We prevent the computer to remember things like an human being would do

I think I will come back to the second point some day as it's more subtle than it looks like
(at least to me).

Anyway, I need

1. A persistence mechanism that can save my main object (**Concepts**, as well as the **Events** or the **Rules**), but also the current state of the system.
2. I also need to have the ability do bo back in time, to see what were the values of theses objects in the past.
3. And of course, I need traceability on theses objects. Eg, the ability to prove that the data was not altered not corrupted


There are tons of different databases already on the market. Unfortunately for me, I'm not
a database expert. But, I already know that I was not looking for a traditional
relational database (SGDB) as the structure will evolve and I didn't want to spend
my time on redesigning the schemas and the constraints.

.. _git: https://git-scm.com/

As I was learning Python, it could have been a good idea to also start looking at an
already existing NoSql database. I started to look at MongoDB, but I got lazy. I knew that
the top feature that I needed was that management of the history (the way git_ does it),
and it was not provided by Mongo, or I didn't notice it in my first readings on the subject.

So I decided to design and implement my own database.


Versioning the information
"""""""""""""""""""""""""""""""""""""
As I said previously, I want a system that mimics how git_ versions its objects.

::

   Obj v0 :  parents = []
            user name = <a name>
            modification date = <a date>
            digest = xxxxx

    Obj v1: parents = [xxxxx]
            user name = <a name>
            modification date = <a date>
            digest = yyyyy

    Obj v1: parents = [yyyyy]
            user name = <a name>
            modification date = <a date>
            digest = zzzzz

    and so on...

I always keep a reference to the last version of the object, so I can navigate through
the versions using the ``parents`` attribute of the object

In git_, there are basically two types of objects :

- **content** (file content, or directory structure)
- **reference** to content (commit or tags)

The hash a **content** only depends on it, while the hash of a **reference** also depends
on the user name, the modification date and the parents. In both cases, the hash is
computed on the whole object. So the hash can also be used to check the integrity
of an object.

For my objects, I need to decide how I compute the hash.

**Concepts** have history, if I decide to include the history in the hash,
as the modification date is :code:`datetime.now()`, a new version will be created
even if the **Concept** has not changed. If I don't include it, the integrity of the
what is saved is no longer guaranteed.

I choose to value identity over integrity. The hash code of the **Concepts** does not depend
on his history. We will see what the future will say about this.