92 lines
3.5 KiB
ReStructuredText
92 lines
3.5 KiB
ReStructuredText
Data Persistence
|
|
=================
|
|
|
|
|
|
The basic idea
|
|
""""""""""""""
|
|
|
|
Everything starts with a basic and simple idea.
|
|
|
|
My simple idea for the persistence is that **everything** should be persisted.
|
|
The actual main difference between an human being and a computer is that we have the
|
|
ability to remember almost everything (at least everything that we have not forgotten).
|
|
|
|
On the contrary, we only allow computer to remember specific stuff that we think (as of
|
|
today) will be relevant in the future.
|
|
|
|
There are two majors issues with that:
|
|
|
|
1. The obvious one, is that we don't know what will be needed in the future
|
|
2. We prevent the computer to remember things like an human being would do
|
|
|
|
I think I will come back to the second point some day as it's more subtle than it looks like
|
|
(at least to me).
|
|
|
|
Anyway, I need
|
|
|
|
1. A persistence mechanism that can save my main object (**Concepts**, as well as the **Events** or the **Rules**), but also the current state of the system.
|
|
2. I also need to have the ability do bo back in time, to see what were the values of theses objects in the past.
|
|
3. And of course, I need traceability on theses objects. Eg, the ability to prove that the data was not altered not corrupted
|
|
|
|
|
|
|
|
There are tons of different databases already on the market. Unfortunately for me, I'm not
|
|
a database expert. But, I already know that I was not looking for a traditional
|
|
relational database (SGDB) as the structure will evolve and I didn't want to spend
|
|
my time on redesigning the schemas and the constraints.
|
|
|
|
.. _git: https://git-scm.com/
|
|
|
|
As I was learning Python, it could have been a good idea to also start looking at an
|
|
already existing NoSql database. I started to look at MongoDB, but I got lazy. I knew that
|
|
the top feature that I needed was that management of the history (the way git_ does it),
|
|
and it was not provided by Mongo, or I didn't notice it in my first readings on the subject.
|
|
|
|
So I decided to design and implement my own database.
|
|
|
|
|
|
Versioning the information
|
|
"""""""""""""""""""""""""""""""""""""
|
|
As I said previously, I want a system that mimics how git_ versions its objects.
|
|
|
|
::
|
|
|
|
Obj v0 : parents = []
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = xxxxx
|
|
|
|
Obj v1: parents = [xxxxx]
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = yyyyy
|
|
|
|
Obj v1: parents = [yyyyy]
|
|
user name = <a name>
|
|
modification date = <a date>
|
|
digest = zzzzz
|
|
|
|
and so on...
|
|
|
|
I always keep a reference to the last version of the object, so I can navigate through
|
|
the versions using the ``parents`` attribute of the object
|
|
|
|
In git_, there are basically two types of objects :
|
|
|
|
- **content** (file content, or directory structure)
|
|
- **reference** to content (commit or tags)
|
|
|
|
The hash a **content** only depends on it, while the hash of a **reference** also depends
|
|
on the user name, the modification date and the parents. In both cases, the hash is
|
|
computed on the whole object. So the hash can also be used to check the integrity
|
|
of an object.
|
|
|
|
For my objects, I need to decide how I compute the hash.
|
|
|
|
**Concepts** have history, if I decide to include the history in the hash,
|
|
as the modification date is :code:`datetime.now()`, a new version will be created
|
|
even if the **Concept** has not changed. If I don't include it, the integrity of the
|
|
what is saved is no longer guaranteed.
|
|
|
|
I choose to value identity over integrity. The hash code of the **Concepts** does not depend
|
|
on his history. We will see what the future will say about this. |