Bi-temporal data modelling or filtering or fast-forwarding and rewinding in 2 time dimensions: Validity and acquisition time
It is not uncommon to want to historicise the data in an application by creating a new line instead of simply overwriting the corresponding line when a change is made. As a rule, the application should then simply display the current status. Sometimes, however, the user wants to see the data as it was at a point in time in the past or as it will be in the future. This places special demands on the data modelling in the database and also on the programming of the application.
If data is to be historicised, 2 temporal aspects come into play: Firstly, the validity of the data.
Example: Change of address. A person moves house and informs their bank in advance of their new address, valid from the first day of the month in two months' time. Ideally, the bank records the new address with the valid-from date, but keeps the previous address. Correspondence will be sent to the old address until the new expiry date is reached. From then on, correspondence is sent to the new address. The application should then either show the data that is valid on a set date - usually today - or show an overview with the history of the data with regard to its validity. In the first case, the user can set the desired valid-per-date in the application from the present to the past or the future and then see the data filtered accordingly.
The second temporal aspect is the time of recording. The aim is to record who entered which change and when - for the same validity date. This is therefore also referred to as the audit time dimension. Here too, the application generally shows the current data, but allows you to list the changes made for a specific validity full stop if required. For example, the house number in an address valid as of 1 November was subsequently corrected. It should then be possible to see what the address looked like before and after (always valid as of 1 November) and who entered the data and when.
So how do you implement this? If you only want to historicise one of the two temporal aspects, this is less difficult. If, for example, you only want to write away data corrections with their time of entry, this is implemented in the database with a so-called trigger for changes and deletions in the corresponding tables. The trigger intercepts any such changes and writes the row with the previously existing data values to a shadow or journal table in addition to the desired action. In addition to the table with the current values, there is also a journal table with historical values in the database. The following example uses a person table with date of birth and customer category:
This approach has the advantage that the actual table is not enlarged by historical entries and the application does not always have to filter these out - as long as you only want to see the current values, which is usually the case.
On the other hand, if you only want to record the other time dimension - that of validity - you can simply introduce a further table column with the valid-from date and possibly a valid-to date. A person table then contains one to several entries for different persons with different validity periods for changing customer categories (bronze, silver, gold), which are adjusted according to turnover.
The application then displays either all personal entries or only those valid as of today. It is also able to display the entries valid at a different - selectable - point in time. Filtering to a specific point in time is carried out either by the application or by the database, as the application always accesses the corresponding table via a view. The view filters the table entries and only shows the entries valid for the selected time. The application must be able to tell the database which validity date it should work with. This is usually recorded in a so-called session variable in the database and applied in the views each time the application accesses data. However, we have ignored one difficulty, namely that in relational data modelling, entries in one table refer to entries in another table via so-called foreign keys: Person 1 has the address 28.
How should such ForeignKey references work if the entries in both the person and address tables can change and the numbers in the Id column are incremented? For example, if person 37 no longer has the address 523 but 524 due to a change of address? - This is solved by splitting the table columns into 2 separate tables, a key table with the unchangeable attributes with in particular the functional key - in the case of a person as a customer, this will be the customer number - and a status table with the variable attributes. There are 1 to several status entries for a key entry. The key entry for a person can then refer to the address key entry, which always remains the same.
The data from the key table is then merged with the data from the status table via a view and filtered via the set validity time in the session variable. The application does not work against the tables, but against the views, and also performs inserts, updates and deletes against the views. The above-mentioned trigger is not created on the table, but on the view. The trigger splits changes into those on the key table (inserts only) and on the status table. Entries are never mutated or deleted in the status table either, but a new entry is always created with the new status data for the purpose of historisation (with a delete flag in the case of a deletion).
If you now want to historicise both time dimensions - validity and time of entry - at the same time, you proceed in the same way by splitting the tables into key and status tables, which are merged again via a view for the application. In this case, however, the view must not only filter for the desired validity time, but also for the entry time via a second session variable!
iRIX has successfully implemented bi-temporal data modelling in various customer projects (pension funds, health authorities) using different database technologies (Oracle, PostgreSQL, SQL Server). It was also important that the response times of the application always remained short for the user despite the views with the temporal filters, even for large tables.