These are some scripts to implement materialized views in PostgreSQL.

PROJECT GOALS

In a nutshell, we want to implement materialized views in PostgreSQL. That
means basically one of three things.

1) The "SNAPSHOT" materialized view is like a picture taken at a point in
time, namely the last time the materialized view was refreshed or altered.
This provides an easy way to cache data that may take a lot of work to get.

This is really easy to implement.

2) The "AUTO UPDATING" materialized view is like a working view, except that
it has a constantly updating cache. Whenever data that affects the view is
changed, the data in the view will be updated. There are two ways to go about
this:

    A) The eager way. Every change that is made that can possibly affect the
    materialized view is propagated to the materialized view. The data is
    always accurate.

    This method is slow if you do a lot of data modifications compared to the
    number of selects. However, your data is accurate. You should really
    question whether you really want a materialized view if the data changes
    often and you still need it to be accurate.

    B) The lazy way. Don't update the materialized view until you are done
    with your transaction. Therefore, within a transaction, there is no
    guarantee that the data in the materialized view is accurate.

    This method is a little bit better than (A), but still suffers the same
    problems.

    C) The very lazy way. Updating the materialized view is put off until
    someone actually tries and gets the data in the view. Perhaps the entire
    data set is updated, or the rows that are actually accessed are updated,
    or something to that effect.

    This method is probably only good if you do a lot of updates, then a lot
    of selects, but not both at the same time.

MUTABLE functions like now() and others that have different outputs given the
same input will pose tremendous hurdles for method (2). It may be impossible
to get it quite right. However, we may be able to fudge now() and say
something like "Sure, we know now() changes every instant, but we'll only
update the table due to changes to now() once a day."

(2) is a lot harder than (1) to implement, but the benefits can be tremendous.

IDEAS

Using an optimizer to find the best way to update a materialized view should
data change. For small data sets, it may be better to just drop all the data
and recreate it. For larger, more complicated sets, it may make sense to
delete a range of tuples and refresh them. We'll have to have multiple methods
of performing the materialized view updates, and keep them around in case they
might be the most effective way in a corner case.

One way to do things is to see if most of one column is of one value. If so,
then set all the rows to that value and let the update work only on the ones
that are not that value.
    
INSTALLATION

1) Run the installation script install.sql. This will setup the tables and
triggers needed.
    
    $ psql [your options here] -f install.sql

Report any errors you see.

2) Configure your environment for perl's DBI module.

    $ export DB_DSN=dbi:Pg:dbname=<dbname>

3) Experiment creating, altering, dropping, and refreshing materialized views
with the 'create', 'alter', 'drop', and 'refresh' scripts.

TODO
- Documentation of materialized views, what they are, and how they work is
  necessary. Preferrably, we'd use the same documentation system as PostgreSQL
  itself, so that when we get included, we will have excellent documentation
  to go with it. Documentation includes the front end documentation, typical
  use cases with plenty of examples, as well as back-end documentation.

- Research papers that are relevant should be collected. They should be
  translated so regular old programmers like me can understand them.
  
- Right now, auto updated materialized views are not supported. However, this
  is one of the major project goals. First, we will implement simple updates,
  and then we will apply all the know-how we can find on the internet and come
  up with ourselves to implement more complicated and more efficient updated.

- The tables created by materialized views should have restrictions so that
  only the update triggers and the refresh script can modify them.

- Refresh should only work for non-auto updated materialized views.

CONTRIBUTING

Feel free to email me with questions, suggestions, and thoughts. Links to
relevant information, photocopies of sections of good textbooks, and such are
warmly welcomed.

If you have something better, I am willing to ditch this and help out.

AUTHOR

Jonathan Gardner <jgardner@jonathangardner.net>

COPYRIGHT

Copyright (c) 2003 Jonathan Gardner

LICENSE

The perl scripts are tentatively licensed under the terms of perl itself. I
haven't decided on an appropriate license for everything. Personally, I'm a
big fan of GPL, but I know a lot of people prefer BSD-style licenses,
especially near PostgreSQL.

