August 21, 2014

Using VIEW to reduce number of tables used

Many Open Source software solutions use database per user (or set of tables per user) which starts to cause problems if it is used on massive scale (blog hosting, forum hosting etc), resulting of hundreds of thousands if not millions of tables per server which can become really inefficient.

It is especially inefficient with Innodb tables both in terms of space (some tables would keep only couple of small rows, but require at least 16K page in Innodb), keeping all tables open in Innodb dictionary and number of other challenges in IO management and recovery. For MyISAM it works better but still overhead can get significant because table_cache can’t be made large enough and so a lot of table reopens needs to happen which requires table header modification, which is costly.

Of course if you can simply rewrite software to store multiple users per table it is best way to go, however quite typically this is way too much work and also requires constant patches as new software versions come out.

It is very tempting to use VIEWs to reduce number of tables dramatically – merging say 1000 of users to the same table as VIEWs are significantly less expensive and cheap to “open”.

How this could work ?

Say you have “post” tables which keeps blog posts and contains id, title, body columns (to keep it simple). There are one table per user so we have post123 table which keeps posts for user number 123 etc.

Converting it to the views we can have “post” table which has user_id,id, title,body columns and create post123 as view:

This would give us post123 containing only posts for users for user=123 which is exactly what we’re looking for. WITH CHECK OPTION is used to ensure we do not insert data in the view which will be invisible.

The VIEW approach works for SELECT, UPDATE, DELETE queries but not for INSERT:

This happens because as we do not pass user_id value to underlying table MySQL tries to set it to default, which does not work as it has to be different for each of the views. You could think MySQL would look at the WHERE clause to figure out which value should be used but unfortunately it does not work this way.

What would be good to have is to specify different default fields for different views, though I would not expect it to happen as I do not think standard defines anything like it.

The other alternative would be to allow triggers on views, specially INSTEAD OF triggers so you could write triggers to perform insert inserts to base tables instead of views. BEFORE INSERT triggers would not help because the column user_id is not part of the view so would not be available even if MySQL would support triggers on VIEWs (which it does not)

But OK. That is all ideas – what can you do now ?

The obvious option is to go over the code and change inserts so they go in the base table – generally there should be only few places when insert happens so it should not be big problem.

Technically if you can’t make inserts to the base table but can have them going to the different table (typically would be very easy change) say “insertpost123″ instead of “post123″ you can create these insert tables as blackhole and define a triggers on them to update base tables, which will make rows available in the views. As the storage engine is BlackHole BEFORE triggers will act the same as INSTEAD OF triggers.

A bit ugly but can be helpful.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. pabloj says:

    I’d rather think that those kind of apps don’t even exist, it UGGGGLYYYYY

  2. Jeffrey Gilbert says:

    Wouldn’t the expense you save in not opening a table be lost in the fact that you can’t use indexes through a view?

  3. peter says:

    Pabloj – There are a lot of these applications both open source and in house, just written for few users but used at massive scale.

    WordPress is good example. basically any forum applications – PHPBB etc, if you’re to run isolated installations.

  4. peter says:

    Jeffrey,

    For simple views like this indexes will be used just fine because simply where clause will be appended to all the queries you run on this view using “merge” algorithm.

    Indeed I did not note about indexes – if you go with such solution you also redo indexes on the base table to include user_id as a prefix as it is now always specified in your queries.

  5. dormando says:

    I like the trigger idea for writebacks to the base table!

    A developer tried this one a development machine I managed a few months ago. We ended up with over 80,000 views in a single instance and MySQL was not exactly happy. It soaked an extra 600 megabytes without having them in use, and querying the INFORMATION_SCHEMA at all became uselessly slow. The DB was not under high query load, but each developer had an “instance” of several thousand views emulating production, with sometimes several instances.

    If I had more time I’d test these methods, wondering if they hammer MySQL less hard?

  6. peter says:

    Oh yes. You need to forget about a lot of data in INFORMATION SCHEMA if you have a lot of objects if it is table or views.

  7. peter says:

    Dormando,

    Also I just checked by creating 500.000 views – and using all of them – there does not seems to be any memory hog related to that.

    May be besides views you had Innodb tables ? These consume memory if created in numbers.

  8. dormando says:

    Can’t get an apples to apples here… I’ll ask…

    There was a ratio of 100 views per innodb table. The system _used_ to be 40,000+ innodb and/or myisam tables, which then changed to 80,000+ views then a few hundred innodb tables. The views used 600 megabytes over that bit.

    It would bloat as the views were first accessed, then the memory never went down. Temp tables/files?

  9. peter says:

    Dormando,

    If it is create the test case and submit bug to MySQL :)
    I expect it is something different.

    I did access all views created in my test.

  10. peter,

    what about using session variables to achive the effect with inserts and reduce the number of views?

    At the start of the script you do:

    [CODE]SET @owner_id = 12345[/CODE]

    Then use a BEFORE INSERT trigger for the shared table (owner_id field should be in the view as well):

    [CODE]CREATE TRIGGER post_insert BEFORE INSERT ON posts AS
    FOR EACH ROW BEGIN
    NEW.owner_id = COALESCE(NEW.owner_id, @owner_id);
    END;[/CODE]

    Also, you don’t even need an own set of views for each user. You just need a one set of views. Think of doing:

    [CODE]CREATE VIEW owner_posts AS
    SELECT * FROM posts
    WHERE owner_id = @owner_id;[/CODE]

    But, yes, variables can’t be used inside a view definition. I have [URL=http://omelnyk.net/blog/2007/06/18/using-views-with-variables-in-mysql/]outlined[/URL] a simple workaround for that. Basically, create a function:

    [CODE]CREATE FUNCTION owner_id
    RETURNS INTEGER AS
    RETURN @owner_id;[/CODE]

    That way, you get two benefits:

    - Most of INSERTs inside an application can remain unchanged. INSERTS that have to be changed are INSERTs without explicitly specified field list. But it is not the case, except for maybe several exceptions, for the applications you have mentioned above, as they have been designed to allow possible table structure extensions;
    - You don’t have to use a huge set of views, one set of views for all users would work just fine;

  11. peter says:

    Thanks Alexey,

    This is indeed good idea and a bit different approach – instead of using different set of tables you patch application a bit so it can use same “table” for multiple users at the same time.

    and indeed such views can be used for Inserts unless you do not have columns specified.

  12. A brief description about Blackhole – MySQL Storage Engine

    http://kathyravan.blogspot.com/2008/12/blackhole-mysql-storage-engine.html

Speak Your Mind

*