Flexviews - part 3 - improving query performance using materialized views

Combating “data drift”
In my first post in this series, I described materialized views (MVs). An MV is essentially a cached result set at one point in time. The contents of the MV will become incorrect (out of sync) when the underlying data changes. This loss of synchronization is sometimes called drift. This is conceptually similar to a replication slave that is behind. Until it catches up, the view of the data on the slave is “behind” the changes on the master. An important difference is that each MV could have drifted by a different length of time.
A view which has drifted out of sync must be refreshed. Since an MV drifts over time from the “base tables” (those tables on which the view was built) there must be a process to bring them up-to-date. Flexviews includes two different refresh methods. Each method is named after the way in which the contents of the view are updated. The two methods are complete refresh and incremental refresh. Â The view type/refresh method are interchangeable since a refresh method is selected at the time of view creation and may not be changed. If I talk about a ‘complete MV’, I mean a ‘MV for which the complete refresh method was selected at creation’.Â Also, when I say ‘view’ I mean MV.
I will fully describe each method below, but the important thing to know right now is that there are two significant differences between the methods.

A view which uses the complete refresh method must be rebuilt from scratch each time the view is refreshed. This is similar to storing the results of a query in Memcache. When the cache “expires”, the contents must be fully recalculated..

An incrementally refreshable view (that is, one which uses the incremental method) can be updated much more efficiently, by examining a history of the rows which have changed since the view was either created, or last refeshed.
The incremental method is somewhat conceptually similar to using mysql binary logs for point-in-time recovery after a restoring a backup. Since the backup is a essentially a ‘snapshot’ of the data at the time of the backup, upon restoration the backup may be ‘out of date’ since changes to the database may have happened since the backup was taken. Replaying the binary logs brings the database up-to-date by replaying the changes.
An MV is also like a snapshot. The incremental refresh method uses the changelogs collected by FlexCDC to update the snapshot to reflect the changes that have happened in the database, instead of replaying binary logs directly.

Selecting a refresh method

The refresh method you select is determined by the following two major factors:

The cost (in terms of execution time) of the query on which the view is based.
The SQL features used in the query on which the view is based.

The cost of the query

The complete method completely rebuilds an MV from scratch at each refresh. This means that the minimum amount of drift for this type of MV is the amount of time it takes to execute the query on which the view is based.Â An incrementally refreshable view, on the other hand, can usually be refreshed very quickly, depending on the amount of data which has changed since the last refresh. This is because the incremental refresh method must examine only those rows related to the ones which changed since the last time the view was refreshed.

The incremental refresh method can be many orders of magnitude faster than the original query execution time. It may be possible, therefore, to refresh an MV based on a query that takes 45 minutes to execute with a greater frequency than 45 minutes, even as frequently as every minute, perhaps.

The refresh method you select effectively controls the minimum amount of time it takes to refresh the view, and therefore the minimum amount of drift it will encounter. Keep in mind that you can build complete refresh views on top of incrementally refreshable view, which can give you a lot of flexibility to get the results you want, but more quickly. More on this later.

SQL features used in the query

Incremental refresh doesn’t support all of the SQL features available to SELECT statements.

Refresh type	Aggregation	Outer join	All SQL functions	Built using SQL	Requires FlexCDC
COMPLETE	Y	Y	Y	Y	N
INCREMENTAL	Y*	N	N**	N***	Y

*All aggregate functions supported, except GROUP_CONCAT and AVG(distinct).
**non-deterministic functions like RAND(),NOW(), etc, are not supported.
***There is a script to convert SQL to the Flexviews API (see convert.php below)

Refresh methods

The incremental refresh method

The incremental refresh method uses table changelogs which are created by FlexCDC. The refresh algorithm computes changes in multiple transactions. After each transaction it must wait for FlexCDC to process changes from that transaction. In practice, this means that it takes a minimum of a few seconds to incrementally refresh a view, even when the number of changes is very small.

The Flexviews SQL_API

Unlike views which use the complete refresh method, incrementally refreshable views are not built directly from SQL. Instead, Flexviews includes a MySQL stored procedure API, called the SQL_API which is used to define the view. These stored procedures maintain the Flexviews data dictionary. The data dictionary is used by the incremental refresh algorithm to build and maintain the view. You should always use the stored procedures to modify the dictionary. Do not modify it directly.

convert.php
You are probably unfamiliar with the SQL_API. In order to make working with the Flexviews easier, it includes a script called ‘convert.php’. This script makes it easy to create incrementally refreshable views from SQL statements. It reads one or more “CREATE TABLE db.schema … AS SELECT” and/or “INSERT INTO db.schema … AS SELECT” statements from standard input, and outputs the SQL_API statements representing the original SQL statements. Each MV name will be taken from the table name specified in each statement. You should try each statement for correctness before you attempt to convert it. You can easily check that it parses by running the SELECT portion, adding LIMIT 0 to the SELECT part of the query.

convert.php
This script takes as one argument the default database. Notice in the following example, the schema is not specified.

The MV will be placed into whatever schema is specified as the first argument on the command line:

$ cat sales.sql
create table dashboard_customer_sales AS
select customer_id,
customer_name,
DATE_FORMAT(order_date,'%Y%m') as sale_when,
sum(quantity) as total_items,
sum(price * quantity) as total_price,
count(*) as total_lines
from orders o
join customers c using(customer_id)
join order_lines ol using(order_id)
group by customer_id,
customer_name, sale_when;

$ php convert.php demo < sales.sql
CALL flexviews.create('demo', 'dashboard_customer_sales', 'INCREMENTAL');
SET @mvid := LAST_INSERT_ID();
CALL flexviews.add_expr(@mvid,'GROUP','customer_id','customer_id');
CALL flexviews.add_expr(@mvid,'GROUP','customer_name','customer_name');
CALL flexviews.add_expr(@mvid,'GROUP','DATE_FORMAT(order_date,"%Y%m")','sale_when');
CALL flexviews.add_expr(@mvid,'SUM','quantity','total_items');
CALL flexviews.add_expr(@mvid,'SUM','price * quantity','total_price');
CALL flexviews.add_expr(@mvid,'COUNT','*','total_lines');

CALL flexviews.add_table(@mvid,'demo','orders','o',NULL);
CALL flexviews.add_table(@mvid,'demo','customers','c','USING (customer_id) ');
CALL flexviews.add_table(@mvid,'demo','order_lines','ol','USING (order_id) ');
CALL flexviews.enable(@mvid);

$ cat sales.sql

create table dashboard_customer_sales AS

select customer_id,

customer_name,

DATE_FORMAT(order_date,'%Y%m') as sale_when,

sum(quantity) as total_items,

sum(price * quantity) as total_price,

count(*) as total_lines

from orders o

join customers c using(customer_id)

join order_lines ol using(order_id)

group by customer_id,

customer_name, sale_when;

$ php convert.php demo < sales.sql

CALL flexviews.create('demo', 'dashboard_customer_sales', 'INCREMENTAL');

SET @mvid := LAST_INSERT_ID();

CALL flexviews.add_expr(@mvid,'GROUP','customer_id','customer_id');

CALL flexviews.add_expr(@mvid,'GROUP','customer_name','customer_name');

CALL flexviews.add_expr(@mvid,'GROUP','DATE_FORMAT(order_date,"%Y%m")','sale_when');

CALL flexviews.add_expr(@mvid,'SUM','quantity','total_items');

CALL flexviews.add_expr(@mvid,'SUM','price * quantity','total_price');

CALL flexviews.add_expr(@mvid,'COUNT','*','total_lines');

CALL flexviews.add_table(@mvid,'demo','orders','o',NULL);

CALL flexviews.add_table(@mvid,'demo','customers','c','USING (customer_id) ');

CALL flexviews.add_table(@mvid,'demo','order_lines','ol','USING (order_id) ');

CALL flexviews.enable(@mvid);

The script (convert.php) supports basic queries which use select/group by/join/where. You may not use sub-queries or any non-deterministic functions like NOW() or RAND(). HAVING clauses, ORDER BY clauses, etc, are not supported. You can work around many of these limitations. I will discuss one of the workarounds below, in the ‘complete refresh’ section. I’ll discuss others in future blog posts.

Enable the view to use it

The SQL_API call flexviews.enable() is used to actually build the contents of the view, making it available for querying:

mysql> call flexviews.enable(
-> flexviews.get_id('demo','dashboard_customer_sales'));
Query OK, 0 rows affected (41 min 52.04 sec)

mysql> call flexviews.enable(

-> flexviews.get_id('demo','dashboard_customer_sales'));

Query OK, 0 rows affected (41 min 52.04 sec)

Data dictionary

Here is a quick example of the list of tables used by the above view, as stored in the data dictionary:

mysql> select *
from flexviews.mview_table
where mview_id=
flexviews.get_id('demo','dashboard_customer_sales')G
*************************** 1. row ***************************
      mview_table_id: 27
            mview_id: 21
    mview_table_name: orders
  mview_table_schema: demo
   mview_table_alias: o
mview_join_condition: NULL
    mview_join_order: 999
*************************** 2. row ***************************
      mview_table_id: 28
            mview_id: 21
    mview_table_name: customers
  mview_table_schema: demo
   mview_table_alias: c
mview_join_condition: USING (customer_id)
    mview_join_order: 999
*************************** 3. row ***************************
      mview_table_id: 29
            mview_id: 21
    mview_table_name: order_lines
  mview_table_schema: demo
   mview_table_alias: ol
mview_join_condition: USING (order_id)
    mview_join_order: 999
3 rows in set (0.00 sec)

mysql> select *

from flexviews.mview_table

where mview_id=

flexviews.get_id('demo','dashboard_customer_sales')G

*************************** 1. row ***************************

mview_table_id: 27

mview_id: 21

mview_table_name: orders

mview_table_schema: demo

mview_table_alias: o

mview_join_condition: NULL

mview_join_order: 999

*************************** 2. row ***************************

mview_table_id: 28

mview_id: 21

mview_table_name: customers

mview_table_schema: demo

mview_table_alias: c

mview_join_condition: USING (customer_id)

mview_join_order: 999

*************************** 3. row ***************************

mview_table_id: 29

mview_id: 21

mview_table_name: order_lines

mview_table_schema: demo

mview_table_alias: ol

mview_join_condition: USING (order_id)

mview_join_order: 999

3 rows in set (0.00 sec)

Using the dictionary

The following SQL is dynamically generated by the flexviews.get_sql() function call. This function reads the meta-data stored in the Flexviews data dictionary. It returns the SQL which represents the data stored in the view. This is much more convenient than trying to read from the dictionary directly to determine the contents of the view.

mysql> select flexviews.get_sql(
-> flexviews.get_id('demo','dashboard_customer_sales'))
-> as 'SQL' G
*************************** 1. row ***************************
SQL:
SELECT NULL as mview$pk,
(customer_id) as `customer_id`,
(customer_name) as `customer_name`,
(DATE_FORMAT(order_date,'%Y%m')) as `sale_when`,
SUM(price * quantity) as `total_price`,
COUNT(*) as `total_lines`
FROM  demo.orders as o
JOIN demo.customers as c USING (customer_id)
JOIN demo.order_lines as ol USING (order_id)
GROUP BY (customer_id),
 (customer_name),
(DATE_FORMAT(order_date,'%Y%m'))
1 row in set, 1 warning (0.00 sec)

mysql> select flexviews.get_sql(

-> flexviews.get_id('demo','dashboard_customer_sales'))

-> as 'SQL' G

*************************** 1. row ***************************

SQL:

SELECT NULL as mview$pk,

(customer_id) as `customer_id`,

(customer_name) as `customer_name`,

(DATE_FORMAT(order_date,'%Y%m')) as `sale_when`,

SUM(price * quantity) as `total_price`,

COUNT(*) as `total_lines`

FROM demo.orders as o

JOIN demo.customers as c USING (customer_id)

JOIN demo.order_lines as ol USING (order_id)

GROUP BY (customer_id),

(customer_name),

(DATE_FORMAT(order_date,'%Y%m'))

1 row in set, 1 warning (0.00 sec)

The complete refresh method

This method is actually quite simple in its operation. It will completely rebuild a view based on the SQL which defines it, each time the view is refreshed.

This is not much different from building a table with CREATE TABLE .. AS SELECT. In fact, this is actually part of what the COMPLETE refresh method does during its operation. It also takes care of atomically* replacing the new contents of the view with the old contents (using RENAME TABLE) which means that the view remains available for querying, even during refresh.

A complete refresh type view can use non-deterministic functions like NOW(). In fact any valid SELECT statement can be materialized. The definition of the MV is stored internally, and represents the actual SQL that defines the view. A simple API call is made to associate the SQL definition with the view, but the definition itself is a SQL statement, similar to a regular view. If you choose to use only the complete refresh method, then FlexCDC is not required to use Flexviews.

In the following example, the total sales amounts are calculated (for all time) from a monthly summary. The monthly summary is actually another MV. This saves a significant amount of time. If this MV did not access the other MV, then the SQL necessary to compute this list would take over 40 minutes to execute, because that is how long it takes to create the monthly summary from scratch.

Here is an example of a complete refresh MV which computes the total sales for all customers. It computes this from the dashboard_top_customers MV, ordering them in descending order:

mysql> call flexviews.create(
-> 'demo','dashboard_top_customers','COMPLETE');
Query OK, 1 row affected (0.00 sec)

mysql> call flexviews.set_definition(
->flexviews.get_id('demo','dashboard_top_customers'),
    -> 'select customer_id,
    '> sum(total_price) total_price,
    '> sum(total_lines) total_lines
    '> from demo.dashboard_customer_sales dsc
    '> group by customer_id
    '> order by total_price desc;
    '> ');
Query OK, 1 row affected (0.00 sec)

mysql> call flexviews.create(

-> 'demo','dashboard_top_customers','COMPLETE');

Query OK, 1 row affected (0.00 sec)

mysql> call flexviews.set_definition(

->flexviews.get_id('demo','dashboard_top_customers'),

-> 'select customer_id,

'> sum(total_price) total_price,

'> sum(total_lines) total_lines

'> from demo.dashboard_customer_sales dsc

'> group by customer_id

'> order by total_price desc;

'> ');

Query OK, 1 row affected (0.00 sec)

When the view is ‘enabled’, the contents are created:

mysql> call flexviews.enable(
-> flexviews.get_id('demo','dashboard_top_customers'));
Query OK, 0 rows affected (5.73 sec)

mysql> call flexviews.enable(

-> flexviews.get_id('demo','dashboard_top_customers'));

Query OK, 0 rows affected (5.73 sec)

It only took about six seconds to calculate the total sales for all customers from the dashboard_customer_sales table. If you were paying attention to the incremental example, the dashboard_customer_sales table just happens to be the view we created above.

I picked this example, because it is important to understand that you can build complete refresh views on top of incremental ones. This will allow you to create complete refresh views that refresh much more quickly.

If this had to actually access the orders, order_lines and customers tables directly (the tables on which dashboard_customer_sales is built), then the query would take significantly longer (40+ minutes):

mysql> SELECT 
-> customer_id as `customer_id`,
-> SUM(price * quantity) as `total_price`,
-> COUNT(*) as `total_lines`
-> FROM  demo.orders as o
-> JOIN demo.customers as c USING (customer_id)
-> JOIN demo.order_lines as ol USING (order_id)
-> GROUP BY (customer_id)
-> ORDER BY total_price desc
-> LIMIT 10;
+-------------+-------------+-------------+
| customer_id | total_price | total_lines |
+-------------+-------------+-------------+
|         689 |      770793 |        3811 |
|        6543 |      754138 |        3740 |
|        5337 |      742034 |        3674 |
|        5825 |      738420 |        3593 |
|        5803 |      733495 |        3670 |
|        1579 |      732507 |        3666 |
|        9316 |      731091 |        3610 |
|        2046 |      722631 |        3531 |
|        6319 |      720100 |        3572 |
|        6019 |      718031 |        3475 |
+-------------+-------------+-------------+

10 rows in set (43 min 10.11 sec)

mysql> SELECT

-> customer_id as `customer_id`,

-> SUM(price * quantity) as `total_price`,

-> COUNT(*) as `total_lines`

-> FROM demo.orders as o

-> JOIN demo.customers as c USING (customer_id)

-> JOIN demo.order_lines as ol USING (order_id)

-> GROUP BY (customer_id)

-> ORDER BY total_price desc

-> LIMIT 10;

+-------------+-------------+-------------+

| customer_id | total_price | total_lines |

+-------------+-------------+-------------+

| 689 | 770793 | 3811 |

| 6543 | 754138 | 3740 |

| 5337 | 742034 | 3674 |

| 5825 | 738420 | 3593 |

| 5803 | 733495 | 3670 |

| 1579 | 732507 | 3666 |

| 9316 | 731091 | 3610 |

| 2046 | 722631 | 3531 |

| 6319 | 720100 | 3572 |

| 6019 | 718031 | 3475 |

+-------------+-------------+-------------+

10 rows in set (43 min 10.11 sec)

As an added benefit, if you build more than one complete refresh view from an incrementally refreshable one, then you can keep the complete views transactionally consistent with each other, as long as you refresh all of the complete views after the incremental one.

We can use the new MV (dashboard_top_customers) to list the top 10 customers nearly instantly. Note that every MV gets a special column mview$pk which is an auto_increment BIGINT surrogate key for the table. For complete refresh views which are ordered with ORDER BY, such as this one, this creates a ranking function automatically. In cases where this is not useful, simply ignore this column. It is used to prevent wide innodb primary keys on the MV.

mysql> select mview$pk as rank,
 customer_id,
 total_price,
 total_lines
 from demo.dashboard_top_customers
limit 10;
+------+-------------+-------------+-------------+
| rank | customer_id | total_price | total_lines |
+------+-------------+-------------+-------------+
|    1 |         689 |      770793 |        3811 |
|    2 |        6543 |      754138 |        3740 |
|    3 |        5337 |      742034 |        3674 |
|    4 |        5825 |      738420 |        3593 |
|    5 |        5803 |      733495 |        3670 |
|    6 |        1579 |      732507 |        3666 |
|    7 |        9316 |      731091 |        3610 |
|    8 |        2046 |      722631 |        3531 |
|    9 |        6319 |      720100 |        3572 |
|   10 |        6019 |      718031 |        3475 |
+------+-------------+-------------+-------------+
10 rows in set (0.00 sec)

mysql> select mview$pk as rank,

customer_id,

total_price,

total_lines

from demo.dashboard_top_customers

limit 10;

+------+-------------+-------------+-------------+

+------+-------------+-------------+-------------+

| 1 | 689 | 770793 | 3811 |

| 2 | 6543 | 754138 | 3740 |

| 3 | 5337 | 742034 | 3674 |

| 4 | 5825 | 738420 | 3593 |

| 5 | 5803 | 733495 | 3670 |

| 6 | 1579 | 732507 | 3666 |

| 7 | 9316 | 731091 | 3610 |

| 8 | 2046 | 722631 | 3531 |

| 9 | 6319 | 720100 | 3572 |

| 10 | 6019 | 718031 | 3475 |

+------+-------------+-------------+-------------+

10 rows in set (0.00 sec)

You can also use this MV to display the total sales amount for any particular customer.

Notice how you can use this summary to calculate the number of lines in the order_lines table, three orders of magnitude more quickly than COUNT(*).

mysql> select count(*) cnt from order_linesG
*************************** 1. row ***************************
cnt: 155187034
1 row in set (32.03 sec)

mysql> select sum(total_lines) cnt from dashboard_top_customersG
*************************** 1. row ***************************
cnt: 155187034
1 row in set (0.03 sec)

mysql> select count(*) cnt from order_linesG

*************************** 1. row ***************************

cnt: 155187034

1 row in set (32.03 sec)

mysql> select sum(total_lines) cnt from dashboard_top_customersG

*************************** 1. row ***************************

cnt: 155187034

1 row in set (0.03 sec)

Notice the difference in response time.

Refresh method performance comparison

For demonstration purposes, I did the following:

Created one view of each type. The view is the same as the one specified above.
Both views took about 40 minutes to build
Deleted 100 order lines.
Refreshed both views.
Compared the refresh performance and query results.

The view type does not usually* affect the time it takes to build the MVs the first time. Both of these views build in about the same about of time:

mysql> call flexviews.enable(
-> flexviews.get_id('demo','complete_example2'));
Query OK, 0 rows affected (42 min 42.14 sec)

mysql> call flexviews.enable(
-> flexviews.get_id('demo','dashboard_customer_sales'));
Query OK, 0 rows affected (41 min 52.04 sec)

mysql> call flexviews.enable(

-> flexviews.get_id('demo','complete_example2'));

Query OK, 0 rows affected (42 min 42.14 sec)

mysql> call flexviews.enable(

-> flexviews.get_id('demo','dashboard_customer_sales'));

Query OK, 0 rows affected (41 min 52.04 sec)

*If you use MIN/MAX/COUNT_DISTINCT, a secondary view will be built in the flexviews schema, transparently, to manage the distinct values for each group in the view. This will increases the time it takes to build, and the refresh will be more expensive. This is an optimization which is required to efficiently refresh MV which use those aggregate functions.

Now I delete some line items from orders:

mysql> delete
-> from order_lines
-> where order_id
-> between 1 and 100
-> limit 500;
Query OK, 484 rows affected (0.27 sec)

mysql> delete

-> from order_lines

-> where order_id

-> between 1 and 100

-> limit 500;

Query OK, 484 rows affected (0.27 sec)

Since the base table has changed, the views now exhibit drift. There should be 484 less lines reflected in the count in our MVs but they are out-of-date and must be refreshed:

mysql> select sum(total_lines) from complete_example2;
+------------------+
| sum(total_lines) |
+------------------+
|        155187034 | <-- too high
+------------------+
1 row in set (0.68 sec)

mysql> select sum(total_lines) from  dashboard_customer_sales ;
+------------------+
| sum(total_lines) |
+------------------+
|        155187034 | <-- too high
+------------------+
1 row in set (0.61 sec)

mysql> select sum(total_lines) from complete_example2;

+------------------+

| sum(total_lines) |

+------------------+

| 155187034 | <-- too high

+------------------+

1 row in set (0.68 sec)

mysql> select sum(total_lines) from dashboard_customer_sales ;

+------------------+

| sum(total_lines) |

+------------------+

| 155187034 | <-- too high

+------------------+

1 row in set (0.61 sec)

Refreshing the MVs

The view which uses the complete refresh method takes a very long time to refresh:

mysql> call flexviews.refresh( 
-> flexviews.get_id('demo','complete_example2'),'BOTH',NULL); 
Query OK, 0 rows affected (42 min 42.14 sec)

mysql> call flexviews.refresh(

-> flexviews.get_id('demo','complete_example2'),'BOTH',NULL);

Query OK, 0 rows affected (42 min 42.14 sec)

But the incrementally refreshable version does not take long to refresh:

mysql> call flexviews.refresh( 
-> flexviews.get_id('demo', 
-> 'dashboard_customer_sales'),'BOTH',NULL); 
Query OK, 0 rows affected (7.01 sec)

mysql> call flexviews.refresh(

-> flexviews.get_id('demo',

-> 'dashboard_customer_sales'),'BOTH',NULL);

Query OK, 0 rows affected (7.01 sec)

The second one is 365x faster because it examines only the rows that changed. This exceptional ability to look at only what changed, even for MVs with aggregation and joins, is the value proposition for Flexviews.

The flexviews.refresh() stored procedure

The first parameter to flexviews.refresh() is the materialized view id. Each view has an identifier which can be obtained with flexviews.get_id(‘schema’,’table’).

The second parameter reflects the refresh type. This parameter can take the options: ‘BOTH’,’COMPUTE’,’APPLY’, or ‘COMPLETE’. I specified ‘BOTH’ for the second parameter in both examples. ‘BOTH’ means compute the changes to the view, and also apply them. This is a combination of the ‘COMPUTE’ and ‘APPLY’ options. When you specify ‘COMPUTE’ then Flexviews will compute the row changes for the view, but not apply them. If their are changes which have not been applied, then you can apply those changes by passing the ‘APPLY’ option.

In both examples, I pass NULL as the last parameter. Materialized views which based on the complete refresh method will always take NULL for this parameter. For incrementally refreshable views, the last parameter is a transaction id number, which is usually obtained with flexviews.get_uow_id_from_datetime() function. If you pass a NULL value, then it refreshes the view up to the latest changes which have been collected. You can use this to refresh multiple incremental refresh views to the same transactional point in time.

Also note that you can ‘COMPUTE’ or ‘APPLY’ changes to a particular transaction id. Logically, you can not apply changes past the transaction id to which you have computed them.

And then confirm they contain the same results

mysql> select sum(total_lines) from dashboard_customer_sales ; 
+------------------+ 
| sum(total_lines) | 
+------------------+ 
| 155186550        | 
+------------------+ 
1 row in set (0.64 sec) 

mysql> select sum(total_lines) from complete_example2 ; 
+------------------+ 
| sum(total_lines) |
+------------------+ 
| 155186550        | 
+------------------+ 
1 row in set (0.68 sec)

mysql> select sum(total_lines) from dashboard_customer_sales ;

+------------------+

| sum(total_lines) |

+------------------+

| 155186550 |

+------------------+

1 row in set (0.64 sec)

mysql> select sum(total_lines) from complete_example2 ;

+------------------+

| sum(total_lines) |

+------------------+

| 155186550 |

+------------------+

1 row in set (0.68 sec)

Conclusion

Flexviews supports two refresh methods, the complete method and the incremental method. I think you will agree that the incremental method has significant advantages over the complete method. In this example the former method was over 350x faster than the latter.

I also showed how you can combine both types of views together. The complete method examples show how to create a complete refresh view which reads from an incrementally refreshable one. This allows the use of SQL features not available with the incremental method, like ORDER BY or use of NOW(), it but still provides improved performance during refresh by accessing summarized data.

I hope this helps you understand how Flexviews can help you ensure fast response times in your application by making access to summary data efficient.

28 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jonathan

13 years ago

That’s a really good post about Flexviews.
I also didn’t know about the convert.php part.
With that, it couldn’t be more simple to use.

Justin Swanhart

Author

13 years ago

Jonathan,

Yes, convert.php makes things easier. Previously it was only available on the website, and that version didn’t work as well. The version included with Flexviews SVN uses a PHP based SQL parser that I maintain, and works much better than the old version did.

Please report any bugs on the issue page. I’ve mostly tested it with the output of the Pentaho Aggregation Designer to build aggregate tables for Mondrian.

Frank Xu

13 years ago

Thanks for your detailed explanation, and i’m looking forward to new improved version of flexviews,
thanks for your great work.

Justin Swanhart

Author

13 years ago

Hi,

The flexviews.mview_signal table is table that Flexviews uses to determine the uow_id of a transaction, after that transaction commits. The refresh mechanism doesn’t currently detect if there are no changes to examine. It will query the changelogs even when they contain no changes, and do an insertion into the flexviews.mview_signal table.

I need to make some changes Flexviews to avoid doing work when there are no changes to the database.

I also need to add a clean-up stored procedure to remove the rows from changelog tables, the mview_uow table, the signal table and other tables, using the oldest change required by any incremental view.

I’m working on Shard-Query improvements right now, but will have some time next week to work on Flexviews. I’ll implement the following as an automated script.

In the meantime, here is kind of a roadmap you can follow:

First, you need to detect the oldest uow_id needed by any view:
select min(least(incremental_hwm,refreshed_to_uow_id)
from flexviews.mview
into @oldest_needed_uow_id;

And then delete the signal rows with the following:
set sql_log_bin=0;
delete from flexviews.mview_signal;

Don’t do the following if you have any views that you have enabled but not yet refreshed! If you delete from flexviews.flexviews_mview_signal, the refresh method may not be able to find the uow_id at which it was created! Simply refresh a view once, immediately after creating it, to ensure there is no problem.

delete from flexviews.mview_signal_mvlog where uow_id < @oldest_needed_uow_id;

Finally, iterate over the tables in flexviews.mvlogs, removing any rows where uow_id < @oldest_needed_uow_id.

Frank Xu

13 years ago

Hi,
I’m using the great flexviews, and I have a question about the mview_signal table.
When i refresh an mview, I notice that the uow_execute procedure will insert a new row into mview_signal, then the newly inserted row will be captured by Flexcdc and insert into flexviews_mview_signal. Is this right?

The question is when no data change happened in my own database, the table mview_signal table grows every time i call flexviews.refresh, how could I fix this?
thank you.

Justin Swanhart

Author

13 years ago

Jonathan,

Please report any bugs on the issue page. I’ve mostly tested it with the output of the Pentaho Aggregation Designer to build aggregate tables for Mondrian.

Jonathan

13 years ago

That’s a really good post about Flexviews.
I also didn’t know about the convert.php part.
With that, it couldn’t be more simple to use.

aivan

12 years ago

Hi I tried the refresh procedure but I got this error.

“Unknown column ‘ERROR: XYZ UNSUPPORTED REFRESH METHOD’ in ‘field list'”

I need help regarding this error.

aivan

12 years ago

Hi what is the cause of this error :

Unknown column ‘ERROR: ERROR: SIGNAL ID NOT FOUND (FlexCDC consumer is likely behind)’ in ‘field list’

Atrevido

12 years ago

“ERROR: ERROR: SIGNAL ID NOT FOUND”
This error keeps coming up when I try to refresh the incremental views. I realized that this is because flexviews_mview_signal table is empty, but no matter what I do, nothing is inserted into this table. What do I do??
Your suggestion would greatly help.

Thanks in advance

Atrevido

12 years ago

Hi aivan
There seems to be some problem with the auto_changelog setting in the consumer.ini file. The SIGNAL ID NOT FOUND error keeps coming when we try to refresh the incremental refresh. Try with the auto_changelog setting as ‘false’. It worked for me.

Cheers

Kairen.Huang

11 years ago

Dear Justin,
How to refresh the MV that created by using “call flexviews.create(‘test’,’mv1′,’INCREMENTAL’);” automatically? thanks.

Kairen.Huang

11 years ago

When I set “auto_changelog=true” in the consumer.ini, flexviews can’t work and throws “Restarting FlexCDC!”(can’t stop the Terminal until close the window)

Justin Swanhart

Author

11 years ago

Karien:
You can automatically refresh an INCREMENTAL refresh view using MySQL events. There are sample events included in the distribution. You’ll have to edit (or just use) the event that refreshes all materialized views, and only refresh the one you want.

I haven’t tested auto_changelog=true recently and it appears to be broken. Don’t use automatic change-logging (I think there is a note in the distribution about that) with Flexviews.

Kairen.Huang

11 years ago

Thanks Justin so much. The success code as :

CREATE EVENT if not exists e_test
on schedule every 15 second
on completion preserve
do call flexviews.refresh(14,’BOTH’,NULL);

set global event_scheduler=1;

Kairen.Huang

11 years ago

Dear Justin,

How can I use “[source]” and “[dest]” in “consumer.ini” file to refresh the MV data in another database(different host)?
Thank you.

guojhq

11 years ago

Hi what is the cause of this error :

Unknown column ‘ERROR: ERROR: SIGNAL ID NOT FOUND (FlexCDC consumer is likely behind)’ in ‘field list’

Atrevido

11 years ago

The “ERROR: ERROR: SIGNAL ID NOT FOUND (FlexCDC consumer is likely behind)” is usually confronted when the flexviews_mview_signal table gets stale and is not logging the changes on the original table ‘mview_signal’ (Both in the flexviews database). This is probably because either your consumer has stopped running or it has stopped logging the table. Check in the mvlogs table if the flexviews_mview_signal has an entry and is active.

Everett

11 years ago

Very interesting.. thank you! 🙂

Dhairya

10 years ago

Hello

I m just getting confused with the table and materialized view concept. what is the difference if we are to process about 40 million records and crunching the records using group by queries to make it to about 4 million records and saving it to another table. and if we do the same process as described above and use materialized view instead of table to save those 4 million records ..
Please let me know where in the process does it make the difference between the two.

Vadim

9 years ago

Hello,

I have one problem – everything is wokring – just the refreshing doesn’t succeed – error mesage:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘t.uow_id >37 AND t.uow_id <=41' at line 3

I'm using the incremental type – FlexCDC is working – i see new netries in the table mvlog_….
I try refrishing with the following SQL:
CALL flexviews.refresh(flexviews.get_id('views','transaction_view'),'BOTH',NULL);

Justin Swanhart

9 years ago

Vadim,

Are you using the latest version of Flexviews from http://github.com/greenlion/swanhart-tools?

Vadim Sacharow

9 years ago

Yes – i have downloaded the latest version on sunday.
Will it be enough to copy all entries from the mvlog_ table to my view and delete all entries from the mvlog_ table?
I can do it with simple sql.

The logic with the transaction ids is not needed for me – i want all updates since last refreshing

Justin Swanhart

9 years ago

Vadim,

If you have only single table views, then you can merge the changes from the log to the table. If you have views with more than one table, that won’t work properly.

Also, as a debug measure, can you try to refresh your view and when it fails do:
select @v_sql;

That should print out the SQL that failed and I can debug the problem.

Vadim

9 years ago

Here is the SQL – thee is no WHERE … may be it is the problem

INSERT INTO vadim_views.transaction_view_delta
SELECT (t.dml_type * 1) as dml_type,t.uow_id as uow_id,t.fv$gsn as fv$gsn ,(t.id) as id, (site.name) as siteName, (c.name) as categoryName, (a.shopId) as shopId, (a.title) as title, (a.startPrice) as startPrice, (a.listingType) as listingType, (t.price) as price, (t.shopDate) as shopDate, (b.name) as buyerName, (s.name) as sellerName, (a.url) as url, (a.id) as auctionId, (s.id) as sellerId, (b.id) as buyerId, (crc32(concat(t.id,site.name,c.name,a.shopId,a.title,a.startPrice,a.listingType,t.price,t.shopDate,b.name,s.name,a.url,a.id,s.id,b.id))) as mview$hash
FROM flexviews.mvlog_66e71d03820fed5312b77f14ca205f83 as t
JOIN vadim.shop_category as c ON a.shopCategoryId = c.id
JOIN vadim.shop_seller as s ON a.shopSellerId = s.id
JOIN vadim.shop_site as site ON a.shopSiteId = site.id
JOIN vadim.auction as a ON t.auctionId = a.id
JOIN vadim.shop_buyer as b ON t.shopBuyerId = b.id
t.uow_id >37 AND t.uow_id <=42

Justin Swanhart

9 years ago

Interesting. I will check the code to see how the missing WHERE could happen. I’ll let you know when I have a fix.

Justin Swanhart

9 years ago

I pushed a bug fix for your issue into git. You will need to pull a new version and run
cd location of Flexviews (./swanhart-tools/flexviews by default)
mysql> use flexviews;
mysql> \.procs\delta.sql

Try to refresh your view.

Please let me know how it works.

sudheer

6 years ago

it is not showing consistence result when compare with MV and direct queries.

Ex: I have created flexviews with the following query and checked for one account count it is mismatching.

select count total_count ,account_id from tickets group by account_id ;

for checking I have used the following command in mysql.

select count(1) from helpkit.helpdesk_tickets where account_id = 281917; call flexviews.refresh(6,’BOTH’,null); select * from helpkit.account_count_3 order by total_count desc limit 1;

I got the following output.

mysql> select count(1) from helpkit.helpdesk_tickets where account_id = 281917; call flexviews.refresh(6,’BOTH’,null); select * from helpkit.account_count_3 order by total_count desc limit 1;
+———-+
| count(1) |
+———-+
| 907765 |
+———-+
1 row in set (0.25 sec)

Query OK, 2 rows affected (0.15 sec)

+———-+————-+————+
| mview$pk | total_count | account_id |
+———-+————-+————+
| 13217 | 907753 | 281917 |
+———-+————-+————+
1 row in set (0.00 sec)

original count is 907765 but MV is showing 907753

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Flexviews – part 3 – improving query performance using materialized views

Selecting a refresh method

The cost of the query

SQL features used in the query

Refresh methods

The incremental refresh method

The Flexviews SQL_API

Enable the view to use it

Data dictionary

Using the dictionary

The complete refresh method

Refresh method performance comparison

Refreshing the MVs

The flexviews.refresh() stored procedure

And then confirm they contain the same results

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Valkey/Redis: Configuration Best Practices

Valkey/Redis: The Hash Datatype

Valkey/Redis Replication and Auto-Failover With Sentinel Service

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation