In my last post, we talked about sql views. Yes, I did convince you that they are really cool – but lets face it, is there any improvment in performance? Well, there is none, but even if – it's too small. IMHO views are just facitilation, they makes our life easier and they look prettier – we like pretty things, dont we :smile: ? At the end of my last post I mentioned about materialized views, and thats the topic of this one !
I want you to consider materialized views as a file, a file in which our tables are saved, with the records we've chosen. Database fetches those records directly from a file, with just one query, so we it does its job in just one query. That comparison to file, is just parallel, in real it is alot more advanced !! In our case, which was to implement advanced search, through 5 000 000 records, we had about 200% performance boost, and I am pretty sure that number can become alot higher.
The biggest flaw of materialized views is that they do not refresh automaticly, so we have to trigger it by ourself. Additionaly, we cannot update the view partially, it means after each change in our database we have to recreate materialized view. Since PostgresSQL 9.4, we have an option to refresh view concurrently, which is just amazing because before that change, each time the view was beeing refreshed the table was unavaiable to read/write. It was quite obstance when the view took long to refresh, in our case, the whole refresh takes about 500ms, and thanks to concurrently refreshes we do not have any delays in usage.
Ok lets do some codding, firsly lets create migration with up and down !
class AddBasicInformationView < ActiveRecord::Migration def self.up execute <<-SQL CREATE MATERIALIZED VIEW post_basic_informations AS SELECT row_number() OVER() AS unique_id, posts.id, posts.name, posts.actualization_status, comments.id as comment_id, comments.text as comment_text, users.id as users_id, users.name as users_name, images.id as images_id, images.name as images_name, FROM posts LEFT JOIN comments ON comments.post_id = posts.id LEFT JOIN users ON users.post_id = posts.id LEFT JOIN images ON images.post_id = posts.id CREATE UNIQUE INDEX unique_post_basic_informations ON post_basic_informations(unique_id); SQL end def self.down execute <<-SQL DROP MATERIALIZED VIEW IF EXISTS post_basic_informations; SQL end end
What's the diffrence compared to normal view:
- we have to add MATERIALIZED prefix in first line,
- we have to create an column with unique id, row_number() OVER() as unique_id and add an index to id CREATE UNIQUE INDEX unique_post_basic_informations ON post_basic_informations(unique_id);
You are propably wondering, why do we need unique_id column? We need a column, with unique id only if we will be using concurrent refreshing. This column is used during the refresh process, to duplicate a table, and make it usable for users during that process.
Now, rake db:migrate. Boom –its done ! Dont worry if the migration takes alot time, its materialized view, it can take some time !
Now lets create ActiveRecord class !
class PostBasicInformation < ActiveRecord::Base self.table_name = 'post_basic_informations' def readonly? true end def self.refresh ActiveRecord::Base.connection.execute('REFRESH MATERIALIZED VIEW CONCURRENTLY post_basic_informations') end end
Ok, as I said before – now we have to trigger the refresh. My thinking was like that:
- if any of my tables row number got changed– it means, we have to refresh that view after each create/destroy in tables: posts/comments/users/images
- if any of my tables got updated – i have to trigger a function that will check if any of my columns used in materialized view got changed, so if id/name/actualization_status in posts, or id/text in comments or id/name in images/user got changed I have to refresh the view.
I will shorty tell how I've done it, but everyone can do it in his way.
We use microservices to create/update records, so:
- in each create/destroy service of posts/users/comments/images we call PostBasicInformation.refresh at the end of process method, and thats it, beacuse after each create/destroy the table MUST be refreshed
- in each update service, before the record is updated, I save in hash values of tracked columns and then compare them in PORO class with values after update, if they do not match – we execute PostBasicInformation.refresh