A technical viewpoint about Wordpress and its performance.
WordPress born 13 years ago and it is still the king of the hill but what is Wordpress?. In an easy word, it’s a platform where everybody could create a dynamic website in a snap, there are tons of plugins, and it moves a considerable market between plugins, design, designers and installers and so on.
It is a HUGE market, and some developers and designers only work for Wordpress.
13 years ago, an average server featured 32mb-128mb of RAM, but things are changed considerably. My phone (it’s a cheap Chinese phone) has 2gb of ram (or 20 times the capability of a 13-year-old server), so the technology has changed a lot. However, there are still some shared-hosting that features 256mb/512mb RAM.
Also, the market has changed, Wordpress borns as a blog system and is still a blog but, how many sites are still using it as a blog?.
In the past, a personal blog site consisted of less than 100 entries. However, it was the past. Now, it is not rare to find a Wordpress Site featuring 1000 blog entries, 500 pages, and 30k products. Wordpress wasn’t developed thinking on such volume.
Well, not really. Some plugins cache the information, it increases the performance considerably by breaking the limits of Wordpress. However, it has a price, real-time data. If the information is cached, then it is not read in real-time, I will talk about it later.
It’s the database model. The model of Wordpress is based on flexibility; a single WordPress editor could add a new post, field, and comment without changing the model. It is easy to edit, but it impacts the performance of Wordpress. Wordpress has some other nasty feature: the use of binary fields but the flexibility is the worst enemy of Wordpress.
For example, let’s say a single Wordpress (without plugins, just a two post on the front page).
It is what Wordpress does with the Database (over-simplified):
select * from wp_posts
where post_status=’publish’
and post_type=’post’
order by post_date desc
Funny but the columns post_status and post_type are indexed together with the post_date.
The previous query loads fast, and there is nothing wrong with it. However, it is not the real and complete query. Let’s say that we need to add two tags to the post.
The previous query reads only the POST, it doesn’t read the Tags (called meta in the database).
Now, the real query could look like:
select * from wp_posts wp
inner join wp_term_relationships wtr
on wp.id=wtr.object_id
inner join wp_term_taxonomy wtt
on wtr.term_taxonomy_id=wtt.term_taxonomy_id
inner join wp_terms wt
on wtt.term_id=wt.term_id
where wp.post_status='publish'
and wp.post_type='post'
order by wp.post_date desc
We are working with 4 tables, no less.
In easy words this query does the next steps:
Those lookups are fast because it uses indexes. However, each index is not magic, and they have a cost, and its performance is affected by the size of the tables. It’s not the same to do a lookup to a table with 100 rows than a lookup with a table with 1 million rows (even if we use an index).
How many blogsite owners are said: “my site ran perfectly but now it’s dog slow”? It is because the performance is inverse proportional to the size of the tables (and the amount of concurrent users).
Every time we read a post, we not only read the post and the meta (tags) but also:
So the real query could be quite long and it uses a lot of resources but we stay with the posts and meta.
And Wordpress also uses binary column (tips: they are painful slow).
Because it’s flexibility and it has a cost and what’s the cost?. Performance.
Well yes and no. There are a lot of plugins for Wordpress that works as cache. Wordpress can’t live without cache but and again, cache is not magic, it has a cost (why everything has a cost? why?).
Cache works in this way.
However, what if the cache is not updated?. then the Cache returns outdated information. Usually, it is not a big deal, but it’s something to consider.
There is another problem, we can’t cache everything, so the cache is in constant flux from invalidating/reloading.
And there is a third problem, costs. We could cache using the memory, database or file system, the goal of the cache is speed and memory has the speed but memory is always limited.
Let’s say we want to rent a dedicated server, the power of the server is not measured by only the CPU but by the amount of memory.
For example, a 16gb server could cost 150 per month, with 32gb it increases +20 more (around 10%), 64gb then +50 more and up to 128gb (it’s common the limit is 64gb or 128gb). Memory is scarce and expensive.
There are more caches, for example, cache (at the level of the web server such as Varnish), they do marvel but with the same drawbacks. MySql also has its own cache, PHP too (OpCache) but OpCache works caching the code (instead of information), yet it is highly recommended.
But sadly, the use of cache is only a patch, it’s not a real solution, it is a must have but it is not viable for every situation, and cache doesn’t work always (it doesn't work if the information wasn't cached or it's explicitily excluded of the cache), or it’s for free. It’s not uncommon to find Wordpress sites that it takes 3–5 seconds to load a single website (even if the site uses a cache). A real solution is to generate or buy a custom system, but it increases the costs tenfold times.
I don’t blame Wordpress, it happens with practically every flexible system. There are worst cases such as SAP. Some companies are biting the bullet thanks to the slowness of SAP. For example SAP HANA, it requires at least 64gb of RAM (for a basic-empty-no-concurrent-system) but usually, it is used on 1TB servers (they are really expensive).
Also published on Medium https://medium.com/cook-php/blues-for-wordpress-98713836f9d4