Installation and configuration of the Magento Mage_Cache_Backend_File in versions prior EE1.13 / CE 1.8

The Latest Magento releases (community edition 1.8 apha and enterprise 1.13) come with a new cache backend, Mage_Cache_Backend_File.

This backend reduces inode usage when saving the cached content.

You are on a prior version but want to use it? No problem, but there are few things to fix before

Continue reading “Installation and configuration of the Magento Mage_Cache_Backend_File in versions prior EE1.13 / CE 1.8”

The Magento Mage_Cache_Backend_File backend gives some holidays for your inodes

Magento allows to customize the way your website manages its cache. This is done by choosing the appropriate cache backend. I’ve provided an example of this update for the full page cache here

By default, your cache will save cached content into files, due to a cache backend existing in both Magento community and enterprise editions, based upon the Zend_Cache_Backend_File, that saves cached content into files

Magento EE 1.13 comes with a new cache storage: Mage_Cache_Backend_File. This backend is an extension of the native cache backend Zend_Cache_File with some improvements regarding performances: Let’s see what is good inside and if we can use it yet

Continue reading “The Magento Mage_Cache_Backend_File backend gives some holidays for your inodes”

Support for cache tags for all cache engines in the Magento enterprise version 1.13

Did you follow the Magento news, you probably heard that Magento Enterprise version 1.13 completely moved the indexer’s logic to the database server.

This for sure leads to a significant increase in your performance.

But are the indexers the only explanation for those better performances? No: there is a few other things involved in this upgrade: one of these is the support of the cache tags for all cache backends.

Continue reading “Support for cache tags for all cache engines in the Magento enterprise version 1.13”

High traffic websites under Magento: the Teleshopping example

Téléshopping, one of the most older Magento project in France. With 41Meuros, website represents a tier of the turnover. Let’s see which technical choices has been made to provide a high performance website

Hello Gilles, maybe everyone doesn’t know you yet, could you please introduce yourself?

I’m in charge of the production center at Teleshopping. I am in charge of the operation, administration and system architectures for ERP and websites and for its subsidiaries (Euroshopping, Place des tendances and partly Direct Optic).

Two of your websites, Teleshopping and Euroshopping, are Magento-developed projects. Could you please tell us more about them?

Sure, Teleshopping’s website is among the first french Magento-developed websites, deployed in April 2009, and regularly updated since. There are approximately 1M unique visitors per month.

It allows its users to order products they have seen during the TF1 TV show. Ordering products from the catalog is done through another website.

Euroshopping’s website was deployed a little later, and there are about 100 000 unique visitors per month. It allows people to order products they have seen during TV shows on different TNT channels or on the cable.

Could you describe the general architecture within these two websites?

For Teleshopping, we have 10 front-end servers available for visitors, and one more for the back-office access

Proxy cache varnish is ahead each one of the servers. This may not be the best solution, but it allows us to regulate the number of front-ends without changing the configuration or having to ask ourselves too many questions.

Concerning the application, the source code is installed on every front-end server, there’s no replication and deliveries are hand-made.

But we share the media and skin directories on an NFS resource.

We use a memcached server with 1 port to put data in cache, and another port to put sessions in cache; this way we can flush data without any impact on the user’s sessions.

Concerning the database, we use a MySQL server which is replicated on a secondary server. All accesses are done on the main server.

Last element, the CDN. Medias are served by a CDN, which allows us to relieve at the outside our front-ends, even during the business campaigns.

Source images are put on an NFS resource available on every front-end.

For Euroshopping it’s at the most the same, except there’s no varnish, the 4 servers available are more than enough to manage the load.

How often do you deploy new versions?

It’s an average, but we do about 1 delivery in the production environment per month. Sometimes we may have more deliveries than that.

Could you tell us about your last deployment?

Every developer has its own development environment on a virtualized server, which is strictly the same as the production one.

A testing environment is available to validate the package before it’s installed on a pre-production environment. This delivery is done by my team.

Once the project manager has tested and approuved this package, it is delivered in the production environment, my team still doing it.

You are sharing the front-ends server’s contents on a NAS. What content do you exactly share?

We share the media and skin directories on our NAS, a filer NetApp. This sharing system is almost as performing as the local disk access, and its vantage is that we do not have to duplicate media files.

What made you choose this setup?

This sharing process is almost as performing as the local disk access, and its vantage is that we do not have to duplicate media files.

NFS and PHP tuning allows to manage access to the shared files with different cache levels.

We can manage the stock volume without any outage.

Could you tell us about the main problems you encountered while you were setting up this model?

We didn’t have any particular problem, only some tuning was needed for the NFS:

  • We mount the NFS shared folder with the following options tcp, rw, bg, hard, suid, noatime, nodiratime, nointr, timeo=600, retrans=2, rsize=32768, wsize=32768, actimeo=30
  • We have increased the PHP realpath cache to have a size of 1M and a TTL of 1 day

What benefits do you get from the source sharing between the different front-ends on a NAS?

Source sharing is useful to deliver the source code only once, it is then available for all of the front-ends; we do not need a “master” server from who to replicate the files.

Presently, the websites’s sources don’t allow us to share the whole source code, partly because of the /var/reports and /var/logs directories. For the next versions if source code allows it, we intend to share the whole source code for every front-end server.

Make a clean sweep of commons Magento cache_lifetime workarounds usage

As I mentioned before, Magento cache policy API is based upon the Zend Framework Zend_Cache component. In this cache policy, there are three elements :

  • Cache key that defines the unity of the data we want to cache
  • Cache tags that allow to cluster our cache data
  • Cache lifetime that defines the duration within we consider that our stored data are valid and do not need to be rebuilt

My current topic will be about this latest point: cache lifetime. Its usage is quite simple: we define a duration while data will be stored in cache; during this period, data is always considered as valid, and so, not rebuilt. After that, even if data exists in cache, it will be rebuilt because considered as expired. Simple, no?

But we can hear so many things about cache lifetime usage that it seems interesting to evoke some specific points about its usage in Magento

What happens when Magento cache lifetime is specified as null?

When Magento cache lifetime is considered as null?

Excluding case you have defined a getCacheLifetime method which returns null value, this case occurs when you do not define the cache lifetime on one object which extends Varien_Object:

  • When rendering block content, Magento will call the getCachelifetime block method
  • If you have not defined a getCacheLifetime method, Magento will use Varien_Object __call method to retrieve cache_lifetime from attribute _data
  • If this index does not exists in _data, __call will return null

How is managed cache lifetime null value by Magento for block data?

All blocks extends Mage_Core_Block_Abstract. This class provides the following cache management:

As you can see, if lifetime is null the Magento Cache API will never fetch something from cache, so rebuild each time the block content. But also nothing will be stored in the cache(_saveCache method check if also there is a non null lifetime before save)

If you do not set up a cache lifetime for a block content, it will never be cached

How is managed cache lifetime null value by Magento blocks for non block data?

The cache API can also be used for other things than block output. This is the case for configuration cache. In this case, the control made by Mage_Core_Block_Abstract does not exist, and data will be cached: we’ll see that if cachelifetime duration is null for something else than block, it will use the default lifetime value like id you set up false as a cache lifetime value: 7200sec (2 hours)

What happens when cache lifetime is set up to 0 ?

Sometimes, we can hear that setting 0 as cache_lifetime duration disable cache. No, and this is for the following reason: 0 is a valid cache_lifetime duration.

Magento will define the timestamp until you consider your block content as valid with the following rule: current timestamp of data generation + cache_lifetime; so calculated expiration time will be the generation time. For sure, next time you’ll check if data is in cache and valid, you’ll compare new timestamp with expired timestamp, and there is a great probability that your current timestamp will be later.

Setting 0 as cache lifetime duration render your cached block content always expired

This is one of the worst case, because, each time you build content, you save it in cache, and invalidate it next time you’ll check it

What happens when cache lifetime is set up to false ?

The Zend Framework lifetime management for false values

Every cached backend used in Magento inherits from the Zend_Cache_Backend class

When saving content, Zend_Cache_Backend class will calculate the expiration timestamp trough the Zend_Cache_Backend::getLifetime method. If specified lifetime === false, backend use a default duration stored in its attributes

If you check Zend_Cache_Backend class, you’ll check that this value is hard_coded as 3600 sec (one hour)

But our content is stored longer, why?

Magento forces a default lifetime in its frontend

When the Magento cache API initializes the cache frontend object in the Varien_Cache_Core model, it merges cache options with directives options

This is the case for the cache_lifetime duration: default cache lifetime duration specified on Varien_Cache_Core is merged with directive options

But Varien_Cache_Core class extends Zend_Cache_Core. So when backend will be set up to frontend, we’ll use the native Zend Framework behaviour and so, merge frontend options as a backend directives.

So for all of our backend usage, because Magento forces usage of a Varien_Cache_Core frontend, default lifetime will be set up as 7200s

If your lifetime is set up to false, by default your block will be considered as valid during 7200s

Zend_Cache component usage in Magento

Magento performances is a hot topic subject. One element of the Magento performance is the cache block policy which allows to store in cache part of the page content: You probably have all read the tutorial available through the Magento wiki, written by my old colleague Laurent BOURREL, about the cache block policy available in Magento.

All concepts of this policy are based upon Zend_Cache component. Let’s see how Magento use it

Reminder about Zend_Cache component

Zend cache module structure

Zend_Cache component is divided into two parts, frontends and backends models:

  • Frontend components define what is covered by cache policy:
    • Files
    • Output rendering
    • Method call
    • Class
  • Backend components define where will be stored cached data:
    • In file
    • In memcached
    • In apc
    • ….

This model allows to easily dissociate which data will be cached, and where they will be cached

Zend_Cache usage in Magento

Embedded Zend_Cache backend in Magento

If we check Zend framework version from CE 1.3.2.4 to CE 1.7.0.0 (and related Entreprise versions), we should be able to use the following backend within Magento:

CoreApcBlackholeFileLibmemcachedMemcachedSqliteStaticTestTwolevelsXcacheZend_PlateformZend_Server diskZend_Server shared memory
CE 1.3.2.41.7.2
CE 1.3.3.01.7.2
CE 1.4.0.11.9.6
CE 1.4.1.01.9.6
CE 1.4.1.11.9.6
CE 1.4.2.01.10.8
CE 1.5.0.11.11.1
CE 1.5.1.01.11.1
CE 1.6.0.11.11.1
CE 1.6.1.01.11.1
CE 1.6.2.01.11.1
CE 1.7.0.01.11.1
EE 1.6.0.01.9.3PL1
EE 1.7.0.01.9.6
EE 1.8.0.01.9.6
EE 1.9.0.01.10.5
EE 1.9.1.11.10.8
EE 1.10.0.11.11.1
EE 1.10.0.21.11.1
EE 1.11.1.11.11.1
Loaded from Zend FrameworkAdded or updated by Magento sources, but available

How does Magento use Zend_Cache?: the factory Mage_Core_Model_Cache

Before CE 1.4, Cache factory was made by the Mage_Core_Model_App model

Since Magento CE 1.4, all cache management is done through the Mage_Core_Model_Cache class; this class is a factory model that will load the Zend_Cache frontend and backends

Magento Zend_Cache frontend

In Mage_Core_Model_Cache, Frontend class is hard coded and restricted to Varien_Cache_Core. Varien_Cache_Core inherits from the frontend class Zend_Cache_Core.

So we are unable to use natively other frontends than Zend_Cache_Core

Magento Zend_Cache backend

Why is there some Backend available in Magento which does not exist in Zend_Cache component?

But this factory method also provides the ability to use some other backends storage that does not exist in Zend_Cache; see the following examples :

  • Database
  • Eaccelerator

These backends have been written by Varien core team

In conclusion, the Zend_Cache backend defined by Magento are the following ones
CoreApcBlackholeFileLibmemcachedMemcachedSqliteStaticTestTwolevelsXcacheZend_PlateformZend_Server diskZend_Server shared memoryEacceleratorDatabase
CE 1.3.2.41.7.2
CE 1.3.3.01.7.2
CE 1.4.0.11.9.6
CE 1.4.1.01.9.6
CE 1.4.1.11.9.6
CE 1.4.2.01.10.8
CE 1.5.0.11.11.1
CE 1.5.1.01.11.1
CE 1.6.0.11.11.1
CE 1.6.1.01.11.1
CE 1.6.2.01.11.1
CE 1.7.0.01.11.1
EE 1.6.0.01.9.3PL1
EE 1.7.0.01.9.6
EE 1.8.0.01.9.6
EE 1.9.0.01.10.5
EE 1.9.1.11.10.8
EE 1.10.0.11.11.1
EE 1.10.0.21.11.1
EE 1.11.1.11.11.1
Loaded from Zend FrameworkAdded or updated by Magento sources, but available

So we will update the available zend_cache backend list according to Magento versions:

And so, what If I want to use a backend that has not been thought of by Varien?

Perhaps you can want to use blackhole, test, or zend_plateform backend for exemple. Is it really difficult? no

Since Mage_Core_Model_Cache model exists (CE 1.4.0.1 and higher, related EE) the factory method has been also thought so that perhaps you want to use another backend :

The value defined in the local.xml node <config><global><cache><backend> can be a class name that implements the Zend_Cache_Backend_Interface.

So, and for my backend configuration? To specify some parameters to this new backend one, you can define a node <config><global><cache><backend_options> which will be passed as arguments when constructing the backend model

Other possibilities to use a custom backend model are the following ones:

  • Define your own backend class that will set up the expected configurations values
  • Overload Mage_Core_Model_Cache to be able to provide other configurations loading

Here’s an example of specific configuration you can have and which works

Conclusion about Zend_Cache usage in Magento

cache tags, cache lifetime and cache key are concepts brought by the zend_cache component

Even if there is no documentation about how to use some custom backend in Magento, it’s not really difficult: just specify the full backend class name in local.xml backend node, and specify arguments through the backend_options node

The only problem I can see with this method that is able to load any backend class is the method used to check if backend exists: it uses class_exists method, which, In Magento context, throws a fatal error if the class cannot be found. It could be interesting, if we provide a backend class which does not exist, that the default one (file) would be used instead of throwing a fatal exception

Updating zend_cache frontend class is quite difficult because it depends on how PHP code was written. So for now, even in Magento 4, 5, I’m not sure that we can use the buffer frontend

High availability in Magento farm: the session problem

Big Magento websites require sometimes to be able to respond to a high traffic: there are many websites with high load, and sometimes, we must propose a magento farm behind a load balancer to be able to respond to all visits. But in this case, we have a choice to do about the load balancing: how is dispatched the load between Magento webservers?

And this choice that we have to make will have an impact about how sessions are managed

Depending on how are configured load balancers, we need perhaps share session between Magento front-webservers

Impact of load balancers configuration on the session storage management

Load balancers are in front of our webservers to be able to manage our webserver load

There are two ways to share the load between each front in load balancer when a customer is visiting a website:

  • he is always sent to the same webserver
  • According to the load, he is sent to the less loaded webserver

This load balancer configuration choice has an impact on the session management:

in the first case, we are always redirect to the same webserver.. But in this case, if the load of this server change we’ll have an end-user usage not as well as expected. In the worst case, if this websever become unavailable we wouldn’t be able to access the website anymore and the consequence is, and so loose some sales…

In the other case, we have not these disagrements because load is detected on load balancers and we are redirect to the most available server; but in this case we have the problem to share our sessions between each front to be able to retrieve our personals informations.

To ensure that we have the highest availability for our website, we’ll choose the second option and will see what solutions we have to share sessions between front web-servers

Reminder on PHP sessions storages

By default, PHP uses a session storage based on files: every time you start a session a file will be written on the webserver with the session content. it’s nice, but it has the disavantage to depend on which webserver you requested the information. If we are redirected to another webserver, the session file does not exist, and so, you’ve lost our cart…. How can we solve that?

PHP allows to redefine our PHP session storage model and define how are they managed. Nice, we’ll see how it can help us in our webserver farm management and our session sharing problem

The possible ways to share sessions in a Magento farm

Latest Magento release (1.6.2 CE, and related enterprise one) embeds many sessions storages management:

  • Files, the PHP default’s one
  • Database
  • Memcached
  • Othes choices?

Which advantage(s) have each one?

Sharing Magento sessions in a Magento farm with database(s)

In this case, we remplace the native PHP session storage management by a database: all sessions are stored in our database(s) server(s).

This solution is quite nice because each of our front webservers must be able to read and write in database; so there is no problem to share session: visitor will be redirected initially to a front webserver, init it session on it and save it’s customer session in database. Next page he request, load balancer will perhaps redirect him to another front-webserver, but with it session id, he’ll retrieve session data from the database and so, keep the same informations. Nice

But the problem in this situation is that we add another load on database server. Each visitor will make read and write instructions on our database server, and so, according to our traffic, it could be a very huge load. Furthermore, even if we add a very good cache policy, if our database webserver go down, our website won’t be available anymore.

Magento database table where are stored the sessions data is based upon file system. So If our server go down, we’ll be able to restore our session data. If Magento database availability is critical, there is some solutions like MySQL cluster to ensure that MySQL is always available. So we are able to provide a maximum service and a model tolerant to failure. But this model has the big problem to provide a important load on the database and it could be interesting to check the others availables solutionss

Sharing Magento sessions in a Magento farm with Memcached

Memcached is a memory server.

It can be use to:

  • Share cache data
  • Share sessions

One of the main advantage of memcached is that it’s a memory server, and so, read and write instructions will be made faster than with a file system

But we have two disavantages to use a memcached to store sessions:

  • Data integrity: memory does not check write as it’s done in a file system: writing in memory is not as secure as writing in a file system. Our data can be compromised, and so, loose part of our data.
  • We have the same problem as in the database: memcached is not tolerant to failure and so you’ll loose everything if it goes down; memcached can be clustered for cache storage, but in case of session sharing, we must provide only one server URI. So if server which manage session storage hang up, there is any session service available…

So for now, memcached has the only advantage to be able to share our sessions faster; it is not tolerant to failure and cannot be clustered so this is another risk in our architecture

Sharing sessions using a network shared folder and file system PHP storage

To avoid data integrity problem, we could continue to use a session file system storage, but share them between each front through a shared folder: in this case, all sessions storages location will be known by each server, and so, we can have the ability to use our session on each webserver. Data are also checked on write operations by the file system, so it’s nice

But this model has the disavantage to impose a network load every time you access a session. According to our website load, this network additionnal load could be critical

Sharing Magento sessions with Zend server

Based upon the PHP session storage file model, Zend Server embeds a session clustering between each front webserver zend server instance: Every starting session is copied on each Zend Server. So every time someone request a page from our webserver, it’ll be able to access our session file. Nice

Furthermore, when some data are written in session, Zend Server knows which is the main session storage file, and so is able to write in it.

If a front webserver go down, another session copy is used as main session file, and so you have no disagreement

For now, Zend Server session clustering is the only way to garantee that you won’t have any go down of our website and it provides data integrity using file system storage write control. But one of the disavantage is you need a Zend Server on each Magento front-office, and it increases our webhosting cost

Conclusion about the high availability storage management

Sales, television advertising or simply very basic traffic can lead you to have a dedicated magento farm to be able to answer all visitors requests. In this case, you should take care of always be able to answer all requests, not only for your orders but also for your branding

For sure, you can configure load balancing to always redirect a customer to the same server. This is the easyest way to configure load management, but is not the most reliable model and can provide some disagremments to our clients with possible no webservers response.

In the other case, the only way to ensure that we’ll have a high availability in a Magento farm is to ensure that the load is dispatched on front-webservers depending on their load. This brings the problem to share session between each webserver, but for now, the only secure way is to use Zend Server to clustering session.This model provide us a secure way to share session, ensure acces and write control, and does not add additionnal critical point in our architecture

Furthermore, it seems that choice of storage management has no performance impact. This is a first test, and we don’t know how it has been managed, but in the case mentionned, it seems that session_storage has no or few influence upon the performance

Magento customer collection loading: how to fetch names attributes values

I know, I’m late publishing new topics: one per week is difficult when your evenings are busy…

But today, I’m leaving (temporary) my others stuffs to share with you something I’ve discover today: loading customer collection in Magento can be redundant in some cases if you use the addNameToSelect method. I’ll explain:

Case study: loading a collection

I’ll found this topic by doing a SQL query like this one in a Magento 1.6:

$customerCollection = Mage::getModel('customer/customer')->getCollection();
/* @var $customerCollection Mage_Customer_Model_Entity_Customer_Collection */
$customerCollection->addAttributeToSelect(array(
    'dob', 'firstname', 'lastname', 'email'
));
foreach ($customerCollection as $customer) {
    /* @var $customer Mage_Customer_Model_Customer */
    echo $customer->getFirstame() . ' ' . $customer->getLastname() . PHP_EOL;
} 

Query made by fetching a customer collection

Query is made with a first query which fetch matching entities:

SELECT `e`.* FROM `customer_entity` AS `e` WHERE (e.entity_type_id = '1');

customer data are hydrated by a dedicated query to fetch attributes:

SELECT 
     `customer_entity_datetime`.`entity_id`, 
     `customer_entity_datetime`.`attribute_id`, 
     `customer_entity_datetime`.`value` 
FROM `customer_entity_datetime` 
WHERE (entity_type_id=1) 
AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('11')) 

UNION 

SELECT 
`customer_entity_varchar`.`entity_id`, 
`customer_entity_varchar`.`attribute_id`, 
`customer_entity_varchar`.`value` 
FROM `customer_entity_varchar` 
WHERE (entity_type_id=1) 
AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('5', '7'))

Customer collection loaded with addNameToSelect method

It seems that addNameToSelect method change how is builded the query. We can test the expected query with the following test code:

$customerCollection = Mage::getModel('customer/customer')->getCollection();
/* @var $customerCollection Mage_Customer_Model_Entity_Customer_Collection */
$customerCollection->addNameToSelect();
$customerCollection->addAttributeToSelect(array(
    'dob', 'firstname', 'lastname', 'email'
));
foreach ($customerCollection as $customer) {
    /* @var $customer Mage_Customer_Model_Customer */
    echo $customer->getFirstame() . ' ' . $customer->getLastname() . PHP_EOL;
} 

With the same script, we have the following queries mades:

SELECT `e`.*, 
`_table_prefix`.`value` AS `prefix`, 
`_table_firstname`.`value` AS `firstname`, 
`_table_middlename`.`value` AS `middlename`, 
`_table_lastname`.`value` AS `lastname`, 
`_table_suffix`.`value` AS `suffix`, 
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name` 

FROM `customer_entity` AS `e`
 LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (`_table_prefix`.`entity_id` = `e`.`entity_id`) AND (`_table_prefix`.`attribute_id` = '4')
 LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (`_table_firstname`.`entity_id` = `e`.`entity_id`) AND (`_table_firstname`.`attribute_id` = '5')
 LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (`_table_middlename`.`entity_id` = `e`.`entity_id`) AND (`_table_middlename`.`attribute_id` = '6')
 LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (`_table_lastname`.`entity_id` = `e`.`entity_id`) AND (`_table_lastname`.`attribute_id` = '7')
 LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (`_table_suffix`.`entity_id` = `e`.`entity_id`) AND (`_table_suffix`.`attribute_id` = '8') WHERE (e.entity_type_id = '1');

SELECT `customer_entity_datetime`.`entity_id`, 
`customer_entity_datetime`.`attribute_id`, 
`customer_entity_datetime`.`value` 
FROM `customer_entity_datetime` 
WHERE (entity_type_id=1) AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('11')) 

UNION 

SELECT `customer_entity_varchar`.`entity_id`, 
`customer_entity_varchar`.`attribute_id`, 
`customer_entity_varchar`.`value` 
FROM `customer_entity_varchar` 

WHERE (entity_type_id=1) AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('5', '7'));	

In this case, customer name values are fetched two times: one time with a join method, a second time with the attribute query, so we do job two times… 🙁

Conclusion on addNameToSelect method

addNameToSelect should be used only if you do not fetch any data from varchar backend storage. if not, you should not use these method for fetching names attributes because you’ll do the job two times: Perhaps in this case it could be more interesting to build “name” value, based upon on the customer_entity_varchar attribute scan query

This test has been made on 1.6

Magento2 source code available on github

Last December 3, Yoav, CTO at Magento posted this tweet
“Magento 2 source code now available on Github https://github.com/magento/magento2”

yes, now you can download the source code of Magento2 on github.

This is a development release, but we can start with these source code to see (and imagine) what we can found in.

first impress:

  • Seems faster
  • Bootstrap seems updated
  • Some web folders has been moved
  • Greetings to the commons embedded translations
  • A new interface to update our local copy
  • Seems always locked to default channels
  • ….

Wait and see, and enjoy download 🙂

Decrease time to search customers in Magento back-office

Yes, it’s possible!

I know, you have many customer accounts in your Magento and your customer service claim that you do something to reduce duration to search a customer in: “10, 15s, we are bored. Can you do something please???” (one girl says, desperated)

Why customer search is so long in back-office?

SQL query generated in grid to search customer is involved! In back-office, when you filter some fields in grid, by default filters are applyed with a “%value_to_filter%”

Here’s an example of a genered query to search John DOE:

SELECT `e`.*,
`_table_prefix`.`value` AS `prefix`,
`_table_firstname`.`value` AS `firstname`,
`_table_middlename`.`value` AS `middlename`,
`_table_lastname`.`value` AS `lastname`,
`_table_suffix`.`value` AS `suffix`,
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name`,
`_table_default_billing`.`value` AS `default_billing`,
`_table_billing_postcode`.`value` AS `billing_postcode`,
`_table_billing_city`.`value` AS `billing_city`,
`_table_billing_telephone`.`value` AS `billing_telephone`,
`_table_billing_region`.`value` AS `billing_region`,
`_table_billing_country_id`.`value` AS `billing_country_id`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='25')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='30')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='27')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='26')
WHERE (e.entity_type_id = '1') AND
(CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) like '%John Doe%')
ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc
LIMIT 20

As you can see, customer lastname, middlename and firstname are concatenated with Mysql CONCAT method to generate the name value

And Yes, it does a full scan in your customer data… If you have 350k customer acccount, you should have at least 1,8G records in customer_entity_varchar table and so, each record is analyzed. This is the first reason why when you search John Doe in back-office, it’s so long.

“But mister, How can we reduce time for this query?”

Preamble: keep native search on all concatenated fields or create new columns for firstname and lastname?

Well, for those who know me, you know that I don’t like to break existing usage: ok, there is a filter field you don’t like, is it a reason to destroy this existing functionnality? If Varien change its function in futures releases, it’s more interesting to be able to also use it, instead of remove the overload we have done, no?

So the following article will study the case of creating two new filter fields in customer grid, name and firstname, instead of update the name filter. We’ll see in conclusion that MySQL also prefer this solution

And it’s fine, the girl of the customer service agrees with that 🙂

First step: avoid full scan table of customer_entity_varchar table

After creating our new columns, we will define our own filter applied when query is made and so, how is requested customer_entity_varchar table. We will see that it’s not the only solution, but a first step to reduce our load.

How can we do that?

Add our name and firstname column in grid

I’m sure you know how we can do that 🙂

Change how customer name filter box works

This is very easy with Magento native API: in a grid, when you add our column, you can also define how is managed the search for this field: with filter data. Filter model defined is the model used to build the SQL condition to apply when you enter something in the filter box:

How to set up our own filter model for our name column?

Filter is one parameter when you define your column in grid _prepareColumn method. We just need to provide our own filter for our new columns. With a grid overload we can add our filter

It could be something like this

Ok, with a complete overload, we can load our filter. But how can it be designed?

Design our own Magento customer name filter

Customer name filter should extend Mage_Adminhtml_Block_Widget_Grid_Column_Filter_Abstract

if you look at this class, you’ll see an interesting method, getCondition: This is this method which is called to define the filter condition to apply on our filter

So we can design our filter like this:

Ok it’s fine?

Now our query made is the following one:

SELECT `e`.*,
`_table_prefix`.`value` AS `prefix`,
`_table_firstname`.`value` AS `firstname`,
`_table_middlename`.`value` AS `middlename`,
`_table_lastname`.`value` AS `lastname`,
`_table_suffix`.`value` AS `suffix`,
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name`,
`_table_default_billing`.`value` AS `default_billing`,
`_table_billing_postcode`.`value` AS `billing_postcode`,
`_table_billing_city`.`value` AS `billing_city`,
`_table_billing_telephone`.`value` AS `billing_telephone`,
`_table_billing_region`.`value` AS `billing_region`,
`_table_billing_country_id`.`value` AS `billing_country_id`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='25')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='30')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='27')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='26')
WHERE
(e.entity_type_id = '1') AND
(_table_firstname.value like 'John%') AND
(_table_lastname.value like 'Doe%')
ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc
LIMIT 20

“But I don’t understand mister, you told me that this solution was THE solution and I don’t earn any time… (she still seems desperate)”

First using this syntax require less time to read data, so it earns time, but ok, this is not significant. And ok, read the remaining step before you think that we have done the overload for anything

Second step: move attributes to entity model

Well, ok, Database expert has surely noticed the problem: I should be more clear, reduce SQL full scan is interesting only on key fields. And for now, value field in customer_entity_varchar table is only part of a multiple key

Ok, we must add an index on this “value” field?!

Don’t change the customer_entity_varchar index model

Sorry, no it’s not that simple: value field in customer_entity_varchar is already part of a multiple index, but in the last position. For those who know how multiple indexes works, you know that in our context, index won’t be used

So we have to add an unique index on this field?

Also in this case, it’s not the solution: if we do that, all customer_entity_varchar fields will be indexes, and so, we loose the interest of using index

So, we don’t change index mode for customer_entity_varchar table; the solution is to move the search field on customer_entity table

Change storage model for name and firstname fields

Oh my god, he plans to change the database storage model (This is not the girl of the customer service who speaks here)

Hum, yes, it’s possible…. We have to do four things:

Create the two new fields (firstname, lastname) in customer_entity table

Do you really require my help for this point? Don’t forget to index them

Change attribute storage model in eav_attribute table

With that, we inform Magento that it should not search for name and firstname in customer_entity_varchar table but in customer_entity table

You must have the following result

Copy the exiting name and firstname values in the new fields

DROP TABLE IF EXISTS customer_entity_temp_name;
CREATE TABLE IF NOT EXISTS `customer_entity_temp_name` (
`entity_id` int(10) unsigned NOT NULL,
`lastname` varchar(255) NOT NULL,
`firstname` varchar(255) NOT NULL,
UNIQUE KEY `entity_id` (`entity_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO customer_entity_temp_name
SELECT `e`.entity_id,
`_table_lastname`.`value` AS `lastname`,
`_table_firstname`.`value` AS `firstname`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7');UPDATE customer_entity AS e, customer_entity_temp_name AS n
SET e.firstname = n.firstname, e.lastname = n.lastname
WHERE e.entity_id = n.entity_id;DROP TABLE customer_entity_temp_name;

Conclusion

Our request look like that now:

SELECT `e`.*, `_table_prefix`.`value` AS `prefix`, `_table_middlename`.`value` AS `middlename`, `_table_suffix`.`value` AS `suffix`, CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(_table_prefix.value," "), ""),e.firstname,IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",_table_middlename.value), "")," ",e.lastname,IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",_table_suffix.value), "")) AS `name`, `_table_default_billing`.`value` AS `default_billing`, `_table_billing_postcode`.`value` AS `billing_postcode`, `_table_billing_city`.`value` AS `billing_city`, `_table_billing_telephone`.`value` AS `billing_telephone`, `_table_billing_region`.`value` AS `billing_region`, `_table_billing_country_id`.`value` AS `billing_country_id` FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='28')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='24')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='26')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='25') WHERE (e.entity_type_id = '1') AND (e.firstname like 'Doe%') AND (e.lastname like 'John%') ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc LIMIT 20

On a customer database which countains above 900k records, we reduce to 39 rows analyzed whereas we analyzed with native search 440k records: Results are now displayed immediatly

Well, we have earned many seconds to scan our customer database and probably a thank you!!! (phone number???) from the pretty girl of the customer service 🙂

This option has only the interest on a big customer database; I have no stats to provide, but if you think that when you search a customer, you are loosing time, perhaps this solution could help you

And if you are good and be able to capitalize, you can earn many phone number in different customer services 🙂