Magento customer collection loading: how to fetch names attributes values

I know, I’m late publishing new topics: one per week is difficult when your evenings are busy…

But today, I’m leaving (temporary) my others stuffs to share with you something I’ve discover today: loading customer collection in Magento can be redundant in some cases if you use the addNameToSelect method. I’ll explain:

Case study: loading a collection

I’ll found this topic by doing a SQL query like this one in a Magento 1.6:

$customerCollection = Mage::getModel('customer/customer')->getCollection();
/* @var $customerCollection Mage_Customer_Model_Entity_Customer_Collection */
$customerCollection->addAttributeToSelect(array(
    'dob', 'firstname', 'lastname', 'email'
));
foreach ($customerCollection as $customer) {
    /* @var $customer Mage_Customer_Model_Customer */
    echo $customer->getFirstame() . ' ' . $customer->getLastname() . PHP_EOL;
} 

Query made by fetching a customer collection

Query is made with a first query which fetch matching entities:

SELECT `e`.* FROM `customer_entity` AS `e` WHERE (e.entity_type_id = '1');

customer data are hydrated by a dedicated query to fetch attributes:

SELECT 
     `customer_entity_datetime`.`entity_id`, 
     `customer_entity_datetime`.`attribute_id`, 
     `customer_entity_datetime`.`value` 
FROM `customer_entity_datetime` 
WHERE (entity_type_id=1) 
AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('11')) 

UNION 

SELECT 
`customer_entity_varchar`.`entity_id`, 
`customer_entity_varchar`.`attribute_id`, 
`customer_entity_varchar`.`value` 
FROM `customer_entity_varchar` 
WHERE (entity_type_id=1) 
AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('5', '7'))

Customer collection loaded with addNameToSelect method

It seems that addNameToSelect method change how is builded the query. We can test the expected query with the following test code:

$customerCollection = Mage::getModel('customer/customer')->getCollection();
/* @var $customerCollection Mage_Customer_Model_Entity_Customer_Collection */
$customerCollection->addNameToSelect();
$customerCollection->addAttributeToSelect(array(
    'dob', 'firstname', 'lastname', 'email'
));
foreach ($customerCollection as $customer) {
    /* @var $customer Mage_Customer_Model_Customer */
    echo $customer->getFirstame() . ' ' . $customer->getLastname() . PHP_EOL;
} 

With the same script, we have the following queries mades:

SELECT `e`.*, 
`_table_prefix`.`value` AS `prefix`, 
`_table_firstname`.`value` AS `firstname`, 
`_table_middlename`.`value` AS `middlename`, 
`_table_lastname`.`value` AS `lastname`, 
`_table_suffix`.`value` AS `suffix`, 
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name` 

FROM `customer_entity` AS `e`
 LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (`_table_prefix`.`entity_id` = `e`.`entity_id`) AND (`_table_prefix`.`attribute_id` = '4')
 LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (`_table_firstname`.`entity_id` = `e`.`entity_id`) AND (`_table_firstname`.`attribute_id` = '5')
 LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (`_table_middlename`.`entity_id` = `e`.`entity_id`) AND (`_table_middlename`.`attribute_id` = '6')
 LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (`_table_lastname`.`entity_id` = `e`.`entity_id`) AND (`_table_lastname`.`attribute_id` = '7')
 LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (`_table_suffix`.`entity_id` = `e`.`entity_id`) AND (`_table_suffix`.`attribute_id` = '8') WHERE (e.entity_type_id = '1');

SELECT `customer_entity_datetime`.`entity_id`, 
`customer_entity_datetime`.`attribute_id`, 
`customer_entity_datetime`.`value` 
FROM `customer_entity_datetime` 
WHERE (entity_type_id=1) AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('11')) 

UNION 

SELECT `customer_entity_varchar`.`entity_id`, 
`customer_entity_varchar`.`attribute_id`, 
`customer_entity_varchar`.`value` 
FROM `customer_entity_varchar` 

WHERE (entity_type_id=1) AND (entity_id in (324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 337, 339, 340, 341, 342, 343, 344, 345)) AND (attribute_id in ('5', '7'));	

In this case, customer name values are fetched two times: one time with a join method, a second time with the attribute query, so we do job two times… 🙁

Conclusion on addNameToSelect method

addNameToSelect should be used only if you do not fetch any data from varchar backend storage. if not, you should not use these method for fetching names attributes because you’ll do the job two times: Perhaps in this case it could be more interesting to build “name” value, based upon on the customer_entity_varchar attribute scan query

This test has been made on 1.6

Decrease time to search customers in Magento back-office

Yes, it’s possible!

I know, you have many customer accounts in your Magento and your customer service claim that you do something to reduce duration to search a customer in: “10, 15s, we are bored. Can you do something please???” (one girl says, desperated)

Why customer search is so long in back-office?

SQL query generated in grid to search customer is involved! In back-office, when you filter some fields in grid, by default filters are applyed with a “%value_to_filter%”

Here’s an example of a genered query to search John DOE:

SELECT `e`.*,
`_table_prefix`.`value` AS `prefix`,
`_table_firstname`.`value` AS `firstname`,
`_table_middlename`.`value` AS `middlename`,
`_table_lastname`.`value` AS `lastname`,
`_table_suffix`.`value` AS `suffix`,
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name`,
`_table_default_billing`.`value` AS `default_billing`,
`_table_billing_postcode`.`value` AS `billing_postcode`,
`_table_billing_city`.`value` AS `billing_city`,
`_table_billing_telephone`.`value` AS `billing_telephone`,
`_table_billing_region`.`value` AS `billing_region`,
`_table_billing_country_id`.`value` AS `billing_country_id`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='25')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='30')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='27')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='26')
WHERE (e.entity_type_id = '1') AND
(CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) like '%John Doe%')
ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc
LIMIT 20

As you can see, customer lastname, middlename and firstname are concatenated with Mysql CONCAT method to generate the name value

And Yes, it does a full scan in your customer data… If you have 350k customer acccount, you should have at least 1,8G records in customer_entity_varchar table and so, each record is analyzed. This is the first reason why when you search John Doe in back-office, it’s so long.

“But mister, How can we reduce time for this query?”

Preamble: keep native search on all concatenated fields or create new columns for firstname and lastname?

Well, for those who know me, you know that I don’t like to break existing usage: ok, there is a filter field you don’t like, is it a reason to destroy this existing functionnality? If Varien change its function in futures releases, it’s more interesting to be able to also use it, instead of remove the overload we have done, no?

So the following article will study the case of creating two new filter fields in customer grid, name and firstname, instead of update the name filter. We’ll see in conclusion that MySQL also prefer this solution

And it’s fine, the girl of the customer service agrees with that 🙂

First step: avoid full scan table of customer_entity_varchar table

After creating our new columns, we will define our own filter applied when query is made and so, how is requested customer_entity_varchar table. We will see that it’s not the only solution, but a first step to reduce our load.

How can we do that?

Add our name and firstname column in grid

I’m sure you know how we can do that 🙂

Change how customer name filter box works

This is very easy with Magento native API: in a grid, when you add our column, you can also define how is managed the search for this field: with filter data. Filter model defined is the model used to build the SQL condition to apply when you enter something in the filter box:

How to set up our own filter model for our name column?

Filter is one parameter when you define your column in grid _prepareColumn method. We just need to provide our own filter for our new columns. With a grid overload we can add our filter

It could be something like this

Ok, with a complete overload, we can load our filter. But how can it be designed?

Design our own Magento customer name filter

Customer name filter should extend Mage_Adminhtml_Block_Widget_Grid_Column_Filter_Abstract

if you look at this class, you’ll see an interesting method, getCondition: This is this method which is called to define the filter condition to apply on our filter

So we can design our filter like this:

Ok it’s fine?

Now our query made is the following one:

SELECT `e`.*,
`_table_prefix`.`value` AS `prefix`,
`_table_firstname`.`value` AS `firstname`,
`_table_middlename`.`value` AS `middlename`,
`_table_lastname`.`value` AS `lastname`,
`_table_suffix`.`value` AS `suffix`,
CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(TRIM(_table_prefix.value)," "), ""),TRIM(_table_firstname.value),IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",TRIM(_table_middlename.value)), "")," ",TRIM(_table_lastname.value),IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",TRIM(_table_suffix.value)), "")) AS `name`,
`_table_default_billing`.`value` AS `default_billing`,
`_table_billing_postcode`.`value` AS `billing_postcode`,
`_table_billing_city`.`value` AS `billing_city`,
`_table_billing_telephone`.`value` AS `billing_telephone`,
`_table_billing_region`.`value` AS `billing_region`,
`_table_billing_country_id`.`value` AS `billing_country_id`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='25')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='30')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='27')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='26')
WHERE
(e.entity_type_id = '1') AND
(_table_firstname.value like 'John%') AND
(_table_lastname.value like 'Doe%')
ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc
LIMIT 20

“But I don’t understand mister, you told me that this solution was THE solution and I don’t earn any time… (she still seems desperate)”

First using this syntax require less time to read data, so it earns time, but ok, this is not significant. And ok, read the remaining step before you think that we have done the overload for anything

Second step: move attributes to entity model

Well, ok, Database expert has surely noticed the problem: I should be more clear, reduce SQL full scan is interesting only on key fields. And for now, value field in customer_entity_varchar table is only part of a multiple key

Ok, we must add an index on this “value” field?!

Don’t change the customer_entity_varchar index model

Sorry, no it’s not that simple: value field in customer_entity_varchar is already part of a multiple index, but in the last position. For those who know how multiple indexes works, you know that in our context, index won’t be used

So we have to add an unique index on this field?

Also in this case, it’s not the solution: if we do that, all customer_entity_varchar fields will be indexes, and so, we loose the interest of using index

So, we don’t change index mode for customer_entity_varchar table; the solution is to move the search field on customer_entity table

Change storage model for name and firstname fields

Oh my god, he plans to change the database storage model (This is not the girl of the customer service who speaks here)

Hum, yes, it’s possible…. We have to do four things:

Create the two new fields (firstname, lastname) in customer_entity table

Do you really require my help for this point? Don’t forget to index them

Change attribute storage model in eav_attribute table

With that, we inform Magento that it should not search for name and firstname in customer_entity_varchar table but in customer_entity table

You must have the following result

Copy the exiting name and firstname values in the new fields

DROP TABLE IF EXISTS customer_entity_temp_name;
CREATE TABLE IF NOT EXISTS `customer_entity_temp_name` (
`entity_id` int(10) unsigned NOT NULL,
`lastname` varchar(255) NOT NULL,
`firstname` varchar(255) NOT NULL,
UNIQUE KEY `entity_id` (`entity_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
INSERT INTO customer_entity_temp_name
SELECT `e`.entity_id,
`_table_lastname`.`value` AS `lastname`,
`_table_firstname`.`value` AS `firstname`
FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_firstname` ON (_table_firstname.entity_id = e.entity_id) AND (_table_firstname.attribute_id='5')
LEFT JOIN `customer_entity_varchar` AS `_table_lastname` ON (_table_lastname.entity_id = e.entity_id) AND (_table_lastname.attribute_id='7');UPDATE customer_entity AS e, customer_entity_temp_name AS n
SET e.firstname = n.firstname, e.lastname = n.lastname
WHERE e.entity_id = n.entity_id;DROP TABLE customer_entity_temp_name;

Conclusion

Our request look like that now:

SELECT `e`.*, `_table_prefix`.`value` AS `prefix`, `_table_middlename`.`value` AS `middlename`, `_table_suffix`.`value` AS `suffix`, CONCAT(IF(_table_prefix.value IS NOT NULL AND _table_prefix.value != "", CONCAT(_table_prefix.value," "), ""),e.firstname,IF(_table_middlename.value IS NOT NULL AND _table_middlename.value != "", CONCAT(" ",_table_middlename.value), "")," ",e.lastname,IF(_table_suffix.value IS NOT NULL AND _table_suffix.value != "", CONCAT(" ",_table_suffix.value), "")) AS `name`, `_table_default_billing`.`value` AS `default_billing`, `_table_billing_postcode`.`value` AS `billing_postcode`, `_table_billing_city`.`value` AS `billing_city`, `_table_billing_telephone`.`value` AS `billing_telephone`, `_table_billing_region`.`value` AS `billing_region`, `_table_billing_country_id`.`value` AS `billing_country_id` FROM `customer_entity` AS `e`
LEFT JOIN `customer_entity_varchar` AS `_table_prefix` ON (_table_prefix.entity_id = e.entity_id) AND (_table_prefix.attribute_id='4')
LEFT JOIN `customer_entity_varchar` AS `_table_middlename` ON (_table_middlename.entity_id = e.entity_id) AND (_table_middlename.attribute_id='6')
LEFT JOIN `customer_entity_varchar` AS `_table_suffix` ON (_table_suffix.entity_id = e.entity_id) AND (_table_suffix.attribute_id='8')
LEFT JOIN `customer_entity_int` AS `_table_default_billing` ON (_table_default_billing.entity_id = e.entity_id) AND (_table_default_billing.attribute_id='13')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_postcode` ON (_table_billing_postcode.entity_id = _table_default_billing.value) AND (_table_billing_postcode.attribute_id='28')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_city` ON (_table_billing_city.entity_id = _table_default_billing.value) AND (_table_billing_city.attribute_id='24')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_telephone` ON (_table_billing_telephone.entity_id = _table_default_billing.value) AND (_table_billing_telephone.attribute_id='29')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_region` ON (_table_billing_region.entity_id = _table_default_billing.value) AND (_table_billing_region.attribute_id='26')
LEFT JOIN `customer_address_entity_varchar` AS `_table_billing_country_id` ON (_table_billing_country_id.entity_id = _table_default_billing.value) AND (_table_billing_country_id.attribute_id='25') WHERE (e.entity_type_id = '1') AND (e.firstname like 'Doe%') AND (e.lastname like 'John%') ORDER BY `e`.`entity_id` desc, `e`.`entity_id` desc LIMIT 20

On a customer database which countains above 900k records, we reduce to 39 rows analyzed whereas we analyzed with native search 440k records: Results are now displayed immediatly

Well, we have earned many seconds to scan our customer database and probably a thank you!!! (phone number???) from the pretty girl of the customer service 🙂

This option has only the interest on a big customer database; I have no stats to provide, but if you think that when you search a customer, you are loosing time, perhaps this solution could help you

And if you are good and be able to capitalize, you can earn many phone number in different customer services 🙂