1 Introduction

Consumer purchase behavior is one of the main research areas of marketing. In recent years, many kind of consumer behavior or activity data in the field of marketing can be obtained, i.e. ID-POS data which is purchase record for each customer in a store or customer attributes. Hence, many stores or retail company want to utilize these data for more effective and efficient for marketing activities.

For retail store, it is an important to grasp the trade area, because if the store can grasp his/her trade area, then he/she can advertise efficiently in the trade area or grasp the needs of main trade area. Especially for Japanese retail store, e.g. supermarket and department store, folding flyer in newspaper is very popular advertising tool, thus to grasp the trade area is very important. Moreover, when manager plan to open a new store, to analyze the potential selling intense is very important issue. As shown these topics, the trade area is one of important factor for store managing.

In this study, we analyze trade areas of stores of a supermarket chain in Japan. Based on the purchase record of each store, we calculate a radius of trade area, and analyze some variables which effects on the size of trade area.

2 Related Studies and Objective of Our Study

One of the most famous trade area analyzing model is Huff model (Huff 1963). Huff model is a probabilistic choice model based on attraction of store and distance from customer address to the store. The parameter for distance shows the intention to disturb to access to store. Huff model was used in various scenes to analyze trade area or store power.

However, Huff model needs not only own store data but also competitive store around the store. Thus, when the manager of a chain wants to calculate his trade area, Huff model cannot utilize to this objective. Many of the other attraction models like MLN or MCI model are needed competitive stores’ data. However, in real business situation, it is very difficult to obtain these other stores’ data. Thus, it is an important issue to analyze the trade area or selling power of own store using only already getting data.

Yamazaki (1996) analyzed trade area using transaction data of a sport club. The result was displayed on geographic information system. The manager could grasp the trade area of the club by observing the display and plan an effective advertising like folding flyer of newspaper.

Yokoyama et al. (1996) showed trade area and purchase model to predict the amount of sales with consumers’ preference based on Huff model. They used conjoint analysis to analyze consumer’s preference, then regression model are utilized to predict the amount of sales. From the result of analysis, they pointed out that accessibility is one of the most attractive factor for choosing store.

In this study, we use the POS data with customer identifying data of the supermarket chain (i.e. ID-POS data) and some opened statistical data or GIS (geometrical information data) like Google map. The stores of the chain are located at the area of a Japanese regional urban area, and there are over 50 stores which are located in near prefecture. Some of them are located on plain but the others are located in mountainous area. The number of residence and competitive store around each store is not same, thus the competitive situation is not same, too. Then we need to consider these specific conditions to estimate the trade area.

3 Data

3.1 About Supermarket Chain

In this study, we focus on a Japanese supermarket chain. This chain have over 53 stores in a same region. We name each store from S1 to S53. Some of them are located on urban or suburbs, however the other is located on country area. The size of the largest store is about 2,000 m2 however the smallest store is only 300 m2. These stores treat all categories of food mainly.

3.2 Data Summary

This chain introduce member card system (frequent shoppers program: FSP) and POS (point of sales) system. ID-POS (POS data with customer Identification number) contains purchase date, time, receipt number, purchase items, the number of purchase and price with customer ID. Thus, the manager can obtain detail purchase record of each customer. We use 3 months records (04/2015–06/2015), the summary statistics is shown in Table 1.

Table 1. Chain summary

The store size and location cluster are shown in Tables 2 and 3, respectively.

Table 2. Store size categories
Table 3. Location categories

Figure 1 shows the sales transition in analyzing term. As shown in Fig. 1, sales amount of almost weekend are higher than weekday, however especially Monday and Friday are typical lower. However, in early in May, it was continuing high amount due to long holiday (in Japan, it is calls Golden Week from late April to early May).

Fig. 1.
figure 1

Sales transition

When we focus on each store, the amount of sales of each store must not have this common rule. Figure 2 shows the transition of sales amount of a store (S1). This store is small and located on urban area. We can guess that many of customers are office worker and residents near store, thus the sales is not high on weekend. Actually, this store emphasizes lunch box or side dishes on assortment rather than fresh items such as vegetables, meats or fishes. As shown in this example, the sales is not same among the stores.

Fig. 2.
figure 2

Sales transition of S1

Figure 3 shows scatter plots of the number of visit, number of purchase item and purchase amount for each customer and each store. The correlation between the number of purchase item and purchase amount is high, however the correlation between the number of visit to store and the other variables are not high. It may show that usage of store is not same with respect to each store.

Fig. 3.
figure 3

Scatter plot of purchase data

4 Analysis and Discussions

The outline of our analysis is shown in Fig. 4. First we aggregate the purchase data with respect to each customer and each store, second calculate the radius of some percentile distance for each store, and third investigate the cause of effect to decide the radius.

Fig. 4.
figure 4

Outline of our analysis

The detail of our model is explained from the next subsection.

4.1 Distance Between Customer and Store Address

To calculate the distance between each customer’s address and store address, first, we give longitude and latitude to all addresses using google geocoding. Next, the distance is calculated using Hubeny’s formula, which considers the curve of the earth in order to determine the distance between two coordination. The equation is given as follows,

$$ d = \sqrt {\left( {d_{y} R} \right)^{2} + \left( {d_{x} N{ \cos }\mu_{y} } \right)^{2} } $$
(1)

where \( d_{y} \) is the latitude difference between two points, \( d_{x} \) is the longitude difference, \( \mu_{y} \) is the average latitude of the two positions, R is the radius of curvature of the meridian, and N is the transverse radius of curvature. The data were projected on the WGS84 datum. Thus, the following ellipsoid parameters were obtained: 6,378,137 m for the semi-major \( a \) and 6,5356,752 m for the minor \( b \). Moreover, \( R \) and \( N \) are defined as follows.

$$ R = \frac{{a\left( {1 - e^{2} } \right)}}{{W^{3} }} $$
(2)
$$ N = \frac{a}{W} $$
(3)
$$ W = \sqrt {1 - e^{2} \sin^{2} \mu_{y} } $$
(4)
$$ e = \sqrt {\frac{{a^{2} - b^{2} }}{{a^{2} }}} $$
(5)

\( R \) and \( e \) is called “meridian radius of curvature” and “major eccentricity”, respectively.

4.2 Calculate Radius of Trade Area

Next, we calculate the radius of trade area of each store.

To achieve this, first, we gather the purchase data of each store from database. Second, the number of visiting, the number of purchase item are summed up with respect to customer, then we put it in order by the distance from the store. Third, we find the distances of some cumulative probability. Table 4 shows an example. The 1st column is the distance from a store to customer’s home address, and it is arranged by distance. From the 2nd to 4th columns are Number of visiting the store, the number of purchase items and purchase amount. Moreover, from the 5th to 7th of the table is cumulative ratios for each variable. As shown this table, we found 5 for 50 percentile distance (radius) according to the number of visiting. Thus we can interpret 50% of customer are into circle whose center are store with radius 5. Furthermore, we found 4 and 3 of distance for the number of purchase and purchase amount, respectively.

Table 4. An example to calculate radius

4.3 Result of Analysis

Using the method in the previous subsection, we can calculate the radius for each cumulative ratio. Table 5 shows the summary statistics of radius for some percentiles.

Table 5. Summary statistics of radius (meter)

As shown in Table 5, for example, the number of purchase items of 50% have various values, the minimum is about 300 m but the maximum is about 4,000 m. Figure 5 shows the histograms of 80% of the number of visiting. The radius of each store is not similar, because the location situation and customer are not same.

Fig. 5.
figure 5

Histogram of 80% trade area of the number of purchase

For the next analysis, we analyze the effect of various causes for trade area size. To do this, we use multiplicative regression analysis, the equation is shown in Eq. (6).

$$ y_{i} = \beta_{0} \mathop \prod \limits_{j = 1}^{p} \beta_{j}^{{x_{ij} }} \varepsilon_{j} $$
(6)

where \( \beta_{0} , \beta_{1} , \ldots ,\beta_{p} \) are the interrupt and slope parameters and \( \varepsilon_{j} \) is the residual. Taking the logarithm of \( y_{i} \), Eq. (6) can be treat as linear model which can be used the ordinal least square method. The reasons why multiplicative model is adapted is the distance is not negative value and effects of some variables seem exponentially, when we use multiplicative model, we obtain only positive predicted values. In addition, the variables vary broadly, thus if we use a linear model, the residual may not distribute homogeneously.

In this study, we use “No. of parking,” “square root of parking,” “cube root of parking,” “sales area,” “location,” “No. of items” “population around store (in 1 km radius of store address),” “No. of household (in 1 km radius of store address)” and “No. of household size” for explanatory variables of our regression model. The reason why we set 3 kinds of parking lot is the effectiveness of the number of parking does not seem linear, thus the combination of these variables may express non-linear effect. The variable “location” has 3 factors; urban, suburb or country, and the other variables are continuous. In analysis, we omit “urban” level. The variables according with household are gotten from jSTAT MAP supported by National Statistics CenterFootnote 1.

The response variable is 80% cumulative ratio of “No. of visiting,” “No. of purchase items” and “purchase amount.”

When we analyze regression model, we adopt variable selection to choose significant variable statisticallyFootnote 2. Table 6 shows the selected variables and the value of parameters for 3 kinds of 80% radius model. All results of our models are summarized in the tables of appendix.

Table 6. Result of multiplicative regression model (The value is displayed by exponential form, so for example E−01 means \( 10^{ - 1} \)).

All 3 models of Table 6 selected common variables. When population is lower then the trade area is larger. We can interpret the result that a store located on country need customers who live far from the store, thus as a result, the trade area becomes wider.

The value of multiplicative correlation coefficients are about 0.870 for all models. Figure 6 shows the scatter plot of actual versus predicted value of trade area for the number of purchase items. Some points are not close to 45o line, thus we may need some further analysis. However, almost stores are well predicted, thus the results are generally appropriate. All scatter plots are shown in appendix in calse of higher percentage, the multiple corelation coefficient value are higher, thus outliers are fewer and the prediction is achieved appropriately. However about especially larger distance, the other variables may be needed additionaly.

Fig. 6.
figure 6

Actual vs. predict (No. of purchase items)

About percentiles, models of higher percentile are better predicted. Espectially, 25% cumulative probability model are not well-predicted. One of the reasons obtained these results, we may point out the core customer of store are not 25% region, but broader area. Moreover about the larger radius, the fitting of predicted value are not well. Almost of these store are located on country, thus the number of population around store is not many thus the stores must consider the wider area, thus the various noise (e.g. the number of competitive store or difference of lifestyle of residences) may be ignoring in our models. However, our model are well-predicted on the whole.

5 Concluding Remarks

In this study, we focus on consumer behavior, especially selecting store behavior. From the viewpoint of store or store manager, we analyzed the range of consumer for the store that is trade area, using ID-POS data. Then we analyzed the trade area radius using regression model. From out result, we could estimate well prediction for trade area radius. Our model can be utilized to optimize area marketing strategy or open a new store, because the trade area can be estimated.

In this study, we only calculate the radius for each store, however, we did not consider items which is assortment in store. To match the needs of customer of each store, we need to know the true needs of core customer or heavy user. In future work, we will consider the needs of items and gather our analysis. In addition, we did not consider the competitive store around each store. When the number of competitor are larger, than the competition are severer, thus it may be effected on the change of trade area radius. Moreover, the coexistence of plural stores of same chain were not considered. Some customer may use properly some stores, for example, lunch box is purchased near office, however almost of foodstuff are purchased near home. These are also our future works.