Academy of Marketing Science.
------------------------------------------------------------------------------------------------------------------
Combining information from a web based survey and a telephone survey
Ingvar Tjøstheim, Norwegian Computing Center, Norway
Ivar Solheim, Norwegian Computing Center, Norway
Magne Aldrin, Norwegian Computing Center, Norway
An on-line survey on the Internet,
a web survey, is a new way of collecting data from the users of a web site.
However, if answering is voluntary, a low response rate can be expected,
and an analysis based on the observed data alone may give biased and misleading
results due to self selection. The research question we address in this
paper is; what solutions can be recommended to correct for the possible
bias due to non-response? Our suggestion is to perform an additional survey,
with better quality with respect to the response rate and self-selection
bias, to combine the two surveys by statistical modelling. This approach
is used to study a specific web site, where data were collected both by
web and telephone surveys.
1 Introduction
In marketing research there are a number of data-collection tools available. One of the most popular is telephone interviewing (CATI). The Internet revolution has given opportunities for other ways of collecting survey data. For instance, E-mail based surveys have been used for a while. These are surveys where either a questionnaire is sent by email in text-format, or an invitation to participate in a survey is sent by email. With the growth of the WWW on the Internet in the last 3 - 4 years, the question of how to collect data through the web has been raised. An example is Peterson, Balasubramanian and Bronnenberg (1997) who write "A second aspect of the Internet that has attracted attention is its potential in the marketing research arena". However, these authors do not refer to any empirical studies.
In contrast to well-known methods such as postal and telephone surveys, Internet data-collection methods have not been much tested or studied so far (Smith 1997; Frost 1998). However, it seems they are getting more and more attention. For instance, in January 1998 the European Society of Opinion and Marketing Research held a seminar on Internet and market research. At this seminar one of the speakers ended his presentation by saying "It is only a matter of time before on-line surveys will be commonplace. The market research industry has very little choice in this. Either we find a way to conduct statistically reliable and predictable results from on-line surveys or our clients will move to on-line research without us." (Gates and Helton 1998).
In this paper, we consider web-based surveys. A web-based survey is here defined as an html-questionnaire, a questionnaire on the web which pops up on a particular web site. This definition does not include links to a questionnaire on a web site or banners on a web site with an invitation to participate in a survey. A web-based survey may potentially be very useful when the purpose is to investigate something closely related to the web site including user-characteristics. Some advantages of web-based surveys are: the sample is selected (drawn) directly from a sub-population of the real users, they are interviewed when using the web site, and they are an inexpensive way of collecting data. An important feature of such surveys is the fact that the respondents are sampled proportional to how often they use the web site. However, these sampling probabilities are not easily available. When someone is logging onto a web site, the IP-address of the visitors' computer is registered, but this can not be used to uniquely identify visitors or their frequency of using the web site (see Pitkow 1997 for a discussion). In future research we plan to use so called cookies to get more detailed information of individual user frequency, but such information was not available in the present study.
It is important that the web-survey doesn't disturb the regular use of the web site. Therefore it is desirable that a visitor has the option to use the web site without filling out the questionnaire. However, this may lead to a very low response rate, and the sample of respondents may be biased due to self selection. Hence, the question of how to deal with the potential bias in such surveys becomes an important issue. Our solution in this paper is to perform an additional survey, for instance by telephone, with a higher response rate and a lower degree of self selection, and to combine the two surveys by statistical modelling.
This approach is used in a study where data were collected both by web and by telephone. The purpose of the study was to investigate the users' opinions of a specific web site called ODIN. A web-based survey seemed to be a very useful way to collecting information about the web-users needs, since the sample could be selected directly from the sub-population who use the web site. However, there were as many as 88% non-respondents in the web-sample. There was no available information about the non-respondents, and one could not exclude the possibility that the non-respondents had a totally different opinion to the respondents. Especially, the willingness to respond may be higher among satisfied users of ODIN than dissatisfied users. Therefore, it was dangerous to draw conclusions from the observed web data only. For the purpose of studying this potential bias, a telephone survey was also carried out. In this data set, the number of non-respondents was negligible (see Section 2 for a discussion), but only 5% (55 people) of the sample had used the ODIN web site. Hence, telephone interviewing alone is a quite inefficient and expensive way of collecting information from the users of a particular web site, even if the web site is quite popular.
It may be possible to get a higher response rate than 12%, for instance by using incentives, but we believe that a low response rate should be expected in many web-based surveys. It only takes one click not to participate, normally a reminder cannot be send to the visitor, and answering is voluntary. We have also experienced a similarly low response rate on two other occasions when not using any incentives
This article is organized in the following
way: Section 2 gives an overview of the two surveys, including results
from a separate analysis of each survey. In Section 3 we discuss how the
two surveys are related to each other as a consequence of their sampling
schemes. Section 4 introduces a statistical model which allows the probability
for non-response to depend on the respondents' opinion of the web site.
Then the results, the estimates generated by the model, are presented (Section
5).
2 A pilot-study with a web-based survey and a telephone survey
The Norwegian Government's official web site is named ODIN (http://odin.dep.no/html/english/). In 1997 the use of this web site was analysed. One of the surveys was web-based. The other relevant survey for this paper was a telephone survey. The two surveys are:
The telephone data were collected by Gallup Norway, according to their ordinary sampling routines. There are non-respondents in these data too (about 30%). However, compared to the web-survey, it is much less likely that the response rate is directly related to the users' opinion about ODIN. The response rate may vary systematically with certain criteria such as sex and age, but the population averages of these criteria are known exactly for the total Norwegian population. One may therefore correct for bias related to these criteria by down-weighting groups which are over-represented in the sample compared to the population total (see for instance Särndal, Swensson and Wretman 1992), and the missing observations may be ignored. Usually, Gallup Norway performs such weighting according to sex, age and place of residence. However, this had little practical importance for the present sample. Hence, for simplicity we have ignored the weighting, and assume that the respondents were randomly chosen from the Norwegian population.
The non-response in the web data is much more serious, and can not be ignored. First, it is reasonable to assume that the response rate is systematically related to the questions under investigation, namely the users' opinion about the web site and their frequency of using it. Second, the response rate is very low. Third, there is no exact information on population level to correct for bias. However, the telephone data contain relevant, but uncertain information, which calls for a combined analysis of the two data sets.
In both surveys, the respondents were asked how often they visited the web site. In the telephone survey they had these alternatives: "never", "once", "few", "periodically", "monthly", "weekly" or "daily". The respondents of the web survey were given only the last six alternatives, since by definition they had used ODIN at least once. People who had never used ODIN are in the following defined as NON-ODIN users. These are said to belong to category 0. An ODIN user is defined as a person who had used ODIN at least once.
The ODIN users were further asked how useful ODIN was to them, with alternatives "not useful", "little usefulness", "some usefulness", or "very useful". For the present analysis, the classifications are simplified, such that we only distinguish between satisfied and less satisfied users, and between frequent and less frequent users. A satisfied user is here defined as one who has found the web site "very useful", whereas all others are classified as less satisfied. A frequent user is defined as one who uses ODIN "daily" or "weekly". The ODIN users are then cross-classified into the following four categories:
category 2: frequent (F) and less satisfied (LS) users
category 3: less frequent (LF) and satisfied (S) users
category 4: frequent (F) and satisfied (S) users.
Table 1 Number of people within
each category in the two data sets.
| Category |
|
1 |
2 |
3 |
4 |
|
| Telephone survey |
|
|
|
|
|
|
| Web survey |
|
|
|
|
|
|
3 Relating the two data sets - un-weighted and weighted proportions
In this section we discuss the different sampling properties of the two surveys. From this we define un-weighted and weighted proportions of satisfied users and of frequent users.
Consider first the people reached by
the telephone survey. Let
,
i=0, ... , 4 be the probability that such a person belongs to the i-th
category, i.e.
pi = P (person reached by phone belongs ( 1) to category i) i = 0, … ,4.
Next, consider the web survey. Let qi denote the corresponding probabilities
qi = P (person reached by web belongs ( 2) to category i) i = 1, … ,4.
One could also define q0= P(person reached by web belongs to category 0), but this probability is exactly zero. The rationale behind the use of different probabilities for the two data sets is that the sampling procedure differs between the two surveys. In both surveys, the people are sampled from the total Norwegian population. In the telephone survey, each person is sampled with equal probability, while in the web survey each person is sampled proportionally to their frequency of using ODIN. If a person uses ODIN daily for example, the probability of getting the web-questionnaire would be about 30 times higher than if they accessed ODIN on a monthly basis.
The next step in the modelling is to
introduce dependency between
,
and
which are of course
related parameters. Consider the total population, and let ni
denote the number of people in the i-th category. Let
,
j=1, ... , ni be weights proportional
to the frequency of using ODIN by the j-th person of category i. The sampling
schemes then link the p- and q-probabilities together through
(
3)
where
is
proportional to the average frequency within category i. The wi's
are normalized such that
.
The definitions of
and
in
( 1) and ( 2) follow from the sampling schemes. However, they have also
alternative interpretations:
We are interested in the proportion of people within each category, and especially some marginals: those of frequent users and of satisfied users among all ODIN users. The un-weighted proportion of satisfied users is defined by
.
( 4)
The corresponding weighted proportion of satisfied users is defined by
=
.
( 5)
The un-weighted and weighted proportions of frequent users are defined similarly.
Both un-weighted and weighted proportions are interesting quantities. If the purpose is to improve a web site, it may be more important to satisfy frequent users' than "surfers", and weighted proportions are then meaningful. On the other hand, less frequent users may become frequent users in the future if the web site is improved according to their needs, therefore the un-weighted proportions are also of interest.
Since people in the telephone survey are sampled with equal probability, the un-weighted proportion of satisfied users may be estimated from the telephone data set alone, using a sample version of ( 4) to the telephone data. The estimated proportion of frequent users is (10+3)/55, and the estimated proportion of satisfied users is (8+3)/55. The estimates and their 95% confidence intervals are shown in Table 2.
Table 2 Estimated
proportion of frequent and satisfied users, based on the telephone data
only. 95% confidence intervals are given in parenthesis.
|
|
(14-36) |
|
|
(11- 32) |
The confidence intervals are calculated by bootstrapping (Efron and Tibshirani 1993): An artificial data set is created by random draws of 55 observations with replacement from the original data set, and a new estimate is calculated from this artificial data set. This is repeated 4000 times, yielding 4000 artificial estimates which will vary over the data sets. These estimates are used to calculate the confidence intervals by the bias corrected percentile method (Efron 1982).
Consider then the web-based survey, and assume for the moment that it is completely random who answered and who did not. Then the weighted proportion of satisfied users can be estimated simply by using the sample version of ( 5) on the web data, without knowledge of the individual frequencies wij.This is because the ODIN users in the web-based survey are sampled proportional to their frequencies wij. The result is presented in Table 3. The weighted proportion of frequent users is 38%. This is higher than the un-weighted proportion of 24% in Table 2, and reflects the fact that in the web-based survey the frequent users by definition are sampled more often than the less frequent ones. However, this is a relatively small increase: Let us assume that the frequent users log onto ODIN 10 times more often than the less frequent users. An un-weighted proportion of 24% frequent users would then correspond to a weighted proportion of (10 x 0.24) / (1 x 0.76 + 10 x 0.24) = 76%. Hence, one can wonder if the frequent users are under-represented among those who answered the web-questionnaire.
Table 3 Estimated
weighted proportion of frequent and satisfied users, based on the observed
web data only. 95% confidence intervals are given in parenthesis.
|
|
(30-47) |
|
|
(39- 56) |
In general, inference based only on the observed answers could be very misleading exactly because the probability of a user not responding may depend on which of the categories they belong to, and especially if the user is satisfied or not. Hence, to get reliable results for the weighted proportions, it is essential to perform a combined analysis of the two data sets. This will be the theme of the next section.
4 A statistical model for combining the surveys, allowing for non-ignorable non-response
In this section, we will drop the assumption that non-response was completely random, and allow the response rate in the web survey to differ systematically between satisfied and less satisfied users, and between frequent and less frequent users.
However, in order to reduce the number
of parameters to be estimated, we will first assume some more structure
in the relationship between the two data sets as defined in ( 3). Members
of categories 1 and 3 are less frequent users, and it is reasonable to
assume that the average use is approximately the same in the two categories.
To be parsimonious we therefore assume
,
and for the same reason we assume
.
By definition, we have that
or
equivalently
. Now,
and
are
linked together through
,
,
,
. ( 6)
Since both the pi
and qi probabilities add to one,
there are now 5 free parameters to be estimated. It is for instance sufficient
to know
and the ratio
.
Then
is given by
,
whereas
and
are
given by
and
,
which follows from ( 6) and from
.
We now introduce a logistic regression
model for non-response in the web sample, where the probability of response
may depend on the category. A model with one specific probability of response
for each category would be impossible to identify. Instead, we postulate
a logistic regression model where the linear predictor includes main effects,
but no interactions. Let
be
the probability of response for a user in the i-th category. Our response
model is then defined by
![]()
.
( 7)
The parameter can be expressed as
![]()
which
is the log odds ratio of the probability of response between frequent and
less frequent users. The parameter
has
a similar interpretation as the log odds ratio of the probability of response
between satisfied and less satisfied users. The response probabilities
may also be expressed directly, for instance as
.
The full model has 8 free parameters
to be estimated. Denote these parameters by
,
where
.
The telephone data are multinomial
with probabilities,
, and is thus
directly related to
. It
remains to establish the relationship between
and
the web data. The web data are multinomial with 5 categories. Let
,
i=1, ...,4 denote the probability of a person responding and belonging
to the i-th category, and let
be
the probability of not responding (missing). These probabilities are given
by
,
,
,
,
.
( 8)
The model is now fully specified. Let
,
i=0, ..., 4 and
, i=1, ...,
5 denote the observed numbers within each category in the telephone and
web data respectively (from Table 1). Except for a constant independent
of
, the log likelihood
is
where
,
( 9)
since the two data sets are independent
and multinomial. The maximum likelihood estimate of
is
found by maximising ( 9) by a numerical optimisation procedure. Several
constraints apply to the various single parameters in
,
so in order to simplify the numerical optimisation the parameters in
are
transformed to 8 unconstrained parameters. The transformations are shown
in Appendix A.
Confidence intervals are found by bootstrapping as in Section 3. New artificial data sets are constructed in pairs by independent random sampling with replacement from both the telephone and web data sets.
5 Estimation results assuming non-ignorable non-response
We now return to the model defined
in the last section that is based jointly on both samples. The estimated
parameters are given in Table 4, including some derived parameters. We
see that satisfied users are significantly more willing to answer the web-questionnaire
than less satisfied users (the estimated log odds ratio
is
significantly positive), which seems to be very reasonable. The estimate
of
is negative, which is
interpreted as a lower probability of response among frequent users than
among less frequent users, but this effect is not significant. The estimate
of
is 10, which is interpreted
as frequent users using ODIN on average ten times more often than less
frequent users. This is a reasonable result, but the uncertainty is large,
and the estimate is not significant different from 1. Hopefully, we will
be able to get more precise estimates in later studies, because we plan
to collect more detailed information of individual user frequency by using
cookies, and we also plan to increase the sample sizes of both types of
data.
Table 4 Estimated
parameters with 95% confidence intervals, based on both samples.
|
|
|
|
|
|||
|
|
94.6%
|
93.2%
|
95.8%
|
|||
|
|
3.4%
|
2.3%
|
4.5%
|
|||
|
|
1.0%
|
0.5%
|
1.7%
|
|||
|
|
0.8%
|
0.4%
|
1.4%
|
|||
|
|
0.3%
|
0.0%
|
0.6%
|
|||
|
|
10.1
|
1.0
|
44.8
|
|||
|
|
-1.29
|
-2.87
|
1.22
|
|||
|
|
-2.17
|
-4.92
|
1.19
|
|||
|
|
1.74
|
0.52
|
3.62
|
|||
|
|
21.6%
|
5.4%
|
77.3%
|
|||
|
|
3.0%
|
1.5%
|
17.2%
|
|||
|
|
61.1%
|
14.4%
|
100.0%
|
|||
|
|
15.2%
|
3.7%
|
31.7%
|
|||
The new estimates of the un-weighted proportions are given Table 5, which should be compared to Table 2. The point estimates are unchanged (within the accuracy shown in the tables). Thus, combining the data sets has not given more precise estimates of the un-weighted proportions. The reason is that there are too many missing observations in the web sample to influence the estimates of the un-weighted proportions.
Table 5 Estimated
proportion of frequent and satisfied users, based on a combined analysis
of both samples. 95% confidence intervals in parenthesis.
|
|
(14-36) |
|
|
(11- 32) |
Table 6 gives the estimates for the weighted proportions, using both data sets. These results are quite different from those we got from the observed web data only (Table 3). The estimated weighted proportion of frequent users has changed from 38% in Table 3 to 76% in Table 6. The estimate of the weighted proportion of satisfied users is now 22%, compared to 48% in Table 3. This gives a much less positive impression of the usefulness of ODIN than the numbers in Table 3, which were calculated under unrealistic assumptions. The confidence intervals are larger in Table 6 than in Table 3, because they are based on more realistic assumptions. Our statistical model allows that the probability of answering depended on the users' frequency and satisfaction.
Table 6 Estimated
weighted proportion of frequent and satisfied users, based on a
combined analysis of both samples. 95% confidence intervals in parenthesis.
|
|
(17-92) |
|
|
(11- 41) |
6 Concluding remarks
Internet as a medium presents certain unique problems for surveying. Since there is no central registry and it is neither practical nor affordable to contact all users of Internet, a basic challenge is to develop methodologies and research strategies that may produce statistically reliable data based on a selection of subsets of users. Data from web-based surveys must be interpreted with care; often the statistical value is limited.
Our research strategy has been to combine data from a web-based survey with data from another survey with fewer data, but with better control of the response rate and self selection. Our initial web-based survey had a predictably low response rate, and although the results from this survey gave valuable information about the actual web site, we have seen the need for more rigorous statistical analysis. The combination of the data set from the web-based survey with data from a nation-wide, representative survey of the same site gave more reliable results. However, the current study is based on a simple schematization of realistic surveys. Firstly, we have assumed that the telephone survey was an ideal representative random sample, but in reality such surveys may have defects that require more sophisticated modelling. Further, we have considered only two response variables of interest, each one with only two levels. With more variables, more categories and perhaps continuous, but non-Gaussian variables, the model would become much more complex and difficult to estimate.
Some conclusions may be drawn from our analysis. Firstly, web-based surveys can become a valuable tool in market research, but the methods and techniques must be further developed in order to get more valid information about the users of a web site. There is little doubt that this kind of survey offers unique time and cost benefits for market researchers. Our approach with a pop-up questionnaire to every 20th visitor, where the responses are immediately sent back to our server and stored in our database ready for the analysis, has demonstrated some of these benefits. Secondly, since the response rate is often quite low and there is a lack of information about the non-respondents, we recommend that web-based surveys should be combined with parallel surveys in order to validate the data sets and get better control of responses and respondents. The results from our analysis indicate that the biases in web-surveys can be substantial, and that therefore some kind of alternative or complementary strategy is not only recommended, but probably necessary. Moreover, our strategy has been shown to be fruitful in minimizing the biases due to a high non-response rate in the web-based survey.
Lastly, and contrary to recent research,
our results shows statistically significant positive correlation between
satisfaction and response rate. Peterson & Wilson (1992) conclude that
satisfaction percentages are not related to response rate percentages.
The basis for their conclusion is a correlation test of 15 independent
studies containing both satisfaction percentages and response rates percentages.
Our approach is different. We have two samples from the same population,
and this gives us a unique opportunity to analyse user attitudes more accurately.
As also stated by Peterson and Wilson, the concept of satisfaction is difficult
to analyse due to lack of standardisation. Also it may reflect the Hawthorne
effect: attempts to measure customer satisfaction will often, in and of
themselves, increase satisfaction, regardless of the product or service
being investigated. Here our approach, which combines different data sets
of the same population, may present an interesting alternative methodological
strategy to simple correlation tests of quite different and often incompatible
issues, populations and concepts of satisfaction.
Acknowledgements
We thank Arnoldo Frigessi, David Hirst
and two referees for very helpful comments on drafts on this paper. The
work was funded by the Norwegian Research Council.
Appendix A Reparameterization
Here we reparameterize the 8 constrained
parameters of
into 8 unconstrained
parameters, to simplify the numerical optimization. Let
,
i=1, ..., 4 be unconstrained parameters, and let
,
i=1, ..., 4 be defined by
.
( 10)
To assure that the ratio
is
greater than 1, it is reparameterized as
. ( 11)
Now, estimates of
can
be found by estimating the unconstrained parameters
.
References
Efron, Bradley and Robert Tibshirani, 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
Frost, Fraser,1998. "Electronic Surveys - New methods of primary data collection." In Proceedings - 27th EMAC Conference. Ed. Per Andersson, Stockholm, 213-232.
Gates, Roger and Alecia Helton,1998. "The Newest Mousetrap: What does it Catch?" In the ESOMAR Seminar on the Internet and Market Research, ESOMAR Publication Series Ð Volume 220, Paris, 28 - 30 January 1998, 75 - 84.
Peterson, Robert. A. and William R. Wilson, 1992. "Measuring Customer Satisfaction: Fact and Artifact," Journal of the Academy of Marketing Science 2; 61 - 71.
Peterson, Robert. A., Sridhar Balasubramanian and Bart J. Bronnenberg,1997. "Exploring the Implications of the Internet for Consumer Marketing," Journal of the Academy of Marketing Science 25; 329 - 346.
Pitkow, James, 1997. "In Search of Reliable Usage Data on the WWW." In Proceedings - The Sixth International World Wide Web Conference. Eds. Michael R. Genesereth and Anna Patterson, Santa Clara, California, 451 - 464.
Särndal, Carl-Erik, Bengt Swensson and Jan Wretman, 1992. Model Assisted Survey Sampling. Springer, New York.
Smith, Christine B., 1997. "Casting
the Net: Surveying an Internet Population," Journal of Computer-Mediated
Communication 3.; (http://jcmc.mscc.huji.ac.il/vol3/issue1/smith.html).