Academy of Marketing Science.

Combining information from a web based survey and a telephone survey
Ingvar Tjøstheim, Norwegian Computing Center, Norway
Ivar Solheim, Norwegian Computing Center, Norway
Magne Aldrin, Norwegian Computing Center, Norway
An online survey on the Internet,
a web survey, is a new way of collecting data from the users of a web site.
However, if answering is voluntary, a low response rate can be expected,
and an analysis based on the observed data alone may give biased and misleading
results due to self selection. The research question we address in this
paper is; what solutions can be recommended to correct for the possible
bias due to nonresponse? Our suggestion is to perform an additional survey,
with better quality with respect to the response rate and selfselection
bias, to combine the two surveys by statistical modelling. This approach
is used to study a specific web site, where data were collected both by
web and telephone surveys.
1 Introduction
In marketing research there are a number of datacollection tools available. One of the most popular is telephone interviewing (CATI). The Internet revolution has given opportunities for other ways of collecting survey data. For instance, Email based surveys have been used for a while. These are surveys where either a questionnaire is sent by email in textformat, or an invitation to participate in a survey is sent by email. With the growth of the WWW on the Internet in the last 3  4 years, the question of how to collect data through the web has been raised. An example is Peterson, Balasubramanian and Bronnenberg (1997) who write "A second aspect of the Internet that has attracted attention is its potential in the marketing research arena". However, these authors do not refer to any empirical studies.
In contrast to wellknown methods such as postal and telephone surveys, Internet datacollection methods have not been much tested or studied so far (Smith 1997; Frost 1998). However, it seems they are getting more and more attention. For instance, in January 1998 the European Society of Opinion and Marketing Research held a seminar on Internet and market research. At this seminar one of the speakers ended his presentation by saying "It is only a matter of time before online surveys will be commonplace. The market research industry has very little choice in this. Either we find a way to conduct statistically reliable and predictable results from online surveys or our clients will move to online research without us." (Gates and Helton 1998).
In this paper, we consider webbased surveys. A webbased survey is here defined as an htmlquestionnaire, a questionnaire on the web which pops up on a particular web site. This definition does not include links to a questionnaire on a web site or banners on a web site with an invitation to participate in a survey. A webbased survey may potentially be very useful when the purpose is to investigate something closely related to the web site including usercharacteristics. Some advantages of webbased surveys are: the sample is selected (drawn) directly from a subpopulation of the real users, they are interviewed when using the web site, and they are an inexpensive way of collecting data. An important feature of such surveys is the fact that the respondents are sampled proportional to how often they use the web site. However, these sampling probabilities are not easily available. When someone is logging onto a web site, the IPaddress of the visitors' computer is registered, but this can not be used to uniquely identify visitors or their frequency of using the web site (see Pitkow 1997 for a discussion). In future research we plan to use so called cookies to get more detailed information of individual user frequency, but such information was not available in the present study.
It is important that the websurvey doesn't disturb the regular use of the web site. Therefore it is desirable that a visitor has the option to use the web site without filling out the questionnaire. However, this may lead to a very low response rate, and the sample of respondents may be biased due to self selection. Hence, the question of how to deal with the potential bias in such surveys becomes an important issue. Our solution in this paper is to perform an additional survey, for instance by telephone, with a higher response rate and a lower degree of self selection, and to combine the two surveys by statistical modelling.
This approach is used in a study where data were collected both by web and by telephone. The purpose of the study was to investigate the users' opinions of a specific web site called ODIN. A webbased survey seemed to be a very useful way to collecting information about the webusers needs, since the sample could be selected directly from the subpopulation who use the web site. However, there were as many as 88% nonrespondents in the websample. There was no available information about the nonrespondents, and one could not exclude the possibility that the nonrespondents had a totally different opinion to the respondents. Especially, the willingness to respond may be higher among satisfied users of ODIN than dissatisfied users. Therefore, it was dangerous to draw conclusions from the observed web data only. For the purpose of studying this potential bias, a telephone survey was also carried out. In this data set, the number of nonrespondents was negligible (see Section 2 for a discussion), but only 5% (55 people) of the sample had used the ODIN web site. Hence, telephone interviewing alone is a quite inefficient and expensive way of collecting information from the users of a particular web site, even if the web site is quite popular.
It may be possible to get a higher response rate than 12%, for instance by using incentives, but we believe that a low response rate should be expected in many webbased surveys. It only takes one click not to participate, normally a reminder cannot be send to the visitor, and answering is voluntary. We have also experienced a similarly low response rate on two other occasions when not using any incentives
This article is organized in the following
way: Section 2 gives an overview of the two surveys, including results
from a separate analysis of each survey. In Section 3 we discuss how the
two surveys are related to each other as a consequence of their sampling
schemes. Section 4 introduces a statistical model which allows the probability
for nonresponse to depend on the respondents' opinion of the web site.
Then the results, the estimates generated by the model, are presented (Section
5).
2 A pilotstudy with a webbased survey and a telephone survey
The Norwegian Government's official web site is named ODIN (http://odin.dep.no/html/english/). In 1997 the use of this web site was analysed. One of the surveys was webbased. The other relevant survey for this paper was a telephone survey. The two surveys are:
The telephone data were collected by Gallup Norway, according to their ordinary sampling routines. There are nonrespondents in these data too (about 30%). However, compared to the websurvey, it is much less likely that the response rate is directly related to the users' opinion about ODIN. The response rate may vary systematically with certain criteria such as sex and age, but the population averages of these criteria are known exactly for the total Norwegian population. One may therefore correct for bias related to these criteria by downweighting groups which are overrepresented in the sample compared to the population total (see for instance Särndal, Swensson and Wretman 1992), and the missing observations may be ignored. Usually, Gallup Norway performs such weighting according to sex, age and place of residence. However, this had little practical importance for the present sample. Hence, for simplicity we have ignored the weighting, and assume that the respondents were randomly chosen from the Norwegian population.
The nonresponse in the web data is much more serious, and can not be ignored. First, it is reasonable to assume that the response rate is systematically related to the questions under investigation, namely the users' opinion about the web site and their frequency of using it. Second, the response rate is very low. Third, there is no exact information on population level to correct for bias. However, the telephone data contain relevant, but uncertain information, which calls for a combined analysis of the two data sets.
In both surveys, the respondents were asked how often they visited the web site. In the telephone survey they had these alternatives: "never", "once", "few", "periodically", "monthly", "weekly" or "daily". The respondents of the web survey were given only the last six alternatives, since by definition they had used ODIN at least once. People who had never used ODIN are in the following defined as NONODIN users. These are said to belong to category 0. An ODIN user is defined as a person who had used ODIN at least once.
The ODIN users were further asked how useful ODIN was to them, with alternatives "not useful", "little usefulness", "some usefulness", or "very useful". For the present analysis, the classifications are simplified, such that we only distinguish between satisfied and less satisfied users, and between frequent and less frequent users. A satisfied user is here defined as one who has found the web site "very useful", whereas all others are classified as less satisfied. A frequent user is defined as one who uses ODIN "daily" or "weekly". The ODIN users are then crossclassified into the following four categories:
category 2: frequent (F) and less satisfied (LS) users
category 3: less frequent (LF) and satisfied (S) users
category 4: frequent (F) and satisfied (S) users.
Table 1 Number of people within
each category in the two data sets.
Category 

1 
2 
3 
4 

Telephone survey 






Web survey 






3 Relating the two data sets  unweighted and weighted proportions
In this section we discuss the different sampling properties of the two surveys. From this we define unweighted and weighted proportions of satisfied users and of frequent users.
Consider first the people reached by the telephone survey. Let , i=0, ... , 4 be the probability that such a person belongs to the ith category, i.e.
pi = P (person reached by phone belongs ( 1) to category i) i = 0, … ,4.
Next, consider the web survey. Let qi denote the corresponding probabilities
qi = P (person reached by web belongs ( 2) to category i) i = 1, … ,4.
One could also define q0= P(person reached by web belongs to category 0), but this probability is exactly zero. The rationale behind the use of different probabilities for the two data sets is that the sampling procedure differs between the two surveys. In both surveys, the people are sampled from the total Norwegian population. In the telephone survey, each person is sampled with equal probability, while in the web survey each person is sampled proportionally to their frequency of using ODIN. If a person uses ODIN daily for example, the probability of getting the webquestionnaire would be about 30 times higher than if they accessed ODIN on a monthly basis.
The next step in the modelling is to introduce dependency between , and which are of course related parameters. Consider the total population, and let ni denote the number of people in the ith category. Let , j=1, ... , ni be weights proportional to the frequency of using ODIN by the jth person of category i. The sampling schemes then link the p and qprobabilities together through
( 3)
where is proportional to the average frequency within category i. The wi's are normalized such that .
The definitions of and in ( 1) and ( 2) follow from the sampling schemes. However, they have also alternative interpretations:
We are interested in the proportion of people within each category, and especially some marginals: those of frequent users and of satisfied users among all ODIN users. The unweighted proportion of satisfied users is defined by
. ( 4)
The corresponding weighted proportion of satisfied users is defined by
= . ( 5)
The unweighted and weighted proportions of frequent users are defined similarly.
Both unweighted and weighted proportions are interesting quantities. If the purpose is to improve a web site, it may be more important to satisfy frequent users' than "surfers", and weighted proportions are then meaningful. On the other hand, less frequent users may become frequent users in the future if the web site is improved according to their needs, therefore the unweighted proportions are also of interest.
Since people in the telephone survey are sampled with equal probability, the unweighted proportion of satisfied users may be estimated from the telephone data set alone, using a sample version of ( 4) to the telephone data. The estimated proportion of frequent users is (10+3)/55, and the estimated proportion of satisfied users is (8+3)/55. The estimates and their 95% confidence intervals are shown in Table 2.
Table 2 Estimated
proportion of frequent and satisfied users, based on the telephone data
only. 95% confidence intervals are given in parenthesis.

(1436) 

(11 32) 
The confidence intervals are calculated by bootstrapping (Efron and Tibshirani 1993): An artificial data set is created by random draws of 55 observations with replacement from the original data set, and a new estimate is calculated from this artificial data set. This is repeated 4000 times, yielding 4000 artificial estimates which will vary over the data sets. These estimates are used to calculate the confidence intervals by the bias corrected percentile method (Efron 1982).
Consider then the webbased survey, and assume for the moment that it is completely random who answered and who did not. Then the weighted proportion of satisfied users can be estimated simply by using the sample version of ( 5) on the web data, without knowledge of the individual frequencies wij.This is because the ODIN users in the webbased survey are sampled proportional to their frequencies wij. The result is presented in Table 3. The weighted proportion of frequent users is 38%. This is higher than the unweighted proportion of 24% in Table 2, and reflects the fact that in the webbased survey the frequent users by definition are sampled more often than the less frequent ones. However, this is a relatively small increase: Let us assume that the frequent users log onto ODIN 10 times more often than the less frequent users. An unweighted proportion of 24% frequent users would then correspond to a weighted proportion of (10 x 0.24) / (1 x 0.76 + 10 x 0.24) = 76%. Hence, one can wonder if the frequent users are underrepresented among those who answered the webquestionnaire.
Table 3 Estimated
weighted proportion of frequent and satisfied users, based on the observed
web data only. 95% confidence intervals are given in parenthesis.

(3047) 

(39 56) 
In general, inference based only on the observed answers could be very misleading exactly because the probability of a user not responding may depend on which of the categories they belong to, and especially if the user is satisfied or not. Hence, to get reliable results for the weighted proportions, it is essential to perform a combined analysis of the two data sets. This will be the theme of the next section.
4 A statistical model for combining the surveys, allowing for nonignorable nonresponse
In this section, we will drop the assumption that nonresponse was completely random, and allow the response rate in the web survey to differ systematically between satisfied and less satisfied users, and between frequent and less frequent users.
However, in order to reduce the number of parameters to be estimated, we will first assume some more structure in the relationship between the two data sets as defined in ( 3). Members of categories 1 and 3 are less frequent users, and it is reasonable to assume that the average use is approximately the same in the two categories. To be parsimonious we therefore assume , and for the same reason we assume . By definition, we have that or equivalently . Now, and are linked together through
, ,
, . ( 6)
Since both the pi and qi probabilities add to one, there are now 5 free parameters to be estimated. It is for instance sufficient to know and the ratio . Then is given by , whereas and are given by and , which follows from ( 6) and from .
We now introduce a logistic regression model for nonresponse in the web sample, where the probability of response may depend on the category. A model with one specific probability of response for each category would be impossible to identify. Instead, we postulate a logistic regression model where the linear predictor includes main effects, but no interactions. Let be the probability of response for a user in the ith category. Our response model is then defined by
. ( 7)
The parameter can be expressed as
which is the log odds ratio of the probability of response between frequent and less frequent users. The parameter has a similar interpretation as the log odds ratio of the probability of response between satisfied and less satisfied users. The response probabilities may also be expressed directly, for instance as .
The full model has 8 free parameters to be estimated. Denote these parameters by , where.
The telephone data are multinomial with probabilities,, and is thus directly related to . It remains to establish the relationship between and the web data. The web data are multinomial with 5 categories. Let , i=1, ...,4 denote the probability of a person responding and belonging to the ith category, and let be the probability of not responding (missing). These probabilities are given by
, , ,
, . ( 8)
The model is now fully specified. Let , i=0, ..., 4 and , i=1, ..., 5 denote the observed numbers within each category in the telephone and web data respectively (from Table 1). Except for a constant independent of , the log likelihood is where
, ( 9)
since the two data sets are independent and multinomial. The maximum likelihood estimate of is found by maximising ( 9) by a numerical optimisation procedure. Several constraints apply to the various single parameters in , so in order to simplify the numerical optimisation the parameters in are transformed to 8 unconstrained parameters. The transformations are shown in Appendix A.
Confidence intervals are found by bootstrapping as in Section 3. New artificial data sets are constructed in pairs by independent random sampling with replacement from both the telephone and web data sets.
5 Estimation results assuming nonignorable nonresponse
We now return to the model defined in the last section that is based jointly on both samples. The estimated parameters are given in Table 4, including some derived parameters. We see that satisfied users are significantly more willing to answer the webquestionnaire than less satisfied users (the estimated log odds ratio is significantly positive), which seems to be very reasonable. The estimate of is negative, which is interpreted as a lower probability of response among frequent users than among less frequent users, but this effect is not significant. The estimate of is 10, which is interpreted as frequent users using ODIN on average ten times more often than less frequent users. This is a reasonable result, but the uncertainty is large, and the estimate is not significant different from 1. Hopefully, we will be able to get more precise estimates in later studies, because we plan to collect more detailed information of individual user frequency by using cookies, and we also plan to increase the sample sizes of both types of data.
Table 4 Estimated
parameters with 95% confidence intervals, based on both samples.






94.6%

93.2%

95.8%



3.4%

2.3%

4.5%



1.0%

0.5%

1.7%



0.8%

0.4%

1.4%



0.3%

0.0%

0.6%



10.1

1.0

44.8



1.29

2.87

1.22



2.17

4.92

1.19



1.74

0.52

3.62



21.6%

5.4%

77.3%



3.0%

1.5%

17.2%



61.1%

14.4%

100.0%



15.2%

3.7%

31.7%

The new estimates of the unweighted proportions are given Table 5, which should be compared to Table 2. The point estimates are unchanged (within the accuracy shown in the tables). Thus, combining the data sets has not given more precise estimates of the unweighted proportions. The reason is that there are too many missing observations in the web sample to influence the estimates of the unweighted proportions.
Table 5 Estimated
proportion of frequent and satisfied users, based on a combined analysis
of both samples. 95% confidence intervals in parenthesis.

(1436) 

(11 32) 
Table 6 gives the estimates for the weighted proportions, using both data sets. These results are quite different from those we got from the observed web data only (Table 3). The estimated weighted proportion of frequent users has changed from 38% in Table 3 to 76% in Table 6. The estimate of the weighted proportion of satisfied users is now 22%, compared to 48% in Table 3. This gives a much less positive impression of the usefulness of ODIN than the numbers in Table 3, which were calculated under unrealistic assumptions. The confidence intervals are larger in Table 6 than in Table 3, because they are based on more realistic assumptions. Our statistical model allows that the probability of answering depended on the users' frequency and satisfaction.
Table 6 Estimated
weighted proportion of frequent and satisfied users, based on a
combined analysis of both samples. 95% confidence intervals in parenthesis.

(1792) 

(11 41) 
6 Concluding remarks
Internet as a medium presents certain unique problems for surveying. Since there is no central registry and it is neither practical nor affordable to contact all users of Internet, a basic challenge is to develop methodologies and research strategies that may produce statistically reliable data based on a selection of subsets of users. Data from webbased surveys must be interpreted with care; often the statistical value is limited.
Our research strategy has been to combine data from a webbased survey with data from another survey with fewer data, but with better control of the response rate and self selection. Our initial webbased survey had a predictably low response rate, and although the results from this survey gave valuable information about the actual web site, we have seen the need for more rigorous statistical analysis. The combination of the data set from the webbased survey with data from a nationwide, representative survey of the same site gave more reliable results. However, the current study is based on a simple schematization of realistic surveys. Firstly, we have assumed that the telephone survey was an ideal representative random sample, but in reality such surveys may have defects that require more sophisticated modelling. Further, we have considered only two response variables of interest, each one with only two levels. With more variables, more categories and perhaps continuous, but nonGaussian variables, the model would become much more complex and difficult to estimate.
Some conclusions may be drawn from our analysis. Firstly, webbased surveys can become a valuable tool in market research, but the methods and techniques must be further developed in order to get more valid information about the users of a web site. There is little doubt that this kind of survey offers unique time and cost benefits for market researchers. Our approach with a popup questionnaire to every 20th visitor, where the responses are immediately sent back to our server and stored in our database ready for the analysis, has demonstrated some of these benefits. Secondly, since the response rate is often quite low and there is a lack of information about the nonrespondents, we recommend that webbased surveys should be combined with parallel surveys in order to validate the data sets and get better control of responses and respondents. The results from our analysis indicate that the biases in websurveys can be substantial, and that therefore some kind of alternative or complementary strategy is not only recommended, but probably necessary. Moreover, our strategy has been shown to be fruitful in minimizing the biases due to a high nonresponse rate in the webbased survey.
Lastly, and contrary to recent research,
our results shows statistically significant positive correlation between
satisfaction and response rate. Peterson & Wilson (1992) conclude that
satisfaction percentages are not related to response rate percentages.
The basis for their conclusion is a correlation test of 15 independent
studies containing both satisfaction percentages and response rates percentages.
Our approach is different. We have two samples from the same population,
and this gives us a unique opportunity to analyse user attitudes more accurately.
As also stated by Peterson and Wilson, the concept of satisfaction is difficult
to analyse due to lack of standardisation. Also it may reflect the Hawthorne
effect: attempts to measure customer satisfaction will often, in and of
themselves, increase satisfaction, regardless of the product or service
being investigated. Here our approach, which combines different data sets
of the same population, may present an interesting alternative methodological
strategy to simple correlation tests of quite different and often incompatible
issues, populations and concepts of satisfaction.
Acknowledgements
We thank Arnoldo Frigessi, David Hirst
and two referees for very helpful comments on drafts on this paper. The
work was funded by the Norwegian Research Council.
Appendix A Reparameterization
Here we reparameterize the 8 constrained parameters of into 8 unconstrained parameters, to simplify the numerical optimization. Let , i=1, ..., 4 be unconstrained parameters, and let , i=1, ..., 4 be defined by
. ( 10)
To assure that the ratio is greater than 1, it is reparameterized as
. ( 11)
Now, estimates of can
be found by estimating the unconstrained parameters .
References
Efron, Bradley and Robert Tibshirani, 1993. An Introduction to the Bootstrap. Chapman & Hall, New York.
Frost, Fraser,1998. "Electronic Surveys  New methods of primary data collection." In Proceedings  27th EMAC Conference. Ed. Per Andersson, Stockholm, 213232.
Gates, Roger and Alecia Helton,1998. "The Newest Mousetrap: What does it Catch?" In the ESOMAR Seminar on the Internet and Market Research, ESOMAR Publication Series Ð Volume 220, Paris, 28  30 January 1998, 75  84.
Peterson, Robert. A. and William R. Wilson, 1992. "Measuring Customer Satisfaction: Fact and Artifact," Journal of the Academy of Marketing Science 2; 61  71.
Peterson, Robert. A., Sridhar Balasubramanian and Bart J. Bronnenberg,1997. "Exploring the Implications of the Internet for Consumer Marketing," Journal of the Academy of Marketing Science 25; 329  346.
Pitkow, James, 1997. "In Search of Reliable Usage Data on the WWW." In Proceedings  The Sixth International World Wide Web Conference. Eds. Michael R. Genesereth and Anna Patterson, Santa Clara, California, 451  464.
Särndal, CarlErik, Bengt Swensson and Jan Wretman, 1992. Model Assisted Survey Sampling. Springer, New York.
Smith, Christine B., 1997. "Casting
the Net: Surveying an Internet Population," Journal of ComputerMediated
Communication 3.; (http://jcmc.mscc.huji.ac.il/vol3/issue1/smith.html).