| |
-
Contents:
-
1. “Secondary Analysis” - Using old data
to test new ideas
-
2. The
potential of secondary analysis for teaching
-
3. Adding to the
informational value of data
-
4. MetaDater: Concepts and tools for survey documentation
-
1.
“Secondary Analysis” - Using old data to test new ideas
Using
old data for new ideas would be the colloquial description of what has
been defined in technical terms as secondary analysis. The situation is
best characterised by the fact that the researcher going to analyse the
data is not familiar with (all) the phases of data collection for the
respective data set. To
illustrate this idea by an example, think about the question: "How often
do you talk to your neighbour about politics?” In the concept of the
principal investigator, this question can be an indicator for political
interest. Collecting the answers face to face to that individual
question in a cross-national representative sample survey will cost
between one and four thousand US $. The costs for the complete survey,
which would be addressed to some two or three thousand people, covering
about one hundred questions altogether, would cost between 120,000 US $
and 400,000 US $. This provides a tremendous information value. Now,
think about chances to reactivate this information for other purposes.
This indicator for political interest quoted above could be used for a
completely different purpose. If another researcher is interested in the
concept of integration into neighbourhood, he may be very happy to come
across this question which has been originally asked to measure
political interest. It would also fit into this new frame of reference
measuring neighbourhood integration. Of course, a well trained
researcher will not rely on just this one indicator, but certainly it
would contribute to solve his data needs without requiring new field
research.
This
simple case shows that it is not only possible to reanalyze old data
under the same hypothesis as the principal investigator. It also
illustrates how old data may be used to answer new research questions
and test new frames of references. This adds enormously to the
informational value of original data, in particular if it is well
documented and prepared for further analysis.
For
what different purposes can data from archives be used? The first and
simplest case would be for descriptive purposes. In our example, we
would not be looking any more at the data under the concept of political
behaviour, but rather under the concept of communication behaviour in
neighbourhoods and reinterpret the question originally intended to
measure political interest under our new concept of neighbourhood
integration.
A
particular contribution of the data archives can be made to comparative
research, both, across nations and over time. In the early years of data
archives, when secondary analysis was not yet a popular research
strategy, the idea of comparative research based on archival data was
promoted in conferences already some 40 years ago. The idea was that
surveys not originally designed for comparative purposes could be fixed
into comparative research designs a posteriori. The precondition for
this would be that corresponding questions in other surveys could be
identified, be it surveys earlier in time and/or surveys in other
communities or nations. In the first case this would allow for
comparative analysis over time, in the second for comparative analysis
across societies or nations. Comparability, however, is hard to
establish ex post. Therefore, the design of comparative surveys is
crucial for making empirical knowledge cumulative over space and time.
A
number of methodological and technical requirements have to be observed
and should be implemented rigorously. Just to mention the most
important: Some methodologists require that the questions should be
functionally equivalent, whereas others claim that the question texts
must be phrased identically. Frequently, it is not the linguistic
identity which matters. Sometimes it is much more important, whether the
questions are understood by the respondent in the same way. Thus, a
thermometer or scale used as a representation for intensity of attitudes
in the more developed societies may be replaced by a ladder in less
developed societies. Both, thermometer and ladder, would still measure
the same dimension in the conceptual world of the respective
respondents. A second requirement would be comparability of samples,
thus, a cross-national representative random sample would be hard to
compare with the local quota sample in one community in a different
nation. Several other factors have to be controlled as well, in
particular contextual influences at the time of field work or political
or environmental events, which are related to the topic of the research.
Prominent examples for an international effort to make individual
studies available for comparative research are the collections of
electoral data. Several projects in the archival world were coordinated
to collect the most important election studies of the past decades.
Zentralarchiv, for instance, has compiled a pool of prominent studies to
the national elections (Bundestagswahlen) since 1949 up to now. Similar
projects have been undertaken in other countries as well. The studies
from Germany were made available to other European archives and they
were incorporated into the holdings of the Inter-University Consortium
for Political and Social Research (ICPSR) archives in Ann Arbor, USA.
These data sets were intensively used, e.g. in one year more than 700
data sets from the election studies were distributed all over the United
States. Other examples for studies from different fields could be quoted
as well. This shows that the potentials of secondary analysis are not
only available in principle, but that they are actually being used on a
large scale. The archival networks are contributing to make national
data resources available internationally. In this way, they are enabling
the international science community to share available data and to
contribute to the accumulation of knowledge by contrasting data from
different sources.
Equally
important are longitudinal studies which can be compiled ex post. In a
research project on ''Attitudes towards Technology" it is of crucial
importance to include data collected in the fifties and sixties in
order to answer the research question whether potential threats from
new technologies have decreased the level of technology acceptance or
whether tendencies to reject new developments concentrate on particular
technologies only, and if so, under what circumstances.
Now
imagine that we could get hold of a good collection of surveys taken in
earlier years; detailed studies about changes going on in this phase and
hopefully additional studies in the years to come. Analyzing this data
base over time could give us a good picture of what changes actually
have taken place in the orientation of the population and of the extent
to which new technical concepts did have an impact on subgroups of the
population.
Furthermore, data archives can help to prepare studies on change over
time by monitoring what questions have been asked in earlier years and
alerting principal investigators to important questions which should be
repeated in planned research projects. Actually, data archives should
consider including funds in their budgets which allow them to collect
data for relevant questions in order to avoid interruptions in important
time series.
2. The
potential of secondary analysis for teaching
Normally, the lecturer or the individual student cannot afford to pay a
large amount of money for collecting the data to test their ideas. Here,
data archives can provide real data for training purposes. In many
cases, subsets of fully fledged surveys will be sufficient for gaining
hands on experience. Training seminars like the Summer Schools of the
Inter-University Consortium for Political and Social Research (ICPSR)
or of the European Consortium for Political Research (ECPR) or the
Zentralarchiv Spring Seminars employ specially prepared data sets for
the practical work complementing the lectures and theoretical
introductions to data analysis. This concept is learning by doing.
A particularly valuable approach to teaching social research has been
the replication of classical studies. In this way, students were
confronted with classical research by prominent scholars and they could
critically analyse the original data sets used by the principal
investigators. Certainly, it is not guaranteed that students arrive at
the same results as principal investigators did, but the assumption that
the results of the reanalysis might be correct in cases where they
differ from the original findings has some empirical grounding, too.
3. Adding to the
informational value of data
In
traditional approaches to data analysis the single survey has been
considered the natural born unit of data analysis. Frequently, this
orientation can impose serious limitations on secondary analysis. This
is in particular the case if the researcher is interested in the social
behaviour of subgroups of the population, which are not represented in a
large enough proportion in the respective survey to lean to further
statistical analysis. If, on the contrary, the individual interviews are
taken from the same population, the natural unit then under certain
constraints (functional equivalence of indicator, i.e. comparable or
identical question formulation, contextual effects etc.) can be
accumulated across various surveys. In this way a higher representation
of the respective groups can be achieved in the data base.
Likewise,
the informational value of the archived studies can be enriched by
combining survey data with aggregate data from other sources like the
statistical offices. The precondition here is that linkage variables,
i.e. same identifiers in all data sets to be linked, are available.
Multilevel analysis can be made feasible by additional archival
operations, provided that relevant data can be made available and that
these data sets can be merged without violating data protection of the
individual.
Even more important is the documentation of changes in the political
system or basic shifts in value orientations of societies. This
information is needed as context information to support correct
interpretation of historical data which are ambiguous when analysed
without considering contextual information.
4.
MetaDater: Concepts and tools for survey documentation
The
rapidly growing database for empirical social research requires data and
metadata management instruments that make the preparation of data files
for access and further analysis more efficient. To this end, the
MetaDater project has developed data models and standards for the
description of surveys in cooperation with the Data Documentation
Initiative (DDI). The Project-Study-Description (PSD) editor is designed
to document surveys as a whole, providing information about field work,
objectives of the study and topics covered, as well as information on
sample. The Question-Variable (QV) editor is developed to
create metadata on the variable level (for further information see
http://www.metadater.org/publications.htm).
|