Home

Conceptual Data Model

RCNS Data Model
Repeated cross-national study

Q/V Data model
Typology on
question levels & data types

Software Modules

Presentations

Publications

Data Archiving & Uses
of Secondary Analysis

Consortium

 
 

Data Archiving and the Uses of Secondary Analysis

Ekkehard Mochmann
Central Archives for Empirical Social Research, University of Cologne (GESIS-ZA)

 
 
Contents:
1. “Secondary Analysis” - Using old data to test new ideas
2. The potential of secondary analysis for teaching
3. Adding to the informational value of data
4. MetaDater: Concepts and tools for survey documentation
 

1. “Secondary Analysis” - Using old data to test new ideas

Using old data for new ideas would be the colloquial description of what has been defined in technical terms as secondary analysis. The situation is best characterised by the fact that the researcher going to analyse the data is not familiar with (all) the phases of data collection for the respective data set. 

To illustrate this idea by an example, think about the question: "How often do you talk to your neighbour about politics?” In the concept of the principal investigator, this question can be an indicator for political interest. Collecting the answers face to face to that individual question in a cross-national representative sample survey will cost between one and four thousand US $. The costs for the complete survey, which would be addressed to some two or three thousand people, covering about one hundred questions altogether, would cost between 120,000 US $ and 400,000 US $. This provides a tremendous information value. Now, think about chances to reactivate this information for other purposes. This indicator for political interest quoted above could be used for a completely different purpose. If another researcher is interested in the concept of integration into neighbourhood, he may be very happy to come across this question which has been originally asked to measure political interest. It would also fit into this new frame of reference measuring neighbourhood integration. Of course, a well trained researcher will not rely on just this one indicator, but certainly it would contribute to solve his data needs without requiring new field research.

This simple case shows that it is not only possible to reanalyze old data under the same hypothesis as the principal investigator. It also illustrates how old data may be used to answer new research questions and test new frames of references. This adds enormously to the informational value of original data, in particular if it is well documented and prepared for further analysis.

For what different purposes can data from archives be used? The first and simplest case would be for descriptive purposes. In our example, we would not be looking any more at the data under the concept of political behaviour, but rather under the concept of communication behaviour in neighbourhoods and reinterpret the question originally intended to measure political interest under our new concept of neighbourhood integration.

A particular contribution of the data archives can be made to comparative research, both, across nations and over time. In the early years of data archives, when secondary analysis was not yet a popular research strategy, the idea of comparative research based on archival data was promoted in conferences already some 40 years ago. The idea was that surveys not originally designed for comparative purposes could be fixed into comparative research designs a posteriori. The precondition for this would be that corresponding questions in other surveys could be identified, be it surveys earlier in time and/or surveys in other communities or nations. In the first case this would allow for comparative analysis over time, in the second for comparative analysis across societies or nations. Comparability, however, is hard to establish ex post. Therefore, the design of comparative surveys is crucial for making empirical knowledge cumulative over space and time.

A number of methodological and technical requirements have to be observed and should be implemented rigorously. Just to mention the most important: Some methodologists require that the questions should be functionally equivalent, whereas others claim that the question texts must be phrased identically. Frequently, it is not the linguistic identity which matters. Sometimes it is much more important, whether the questions are understood by the respondent in the same way. Thus, a thermometer or scale used as a representation for intensity of attitudes in the more developed societies may be replaced by a ladder in less developed societies. Both, thermometer and ladder, would still measure the same dimension in the conceptual world of the respective respondents. A second requirement would be comparability of samples, thus, a cross­-national representative random sample would be hard to compare with the local quota sample in one community in a different nation. Several other factors have to be controlled as well, in particular contextual influences at the time of field work or political or environmental events, which are related to the topic of the research.

Prominent examples for an international effort to make individual studies available for comparative research are the collections of electoral data. Several projects in the archival world were coordinated to collect the most important election studies of the past decades. Zentralarchiv, for instance, has compiled a pool of prominent studies to the national elections (Bundestagswahlen) since 1949 up to now. Similar projects have been undertaken in other countries as well. The studies from Germany were made available to other European archives and they were incorporated into the holdings of the Inter-University Consortium for Political and Social Research (ICPSR) archives in Ann Arbor, USA. These data sets were intensively used, e.g. in one year more than 700 data sets from the election studies were distributed all over the United States. Other examples for studies from different fields could be quoted as well. This shows that the potentials of secondary analysis are not only available in principle, but that they are actually being used on a large scale. The archival networks are contributing to make national data resources available internationally. In this way, they are enabling the international science community to share available data and to contribute to the accumulation of knowledge by contrasting data from different sources.

Equally important are longitudinal studies which can be compiled ex post. In a research project on ''Attitudes towards Technology" it is of crucial importance to include data collected in the fifties and sixties ­in order to answer the research question­ whether potential threats from new technologies have decreased the level of technology acceptance or whether tendencies to reject new developments concentrate on particular technologies only, and if so, under what circumstances.

Now imagine that we could get hold of a good collection of surveys taken in earlier years; detailed studies about changes going on in this phase and hopefully additional studies in the years to come. Analyzing this data base over time could give us a good picture of what changes actually have taken place in the orientation of the population and of the extent to which new technical concepts did have an impact on subgroups of the population.

Furthermore, data archives can help to prepare studies on change over time by monitoring what questions have been asked in earlier years and alerting principal investigators to important questions which should be repeated in planned research projects. Actually, data archives should consider including funds in their budgets which allow them to collect data for relevant questions in order to avoid interruptions in important time series.

 

2. The potential of secondary analysis for teaching

Normally, the lecturer or the individual student cannot afford to pay a large amount of money for collecting the data to test their ideas. Here, data archives can provide real data for training purposes. In many cases, subsets of fully fledged surveys will be sufficient for gaining hands on experience. Training seminars like the Summer Schools of the Inter­-University Consortium for Political and Social Research (ICPSR) or of the European Consortium for Political Research (ECPR) or the Zentralarchiv Spring Seminars employ specially prepared data sets for the practical work complementing the lectures and theoretical introductions to data analysis. This concept is learning by doing.

A particularly valuable approach to teaching social research has been the replication of classical studies. In this way, students were confronted with classical research by prominent scholars and they could critically analyse the original data sets used by the principal investigators. Certainly, it is not guaranteed that students arrive at the same results as principal investigators did, but the assumption that the results of the reanalysis might be correct in cases where they differ from the original findings has some empirical grounding, too.

 

3. Adding to the informational value of data

In traditional approaches to data analysis the single survey has been considered the natural born unit of data analysis. Frequently, this orientation can impose serious limitations on secondary analysis. This is in particular the case if the researcher is interested in the social behaviour of subgroups of the population, which are not represented in a large enough proportion in the respective survey to lean to further statistical analysis. If, on the contrary, the individual interviews are taken from the same population, the natural unit then under certain constraints (functional equivalence of indicator, i.e. comparable or identical question formulation, contextual effects etc.) can be accumulated across various surveys. In this way a higher representation of the respective groups can be achieved in the data base.

Likewise, the informational value of the archived studies can be enriched by combining survey data with aggregate data from other sources like the statistical offices. The precondition here is that linkage variables, i.e. same identifiers in all data sets to be linked, are available. Multilevel analysis can be made feasible by additional archival operations, provided that relevant data can be made available and that these data sets can be merged without violating data protection of the individual.

Even more important is the documentation of changes in the political system or basic shifts in value orientations of societies. This information is needed as context information to support correct interpretation of historical data which are ambiguous when analysed without considering contextual information.

 

4. MetaDater: Concepts and tools for survey documentation

The rapidly growing database for empirical social research requires data and metadata management instruments that make the preparation of data files for access and further analysis more efficient. To this end, the MetaDater project has developed data models and standards for the description of surveys in cooperation with the Data Documentation Initiative (DDI). The Project-Study-Description (PSD) editor is designed to document surveys as a whole, providing information about field work, objectives of the study and topics covered, as well as information on sample. The Question-Variable (QV) editor is developed to create metadata on the variable level (for further information see http://www.metadater.org/publications.htm).