Retention of Raw Data: A Voice from Behind the Ocean

It will soon be nearly half a century since Leroy Wolins authorized one of his students to write to thirty-seven authors of research studies requesting them to submit theraw data that had been the basis of their findings. The student intended to make practical use of the data in a study of their own. Contrary to expectations, only nine researchers, (i.e. 24% of the initial group) appeared willing to make their data available. Wolins, surprised by the scientists’ responses, recounted the experience in the American Psychologist, thus triggering a heated scientific debate over the availability of raw data. This debate has been dragging on for nearly half a century now.raw_data-1s_and_0s

In 1973, Craig and Reese decided to replicate Wolins’s inadvertent research. Their study was thoroughly planned and was designed to check whether there had been any improvement in the availability of raw data since Wolin’s attempt nearly a decade earlier.They obtained better results with a positive response rate as high as 38% of the authors contacted making their data available. Even more optimistic were results obtained by Eaton in 1984. The success rate of 73.5%, (25 of 34) was much better than Wolins’s and Craig’s & Reese’s. However, in 2006, Wicherts, Borsboom, Kats and Molenaar obtained almost as negative results as Wolins had in 1962. In an attempt to reanalyze data sets to assess the robustness of the research findings to outliers, they contacted 141 authors of 249 studies. The result was 38 positive reactions and the actual data sets from 64 studies. This figure represents 25.7% of the total number of 249 data sets; 73% of the authors did not share their data.

While writing a book on abuses of the scientific practice and research code of ethics, I reported on numerous problems related to the retention and retrieval of raw data. Inspired by examples of Wolins and other researchers, I decided to repeat their studies on my own. To this end, I randomly selected 50 empirical studies out of the PsycINFO database, all of which had been published at least a year before. Using the e-mails addresses next to the articles, I contacted the authors, requesting them to share the raw data from their studies. I explained that I was conducting similar research and that my intention was to compare outcomes. The random sample included articles written by American and European authors, as well as researchers from Israel and Australia; they all represented a vast diversity of psychological fields of study and a variety of scientific journals in which their articles had been published. My only criterion for inclusion in the sample was that a particular work had to be representative of an empirical study based on quantitative data. I excluded case studies, literature reviews and meta-analyses.

Soon after I had sent out my 50 requests, I received 27 replies which accounted for 54% of the entire sample. None of my requests had been returned due to technical problems or mail server malfunctions, and I therefore assumed that they had reached all the addresses, especially since they had been contacted just a few months after the articles had gone to press; i.e. the email addresses must have been valid. Out of all the replies, 7 had attached files with the requested raw data that was sorted and described in a manner consistent with the respective articles. A further 7 responses I labeled as “willing to cooperate”. Some of these researchers asked for more information about the variables I wished to reanalyze, and others pointed out that the results were written in, for instance, Hebrew or Flemish and that it would take quite some time for the results to be translated into English. Yet others inquired as to the nature of my alleged research, my job position, etc. I found such replies to be a valid response to my request for information, as these authors clearly had the right to inquire about the reason for and the nature of my study. I assumed that those researchers would be quite willing to share their data upon providing the requested information in detail. Therefore, my success rate was 30% of the total sample, thus approximating that of Wolins in 1962, and subsequently of Wicherts et al. in 2006.

As for the remaining 4 authors I had contacted, further measures were required. They all indicated that a third party was the actual data holder so I resent my requests to these third parties with the following results: the total number of data sets I obtained rose to 9, those categorized as “willing to cooperate” increased by 1, and my request was denied by 1 of the third parties contacted.

In total, there were only 4 explicitly negative responses. Two explained that the data werestill in the process of analysis. I felt that the odds were against my receiving the requested data any time in the near future. The third response invoked prohibitive regulations of the institution acting as a sponsor of the entire research program that prevented any disclosure of data to third parties. The fourth negative response cited ethical principles applicable to all scientists that prohibited this type of information gathering.

I labeled 5 other responses as “ambiguous or evasive”. These authors insisted on the necessity of contacting fellow researchers involved and obtaining their permission, which might, as they suggested, be extremely difficult, if not impossible. Among these were also replies stating that the present condition or format of the existing data sets precluded any further analysis.

The table below shows all the steps taken in the course of my study and their respective outcomes.


Action Number %
Requests for data sent 50 100
Returns 0 0
Responses received 27 54
Unanswered requests 23 46
Data sets received 7 14
Willing to cooperate on certain conditions 8 16
Responses requiring further contact 4 8
Ambiguous or evasive responses 5 10
Explicitly negative responses 3 6
Data received as a result of further contact 2 4
Willing to cooperate following further contact 1 2
Explicitly negative reply following further contact 1 2
Total positive responses 18 36
Total ambiguous responses 5 10
Total negative responses 4 8

To complete this procedure, I disclosed my true intention by sending debriefing information to all the authors contacted. I also explained that there was absolutely no need for them to process the data in preparation for reanalysis. Furthermore, I assured them of my willingness to share my results after the study had been completed.

In addition to the findings reported by Wollins in 1962, Craig & Reese in 1973, Eaton in 1984, and Witcherts et al. in 2006, my own observations seem to indicate a mood of cautious optimism. On the one hand, of all the previous attempts made, mine had the lowest general number of responses, merely half of those contacted. On the other hand, the study came out ahead of all the previous attempts in terms of the percentage of those willing to cooperate; more than half of those contacted agreed to provide the requested data. I was impressed by how quickly some of the authors I had contacted offered their data; sometimes as quickly as two or three days after receiving my request. I was also impressed by their eagerness in offering practically unconditional assistance with my research.

One could conceivably ascribe the relatively low number of responses to the plethora of information being sent out over the Internet. Every day mailboxes are loaded with messages; some of the contacted scientists may well have relegated my request to the trash in an attempt to avoid unnecessary communication. It is also possible that they may have regarded my request as spam, despite all my efforts for it not to be perceived as such. Lastly, I believe that scientists might be more inclined to respond to a request from a scientist in their own country of origin than from an unknown scientist in Eastern Europe.

And yet, despite these theories, in the Wollins’s sample, among the respondents contacted by Craig and Reese, and in my own research, there were those who deliberately chose not to disclose their data. How many felt the data would not stand up to scientific scrutiny due to internal flaws? Based on the body of data obtained in this manner, it is a question that will probably remain unanswered. It is for this reason that we need to seek solutions that can effectively deal with such uncertainty and speculation.

A number of recommendations have already been put on the table in this respect. A possible solution was proposed by Johnson in 1964. Technical opportunities available today enable these solutions to be almost effortlessly put into practice. Witchers et al. (2006) discussed them as well. Without such solutions being put into practice, progress in the field of scientific research integration may be seriously hindered, affecting, for example, the replication of major scientific research, meta-analyses, etc. And this would, beyond a doubt, contribute to strengthening psychology’s most significant weaknesses – the fragmentation of knowledge and the failure to synthesize research output of generations of scientists.


