Re: Draft Policy for Self-Archiving University Research Output from Stevan Harnad on 2003-04-10 (American-Scientist-Open-Access-Forum)

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Thu, 10 Apr 2003 13:13:20 +0100

I hope the following exchange will be helpful to those universities
that are currently drafting and implementing institutional/departmental
self-archiving policies: http://www.ecs.soton.ac.uk/~lac/archpol.html
It concerns the degree to which the metadata of deposits are checked
before they appear publicly in the eprint archive.

---------- Forwarded message ----------
On Thu, 10 Apr 2003, [identity deleted] wrote:

> Entering data on a particular day [does] not result in that data
> becoming immediately available. There appears to be a serious problem
> with the data processing.
>
> According to my "user area homepage", I have three items pending - one
> from 13th March and 2 from 28th March! Why is it necessary for 4 or more
> weeks to elapse before entries that I make can be added to the database?
> This *is* a *software* problem. If the reason is that I have not
> completed the entry properly, that is still a *software* problem because
> I don't know what I haven't done correctly - there are no error messages.

This is most definitely *not* a software problem but a human factor
problem! The delay in the appearance of your data is 100% a function of
the fact that the vetting of the deposits is not being done promptly --
by a designated human being.

I know this for a fact. I have been performing, myself, that vetting
function for CogPrints -- a public central archive rather than
a local departmentla/institutional archive -- for 6 years now, as
the designated vettor. As soon as a paper is deposited, it is in the
submission buffer. I, as vettor, can immediately review the metadata,
and then OK the deposit, within 1 minute, if I am at the helm. With an
average of 5 deposits per week, this has been no problem. (If the load
ever gets bigger, I can recruit additional designated vettors, but the
OAI and distributed institutional archiving have evolved since the
founding of CogPrints, and that is likely to distribute the load more
sensibly than central archiving, once self-archiving picks up
momentum, with each research self-archiving in his own departmental
archive.)

The (human) resources for either (1) prompt, careful, group-based vetting
of the metadata by designated vettors in each research group, or (2) no
vetting of the metadata and automatic acceptance of the deposits
*must* be part of any departmental self-archiving policy. Without it,
discouraging delays and misunderstandings of the kind you describe
are inevitable. But they have nothing whatsoever to do with either the
software or the principle (and benefits) of departmental self-archiving
of all refereed research output.

Just as the deposit of a single paper is only the matter of a few
keystrokes and a few minutes of time (meaning that the self-archiving of
*all* the research output [including the retrospective legacy output]
of even the most prolific of departmental researchers represents no more
than a few man-hours -- a tiny investment for a huge return, especially
with the help of the "cloning" feature that automatically repeats all
metadata that are common to all or many papers, making redundant re-entry
unnecessary), so the vetting of each single paper is a matter of still
fewer keystrokes and minutes of time. All that is needed is a designated
vettor available to reliably vet that day's deposits -- plus a one-time,
start-up corps of vettors who will process the legacy data.

The calculation of the number of man-hours required, both for any
department's legacy data and for the ongoing future daily research output
per group can easily be done, and it will be found to be ludicrously
small, especially for the size of the benefits it will confer on us all:
http://www.neci.nec.com/~lawrence/papers/online-nature01/

But that calculation must be done, as an essential part of any
departmental self-archiving policy. And a decision has to be made as
to whether the department or institution will (1) resource rigorous
vetting per group, or they prefer to (2) have deposits immediately appear
automatically.

(Option (2) is not a great risk, as the Eprints software itself makes
sure that certain obligatory fields are filled, the depositor himself
can review his own data, and if/when later metadata errors are discovered, the
depositor can correct them. The vetting capability we provided with the
Eprints software was originally modelled on that of the Physics ArXiv,
which receives 3500 deposits per month, from all over the world, in
one central archive in which no individual or institutional interests
are vested. But any local departmental archive -- once the legacy data
are in there -- will have monthly deposit frequencies equal to that
department's monthly output in research papers. I think one vettor per
research group could easily set aside the few minutes per day that it
would take to keep up with checking the metadata for his group's daily
deposits [option (1)], but if that resource is not available I suggest
having the deposit accepted automatically [option (2)] as a far preferable
(and not very risky) alternative to having it sit for a month in a
submission buffer with no designated vettor to check and accept it.)

To repeat, this is a departmental archive policy matter, not an archive
software matter. It is regrettable that in this case the practise seems
to have been allowed to precede thinking the policy through and choosing
between (1) or (2), thereby creating needless misunderstandings about
the software and the principle, but this can easily be remedied now,
and all researchers alerted. Such are the advantages of implementing
a research archive at departmental scale -- and of the small (indeed
trivial) nature of the policy problem in question.

Stevan Harnad
Received on Thu Apr 10 2003 - 13:13:20 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:46:57 GMT