Known-Item vs. Open-Ended Search, OA, OAI and Google

From: Stevan Harnad <harnad_at_ecs.soton.ac.uk>
Date: Fri, 27 Oct 2006 13:06:04 +0100

On Fri, 27 Oct 2006, Andy Powell wrote:

> I completely agree that this is an interesting development. Like you,
> it seems to me to raise some pretty fundamental questions about the way
> in which repositories are integrated into the fabric of the Web. I've
> tried to capture my thoughts (such as they are) here
>
> http://efoundations.typepad.com/efoundations/2006/10/pushing_an_open.html
>
> I share some of your conclusions but not all.

Andy's informal test is a good demonstration that google is *already* just about good
enough for "known-item searching" (i.e., where I know the reference I want, and am just
looking for an OA version of it on the web).

But the real question is: What proportion of a researcher's searching and search-needs
consist of known-item searching? What about open-ended searches on topics, keywords,
boolean text items, etc.?

One solution is obviously to use an existing index database that contains only the
metadata, but contains all and only your target literature (e.g., pubmed, web of
knowledge, scopus, etc.) and then do a patch that does a google-search on the retrieved
items (or the retrieved items you tag). That reduces an open-ended search to resulting
known-item searches.

But it does leave you dependent on non-OA databases, and we may not want to do that --
nor do we need to, since all the requisite information (and incomparably more, e.g.,
full text!) will be in the OA corpus. So there is still much to be said for
capitalising on that OA data-goldmine more fully, by using the OAI tags and sorting it
for separate harvesting, navigation, analysis and consumption.

Stevan Harnad

------------
> Andy
> --
> Head of Development, Eduserv Foundation
> http://www.eduserv.org.uk/foundation/
> http://efoundations.typepad.com/
> andy.powell_at_eduserv.org.uk
> +44 (0)1225 474319
>
> > -----Original Message-----
> > From: Repositories discussion list
> > [mailto:JISC-REPOSITORIES_at_JISCMAIL.AC.UK] On Behalf Of Leslie Carr
> > Sent: 27 October 2006 10:16
> > To: JISC-REPOSITORIES_at_JISCMAIL.AC.UK
> > Subject: Re: OpenDOAR Search
> >
> > On 26 Oct 2006, at 19:00, Hubbard Bill wrote:
> >
> > > Please find below an announcement from OpenDOAR for a
> > search facility
> > > based on OpenDOAR holdings.
> >
> > This is a very interesting service!
> >
> > There was a discussion on this list at the beginning of
> > August about "Search Engines for Repositories Only". There
> > were several attempts to define constrained searches using
> > RollYO or similar, but they all suffered from one defect or
> > another (too few sites, or logins required etc). The Google
> > Custom Search that OpenDOAR have set up seems much more
> > suitable to the repository community needs. Further, it would
> > seem to be fairly simple to set up Country-specific searches
> > (a la UKOLN's EPrints UK) by providing location-identifying
> > annotations for each repository.
> >
> > I have had a go with this, and created a ROAR-based
> > Repository Search Engine at
> > http://google.com/coop/cse?cx=009118135948994945300%
> > 3Agvogitng0da
> > You can search all the ROAR repositories for a keyword and
> > then Derek Law can click on 'Scottish Research' to reduce the
> > set of results to those coming from the Scottish repositories
> > (the "small and smart"
> > ones, according to his recent keynote at Open Scholarship :-)
> >
> > There is a serious point that this opens up: why would we
> > bother with OAI-based repositories, if you can do it all with
> > Google? The advantage that OAI provided us was "metatdata",
> > ie the possibility of providing more accurate resource
> > identification. The advantage of repositories were that they
> > provided an identifiable source of (well-
> > maintained) research material. Of course, the one can be
> > simulated by the other, and if Google could support a simple
> > quality control "refereed material" tag then we could get by
> > without OAI and without repositories.
> >
> > Well, it doesn't, and so OAI still seems our best hope.
> > However, even with five years of OAI our repositories are not
> > doing a very good job of sharing metadata that helps a
> > service to comprehend the status of the holdings that it
> > harvests (is this a published, refereed journal article or
> > equivalent? Is this a paper from an unrefereed workshop?
> > is this a chemical data file?) Too much is still down to
> > interpretation and subsequent data mining of the web pages.
> > The Eprints Application Profile (http://www.ukoln.ac.uk/repositories/
> > digirep/index/Eprints_Application_Profile) seems to be doing
> > a good job in achieving consensus in the use of Dublin Core,
> > but there is an urgent need for it to be implemented by all
> > repositories!
> >
> > We've spent a lot of time and effort on advocacy and policies
> > over the last couple of years, but I think it's time that we
> > went back to some of the technical fundamentals and made sure
> > that our information interoperability is up to scratch,
> > otherwise we'll find ourselves in a universe where the only
> > thing you can do is a keyword search!
> > --
> > Les
> > (just my opinion)
> >
>
Received on Fri Oct 27 2006 - 14:23:28 BST

This archive was generated by hypermail 2.3.0 : Fri Dec 10 2010 - 19:48:33 GMT