First Author*


Search WWW

Search www.firstauthor.org

 

On this page First Author* explains and reviews new software, Internet tools, and databases designed to facilitate data retrieval and organisation. Follow the links on the right hand side for articles, which can be read on this page or downloaded free of charge by clicking on the PDF icons next to each title. For updates on new web tools and technology by month, see Deba tes.

*Virtual Public Networks PDF_icon

First Author talks to Nature Publishing Group’s Timo Hannay about the expansion of Nature's networking sites and Nature’s adventures in the virtual world of Second Life.

Social networking

The idea of networking is nothing new: love it or hate it, making contacts has always been a vital part of business, academia, and, of course, social life. However, transferred to the web, social networking is beginning to revolutionize the workings of the Internet itself, with ‘friendship’ links between pages and sites changing the way information is processed and the conduct of business and research. Although online networking sites did exist previously, it was in the early years of this century, added by new web tools like wikis, blogging and tagging, that social sites like Friendster, later MySpace and most recently, YouTube and Bebo, took off as a way to contact friends, meet their friends, share photos, music or video and sent reminders of birthdays and events. The music world was forced to take MySpace seriously when bands like the Artic Monkeys reached the pop charts without having signed to a record label or released a single in the conventional way. Communications companies are now vying to strike deals with these services. For example, Skype recently agreed to provide calling services from within Bebo.

Social networking sites have now begun to be used to extend traditional media: an example is the Sun Online’s attempt to construct its own networking site. Businesses have also seen their potential: sites such as LinkedIn and Doostang collect details of users’ employment sector and educational history and use these data to establish targeted business networks for recruitment and deals. The way the site works, through introductions by colleagues or former classmates, is intended to build the trust of users in the contacts they make online.

Academics have also begun to realize the potential of social networking. Like their business counterparts, sites like Academi ask members to enter details of their education, experience and interests. Users can then form groups and build discussions based on these interests. This has led to the formation of a wide range of mini networks involving academics and those in related professions from all over the world. Topics range from obscure, scholarly discussions to practical advice for PhD students to communications between publishers and authors.

Nature’s Networks

Nature Publishing Group has taken on the concepts of social networking with Nature Network Boston (NNB), launched in summer 2006 and soon to receive an upgrade adding discussion and message boards to the existing infrastructure for the creation of groups. NNB is also due to be joined by a sister site, Nature Network London, enabling a unique combination of local and international networking depending on context within the same network. First Author talked to Nature’s Director of Web Publishing, Timo Hannay, about the new directions for the site.

FA: I noticed a comment from the editor of Nature Network Boston saying that the service is about to be re-launched as a global networking service, Nature Network. How will this operate? Do you see it as a network of local sites or a networking service without geographical boundaries?

TH: Both! It depends on what you are looking for. There will be a London-based site launching during 2007. We are working to build a more generic application that will make it easier to roll out new sites for different locations. The upgrades to NNB that are coming out in February next year will include messaging and discussion services and these will allow researchers to form groups that can be either local, for example when discussing an event, or global, such as a debate on the avian flu pandemic.

FA: Who are the main users of NNB so far? Are they drawn more from the academic or commercial sectors?

TH: My impression is that our users tend to be academics. Like other social networking sites, NNB has attracted a younger audience than traditional journals.

FA: The new-look Network Boston will have a messaging service and discussion boards. How will this change the service?

TH: The social networking tools on NNB are a first stage at the moment. You can form a group, but the new services will allow these groups to operate as forums for the exchange of ideas.

FA: Do think scientists will be willing to share ideas freely in a networking space like NNB or are researchers usually too protective of their data to want to share in such an unrestricted environment?

TH: NNB allows three levels of group formation. The first is an open, public group. It has an administrator, but there is no system of approval, anyone can join. The second is called a formal group and, although the content is open, participation is at the discretion of the administrator. Finally, a private group is completely closed, with those not invited to participate not even being aware of the existence of the group. This can be used by labs who want to have an internal discussion about their work that is not ready to be made public.

Second Life is Second Nature

The success of many networking sites has been aided by the ability to add multimedia content. A genre of social networking site that takes this further involves the creation of an entirely new environment, build by members of the network, in which they then interact. The best known example of this is Second Life (SL), a 3D environment created by members using a simple scripting language and inhabited by their ‘avatars’, computer-generated animations that move around the world, create buildings and other content, and interact with other users. Famous for having a GDP and a carbon count higher than some real countries, the virtual world created by Linden Labs recently counted over two million users. Many are beginning to agree with Linden Labs’ community and education manager John Lester that ‘Second Life is no more a game than the Web is a game. It's a platform’ (1). Understandably, businesses have regarded their potential avatar customers with interest. IBM recently announced plans to populate twelve islands to be used for conferences, training and commerce (2), while Reuters pledged to regularly cover developments in-world through their ‘atrium’ (3). Other initiatives that hope to bridge the divide between real and virtual worlds include Sim Teach’s Second Life Education, whose wiki includes discussion of issues from how to host a conference in the real world and second life simultaneously to the ethics of conducting social research in-world. Meanwhile some charities have managed to raise over 40,000 (real) dollars from virtual campaigns (1). Further, in a reversal of the traditional concerns over the detrimental effects of computer games on the development of social skills in children, there are already reports of therapists using SL to teach communications skills to autistic children (4).

The rather extreme example of a psychiatry professor who inflicted hallucinations on the avatars of his students in order to improve their understanding of schizophrenia reveals how much visualization can aid comprehension (4). This is also true in other scientific disciplines: for example, in the field of structural biology to design appropriate experiments to test interaction between molecules the use of 3D models to provide details about binding is crucial (6). Visualisation becomes increasingly important, and simultaneously more difficult to achieve using conventional methods, when examining interactions in large complexs. Electron tomography allows the creation of three-dimensional models, which be invaluable for cases in which the entire structure of an interaction network is not known but can be homology modelled on the basis of structures determined in related species (6). To allow these models to become more accurate repsresentations of the physical world involves the addition of data such as expression patterns. Placing such models online, therefore, has the added advantage of allowing constant communication with datasets containing such information.

NGP has been swift to realise the potential of virtual worlds both for interaction and visualisation: Nascent recently announced the creation of a Second Life island, Second Nature. I sent my avatar to visit. Second Nature was newly created and a fairly barren place on my first visit but it was already beginning to be inhabited by some interesting little gadgets. One of these was the Magical Molecular Model Maker (M4) created by SL experts the Electric Sheep Company, the development of which was overseen by Joanna ‘Wombat’ (in first life, Joanna Scott, a member of the NPG team). Resembling its familiar school bench predecessor, it works by creating 3D models which can be requested by simply by entering the name of the desired molecule.

On my second visit, further fascinating creations had begun to converge on the island: a miniaturized version of the real world had appeared in the shape of a globe, near to a strangely flat, though simultaneously 3D, model of the universe. Both maps enable the user to point to a part of the sky map or to direct a ‘virtual telescope’ at the SL sky and zoom in to view that section of the world or universe in detail. Floating overhead were a collection of objects, notably including an Escheresque knot, created by avatar Matt Basiat (programmer Matt Biddulph), which constantly formed and reformed its tubular limbs to create ‘ideal knots’; curves which maximise the scale invariant ratio of thickness to length. I returned to Timo to learn more about what was behind these creations and the aims for Nature’s virtual space.

FA: Are you expecting to attract different audiences for Second Nature and Nature Network Boston?

TH: Although the two services are similar in that both are social tools that improve the more people use them, Second Nature is much more experimental than NNB. I’m convinced that someone will make social networking work for science, simply because science is all about collaboration; this is shown by the way papers are coauthored and team work on research and development. So Second Life is a way of ‘playing’ with some of the possibilities offered by the format, particularly with visualization and collaboration.

One example of the use of visualization of data retrieved remotely is the Magic Molecule Model Maker (M4). While this was initially based on hard-coded atom coordinate data, the latest version under development retrieves structural data remotely from the PubChem database. The universe models similarly make use of astronomical survey data from the Sloan Digital Sky Survey. The built-in scripting tools of SL (Linden Script Language) meant that this sort of thing is far easier to display than it would be if we were to build it as a web application. The ideal knots, which have wide relevance in disciplines as diverse as DNA biology and quantum physics, are an example of the use of Virtual Reality Modeling Language (VRML) data sets to create SL objects. Another use of VRML that we are developing currently utilises electron tomography data sets of bacterial cells from EMBO data (6).

In terms of collaboration, we have already held some team meetings on Second Life and we are constructing some venues on the island to house meetings as well as an induction area to explain the concept of Second Nature to new visitors. There are certainly instances when it is useful to be able to demonstrate a concept visually during an online meeting, and we have already had a request to host a lecture for PhD students on the island.

FA: So it sounds like Nature is planning to explore various different types of social interaction online. You did a feature about Perplex city on Nascent recently. Is this type of online game another possible forum for Nature?

Perplex city is an innovative alternate reality game by MindCandy. Some of their teams, who happen to have scientific backgrounds, came to Nature recently to tell us about their work. Nature has never got directly involved in education, although of course NPG has produced textbooks, but something like Perplex city could be very interesting from that point of view. Already several of the questions are quite related to science or maths as well as popular culture, so it could prove a useful educational tool and I think that if Nature did get involved in education, an imaginative online game would be a great way to do so.

Conclusions

So will social networking become an established tool for academic communication? Networking was always an uncertain venture, fraught with the perils of calling someone the wrong name or being ditched for someone more important. Naturally, there are also drawbacks, albeit different ones, to exchanging in a network with a collection of strangers represented only through a username or avatar. Second Life has already faced a host of legal problems, including law suits over the sale of virtual property. For any significant advance in scientific collaboration in the context of social networks, there will have to be clear means of identification and protection of the new types of intellectual property that are emerging.

There is also the question of the longevity of networking sites: MySpace, desperately cool six months ago, is already losing out to Bebo. The retention rate of users in Second World is also surprisingly low, at around 10% (5). This acts as a disincentive to invest time creating content and build up webs of connections for a brief period of pay-back before having to repeat the process on a new platform. Nevertheless, it is only through participation that social networks can evolve to suit the changing demands and requirements of real life research. The graphic representation of a social network shown above reveals the power of multiple overlapping connections: such a web grows stronger the larger it gets, allowing all users more opportunities for networking, whether to find an answer to a particular question, exchange content, or find the perfect partner for specialist collaboration.

The examples of in-world collaborative creation in Second Life are beginning to suggest that the participation of scientists in such networks could go beyond traditional networking and discussion, towards the provision of new platforms that transform the way research is conducted and communicated. Perhaps most important, howver, is the potential of virtual world for visualisation based on ongoing communication with datasets. This capacity of online networks may be key to the comprehension of biological networks and chemical structures. It is hard to dismiss the view of Jaron Lanier, a veteran of virtual-reality experiments, that Second Life ‘unquestionably has the potential to improve life outside’ (4).

References

(1) ‘Leading a double life’. The Boston Globe. 25 October 2006 http://www.boston.com/news/globe/living/articles/2006/10/25/leading_a_double_life/?page=2 (accessed 19/12/06).
(2) Shankland, Stephen. ‘IBM to give birth to 'Second Life' business group’. CNET News.com. 12 December 2006 URL: http://news.com.com/IBM+to+give+birth+to+Second+Life+business+group/2100-1014_3-6143175.html (accessed 19/12/06).
(3) Reuters Second Life News Centre. URL: http://secondlife.reuters.com/.
(4) 'Virtual Online Worlds: Living a Second Life’ The Economist. 28 September 2006 URL: http://www.economist.com/displaystory.cfm?story_id=7963538 (accessed 19/12/06).
(5) Wagner, James Au. ‘Second Life: Hype vs. Anti-Hype vs. Anti-Anti-Hype’. GigaGames. 18 December 2006. URL: http://gigagamez.com/2006/12/18/second-life-hype-vs-anti-hype-vs-anti-anti-hype/ (accessed 19/12/06).
(6) Aloy, Patrick and Russell, Robert B. Structural systems biology: modelling protein interactions. Nature Reviews Molecular Cell Biology. Vol. 7, March 2006.

(top)

*BMC branches out: an interview with Matthew Cockerill and Chirs Leonard PDF_icon

BioMed Central (BMC) is one of the longest running and most successful open access publisher. As well as publishing 170 journals across the life sciences, they offer a range of services to institutions or organizations wishing to set up their own journal or institutional repository. Many BMC journals were pioneering in implementing systems of open peer review, in which the signed review is offered online alongside the article, and in offering authors the chance to retain copyright over their work and distribute it under Creative Commons licenses as well as having it automatically archived in PubMed. BioMed Central has also been swift to adopt new methods of content delivery, offering customization opinions, RSS feeds of journal content, and full-text delivery to mobile devices via AvantGo.

Recent years have seen the open access model enter the mainstream. Huge numbers of open access journals have been launched, including the high-impact publications of the Public Library of Science (PLoS), and academic institutions across the world have set up their own institutional repositories. Major academic bodies, including the US National Institutes of Health (NIH), the Wellcome Trust and several of the UK’s Research Councils requiring authors whose research they have funded to deposit their papers in an open access repository shortly after publication. Major publishers, including Nature and Oxford University Press have also begun to take note of the growing trend towards open access and some of their journals now offer an option to pay to have work freely available in PubMed immediately. Governments and publishers are coming under increased pressure from organizations such as SPARK and SciDevNet to enforce open access policies.

BMC has also continued to expand its range of services, most notably by launching two new portals that apply the BioMed Central model to the physical and chemical sciences. Chemistry Central was launched in October this year and hosts its own generalist Chemistry Central Journal as well as seven other open access publications. It features some interesting articles about the unique impact of open access in chemistry research, including a piece entitled Open Source Research, the Power of Us, which discusses the application of open source collaboration to specific biochemical problems. Its eagerly awaited sister site, PhysMath Central, is currently in discussions with the academic community in these areas to develop new open access journals. Nevertheless, open access continues to face strong opposition from traditional publishers and others who argue that quality journals cannot be produced using the open access model without relying on philanthropy.

So, how has this revolution in open access changed the practices of its prophets, what are their predictions for the future, and what will be unique about the new services for the physical and chemical sciences amid their host of rivals? First Author spoke to BMC’s publisher, Matthew Cockerill and to the head of the new PhysMath Central service, Chris Leonard, to find out.

FA: A recent press release described BioMed Central as leading the way in open access: could you discuss how you view the future of the open access movement? How do you see BioMed Central changing as it expands its remit?

MC: Open access publishing took off first in the biomedical sciences. This may partly be because open resources like PubMed and GenBank alerted biomedical researchers to the benefits of open access. Other fields have different starting points - for example, many areas of physics have close-knit communities within which preprints play a key role in scientific communication. The most critical resources in Chemistry are accessible only to subscribers. But there has been increasing recognition that the benefits of open access for the publication of original research apply in all fields, although the most appropriate funding model may depend on the field.

For example, rather than leaving it to individual authors to find funds to pay publication charges, CERN is working to create a consortium of research labs that will collectively fund open access publication for all particle physics papers.
http://public.web.cern.ch/press/PressReleases/Releases2006/PR16.06E.html

As open access publishing continues to grow in scale, both within biomedicine and in other fields, we imagine it is likely that such models will proliferate. Open access can be funded in many ways, all of which are compatible with the underlying goals of open access as long as they do not depend on restricting access.

FA: As open access goes mainstream will the ‘author pays’ model be sufficient to cover the costs of high quality publishing, especially given exceptions for authors in less wealthy countries and/ or institutions?

MC: In general, the cost of open access publication (around $1500 in a typical BioMed Central journal) is very reasonable compared to the amount that libraries spend on journal subscriptions. For example, OUP quoted figures recently that for every article it published in Nucleic Acids Research in 2003, it received $4224 of subscription revenue. (http://www.oxfordjournals.org/news/Presentation%20slides)

Such comparisons suggest that the costs charged by open access publishers such as BioMed Central are very reasonable. They are also realistic and sustainable. We have worked hard to develop efficient online systems for running online journals and managing peer review, and as a result we can offer a high quality service at a very competitive price. We expect to break even within the next 12 months, and overall we believe that the efficiencies introduced by open access publishers such as BioMed Central have the potential to save the research community significant sums of money that are currently spent on over-priced subscription journals. Reports from both the House of Commons Science &Technology Committee and the European Commission have expressed the concern that the scientific journals system does not appear to function as an effective market. Open access journals, with publication fees, can improve transparency and competitiveness in this area. http://www.publications.parliament.uk/pa/cm200304/cmselect/cmsctech/399/39902.htm
http://europa.eu.int/rapid/pressReleasesAction.do?reference=IP/06/414


In terms of low-income countries - just as many publishers currently subsidize access for those in low-income countries, so it is equally possible for open access publishers to subsidize publication for authors in those countries. BioMed Central routinely provides waivers for authors in low-income countries, and this has not proved to be an obstacle to creating a sustainable business model.

It is also important to recognize that whereas authors in low-income countries previously had to get their work published in rich-country journals for that work to be read and cited, open access means that it is now feasible for local journals to achieve wider readership and impact. Brazil and India are both leading the way in this area, with many local journals now operating on an open access model (typically with central institutional support rather than depending on author-fees), and achieving improved Impact Factors as a result.

See for example:

Salvador Declaration on Open Access
http://www.icml9.org/channel.php?lang=en&channel=91&content=439
Bangalore Declaration on Open Access
https://mx2.arl.org/Lists/SPARC-OAForum/Message/3479.html

FA: How would you respond to Nature's recent claims that PloS has proved the model to be unsustainable without philanthropy? Do you have an opinion on hybrid models (in which the author can choose to pay to have their article freely available immediately) such as that piloted by the Royal Society?

MC: Starting new journals and making them profitable is hard work - that's true for subscription journals as much as for open access journals: subscription journal also tend to take several years before they break even.

PLoS's approach initial approach was to start high-end journals which publish relatively few articles, and are expensive to run, but it is now broadening its remit with PLoS One, which should improve its finances.

BioMed Central has taken the broader approach right from the start. We have some journals which are highly editorially selective (Genome Biology and Journal of Biology, for example), but we have other titles such as the BMC-series which aim to publish all sound research, while highlighting the best. This has allowed us to create a business model which offers authors low publication charges, while also allowing us a realistic prospect of making a profit.

FA: Your websites, like those of some other science publishers, have an increasingly ‘portal’ feel - a focus on thematic rather than volume/issue-based delivery. There is also a growing movement towards legislating to require authors to deposit their work in open access journals shortly after publication. Will these tendencies eventually break down the distinction between journals and subject-based repositories?

MC: Journals are not just distributors and organizers of content. They are trusted brands that convey kudos and authority. That role is independent of the medium. Journals can convey this 'badge of quality' in print, or online, or via an RSS feed, or through a podcast. As the web evolves, there are certainly going to continue to be changes in how journals operate. Integration of published articles with datasets is likely to become more important, as are computer-readable semantic representations of the content of articles. For example, BioMed Central is developing the Journal of Medical Case Reports – at one level, this will simply consist of short case report articles, published in high volume through a streamlined publication system. But more importantly, these case reports will be integrated in into a database that can be searched in a structured way in order to identify, for example, all case reports involving teenage patients taking antidepressant drugs in North America.

First Author spoke next to the head of the forthcoming PhysMath Central, Chris Leonard, about how the new service will meet the unique needs of the maths and physics community and learn from the experience of a technologically adept target audience.

FA: How do you think the research needs and/ or interests of the Physics and Maths differ from those of biomedical researchers? How will you service cater for these needs?

CL: In most respects they are very similar, but physicists (and latterly mathematicians) were very prescient in seeing the benefits that the Internet offers in terms of dissemination of research material, which is no real surprise given the origins of the Web. However, what is missing from arXiv.org is the validation and quality branding that a rigourous peer review process brings. This is why arXiv and traditional journals enjoyed a symbiotic relationship for many years. What we are hearing now from scientists is that once this peer-review process has taken place, they want those results available for free to everyone and not 'locked-up' in subscription journals. This is where open access comes in. With a history of supporting OA for many years in the biosciences, BioMed Central was well placed to expand its reach into the physical sciences.

Physicists and mathematicians do have their own habits which differ to the biosciences though, and we will be accommodating these habits with our journals. What this means in practice is that authors can submit articles in TeX format, submit directly from arXiv and even submit to PhysMath Central and arXiv simultaneously. We will also link to the main databases in physics and provide support for multi-author uploads (where there are 10s or 100s of authors) and specialist publishing entities such as astronomical objects. We will also be adopting the standard PACS and MCS codes for physics and mathematics classifications.

FA: The physics and maths academic communities were pioneering in their adoption of open access. Notably, as you mentioned, with the founding of Arxiv. You also have experience in the commercial sector. How will you work with and borrow from the experience of both these sectors?

We are a commercial company providing an open access service. From a commercial standpoint open access makes sense. Scientists are demanding it and it is almost seen as unethical in some fields to publish results in a subscription journal. It is difficult to see the future of subscription journals as rosy.

But open access does not necessarily imply 'free'. If we are based on a sound financial footing, that bodes well for the long-term future of open access. We are not dependent on grants or philanthropy and will be able to grow with the growing interest in open access in the future.

FA: You recently promised to take advantage of new technologies to communicate research findings clearly and to meet the challenges of the future. Can you give some examples of these technologies and how you believe they will change the ways scientists research, collaborate, and publish?

CL: Sure - this is one of the most exciting parts of working in open access. Not only can we develop tools and services around our data, but anyone can. All articles are available, for free, to anyone in fully-formed XML, so we hope to see some suite of services like 'Google Labs' develop around this data.

However, for our part we intend to use new technology to support the scientific process in many ways. Apart from the tight arXiv integration already mentioned we are also going to use wikis with the editorial board members to refine the scope of the journals, journal blogs to inform everyone of editorial developments, OAI-PMH to update A&I services, RSS for journal content updates, multimedia to support the online text, comments from readers on each article, and we are very keen in working on ways to further structure and open up our data to other services. Other developments, such as 'tagging' of articles and refining the peer-review process will be considered if there is an appetite for it from the community we serve.

There is also an increasing drive to make raw data of experimental results available alongside the article itself. For particle collision data, for example, this would be problematic given the sheer volume of data - but this barrier will come down with time and for some fields it is already possible to publish raw data, so we will be investigating this option in the coming weeks.

Both Matthew and Chris make a forceful case for open access, as well as lighting its way forward. That BMC is about to break even without philanthropy is a forceful argument against the claim of some skeptics that open access is an unsustainable business model and vital encouragement for the development of local journals that challenge the traditional domination of English language titles published in the West. Semantic representations of data and the subject portal will change the way data is arranged and accessed. Finally, among the most exciting ideas mentioned are the tools for communication, wikis, blogs and tagging, that PhysMath aims to include. Although these ‘Web 2.0’ tools emerged alongside open access as part of the effort to ‘reclaim’ the web from corporate domination, the two principles have two rarely converged: to see an open access service make use of these tools for communication could contribute toward revolutionising pre- and post-publication methods of scholarly discussion and collaboration. Clearly, BioMed Central and its sister sites are not content to rest on the laurels of their previous successes, but are set to continue to lead the way in open access.

(top)

*Open access: Scientific publishing and the developing world PDF_icon

Advances in science, medicine, technology and agriculture have potential to contribute to the reduction of disease and poverty worldwide. Communications and information technology (ICT) has enabled collaboration and dissemination of scientific research on a global scale. In the words of UN Secretary-General Kofi Annan, ‘we are fortunate to live in an age that offers new opportunities for involving all nations in science and technology’ (1). However, scientists in the developing world are severely restricted in their access to current research. The open access (OA) model of publishing has often been suggested as a means of mitigating some of the restrictions faced by scientists in low-income countries, and has made significant progress in improving free access to research. However, as it emerges into the mainstream, the OA model must also face questions concerning its implications for the global distribution of intellectual property, widespread integration, and financial viability.

Open access in theory and practice
The basic philosophy of OA is that the publicly funded research emerging from universities and research institutions should be freely available to researchers working to benefit the public, rather than subject to fees imposed by publishers. OA refers primarily to material distributed in electronic form on the Internet (e-prints). This can include publications in OA journals, which make published articles available upon initial publication, or before (pre-prints) or after (post-prints) publication elsewhere. OA repositories are searchable collections of freely accessible material that can include OA journal articles, pre-prints, or post-prints.

Initiatives and organizations promoting OA emerged alongside the Internet from the early 1990's, and gathered force in the first years of this century as a reaction to the restriction of scientific research under subscription or 'pay-per-view' models of online publishing. The specialist open-access publisher BioMed Central (BMC) and the more recent arrival Public Library of Science (PLoS) continue to expand their journals and content. All content is automatically archived in PubMed Central and several other OA archives. BMC also assists institutions in developing their own research repositories, as does the Scholarly Publishing and Academic Resources Coalition (SPARC). Repositories are often developed using open source software; two of the best-known options are DSpace developed at MIT in the US, and EPrints from the University of Southampton in the U.K.

In recent years many, publishers, funding bodies, and renowned scientists have provided support to the OA movement, and thus bolstered the scientific prominence of freely available science. In the past year, the US National Institutes of Health (NIH), the UK Research Councils, and the Wellcome Trust, have mandated that all work they fund to be deposited in an open access archive within a short period after publication. Further, established journals operating on a traditional paid subscription model have shifted policy to accommodate the demand for OA science. The Nature Publishing Group (NPG), for example, has announced that starting in January 2007, all content published more than 4 years prior will be made available online. Finally, 25 provosts from leading research institutions in the U.S. have written a letter (PDF) in support of a U.S. Senate Bill that would require all governmental funding bodies with budgets in excess of $100 million to follow the lead of the NIH, and publish all resultant research in an OA repository. Active support for the OA movement from such high impact entities is encouraging across the scientific community has potential to make significance impact on science, including in developing countries.


Funding quality OA research
Despite these developments, relatively few OA journals are gaining high rankings in indices such as the Institute for Scientific Information (ISI), which deters many researchers from publishing in them. One important exception is PLoS, a model designed to prove that the open access model can be viable in publishing high quality research and whose journals PLoS Biology and PLoS Medicine are high-impact general science publications thought to aim at competition with top journals such as Nature and Science.

Publishing top quality research obviously requires the costs of editorial input, peer review, and electronic infrastructure and tools to be covered. Several alternatives have been explored to gain income from sources other than subscription fees. The most commonly used is the ‘author pays’ system, in which authors pay a fee towards journal publication costs. The assumption is that the cost of publication will be born by the author’s institution rather than the individual and advocates of this model argue that the expenses involved will be more than compensated for by the money saved in journal subscription costs. However, the example of PLoS also raises the question of whether the author pays model is capable of sustained financial viability. Nature recently published a news item claiming that PLoS had failed to achieve its stated aim of breaking even, losing almost $1 million in 2005, and had consequently been forced to increase its author charges from US$1,500 per article to as much as $2,500, as well as being reliant on philanthropy to cover its costs (2).

Most open-access journals agree to waive author fees in the case of authors based at institutions in developing countries. However, there remains concern that the need to raise revenues could lead to a financial bias in editorial decisions against those authors who would qualify for exemption from the charges.

An alternative to the full author-pays model is the ‘sponsored article’ model which has recently been adopted by Elsevier journals and by the Royal Society under its EXis open choice model. In this case, after the peer review process, authors are offered the option of paying a fee to ensure their articles are freely available. However, as there is no plan to waive author charges in this model, it is unlikely that cash-strapped institutions in the developing world are likely to take up this option. In fact, this could led to an increase in the current trend for research from authors in industrialised countries to be more widely read, as it is more likely to be freely available.

Open access and developing countries
Research in low-income countries is compromised by multiple factors: resources may be limited, equipment less than optimal, and basic infrastructure, such as electricity supplies, unreliable. Among these barriers is the issue of access to current research. While the number of specialist academic journals continues to rise, the average price of a science journal has risen four times faster than inflation for the past two decades, resulting in an 'access crisis' in which libraries are forced to cancel journal subscriptions (3). This worldwide problem is magnified in low-income countries; even state institutions are often unable to meet the rising costs of journal subscriptions. Although the Internet has largely overcome the problems, including delays and theft, associated with physical distribution of journals, the price barrier remains insurmountable in many cases. It is therefore widely thought that open access will be particularly beneficial to researchers in less developed countries

OA initiatives that target less wealthy nations or regions can be broadly divided into those that aim to increase the access to resources, those aiming to increase the visibility of work of authors from these areas and those which aim to increase knowledge of the available resources.

Access
Diverse initiatives targeted at improving access to science, technology, and medical research in the developing world have arisen from the 1990s onwards. Two of the most high profile international initiatives are HINARI and AGORA.

In January 2002 the World Health Organisation (WHO) launched HINARI (Health InterNetwork Access to Research Initiative) as part of a wider scheme to improve communication between researchers (3). HINARI provides free or reduced-rate access to over 2000 medical, biomedical and social sciences journals for researchers working in designated countries, via an interface with publishers' websites. Access is limited to state institutions and does not encompass non-governmental organizations (NGOs) or smaller hospitals, and the qualifying criteria are stringent. Furthermore, despite qualifying on the basis of a nominal cut-off point of $3000 GNP per capita, India and other transitional states are not eligible. Nevertheless, positive feedback from users in Asia, Africa, South America, and Eastern Europe also testifies to the value of this resource. A sister UN program, AGORA (Access to Global Online Research in Agriculture) operates in a similar fashion for agricultural research publications.

Several publishers have also taken steps to increase the ability of researchers in developing countries to both access and contribute to academic literature by offering free or reduced price access to journals and/or waiving author charges. To review and develop these moves will be an important task for organisations such as the Task Force on Science Journals, Poverty, and Human Development set up in 2005 by the Council for Science Editors as a forum for journal editors.

Publishing
Alternative OA journals and repositories, which focus on research emerging from the developing world, are becoming increasingly prominent. Some notable examples include Bioline International, which hosts electronic OA versions of developing country journals, SciELO, containing journals published in Latin American countries and Spain (2), the Indian Medlars Centre, and African Journals Online. Their integration into mainstream archives like PubMed seems far off, however. Such integration will be vital if the OA model is to seriously challenge the gap between access to cutting-edge scientific knowledge in wealthy and less wealthy countries.

Information and collaboration
The provision of resources is not sufficient to improve access. Awareness of OA remains low in both the developed and developing world. As the drive towards open access to science developing countries gathers momentum, encouraging the best use of available resources is the next logical step. Since its inception in 1992, the International Network for the Availability of Scientific Publications (INASP), a UK-based charity, has worked with partners across the world to facilitate access to online publications through workshops, training, library capacity building, and skills development. INASP also recognizes the importance of outreach programs to rural communities, particularly in view of agricultural research and health interventions. Additionally, INASP fosters closer associations between partners in different countries. Transfer of knowledge and expertise between developing countries has to date been extremely limited; however 'South-South' research collaborations are poised for progress over the next decade, with renewed trade and development contacts, for example, between Africa and China.

Collaboration between researchers in wealthy nations and less developed regions is an informal way to improve scientific communication outside the traditional model of journal publishing. However, such links can be constrained by institutional regulations. Dr Maria Sanchez of the University of California points out that currently, overseas collaborators are not usually granted access privileges, but that 'one way of facilitating access to journal resources would be for institutions in high-income countries to grant this kind of service to collaborators in low-income countries, or in general to institutions in low-income countries with which they have relevant ties'. In technical terms, such collaboration could be facilitated by integrating the various 'end-user identities' used by universities and research institutions to allow controlled access from other institutions.

Conclusions
There is little doubt that open access initiatives have greatly improved the potential access of authors in low income countries to scientific research. Nevertheless, in the case of agreements which allow open access to findings that are usually restricted under subscription or pay-per-view models, there are strong arguments that the scope of institutions, as well as the range of countries that are granted open access, should be enlarged. Further, there is a need for the provision of information about open access in an accessible form (and language) and the training of information professionals and scientists in less industrialised nations to ensure that those who could benefit from them are aware of and able to use open access resources. The question of whether open access allows the work of authors from less developed countries to gain more exposure is less straightforward. The author pays model presents obvious problems for less affluent institutions, as well as the more subtle issues of editorial decision making where charges are waived. Great care needs to be taken that some aspects of this model, and especially of ‘hybrid’ models where charges to allow open access are an option, do not act to reinforce the dominance of the industrialised countries over the scientific literature rather than challenging it. Finally, while gateways and repositories focusing on journals from a specific country or region are useful, the development of subject-specific resources containing the work of authors from both wealthy and less wealthy nations in a range of languages is vital to prevent the development of a ‘two-tier’ system of open access publishing and archiving.

Acknowledgements
With thanks to Maria Alice da Silva Telles, Dr Maria Sanchez, and Dr Ruth McNerney for their constructive comments via the HDNet Stop-TB eForum.

References
1) K. Annan, (2004). Science for all nations. Science 303(5660):925.

2) Butler, Declan. Open-access journal hits rocky times: Financial analysis reveals dependence on philanthropy. Nature News Published online: 20 June 2006; doi:10.1038/441914a, URL: http://www.nature.com/news/2006/060619/full/441914a.html. See also the ensuing discussion on the Nature Newsblog: http://blogs.nature.com/news/blog/2006/06/openaccess_journal_hits_rocky.html

3) Suber, Peter and Arunachalam, Subbiah, (2005). Open Access to Science in the Developing World. World-Information City, October 17.

4) Papin-Ramcharan, Jennifer and Dawe, Richard A. Open Access: A Developing Country View. First Monday, volume 11, number 6 (June 2006), URL: http://firstmonday.org/issues/issue11_6/papin/index.html.

(top)

*Text Mining: Science Digs Deeper PDF_icon


Many scientists trying to unearth nuggets of information from the vast online deposits have probably wished for an intelligent tool to automatically answer complex queries, such as “How does protein X affect disease Y”? Existing keyword searches largely ignore the relationships between words, so is this scenario just a pipedream? Perhaps not, as several research groups and companies are now pioneering innovative text-mining technologies that might ultimately allow personally relevant semantic searching.

Intelligent digging
Most electronic information is stored as unstructured text in the form of e-mails, news articles, academic papers, commercial reports and so on.1 Keyword-based search engines, be they specialized for science like Entrez or more general like Google, pay little heed to the textual context or interrelationships between words.(2)


Text-mining programs aim to dig deeper, using natural-language processing to discover hidden patterns and associations, and providing visual maps to direct users down previously uncharted trails.(1) This essentially semantic approach uses statistical and linguistic rules, hand-crafted for specific types of text, to probe the underlying meanings. The result could be technologies capable of answering sophisticated questions and performing automated text searches with an element of intelligence.(3)


Labour-intensive manual text-mining approaches first surfaced in the mid-1980s, but technological advances have enabled the field to advance in leaps and bounds during the past decade.(4) Electronic text-mining programs are now beginning to use artificial intelligence techniques to search text for entities (qualities or characteristics) and concepts (such as the relationship between two entities).(4) The four key stages of this process are retrieving relevant documents, extracting lists of entities or relationships among entities, answering questions about the content, and delivering facts to the user in response to specific natural-language queries.

Fool’s gold?
So just how realistic is this goal? It is true that language-processing software tools have been successfully applied to non-scientific text, such as news content.(5) Yet the task facing science text-mining comes closer to the ultimate challenge of comprehending human languages.(5)


One of the major hurdles to be overcome is interpreting the dense layers of jargon that permeate science writing — this is known as the ‘ontology problem’.(5) Another obstacle to the progress of text mining is the thorny issue of access; many journals do not make their full-text content publicly available, so search engines often scan abstracts alone. Further complications stem from the lack of standardization; few journals use the same format, and even within an article different sections (such as the methods and the discussion) might need to be assigned different weightings depending upon the precise nature of the search.

Lighting the way
Several groups of academics have begun addressing these concerns, and are developing text-mining software to scan open-access publications and meet the advanced needs of researchers in their particular fields. Examples from the life sciences include the Arrowsmith software and EBIMed retrieval engine, both of which perform sophisticated searches of Medline text focusing on the causes of disease and protein interactions, respectively.(7) One of the newest such tools is Textpresso launched by Wormbase in February 2006 to serve the Caenorhabditis elegans research community; this resource relies on human ‘taggers’ to manually mark up its corpus of text, and outputs responses to complex queries in the form of citations, abstracts or paragraphs from relevant papers.(5) An information resource with links to various similar projects is provided by Biomedical Literature and (text) Mining Publications (BLIMP).


An exciting new initiative to aid the academic onslaught against ‘data deluge’ was launched in March 2006.2 The National Centre for Text Mining (NaCTeM) is a collaborative effort between the Universities of Manchester, Liverpool and Salford, funded by the Joint Information Systems Committee (JISC), the Biotechnology and Biological Research Council (BBSRC), and the Engineering and Physical Sciences Research Council (EPSRC). NaCTeM aims to provide tools, carry out research and offer advice to the academic community, with an initial focus on text mining in the biological and biomedical sciences.(2)

Pioneering publishers
Although science publishers will have a strong impact on the success of text-mining efforts, they have yet to develop a standard annotation of their content that will allow full-text access to computers.(7) One route would be for each publisher to license its own electronic tools for searching its content, although such a system would still necessitate multiple separate searches. A more appealing prospect might be the adoption of a common format in which all publishers could issue content for text mining and indexing. Nature Publishing Group (NPG) recently proposed an initiative, known as the Open Text Mining Interface (OTMI), in an attempt to kick start the debate on how publishers should respond to requests for machine-readable copies of content.(8,9) Their suggested approach is to establish a common format in which coded content could be made freely available for text mining and indexing while maintaining publishers’ restrictions on human access. This aim would be achieved by labelling the different section of the paper and converting text into ‘word vectors’ and ‘snippets’, which give some indication of the content and structure of a piece. They propose that “If all publishers were to adopt this or some similar standard, the entire literature would become accessible for mining.”(7).


Some argue that converting the text after publication is a mistaken way to approach the issue of adding meaning to new academic texts and that the process should begin much earlier, with the initial formatting of the journal article by the authors or publishers. This will perhaps be the eventual result of the approach taken by the National Institutes of Health (NIH) with their initiative to encourage publishers to adopt a common Journal Publishing Document Type Definition (DTD) to provide a standard method of xml tagging for journal content that could provide semantic meaning without the need for syntax scrabbling or laborious manual tagging. Open access publishers, who do not face the problem of how far syntax can be retained while shielding meaning, also tend to take different approaches, discussed in BioMed Central’s text mining page and the Open Archives Initiative.

Future prospects
Clearly, the ability of software to interpret text depends upon the knowledge and abilities of its human programmers and users. Recent document-retrieval studies have reported just 5–10% improvements in accuracy using existing text-mining technologies compared with standard keyword searches.(3) Thus, while we could be poised on the threshold of the era of text mining, the technology is still in its infancy and much remains to be achieved. It is too early to predict whether text mining will ultimately strike gold, hit rock bottom, or be surpassed by newer forms of industry as semantic markup of scientific texts becomes the norm.

References
1. Guernsey, L. (2003) Digging for nuggets of wisdom. New York Times 16 October http://tech2.nytimes.com/mem/technology/techreview.html?res=950CE5DD173EF935A25753C1A9659C8B63.
2. Joint Information Systems Committee (2005) Press Release: World’s first text mining service to benefit British academics 17 March http://www.jisc.ac.uk/index.cfm?name=pr_text_mining_170305.
3. Abrams, W. (2003) Text mining: the next gold rush. Second Moment http://www.secondmoment.org/articles/textmining.php.
4. Nightingale, J. (2006) Digging for data that can change our world. The Guardian 10 January http://education.guardian.co.uk/elearning/story/0,,1682496,00.html.
5. Dickman, S. (2003) Tough mining: the challenges of searching the scientific literature. PLoS Biol. 1(2): e48 http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pbio.0000048.
6. Timmer, J. (2006) Mining scientific publishing. Ars Technica 3 May http://arstechnica.com/journals/science.ars/2006/5/3/3827.
7. Editorial (2006) Machine readability. Nature 440: 1090 http://www.nature.com/nature/journal/v440/n7088/full/4401090a.html
8. Hannay, T. (2006) Open text mining interface. Nascent 24 April http://blogs.nature.com/wp/nascent/2006/04/open_text_mining_interface_1.html.
9. Lynch, C. (2006) Open Computation: Beyond Human-Reader-Centric Views of Scholarly Literatures: http://www.cni.org/staff/cliffpubs/OpenComputation.htm.

(top)

*Mobilizing Scholars: using mobile devices in scientific research PDF icon

A recent First Monday article (1) coined the phrase 'the mobile scholar', noting that those scholars who are able to make effective use of a range of information and communications technology work faster and more accurately than those who do not. The authors noted the use of personal digital assistants (PDAs) in medicine and engineering and their potential for greater use in other academic fields. A range of new mobile devices combining the capacities of USB flash drives and self-contained servers with those of PDAs coupled with new online formats designed for mobile devices and the potential of the ‘mobile web 2.0’ could allow the development of innovative medical and scientific uses of mobile devices.

In medicine, PDAs have been shown to assist medical staff in diagnosis and drug selection (2) and studies have been conducted showing that their use by patients to record symptoms improves the effectiveness of communication with hospitals during follow-up (3). Take-up rates have been high among medical professionals and a range of relevant resources have been developed to cater for this, including epocrates and ABX guide, which supply drug databases, treatment information and relevant news in formats designed for mobile devices and MedNet, which provides product reviews of a range of mobile devices and a forum for medical professionals to discuss related issues. Services such as Avantgo translate medical journals into readable formats and provide updates from journals, software like WardWatch organizes medical records to remind doctors making ward rounds of information such as the treatment regimens of patients and programs. Finally, Pendragon provides tools for conducting research on mobile devices, with a connection back to a central server allowing the user to enter data into a centralized database using their mobile device meaning that it is relevant to biological sciences as well as medicine (4).

In other scientific disciplines, the use of PDAs has so far been less widespread. However, discussion of new systems intended to monitor changes in the physical world has encompassed the use of mobile devices as part of larger systems. These include the development of ‘motes’ or mini-computers that record information about the environment, which is aggregating using distributed computing to form ‘sensor webs’ or ‘sensor nets’ (5). A working paper that emerged from Intel’s collaboration with Berkeley University to explore the scientific applications of sensor web technology suggested the use of PDAs to visually monitor the locations and status of the motes (6).

Sensor web technology is also relevant to medicine. Tiny motes could also be used to monitor patients’ physical conditions. A recent conference run by MIT’s fascinating, if occasionally fanciful ‘Things that Think’ team, focused on wearable bodily sensors to monitor ongoing conditions like diabetes and epilepsy and altering medical staff or the patient themselves to the treatment required. The conference also discussed integrating this type of sensor network with those monitoring the physical environment.

Despite the proven usefulness of mobile devices for these applications, their use in academic research remains limited. This is due firstly to the restrictions imposed by their physical size and secondly because of their incompatibility with computer operating systems and the format of much of the information available on the Internet. Various and diverse solutions are being suggested to combat these problems.

Tim Berners Lee’s World Wide Web Consortium has launched a Mobile Web Initiative designed to ‘make browsing the Web from mobile devices a reality’. The initiative’s main goal is to develop a common device descriptions repository with the aim of allowing Internet providers to identify the type of device that is receiving the information and modify its format accordingly. New tools such as Macromedia’s Flash Lite enable the production of user interfaces customized for mobile devices. In any case, with the increasing movement away from website-based content towards delivery via RSS and other formats in which content is divorces from presentation, the issue of microcontent becomes less of a problem as the device rather than the content-provider is enabled to specify how the content is displayed. Communication with computers, which currently has some remaining problems such as the PocketPC’s lack of compatibility with the Mac, is also an issue that should soon be resolved in light of the current trend towards enabling cross-platform compatibility (1).

Meanwhile, investigations into transforming the relationship between mobile devices, computers and the web are producing some interesting results. A variety of systems are being developed that combine mobile devices with USB flash drives and portable web servers. One example is indi, a software package that runs a ‘personal web server’ and web site based on Ruby and can be downloaded onto a flash drive. When the USB drive is connected to a computer, it runs its own operating environment rather than relying on that of the computer, meaning that it is compatible with Mac and PC computers. All data is also retained only on the indi drive rather than being transferred onto the host computer. A similar service is provided by WOS, or ‘webserver on stick’ a USB drive that transforms any computer into an Apache webserver with PHP and MySQL, and by the SVG terminal, a JavaScript Firefox plugin allow mobile devices to project information via wi-fi. Finally, Linux-based servers that can be combined with a PDA-like hand-held device have been developed, including Black Dog and Intel’s Personal Server.

These devices allow more complex information to be stored confidentially on a PDA and displayed when necessary, rather than being copied between various different computers and the mobile device. The data on the PDA can also be protected against theft using features such as such as the biometric signature authentication included in Black Dog. Obviously the issue of security is a major one for doctors, for whom entering confidential information, such as patients’ medical records onto their PDAs is contentious (as a MedNet post comments). Combining this type of independent and secure operating system with the new applications of PDA in the type of sensor web technology discussed above would assure doctors that they can monitor their patients securely.

Since the explosion of ‘Web 2.0’ applications over the last few years, some have been discussing how this technology can be applied to mobile devices. One interesting example of this is Ajit Jaokar’s Open Gardens blog and his earlier article (7), which suggests an adapted version of del.icio.us and flickr for mobile devices. This proposal takes into account the different situation that a mobile web user is placed in and how this would affect tagging and sharing data, suggesting that tags for a visual image could be added at the point when the image is captured, based on physical location, time, and data from other users. Sharing data between mobile devices, for example using Bluetooth, would also depend on physical location: in fact data could be fixed to particular locations, a practice known as ‘air graffiti’ or ‘splash messaging’ (for an example see Milestone 5) and enabled by a combination of spatial information and mapping feeds. So far, there is not much information about this idea available but it could have interesting applications for field-based research, such as leaving information about the location of the sensor webs discussed above or other comments related to and/ or images of the physical environment for co-workers or to return to later yourself. Skeptics have pointed out the downsides of transferring Web 2.0 technology to mobile devices, in particular the difficulty of translating the concept of open standards. However, Web 2.0 has never been a concrete set of specifications (8), and concepts like those pioneered by Jaokar show that mobile devices have the potential to expand the range of technologies and concepts encompassed by the term as they add new capacities such as imaging and, most importantly, portability.

Mobile devices have great potential for scientific application, in terms of fieldwork, on-the-spot access to information and communication with colleagues. The disparate technologies and concepts behind the development of PDAs, mobile phones, and flash devices are beginning to come together to produce a new generation of devices that combine these various functions. Features of the devices themselves also have the capacity to extend, rather than limit, the development of web technology.

(1) David B. Bills, Stephanie Holliman, Laura Lowe, J. Evans Ochola, Su–Euk Park, Eric J. Reed, Christine Wolfe, and Laura Thudium Zieglowsky. The new mobile scholar and the effective use of information and communication technology. First Monday, volume 11, number 4 (April 2006).

(2) Rudkin SE, Langdorf MI, Macias D, Oman JA, Kazzi AA. Personal digital assistants change management more often than paper texts and foster patient confidence. Eur J Emerg Med. 2006 Apr; 13(2): 92-6.

(3) Kearney N, Kidd L, Miller M, Sage M, Khorrami J, McGee M, Cassidy J, Niven K, Gray P. Utilising handheld computers to monitor and support patients receiving chemotherapy: results of a UK-based feasibility study. Support Care Cancer. 2006 Mar 9.

(4) Sandra Fischer, MD, Thomas E. Stewart, MD, FRCPC, Sangeeta Mehta, MD, FRCPC, Randy Wax, MD, FRCPC, and Stephen E. Lapinsky, MB, BCh, FRCPC. Handheld Computing in Medicine. J Am Med Inform Assoc. 2003 Mar–Apr; 10(2): 139–149. doi: 10.1197/jamia.M1180.

(5) Declan Butler. 2020 computing: Everything, everywhere. Nature 440, 402-405 (23 March 2006) doi:10.1038/440402a.

(6) W. Steven Conner Jasmeet Chhabra Mark Yarvis Lakshman Krishnamurthy. Experimental Evaluation of Topology Control and Synchronization for In-Building Sensor Network Applications. Intel Research & Development. August 2003. See also Sheng-Tun Li, Li-Yen Shue, Huang-Chih Hsieh. The development of a PDA-based communication architecture for surveillance services. Int. J. of Internet Protocol Technology 2005 - Vol. 1, No.1 pp. 3 - 9.

(7) Ajit Jaokar. Mobile web 2.0: Web 2.0 and its impact on mobility and digital convergence. December 25 2005.

(8) Tim O'Reilly. What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software. 09/30/2005.

(top)

*Writing on the Web (2.0)?PDF

In most scientific disciplines, the majority of academic papers are written collaboratively. They also tend to undergo several rounds of revision, with new content often being added after peer review and style and format reworked for target journals. Currently, this tends to involve emailing versions of the document between authors, or storing versions on shared drives. However, a new breed of online applications that mimic the functions of desktop applications could change the process of producing a scientific paper.

Ajax and the writable web
The traditional problem with using web applications to perform functions similar to those carried out by personal computers is the time it takes for the client to communicate with the server. This means there is a delay in performing the command given by the user. Recently, Google Earth demonstrated that this barrier had been removed: moving the mouse on your personal computer results in instantaneous spin or zoom onscreen. Google Suggest is another example: with every keystroke the suggestions in the drop-down box are updated (1).

The secret to the instant responses of this new generation of web applications, sometimes referred to as WYSIWYG (‘what you see is what you get’), is the use of Ajax. Ajax stands for Asynchronous JavaScript + XML and, in these applications, is used to form an extra layer between the server and the client, simultaneously creating the visible interface that the user sees and maintaining continual contact with the server. This allows the user’s interaction with the application to occur asynchronously, giving the instant onscreen results (2). The extra layer, also referred to as Middleware, can also allow for security functions including the authentication and authorization of users.

WYSIWIG applications
This technology can also be applied to functions traditionally performed on a single personal computer, such as writing, editing, and creating spreadsheets. One example, expected to attract more attention after its recent acquisition by Google, is Writerly. Writerly is a WYSIWIG word processing application that runs within a web browser. It mimics many of the functions of Microsoft Word and it is possible to load documents authored in Word and other word processing programs into Writerly and vice versa. Writerly also possesses several advantages over traditional word processing programs. The fact that documents are stored online (currently individual files must be under 500k but there is no limit on users’ total storage space) makes it possible to access them from any computer with an Internet connection. The documents are protected by layered security architecture, and can only be viewed or modified by those the creator invites to do so via email. The revision function stores all previous versions of the document, so that it is clear which changes have been made by which authors and at what time. It is also possible to revert back to any earlier version at any stage.

Other examples of online word processing applications include AjaxWrite. This is a more basic program that works only with Firefox and lacks the advanced editing features of Writerly. Although it is compatible with Word, it also lacks several of its features (for example, while editing the draft version of this document in AjaxWrite, I wasn't able to add the hyperlink to the web address). Another is Zohowriter, which had more advanced features but does not yet seem entirely compatible with word (it made a draft of this document crash and refuse to reopen) and appeared to lack security features, saving my document directly into a public folder.

Other applications have also been developed to mimic the Microsoft’s desktop applications in an online setting. These include, iRows, an Excel-like web application and Thumbstacks, which performs similar functions to Powerpoint. Finally, an entire web-based office package is offered by gOffice. This service concentrates on formatting, outputting documents in PDF. It is less compatible with Word that the online word processor programs, asking that text should be pasted from text only or html files. Like most of the other free services, it offers unlimited storage space and additional perks such as free fax and postal services. Another way to enhance communication over the web is to integrate existing systems such as mobile phones, personal digital assistants (PDAs) and email and messenger services. One service that offers this is Remember the Milk, an online organizer program. Remember the Milk also allows collaboration with groups set up for various tasks and allows integration with Google Calendar, Apple iCal and Mozilla Sunbird.

Online organisation
While desktop applications are usually delivered as a package, are largely interoperable, and produce documents that are stored in the file system of a local computer, online applications clearly have different requirements when it comes to integration and storage. Writerly also provides a novel solution to the problem of storage and retrieval of documents. Like other ‘Web 2.0’ applications, Writerly makes use of Technorati tags: keywords which a user can attach to their document and which will then retrieve all documents with that keyword.

Writerly is also designed to be integrated into Netvibes, an example of another element of the move from personal computers to online applications. Netvibes is an online ‘desktop’, which can be personalised by the user.

So far, Netvibes, and competitors including Google Desktop and Pageflakes, essentially work as feed aggregators, in which the display of RSS feeds can be altered by the user. Netvibes also allows the user to integrate applications that use OPML (Outline Processor Markup Language) to develop more complex XML-based services. A new service from Microsoft, Microsoft Live performs a similar function, with the additional capacity to save searches from its academic search feature directly onto the desktop. An alternative way to integrate continuously updated searches into one of these services is to run a search in HubMed then integrate the RSS feed into any of the rival online desktops. As well as RSS feeds and OPML, Microsoft Live’s desktop function also supports Gadgets, another breed of XML-based applications that perform simple calculations such as currency conversion or weather reporting based on continuously updated information. The Microsoft Live blog gives Windows users instructions on how to design their own Gadgets or select from those designed and tagged by previous users. Apple’s Widgets provide a similar service for Mac users.

Critics and alternatives
There are some critics of the movement towards transferring software online. One of the most vocal is Liam Breck. As he points out in his blog, ‘Web 2.5’, there are some very real issues of security surrounding the storage of data online: the record of search engines for respecting the privacy of users is arguably patchy. More fundamentally, he contends that Ajax, like earlier experiments with ASP is concerned with ‘pushing personal computing up to the web, rather than bringing the web down to personal computers’ (3). Breck’s personal vision is a web model that reverses the client-server relationship altogether utilising millions of personal ‘mini servers’. A prototype of one of these servers is airWRX. The idea is that the program is downloaded to a USB flash device and works as mobile webspace, using XML and flash to communicate with authenticated client computers. The services offered are quite similar to those boasted by the new online office programs: instant WYSIWYG formatting, keyword searches and ease of collaboration. Unlike services like Writerly that mimic Microsoft Office layouts, however, the airWRX layout is described as analogous to a spiral-bound notepad for a single user – displaying a single page but easy to flip forward or backwards - and to a whiteboard for multiple users viewing the same screen. Breck also proposes alternatives to web-based service such as the ‘Pocket Wiki’, although as one comment pointed out, the problem with offline alteration that are uploaded in stages is the potential for more than one version of a document to exist, having been modified variously by different users.

While Breck’s services resolve some of the security issues surrounding creating and storing data online, even skeptics are coming to accept the concept that offline services lack the immediacy and potential for collaboration while personal computers lack the storage space, computing power, and organizational tools that Web 2.0 affords (4). The research that goes into the creation of a scientific paper is increasingly taking place online: databases are used to find drug candidates or test binding interactions, online bibliographic services to identify references and bookmarking services store them online. When a paper is produced, it is increasingly read and referenced online. It would seem logical that the next stage in this evolution is a move towards creating research articles online. However, the issues that remain surrounding security must be resolved before confidential data can be entrusted to online storage and the question of long-term storage of data, as well as continuity in web services must also be addressed: as a BBC commentator puts it, ‘If Web 2.0 is the first stage in a revolution, we need to make sure it's a permanent revolution’ (4).

References
(1) For an in-depth analysis see: Chris Justus, “Google Suggest Dissected ”. 04/12/04 Server Side Guy.
(2) Jesse James Garrett. “Ajax: A New Approach to Web Applications”
(3) Liam Breck. “Web 2.5 is the personal web”, Web 2.5 blog, 05/12/05.
(4) Bill Thompson. “Learning to Love Web 2.0”, BBC technology website 27/03/06.

(top)

*The Wide World of WikisPDF


For many, the wiki has been a surprise hit among the up-and-coming online publishing technologies. The radical concept of a website that anyone and everyone can edit was initially greeted by some with scepticism, but has rapidly proved itself to be a winner. The huge success of the online encyclopaedia Wikipedia (http://www.wikipedia.org) and the highly-publicised wrangles over its merits have propelled wikis into the media spotlight in recent months.

What’s a wiki?
A wiki is an open-access website that all-comers can view and edit, often without needing to register. The essential features are unrestricted editing by users, cumulative revision of articles rather than previous versions being deleted, and rapid quality checking (1). The basic philosophy of wikis involves harnessing the collective brain power of experts from around the world to continuously update and refine their content. Wiki systems encourage users to closely monitor changes, and present a forum for discussing inevitable clashes of opinion as and when they arise.


Box 1: Selected examples of science and technology wikis
*Cosmopedia (http://www.cosmowiki.org/index.php/Main_Page): a physical science resource and encyclopaedia that started in December 2005.
*EvoWiki (http://wiki.cotch.net/index.php/Main_Page): a free reader-built encyclopaedia of evolution, biology and origins, which aims to promote general evolution education, and to provide mainstream scientific responses to the arguments of creationism and other antievolutionists.
*Quantiki (http://cam.qubit.org/wiki/index.php/Main_Page): a free-content resource in quantum information science.
*Qwiki (http://qwiki.caltech.edu/wiki/Main_Page): a quantum physics wiki devoted to the collective creation of content that is technical and useful to practicing scientists in subjects including, but not limited to, quantum optics, quantum metrology, quantum control, quantum information and quantum computation.
*Wikiomics (http://wikiomics.org/wiki/Main_Page): a wiki for the bioinformatics community.

Some supporters have gone a step further when defining wikis, pointing out similarities with science itself. From this perspective, a wiki can be seen as a collaborative journey of a community of individuals with a shared passion, which self-corrects by peer review, and ultimately aims to explore and explain the world.

Wikis form part of the new generation of Internet technologies sometimes described collectively as ‘Web 2.0’. Central to the concept of Web 2.0 is user participation and the ‘radical trust’ required to entrust the production and control of information to the online community at large. Skeptics would point out that such trust replaces a decline in the traditional markers of credibility: the anonymity of the creators of the content of the new web, and its fluidity, means that verifying the source of the material, the date of its creation and assessing its objectivity become virtually impossible (2).

A brief history of wikis
Since the first wiki appeared in 1995, the technology has inspired an ever-growing body of private and public online knowledge bases (3). Wikipedia, which is probably the largest and best known, has mushroomed since its launch in 2001, and now includes almost 4 million entries in over 200 languages. The English-language version alone has more than 45,000 registered users, and up to 1,500 new articles were added daily during late 2005 (4).

The non-profit Wikimedia Foundation that hosts Wikipedia promotes many other wiki-based projects. These include the collaborative English-language dictionary Wiktionary (http://en.wiktionary.org/wiki/Main_Page), the Wikispecies (http://species.wikipedia.org/wiki/Main_Page) directory of life, and the Wikibooks (http://en.wikibooks.org/wiki/Main_Page) textbook collection. There is even an annual international wiki conference — with the next session to be held in Cambridge, Massachusetts, on 4–6 August (http://wikimania2006.wikimedia.org/wiki/Main_Page) — focusing on issues surrounding open-source software, free-knowledge initiatives and other wiki projects worldwide. Examples of further science and medicine wikis are listed in Box 1.

Wicked wikis
Despite the advantages of the wiki approach, it is clearly vulnerable to electronic ‘vandalism’ and problems with misleading content. Most wikis focus on the rapid correction of mistakes rather than their prevention, which allows users to introduce errors — albeit transiently — either by mistake or for their own dubious purposes (3).

Fighting wiki vandalism is an ongoing battle, and without adequate protection sites can easily become overwhelmed. In June 2005, the Los Angeles Times launched an innovative new online section, named the Wikitorial project, which allowed readers to rewrite its editorial column (5). The site was flooded with inappropriate material faster than the editors could remove it, and was shut down within days. Likewise, although most vandalism to Wikipedia is reportedly corrected within minutes (3), the site was forced to introduce a registration process for editors in December 2005 after detecting malicious changes to a biography (6) — a move seen by some as at odds with the basic wiki principle. Although new defences against vandalism are constantly evolving, deliberately inserted subtle errors continue to be the most problematic and insidious form of attack.

The thorny question of the accuracy of wikis recently hit the headlines when a bitter row erupted between two publishing giants, Nature Publishing Group and Britannica. An investigation carried out by Nature in December 2005 claimed that the scientific accuracy of Wikipedia did not substantially differ from that of the ‘gold standard’ reference work Encyclopaedia Britannica — clearly a great endorsement of the wiki principle (4). However, Britannica dismissed the Nature study as “misleading” and “completely without merit”, and called upon the journal to retract the “fatally flawed” report (7). Despite the surprising vehemence of this rebuttal, Nature has continued to stand by both its data and the conclusions of its report (8).

The issue of accuracy is clearly far from settled, but this clash illuminates the underlying concern that wikis could eventually supersede traditional publishing formats. However, in reality, they are just one of many online technologies that are challenging existing publishing models. The concern over the accuracy of Wikis is also part of a wider debate about the reliability of information on the Internet, an issue that has existed since the inception of the web but that has grown more pressing with the potential for abuse of the radical trust invested in Web 2.0 partcipants. The Center for the Digital Future, which studies trends in Internet use in the US, has reported declining rates of accuracy in the information available on the web over the last three years, paralleled by an increasing tendency for users to mistrust information available online (9). Case studies of other participatory web technologies have also uncovered significant breaches of reliability: for example, a study of Amazon’s product reviews found that a large number had been copied wholesale between products (10).

This example raises another important issue, that of how participation in the new interactive web is controlled and manipulated. Despite Wikipedia’s open standards, the project is overseen by a number of staff editors who are responsible for removing inaccuracies and inappropriate entries. These editors are not required to identify themselves, and some argue that their control of the project extends beyond removing errors to enforcing ideological control by removing entries that they disagree with or even perpetrating slander against those that oppose them (11). The question of the boundaries between fact and opinion is a difficult one, and one that is bound to occur in some of the scientific wikis listed above: EvoWiki’s statement that it is intended to counter Creationism will no doubt come under fire from those who argue that the latter should be accorded respect as an explanation for the origin of mankind. While information can be accumulated as a collaborative venture, one of the problems of wikis is that they obscure differences of opinion (11). Science is about the culmulative production of information, but it is born from a process of ongoing debate, which can be lost in the homogeneity of the wiki format.

In their favour, wikis are free, have almost unlimited scope, are instantly updatable and carry interactive links to numerous other sources. Yet, at present, traditional books and journals are still perceived by most to be more authoritative and reliable.

A Wiki World?
Wikis clearly have the makings of high-quality global resources if the issues surrounding vandalism and accuracy can be settled. The calibre of the individuals contributing to and monitoring wiki entries will also remain of paramount importance, as recognized in Nature’s call for researchers to contribute their expertise in order “to push forward the grand experiment that is Wikipedia, and to see how much it can improve” (12).

The scope of wikis is almost limitless and doubtless much of their potential remains untapped at present. However, during its short history, Wikipedia has achieved massive popularity as an online information resource — ranking 17th among the global top 500 most-visited websites according to the Alexa web-ranking service (http://www.alexa.com/site/ds/top_500) in April 2006. Time will tell whether wikis will co-exist alongside established information resources or whether the future will see a truly wiki world.

References
1. Guest, D. G. (2003) Four futures for scientific and medical publishing. It’s a wiki wiki world. British Medical Journal 326: 932.
2. Shaker, Lee. (2006) In Google we trust. First Monday, 11 (4) URL: http://firstmonday.org/issues/issue11_4/shaker/index.html
3. Wikipedia. Entry for Wiki (last modified 11 April 2006) http://en.wikipedia.org/wiki/Wiki.
4. Giles, J. (2005) News. Special Report. Internet encyclopaedias go head to head.
Nature 438: 900–901 (doi:10.1038/438900a) http://www.nature.com/nature/journal/v438/n7070/full/438900a.html.
5. Glaister, D. (22 June 2005) LA Times ‘wikitorial’ gives editors red faces. Guardian Unlimited http://technology.guardian.co.uk/online/news/0,12597,1511810,00.html.
6. Associated Press (5 December 2005) Wikipedia tightens the reins. Wired News http://www.wired.com/news/technology/0,1282,69759,00.html
7. Britannica (March 2006). Fatally flawed: refuting the recent study on encyclopedic accuracy by the journal Nature. http://corporate.britannica.com/britannica_nature_response.pdf
8. Editorial (30 March 2006) Britannica attacks... and we respond. Nature 440: 582 (doi:10.1038/440582b) http://www.nature.com/nature/journal/v440/n7084/full/440582b.html
9. Center for the Digital Future. 2005 Report. University of Southern California. http://www.digitalcenter.org/pages/current_report.asp?intGlobalId=19
10.David, Shay and Pinch, Trevor (2006). Six degrees of reputation: The use and abuse of online review and recommendation systems. First Monday, 11(3). http://www.firstmonday.org/issues/issue11_3/david/index.html
11. Andrew Orlowski (13 April 2006. A thirst for knowledge. Guardian Unlimited. http://technology.guardian.co.uk/weekly/story/0,,1752257,00.html
12. Editorial (14 December 2005) Wiki’s wild world. Nature (doi:10.1038/438890a) http://www.nature.com/nature/journal/v438/n7070/full/438890a.html

(top)

*P2P Networks 4 Science PDF


A peer-to-peer (or P2P) computer network is based on the concept of pooling computing power and bandwidth of the participants in the network, eliminating the distinction between servers and clients. In contrast to the strain that transferring large files to multiple users puts on servers, the advantages of this system are that as the storage space, bandwidth, and computing power of the network’s computers are pooled, the capacity of the network increases as more members join. Such networks are particularly useful for sharing files, and have attracted controversy for copyright violations, most famously involving the Napster network, set up to share music files.

After the legal problems encountered by Napster, many file sharing networks converted to a fully decentralized system intended to prevent broken links from inducing the collapse of the entire network. The best-known example of this type of file-sharing network is Gnutella, which makes use of servers only to connect peers. Other peer networks, such as OpenNap have partially adopted the decentralized model while retaining some centralized features, using servers for search functions. New systems of copyright have also begun to be developed to fit the new concept of file sharing. These include the Creative Commons licenses, designed to cover a range of intellectual property, and the GNU license, which allows users to make modifications to open source computer programs. P2P networks are organized in a variety of ways, but many enforce some type of protocol on members governing the extent of sharing required or rating members on their sharing record and limiting or extending their download privileges accordingly. An alternative approach is to adapt the P2P model to a closed network for a specific group of people, for example to replace a central office server. An example of this is OnSystems, who sell software enabling the creation of restricted peer networks.

Box 1: Peer-to-Peer Network Resources

The P2P Foundation maintains a wiki-based site, blog and newsletter with up-to-date news on the progress of P2P as well as some interesting reflective pieces concerned with the implications of P2P for Internet governance.

P2P Science a site focusing on the applications of P2P networks to science, containing links to software, blogs and discussion forums and events.

The Science Creative Commons supplies open source licenses designed to allow scientists to share their work while preventing copyright violations

CacheLogic a commercial P2P network solutions providers has some useful background and interesting research into P2P issues, including recent surveys showing a shift towards legitimate usage of P2P networks.

Despite these developments, peer-to-peer networks are still controversial and the division between those who advocate them as a form of liberation from the centralized control of the Internet and those who regard them as facilitating theft of intellectual property. The controversy surrounding copyright has perhaps delayed the application of its principles outside the business of entertainment. However, the efficacy of using file-sharing protocols such as BitTorrent that can speed up the transfer of large data sets, such as those involved in the human genome and phenome projects, is evidently applicable to academic collaboration (1).

P2P networks have already been harnessed to identify drug candidates using projects such as Think, a project begun in 2001 as a collaboration between Oxford University’s Centre for Computational Drug Discovery, the National Foundation for Cancer Research, and two software companies, the aim of which was for a program running as a screen saver to test the binding interactions of proteins against a bank of small-molecule drug candidates (2). This project led to a variety of similar projects, now coordinated by Grid.Org.

As well as the large data sets involved, bioinformatics a discipline in which the proliferation of available software to analyse large datasets often makes it difficult for researchers to keep up with the latest tools or to select the most appropriate. Chinock, a P2P project sponsored by Genome Canada, aims to unify access to alignment software and facilitate comparisons. To this send, Chinook is a freely available software designed for self-administered scientific communities of various sizes to run and compare the various programs available for computational biology in a P2P setting. Chinook uses Java client or Perl Engine, but is also available as a web application, Chinook online (3).

Box 2: Peer-to-Peer Software

Open P2P provides free downloads and documentation for a wide range of P2P software and a host of useful links.

Linux P2P provides a list of P2P software compatible with Linux and GNU operating systems.

FolderShare allows the creation of a searchable file-sharing network. Acquired by Microsoft in 2005, it is now available as a freely downloadable plugin.

Google Desktop provides a similar service, although critics have warned that it will compromise user’s privacy, as documents retrieved during searches of local computers will be stored on Google’s own servers to enable the search to be run (4).

AllPeers is soon-to-be released free file-sharing software that has been attracting some attention from the P2P community.

Digital Bicycle is another interesting project in the development stages that plans to combine BitTorrent file sharing protocols with an open-source content management system, Drupal, and RSS syndication.

The utility of file sharing for collating academic information is being increasingly acknowledged. A UN declaration from the Civil Society Science Information Working Group stated that peer-to-peer networks should be promoted as a means ‘to share scientific knowledge and pre-prints and reprints written by scientific authors who have waived their right to payment’.

One of the first attempts to put this declaration into practice on a multi-institutional level comes from LionShare. LionShare is a project of the Pennsylvania State University and is developing its software in collaboration with the open source authentication project, Shibboleth, MIT’s Open Knowledge Project, researchers at Simon Fraser University and the Peer to Peer Working Group. Lionshare includes personal servers to store individuals’ files and networking to support peer-to-peer file sharing. To avoid the obvious disadvantage of P2P networks, that the files available are dependant on the particular users connected at a given time, Lionshare uses ‘peer servers’ to aggregate documents from small and large groups, and to provide a persistent mirror for files to prevent blockages due to disconnections of personal computers. The project also involves building on Gnutella’s protocol to develop more advanced search functions. The most ambitious part of the project, however, is to facilitate collaboration between separate academic institutions, using Shibboleth to overcome the traditional barrier of differing end-user identities. Unlike most P2P networks, users of LionShare are required to identify themselves, making violations of copyright standards less likely.

The goal of Lionshare’s Connecting Learning Object Repositories parallels previous initiatives, in particular the work of the Scholarly Publishing and Academic Resources Coalition (SPARC), which works to promote the concept of institutional repositories, in which academics are encouraged to archive their scholarly output. Like the concept of institutional repositories, Lionshare’s protocol shifts the burden of archiving and indexing work towards individual members of the institution. Both systems also allow various levels of groups with different authentication privileges and have the eventual aim of enabling access to resources between institutions. LionShare include in their 2004 white paper the proposal that users will be able to access documents stored in repositories as well as those on peer servers and the user’s own computer using a single search function Secure eduSource Communication Layer (ECL). It will be interesting to see how whether the two models can converge, perhaps by evolving short and longer-term storage and retrieval solutions for electronic material.

References

1) Church, G. M. “The Personal Genome Project”. Editorial, Molecular Systems Biology doi:10.1038/msb4100040
Published online: 13 December 2005.

2) MacFarlane, John. “PCs enlisted to cure cancer”. Nature Medicine 7, 517 (2001) doi:10.1038/87813.

3) Montgomery, Stewart et al. "An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs". Nat Meth 2 (8), 563 (Aug 2005).
doi:10.1038/nmeth0805-563

4) Steward, Sid. "FolderShare remote computer search: better privacy than Google Desktop?" Posted to Orielly Weblog, Feb. 15, 2006 09:21 AM

(top)

*Broadening Scientific Horizons with the Semantic Web PDF

Scientists are heavily reliant upon current Web technology for researching topics of interest, retrieving information from databases and ordering of scientific goods. Online versions of journals communicate research in a format that is readily and rapidly accessed, while preprints ensure that the very latest results are made available within weeks of submission.

However, the vast array of Web-based information can be a hindrance, particularly when researching an unfamiliar area. Although search engines provide some guidance, the unwary scientist has no way of identifying from the thousands of possibilities those sites that describe a scientific theory, a case study, or even a link to purchase a particular product. The Web is also currently limited in its ability to integrate information from multiple sites and to search for nontextual information.

What are the advantages of the Semantic Web?

The Semantic Web has been designed to overcome these difficulties and improve communication between people using different terminologies. It aims to enable interactions with multimedia and to enhance collaborative and interdisciplinary work. To do this, it will effectively make the content of the Web machine-friendly so that the power of computers can be harnessed to search, integrate and act on information in a standardised way.

The Semantic Web will use new Web languages to enhance machine comprehension. Currently, most of the Web is written in HTML, which is appropriate for describing and presenting structured text and images but less useful in classifying items or distinguishing discrete pieces of information.

The new languages are based on RDF (Resource Descriptive Framework) and utilise the document-tagging abilities of XML (Extensible Markup Language) and the descriptive properties of OWL (Web Ontology Language), which enables the identification of characteristics such as symmetry and how objects are related to each other. More than a mere filing system, the Semantic Web will classify Web documents according to their content.

The major implication of enhanced meaning is improved retrieval of information by search engines. These employ a number of methods to retrieve data, including searching metatags and html headings: as Brooks (2004) points out, this process often becomes a contest between web authors who wish to see their websites featured high up in a page of search results and the search engine’s goal of providing relevant information. Another method of ranking websites for relevance is based on the number of other sites that are linked to it, creating a page-rank system that works on a similar premise to that used by ISI to rank the impact of scientific papers. Various other systems have developed from linking this concept to the identity of the user: for example, Amazon’s use of recommendations to create personalised pages depending on customers’ profiles. Social bookmarking tools such as Connotea or CiteULike also link users directly with one another through sharing tags added to resources. Yahoo, who recently took over another social bookmarking site, de.li.cious, have already developed a basic semantic search based on a continuum between shopping and research within which websites are categorised.

Currently, the first barrier to the further development of semantic searching is the lack of access of search engines to material held within databases: this means that there is a distinction between documents and data. This problem can be solved by the emerging XML databases, which allow documents held within the databases to be tagged and searched in the same way as data displayed on web pages. This approach also allows for the combination of sources in innovative ways, creating ‘mashups’: new types of services that work by combining two services to achieve innovative results. There is, of course, some resistance to the idea of opening databases to the crawling of search engines, and especially to allowing the use of information and resources with those of competitors. However, as search engines shift tactics towards semantics, the majority of web developers and publishers may have little choice but to follow.


The second major issue in the transition to the semantic web is the translation of current web resources into the new languages. Some have argued that this will be a difficult task: as most current web languages do not contain semantic markers, human intervention will be necessary to attach meaning. However, meaning can be inferred from the structure of certain existing markers, such as Bibtex and some Javascript. The Simile Project is developing ‘RDFizers’,open source tools to translate specific types of information, for example weather reports for US zip codes, into RDF. Efforts like these could ease a potentially troublesome transition between the web of today and its semantic sucessor.

What effect will the Semantic Web have on scientific publishing?

Scientists and medical professionals are already coming to see the advantages of the semantic web for marking up, retrieving, and sharing their work. A coalition, the HCLSIG, has been formed to promote the use of the semantic web to improve scientific communication.

Although XML already enables users to inform others that a document describes, say, an experiment, the terminology of the Semantic Web will enable authors of scientific papers to use markup languages to include such details as the experiment was carried out using a particular protocol, with specific reagents, and that images of the results are visible in a distinct location on the Web. All these terms will relate to each other, in a manner that is understood by the computer, so that improved search engines will readily be able to find them. Using these details, findings can then be grouped together in innovative ways that are more useful than traditional presentation within a journal. For example, BioDASH, a prototype of a drug development dashboard, uses the concept of the ‘therapeutic topic model’ to search for and group together information on the progression, components, molecular biology, and pathway knowledge of a disease.

The final stage of data organisation in the Semantic Web is storing or marking information so that it can be retrieved as required. This is something that can be achieved to some extent by the user, regardless of whether the web information originally accessed is formatted for the semantic web, by using the new plug-in to Firefox, Piggy Bank. Piggy Bank works by extracting or translating web scripts into RDF information and storing this information on the user’s computer. This information can then be retrieved independently of the original context and used in other contexts, for example by using Google Maps to display information. Piggy Bank works with a new service, Semantic Bank, which combines the idea of tagging information with the new web languages.

Semantic web resources

BioDASH: http://www.w3.org/2005/04/swls/BioDash/Demo/

Piggy Bank: http://simile.mit.edu/piggy-bank

RDFizers: http://simile.mit.edu/RDFizers/

Simile Project homepage:
http://simile.mit.edu/

Semantic Bank: http://simile.mit.edu/bank/

Semantic Web for Health Care and Life Sciences Interest Group (HCLSIG): http://www.w3.org/2001/sw/hcls/

World Wide Web Consortium Semantic Web resources: http://www.w3.org/2001/sw/

Conclusions

Researchers will benefit from the closer interdisciplinary links provided by the unifying technology of the Semantic Web. Communicative difficulties will be resolved by the ‘translation’ of terms and concepts, allowing one group of scientists to interact with data produced by another group. Traditional concerns about sharing data could perhaps be allayed by the development of new ways of permanently protecting the copyright of electronic data by including encrypted information and digital ‘signatures’ to verify its authenticity (Berners-Lee et al. 2001). One study (Aleman-Meza et al. 2005) has even suggested the use of Semantic Web data to improvement scientific integrity by excluding conflicts of interest in selecting peer reviewers.


The Semantic Web will provide a more universal and accessible web of resources. Although many scientists are unaware of these developments, their input would undoubtedly channel the evolution of technologies towards their explicit future needs.

References

Aleman-Meza et al. (2005) Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection.[Draft copy available at

http://ebiquity.umbc.edu/_file_directory_/

papers/237.pdf, accessed 23/01/06]

Berners- Lee, T. and Hendler, J. (2001) Scientific publishing on the semantic web. Nature 410: 1023-1024.

[Avaliable at http://www.nature.com/nature/debates/e-access/

Articles/bernerslee.htm]

Berners-Lee, T., Hendeler, J., and Lassila, O. (2001) The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American [Available at http://www.ryerson.ca/~dgrimsha/courses/cps720_02/resources/

Scientific%20American%20The%20Semantic%20Web.htm, accessed 23/01/06]

Brooks, T.A. (2004). The nature of meaning in the Age of Google. Information Research, 9(3) paper 180 [Available at http://InformationR.net/ir/9-3/paper180.html, accessed 23/01/06]

Hendler, J. (2003) Enhanced: Science and the Semantic Web. Science 299: 520-521.

(top)

*The Monster Mash-Up PDF

The term ‘mash-ups’ has come to refer to websites that collate information, presenting it as links on a single page or in a visual form (1). This is achieved by accessing information contained in selected databases via public interfaces, application programming interfaces (APIs), or RSS feeds. Various software is available for building mashups, including some open source (2) and an RSS feed service is already available to provide updates on new mash-ups and rank existing services (3). Mash-ups have so far been deployed to provide content on various topics, including news stories, business directories, and online shopping. A slightly different concept links a database into Google maps to display the data. Some in the scientific community have also begun to embrace these new ways of displaying information.

iSpecies is an experimental resource developed by Roderic Page of the University of Glasgow intended to explore the application of the concept of ‘mash-ups’ to science. As a Nature review by Declan Butler (4) points out, this is part of a general movement towards developing interfaces that provide collation services that includes Google’s maps database, and in scientific information, GenBank and UniProt. Initiatives to make information more readily available have been begun by the World Health Organisation (WHO) and the Global Biodiversity Information Facility (GBIF).

iSpecies already includes an impressive array of species, searchable on their scientific or common names (as a review in ResearchBuzz! (5) points out, these do retrieve different results, so it is best to try both). The search retrieves sequence data from NCBI, images from Yahoo Images, and articles from Google Scholar. Another interesting feature of iSpecies is that it makes use of the Touchgraph feature developed by HubMed (see our review of HubMed). This represents related articles as a spider diagram, nodes labelled with the name of the paper, connected to the central query by their descriptive tags. Clicking on a node will expand the graph to reveal more results that share tags with that node. The full text or abstracts of articles can be accessed via a Google Scholar search. Here, it seems that the inclusion of some of other of HubMed’s features could be useful; for example those features that allow the user to access the article where they have an agreement with the publisher or to order it from libraries. As some of the links to the articles that are retrieved are already broken, another useful feature would be to include the doi and link to a Crossref search to retrieve those articles that have been moved.

Page runs his own Blog on iSpecies (6) with updates on the site and related technology and links to reviews. His suggestions for further development include using EXIF tags on images to contain metadata, which would certainly be a useful innovation for both those wishing to publish images online and those who wish to make use of them, as copyright information would be easily accessible. Some way of widening the image search and then searching within the results could also be useful, as currently the search returns only five fairly generic images of the animal, which would not be of much use to a researcher looking for specific data such as maps of distribution or cellular-level data. This could lead towards ranking results from different search engines or databases according to user criteria: potentially a more useful tools than simply collating tools, given the difference between the results retrieved by different search engines.

Nature has also developed an experimental service to collate data on the avian flu outbreak using Google Earth. A similar service had been pioneered by the Californian Academy of Sciences for mapping the distribution of different species of ants. Such services certainly provide an interesting new way to retrieve and display data. However, Nature’s Declan Butler, who developed the avian flu service also highlights the drawbacks of the system: the data used by Nature to construct the database are not freely available, but had to be requested from different sources and then transferred into the avian flu database (4). (In fact this means that, like the other services that use Google Maps it is not strictly a ‘mash-up’ in the sense of collating more than one service as the sources are collected in the database prior to the user running a search.)

Butler points out that if barriers to accessing data were lifted and it was placed in public databases, the information could be accessed directly by the search functions underpinning the mash-up system and update the map automatically. A slightly hypocritical view coming from Nature, who restrict much of their own information to subscribers, this is nevertheless a drawback of the mash-up concept, as it is easy to understand the concern of many web operators at their content being presented alongside that of rivals. However, the growing popularity of mash-ups could overcome this: as a Business Week article (7) points out, Yahoo capitulated to the use of their traffic data with Google Maps, a service they had initially blocked, and Amazon has gone further, freeing up access to their data as far as possible. The mash-up format certainly has benefits for those who wish their data to be accessed as well as those accessing it and this will allow scope for future expansion.

References

(1) For more information and links see: The Wikipedia definition. URL: http://en.wikipedia.org/wiki/Mashup_

(web_application_hybrid

(accessed 17/01/06).

(2) Open Source Gis. URL: http://www.opensourcegis.org/ (accessed 17/01/06).

(3) Programmable Web (2006). Mash-up Feed.

URL: http://www.mashupfeed.com/

(4) Butler, Declan (2006). Mashups mix data into global service. Nature 439 (5). URL: http://www.nature.com/nature/journal/v439/n7072/

full/439006a.html

(5) ResearchBuzz (10/01/06). iSpecies, Living Thing Roundup Engine . URL: http://www.researchbuzz.org/2006/01/ispecies_

living_thing_roundup.shtml (accessed 17/01/06).

(6) Page, Rod (2006). ISpecies Blog. URL:

http://ispecies.blogspot.com/(accessed 17/01/06).

(7) Hof, Robert D (25/07/05). Mix, Match, And Mutate. Business Week.

(top)

*Will science blast into the blogosphere?PDF


Interactive online sites known as weblogs are one of a number of personal publishing tools that are currently propelling science communication towards uncharted territory. Weblogs allow researchers to share and debate data both before and after publication, and can reach a wider readership than many specialist journals. Blogging is part of a wider movement in which the Internet is changing from an online library into a highly interactive ‘social web’.(1)

The big bang
Despite their relatively recent emergence, weblogs are fast becoming an established part of online culture and their popularity is skyrocketing. During 2004, weblog readership increased by 58% in the United States.(2) By mid-2005, a new weblog was being created every 7.4 seconds(3) and the current total is thought to exceed 20 million sites.(1) A few pioneering scientists are championing personal weblogs as a means to communicate rapidly and collaboratively with a diverse audience, spark debate, promote a sense of unity among researchers and increase general scientific awareness,(4) but the community as a whole is approaching this technology with caution.


The power of weblogs is that millions of individuals can easily publish their ideas, allowing millions more to comment on them.(2) Entries are generally connected to other relevant posts and online resources, and are followed by a ‘comment’ button inviting responses. Weblogs created by individuals or groups can easily be crosslinked to create online communities.(5) RSS feeds can even deliver details of the latest postings direct to subscribers’ home computers and handheld devices.

To blog or not to blog?
So why are there so few scientific bloggers? Individual scientists might be reluctant to jump on the blogging bandwagon for fear of damaging their credibility and career prospects, or being scooped by rivals. Indeed, weblogs are viewed by some as distractions from real work — the online equivalent of coffee-room chatter.(1) Against this background, many scientists are currently blogging anonymously.


Yet scientists who frequent the so-called blogosphere argue that weblogs are a great way of keeping up to date with hot topics in science, and offer a forum for real-time discussion that can run alongside traditional peer-reviewed journals. Some academic researchers are experimenting with weblogs as a way of sharing laboratory data, and more sophisticated tools are becoming available to help organize this information.

The impact on science publishers
Although peer-reviewed publication undoubtedly remains the ‘gold standard’ in scientific communication, blogging has the potential to give a boost to traditional journals. For example, posting research findings on a weblog can give free access to a diverse audience and allow instant feedback.

Several publishers are now beginning to explore the possibilities of this technology. In most cases, weblogs are written by a journal’s staff, with contributions from published authors and experts in the field. The weblog format allows editors and readers to collaborate in discussing hot topics and putting new research into a broader context. In addition, authors can follow up on points that could not be included in their published manuscripts.


The first weblogs to be launched by science publishers appeared in autumn 2004. These included the companion weblog to Science Magazine’s Functional Genomics website, SFGblog, which carries molecular biology and genomics postings. Hot on its heels was The American Journal of Bioethics' companion weblog, which has been credited with allowing the journal to respond faster to public controversies, and influencing the reporting of mainstream media on ethical issues.(1)


This year saw the launch of OUPblog by Oxford University Press, which provides daily commentary on a wide spectrum of subjects, including science. In addition, three Nature Publishing Group weblogs went live in November 2005: Nascent, written by the Web Publishing Team, with the stated mission of “apply[ing] web technologies in new ways that promote the discovery and dissemination of scientific knowledge”; Free Association, written by the editors of Nature Genetics; and Action Potential, written by the editors of Nature Neuroscience (see our New Publications page for a brief review).

The blogosphere and beyond
The academic status of weblog postings remains unclear, and questions abound as to how they should be cited and archived. Implementing a formal peer-review process might help the scientific community to accept weblogs as supplements to traditional forms of academic discourse.
So will the future see a weblog accompanying every published scientific paper, with scientists regularly debating the latest research in real time on journal homepages?(3) Only time will tell. What is clear is that the experiences of the current pioneers are sure to influence whether science weblogs fizzle out or take off with a bang in the coming year.

Blog search engines
Given the proliferation of blogs, it is often hard to keep track of all those that may be relevant to a particular topic. The search engines listed below work on the same principle as the major web searches, ranking blogs according to the number of sites that link to them. some also incorporate the principle of tags that can be added by users. Once you have identified relevant blogs most can be subscribed to, so that updates will appear in your RSS reader.

*BlogPulse: This search engine focusese on ‘trends’, unrestricted keywords. Searching on a trend yield a graph mapping the discussion that has taken place around this theme over the period specified in the search query. Clicking on any part of this graph takes the user to the disussion taking place at that point. another useful tool for those concerned about the origins of blog-derived data allows readers to find out more about the identity of the authors of a particular blog.

*Blogz: A search engine that finds posts according to keyword. It also ranks blogs according to popularity.

*Bloglines: Bloglines, an aggregator as well as search engine , also incorporates some of the new social bookmarking technology. By creating a (free) profile, you can choose to share your subscriptions with others or to keep them personal.

*Google Blog Search: Google extends its usual format to blogs, allowing the user to narrow the search down to words in a blog or subject title, as well as to post within a specific date range.

*PubSub: As well as storing blogs in your account, which can be keyworded for retrieval, PubSub offers the option of a sidebar for which the user specifies topics. Links to articles matching these topics then automatically appear in this bar as they are published.

*Technorati: Technorati searches on tags as well as keywords, which can be useful to avoid retrieving throwaway references to your search terms.

References
1. Butler, D. (2005) News Feature. Science in the Web Age: Joint Efforts. Nature 438: 548–549 (doi:10.1038/438548a).
2. Rainie, P. (2005) The State of Blogging. Pew Internet and American Life Project.

[www.pewinternet.org/pdfs/PIP_blogging_data.pdf]
3. Secko, D. (2005) The Power of the Blog. The Scientist 19: 37.
4. Gallagher, R. (2005) On Your Mark, Get Set, Blog! The Scientist 19: 6 [www.the-scientist.com/2005/8/1/6/1]
5. Godwin-Jones, R. (2003) Emerging Technologies. Blogs and Wikis: Environments for On-line Collaboration. Language Learning & Technology 7: 12–16.

(top)

*RSS Feeds The Hunger for Science PDF

The Internet has revolutionized our ability to access the latest science news and research. Yet users can still end up wading through outdated or irrelevant information as they struggle to keep up to date. RSS is one of several new web-based technologies that are now delivering cutting edge science directly to subscribers.

RSS is variously defined as Rich Site Summary, Resource Description Framework Site Summary or Really Simple Syndication. This confusion reflects the fact that RSS is not a single technology, but rather a family of loosely related specifications developed by separate groups.(1) Since its inception in the mid-1990s, RSS has attracted interest as a simple way for users to track changes to their favourite websites.(2) Roughly 5% of Internet users in the United States actively received RSS feeds in 2004,(3) and approximately 4% of Internet users worldwide are currently estimated to make use of this technology.(4)

Box 1. Examples of science publishers providing RSS feeds

RSS feeds provide an automatic snapshot of the current state of a web page direct to subscribers.2 In effect this information is ‘pulled’ towards the user, rather than being ‘pushed’ by e-mail. News and media websites worldwide have been quick to exploit this technology, but the scientific community has lagged behind. However, the past few years have seen a growing uptake of RSS among science publishers, with the distinctive orange icon becoming an increasingly common sight on the home pages of many major journals.(5)


*What are RSS feeds?
Put simply, RSS presents a structure for packaging brief headlines and links, and delivering them directly to the user. This information is then sent out to subscribers as an Extensible Markup Language (XML) file, commonly known as an RSS feed. Most are created according to the current standards, so they can easily be read by the majority of the available software clients, which are known as RSS feed readers or aggregators. These programs automatically check for new content on subscribed sites at user-determined intervals, and provide a consolidated list of updates in a single browser display or desktop application. RSS readers can be downloaded onto users’ home computers and an expanding range of handheld devices, including mobile phones, iPods and personal digital assistants.


*Why use RSS feeds?
The key feature of RSS feeds is that they notify subscribers immediately when material is added to a website of interest, so there is no longer the need to visit numerous websites to check for updates. Although this technology currently tends to run alongside traditional e-mail alerts, RSS feeds have numerous advantages. For example, users can easily subscribe to, and unsubscribe from, RSS feeds without having to log in to a website, so privacy is maintained and there is no password to remember. RSS feeds do not compete with spam e-mails or clog up in-boxes. They are also easy to manage, as RSS feed readers can display information from multiple sources on a single page, and filter and sort it in a variety of user-defined ways.


From a publisher’s perspective, RSS feeds effectively project their presence onto the desktop and beyond. They also make it easier for other websites to link to a journal’s content, as webmasters can automatically embed the latest headlines from RSS feeds into their own web pages.


What do science RSS feeds have to offer?
In addition to the obvious applications, such as alerting users to new content, many leading science publishers (see Box 1 for examples) are now offering syndicated RSS feeds for a range of additional services. These include news highlights, summaries of the latest research, lists of most viewed articles and even postings from science jobs databases.

RSS feeds have also recently begun to be exploited for transmitting scientific data sets. Murray-Rust and colleagues (6) at the Unilever Centre for Molecular Informatics, University of Cambridge, have led the way by creating a metadata-based alerting service for molecular content, known as Chemical Markup Language RSS (CMLRSS). RSS is likely to be increasingly used in this way as researchers become more familiar with the technology.

Future perspectives
RSS feeds can flow into numerous products and services, and are closely associated with additional new technologies, such as weblogs and podcasts, which together are bringing a new immediacy to online science communication.

Although RSS feeds have so far tended to be free of advertisements, they could represent a valuable source of future revenue, particularly for open-access publishers. However, some see advertising as contrary to the basic ‘pull’ nature of RSS feeds, so this is by no means a clear-cut issue.


RSS feeds are just beginning to gain a foothold in science publishing, and it remains to be seen what overall penetration they will achieve. However, as online information continues to proliferate, more and more users can be expected to turn to RSS feeds to provide them with the latest science updates in handy bite-sized pieces.

Box 2: RSS Aggregators


The first step to take when subscribing to feeds is to select an aggregator. Feeds can be delivered in various ways: Desktop aggregators work through software installed on a computer that manages subscriptions. RSS feeds can be displayed either in a list view similarly to emails or using a browser-based interface similar to a web-based RSS server but hosted on the local system. A list of desktop aggregators is available at RSS specifications. These range from freely downloadable open-source aggregators such as FeedReader and Awasu, which simply deliver feeds and alert you to updates, to top-of the-range readers such as Newzcrawler, which includes features such as voice synthesis, and IntraVnews, the state-ofthe-art aggregator that turns Microsoft Outlook into an RSS reader. IntraVnews is free for individual and not-for-profit use. Software available for Macs includes NetNewsWire and for Gnome, Straw.

Web-based aggregators are mostly free, although some, like NewsisFree offer subscription-based premium services. Popular choices are those that run alongside email services such as My Yahoo: or My MSN. Other recommended sites include Bloglines and Newsgator. A third option is to download add-on software to allow you to view feeds in your web browser. An example of this is Ampheta Desk. A list of these types of feeds is also available from RSS specification.

Finally, G3 mobile phones are developing the technology to receive feeds. Mobile RSS readers can be set by the user to check for updates. The more advanced models are linked to a web server, where subscriptions are cached until they are required, then downloaded to the mobile.

For more information read Steve Shaw’s

15 ways to read an RSS feed’or

Wikipedia’s article on aggregators.

References
1. Hammond, T., Hannay, T. & Lund, B. (2004) The Role of RSS in Science Publishing. Syndication and Annotation on the Web. D-Lib Magazine (doi:10.1045/december2004-hammond).
2. Hammond, T. (2003) Why Choose RSS 1.0? [http://www.xml.com/lpt/a/2003/07/23/

rssone.html].
3. Rainie, L. (2005) The State of Blogging. Pew Internet and American Life Project [http://www.pewinternet.org/pdfs/PIP_

blogging_data.pdf].
4. Grossnickle, J. (2005) RSS—Crossing into the Mainstream [http://publisher.yahoo.com/rss/RSS_

whitePaper1004.pdf].
5. Science and Engineering Library. RSS Feeds for Science and Engineering – Journals [http://scilib.ucsd.edu/webfeeds/journals.html].
6. Murray-Rust, P., Rzepa, H. S., Williamson, M. J. & Willighagen, E. L. (2004) Chemical Markup, XML, and the World Wide Web. 5. Applications of Chemical Metadata in RSS Aggregators. Journal of Chemical Information and Computer Sciences 44: 462–429.

(top)

*Podcasts PDF

Podcasting is a means of distributing audiovisual material on a subscription basis via the Internet that has rapidly gained popularity since 2004. The system employs RSS ‘feeds’, to which users subscribe using an aggregator or ‘podcatcher’ such as iTunes. Material is usually delivered in MP3 format and can be played on any digital audio player or computer. It is possible to subscribe to a podcast by pasting the URL into your aggregator, but most can be found by running a search within the aggregator. iTunes allows users to build up a library of subscriptions, in which new episodes will appear as they are released. Many services are free and those that are subscription-based offer various rates for institutions and individuals.Podcasting is beginning to take off as a means of scientific communication. A handful of journals, including Nature, have launched their own services, there are a few quality services delivering scientific news and discussion, and some commercial use of the technology in medicine. Medicine was perhaps the first discipline to realise the potential applications of podcasting. A commercial service, McGraw-Hills’ Access Medicine delivers updates and advice from top physicians based on individual or institutional subscriptions. A free alternative is provided by Critical Care Medicine.The New England Journal of Medicine has also caught on to the podcast movement. Beginning in July with Dr Kenneth Arrow on making antimalarials available in Africa, the NEJM has used podcasting to make available interviews with key figures on subjects such as the ethics of drug patenting, the improvement of emergency medical care after the London bombings and, most recently ‘genetic discrimination’. The approach of using podcasts to examine ethical issues has also been taken by some science news podcasts. One of the best of these is Science Friday , which uses its service to discuss issues such as intelligent design, stem cell research, and the US response to Hurricane Katrina. Nature, who began podcasting at the beginning of October, use their service to provide a summary of the highlights of the latest issue of the journal. Hosted by Dr Chris Smith, from Cambridge, the 20-minute weekly shows covers the top stories and papers in the issue with interviews with the corresponding author or the editor responsible for the relevant section of the journal.The American Society for Microbiology has taken another approach again, launching Microbe World Radio a daily 90-minute discussion of microbiological news. The focus of this is again on a fairly basic level: for example, the latest discussion on slime molds ‘they’re here and they’re weird’ is a very popularist take on the ‘blobs’, with a very brief interview and an overview.    IT conversations has some interesting non-technical podcasts. These include interviews with figures at the forefront of the new Internet technology such as Google’s co-founder Sergey Brin, JBoss’ Marc Fleury on using open source models in business, and a member of Yahoo’s research lab discussing their new contextual search facility.Podcasting is a convenient new way to access resources and the short overview, interview or discursive approach that most have taken so far allows easy listening: as one scientist commented on the discussion forum Nobel Intent, they can enliven routine lab tasks such as setting up assays. There is undoubted much room for development of the concept of podcasting. They concept of ‘learncasting’ has already appeared, with audio material supplemented by video and text to achieve academic instruction or support and was the subject of a recent seminar at London University’s ‘London Knowledge Lab’. Applications of ‘learncasting’ to science could include running instructions and demonstrations of methods and author commentary on important papers. For more information and links see:
Wikipedia: http://en.wikipedia.org/wiki/Podcasting

(top)

*Connotea and Social Bookmarking PDF

Connotea is a free online bibliographic tool developed by Nature. Inspired by ‘social bookmarks managers’ such as del.icio.us, Connotea is a resource allowing scientists to collate and share their references online. The site, begun in late 2004, describes itself as ‘experimental’, and its innovative use of the concepts of open source technology, direct linking between references, and recent developments such as geotagging have already won it many followers. It was recently announced that Connotea had been awarded the Association of Learned and Professional Society Publishers (ALPSP) Award for Publishing Innovation. A similar service is provided by Cite-U-Like, and the two services are working together to ensure compatibility.

Registration is quick and simple and does not require you to enter any personal details. Once you have registered, the entry page contains a link to your own site and a list of recent updates, links added by all visitors who have elected to make their choices visible. The abstracts of these can be viewed and the links copied into your own library, which is displayed as a link on the left of the screen.

To create your personal library, drag the appropriate (these vary slightly between Internet Explorer and Safari and FireFox) Connotea ‘bookmarklet into your bookmarks or favourites bar. This link can then be clicked when you are browsing the web to bring up a new window containing a form that should already be filled in which the address and name of the web page. It is compulsory here to add ‘tags’, unrestricted keywords that are used to retrieve references, and you have the option of also adding a personal description or note and specifying whether you are one of the authors of the piece. It is possible to bookmark most web pages in Connotea, but bibliographic information will as yet only be retrieved from Nature, PubMed, Amazon, and a few other sites:


An alternative method of adding an article is to retrieve the Connotea form and add the digital object identifier (doi) for the article. The information should then be retrieved automatically using
CrossRef, the official doi registration point. This function means that it is possible to quickly retrieve the reference for a print article that has an electronic counterpart with a doi.

Once you have added several items to your library they are displayed as a list of links under ‘my library’: clicking on the title will display the original window that you added to Connotea, the authors, PubMed ID, and doi are listed in abbreviated form beneath the title and clicking ‘info’ will give more in-depth information such as the names of all authors. Your tags are listed to the left of the screen where clicking any of them will display those links. When you select the same reference as someone else, that person’s username will appear as a ‘related user’ towards the bottom left of the screen. You can then browse their library. The idea of sharing references provides an interesting alternative to sometimes endless literature searches and can channel your reading in unexpected ways, as other users will take different angles on the same areas.

One of the most useful features of Connotea is the capacity to export the references in ris format to a citation manager program, such Endnote or Reference Manager. This means that you can save references when not working on a computer with such bibliographic software installed and import them for citing later. A possible future improvement would be to provide Connotea with similar functions to these bibliographic programs, allowing users to cite references directly from their libraries, perhaps hyperlinking back to the source or to Connotea.

Connotea is working on this question of integrating other bibliographic systems by offering Open URLs, through which users can access their institutions’ library databases to view online availability or print holdings of references. Additional features include a geotagging function, which allows you to use special tags to associate latitude and longitude coordinates with your articles, and then view their geographical distribution using Google Earth (unfortunately this will not work on Apple Mac computers at present).


Critics of the open source movement should also be appeased by Connotea’s policies, which combine freedom of information with the protection of intellectual property. Although the default option is to make links visible to all users, it is also possible to keep them private. The original source of all articles is acknowledged, and the use of the doi, a valuable means of identifying work on the Internet and protecting it from plagiarism, will hopefully encourage its use to become more widespread. As Connotea is a collection of links rather than hosting hard copies of papers, only references to papers are stored: the full text remaining in the original database and therefore information can only be viewed by an individual where they have access rights granted by the publisher or institution.


In the ever-proliferating world of the Internet, the greatest problem users have is not finding information, but sorting extracting and organising useful data from the mass of possible references. In doing this, the ability to communicate with peers is essential. Connotea provides a useful resource that fulfills both functions and combines new Internet technology with heightened protection of information.

For more information see:

Social Bookmarking Tools (II): A Case Study - Connotea (info). Ben Lund et al.
D-Lib Magazine 11 (4), (Apr 2005) doi:10.1045/april2005-lund. URL:
http://www.dlib.org/dlib/april05/lund/04lund.html. (top)

(top)

 

*HubMed PDF


A new way to search the PubMed database: HubMed labels itself ‘PubMed rewired’ and has been described as ‘PubMed on steroids’ and ‘The Swiss Army knife of PubMed interfaces’. Such labels are apt: the new interface is a simple but powerful research tool. Here we provide a brief guide to some of the HubMed’s innovative features.

The search function appears very basic, with no advanced options. However, it is possible to search on titles, authors, keywords, or the digital object identifier of an article and to limit phrases using “” or expand them using * and to use the Boolean operators AND, NOT and OR. The only obviously lacking function is the ability to limit the search to a particular date range. Features such as PubMed’s journals database are also absent, the search concentrates on retrieving articles, but there are some new and interesting options for storing, managing and exchanging comments on what is retrieved.


The menu when the abstract of an article has been retrieved include linking options for obtaining the full text of the article through the publisher’s website, ordering it through Infotrieve or linking to SFX by Ex Libris
which tells you whether the article is freely available anywhere online and links out to library databases, automatically running a search for the item. The selection of libraries is as yet small, but important, including the Union Catalogue of British and Irish Libraries (COPAC), the Library of Congress, and the Union catalogues of Canada and Sweden. By clicking ‘advanced’ here it is possible to arrange an inter-library loan or to download the citation into Reference Manager, Endnote. ProCite or RefWorks. Citation’ and ‘BibTeX’ provide extra options for downloading the reference into bibliographic software by respectively exporting the citation as a RIS file, or allowing you to view the RIS file in a new window in order to export it to LaTex.

Some innovative options for viewing related articles are also available: TouchGraph’ is a potentially very interesting feature, which displays related articles as a web of links. Clicking on individual nodes expands the graph to show articles that are related to them; thus expanding the graph, and use the ‘info’ button to read abstracts and link back into HubMed. Unfortunately none of the articles I tried this out on appeared to link to anything; indicating that as good as the ideas behind the site are, their development will take time and input from researchers and publishers as well as the sites’ developers.

Perhaps the most interesting feature of HubMed is that it allows users to customise retrieved information by adding ‘tags’ or keywords to the piece, which will be retained. This is also useful when returning to the site from a different computer as it is possible to retrieve tagged papers. It does require login, but this is free and only requires a username and password. ‘References’ is an interesting feature that allows you to post a comment on the article using ‘TrackBack’. This software is designed to promote ongoing peer discussion of articles by allowing readers to post comments on what they have read. To use this feature to initiating an ongoing discussion about the piece, you need to install the TrackBack bookmarklet: a Movable Type bookmarklet. This allows you to keep track of discussions going on across various different sites: you will be sent a ‘ping’ alert each time another reader responds to your comment. A good overview of this type of feature is given by Sam Ruby,

A final addition of note is that of ‘feeds’. A ‘feed’ employs a news-reader or aggregate program, such as NewzCrawler (PC), AmphetaDesk (cross-platform), Radio Userland (PC or Mac), RSS Owl (Windows) or Liferea (Linux). These used XML to search sites for RSS (Rich Site Summary) tags in sites. This means that you can sign up to receive news on a particular topic from any number of different websites, which will all be displayed as links from a single browser window. An article explaining the concept of feeds in more depth is available by J.D. Lasika.

For more information see:
Yensen, J.A.P. (June, 2005). Editorial: Almost Everything About HubMed. Online Journal of Nursing Informatics (OJNI), 9, (2) [Online]. URL: http://eaa-knowledge.com/ojni/ni/9_2/  yensen.htm


(top)

 

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

XMLicon

First Author* feed

Subscribe in:

Subscribe in NewsGator Online

News is Free

Add to My Yahoo! 

Or view more options in:

FeedPass

/* CSS Document */