- Open Access
A cross disciplinary study of link decay and the effectiveness of mitigation techniques
© Hennessey and Ge; licensee BioMed Central Ltd. 2013
- Published: 9 October 2013
The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication. Researchers create thousands of web sites every year to share software, data and services. These valuable resources tend to disappear over time. The problem has been documented in many subject areas. Our goal is to conduct a cross-disciplinary investigation of the problem and test the effectiveness of existing remedies.
We accessed 14,489 unique web pages found in the abstracts within Thomson Reuters' Web of Science citation index that were published between 1996 and 2010 and found that the median lifespan of these web pages was 9.3 years with 62% of them being archived. Survival analysis and logistic regression were used to find significant predictors of URL lifespan. The availability of a web page is most dependent on the time it is published and the top-level domain names. Similar statistical analysis revealed biases in current solutions: the Internet Archive favors web pages with fewer layers in the Universal Resource Locator (URL) while WebCite is significantly influenced by the source of publication. We also created a prototype for a process to submit web pages to the archives and increased coverage of our list of scientific webpages in the Internet Archive and WebCite by 22% and 255%, respectively.
Our results show that link decay continues to be a problem across different disciplines and that current solutions for static web pages are helping and can be improved.
- Optical Character Recognition
- Universal Resource Locator
- Internet Archive
- Naming Authority
- Survival Regression Model
Link decay has been studied for several years in specific subject areas.
Year(s) of URLs
Biology & Medicine
Science curriculum web links
Full text of 3 dermatology journals
Sample of bibliographies being published on PubMed
References made in the Annals of Emergency Medicine
2000, 2003, 2005
References in 5 biomedical informatics journals.
MEDLINE titles & abstracts
Internet citations in 5 health care management journals from 2002-2004
Citations appearing in research articles in 6 leading communications journals
URLs appearing in the full text of 4 Ecological Society of America journals
Samples from a collection of born-digital law- and policy-related reports and documents
Citations appearing in 3 leading Information Science journals
Sample of citations appearing in library and information science journals
URLs appearing in the full text of 2 well-respected historical journals
Citations from articles in the Chinese Social Sciences Index
Random Collection of web URLs
Citations in 3 highly circulated journals
Supplementary information published in 6 top-cited journals
Citations from conference articles
Some solutions have been proposed which attack the problem from different angles. The Internet Archive (IA)  and WebCite (WC)  address the issue by archiving web pages, though their mechanisms for acquiring those pages differ. The IA, beginning from a partnership with the Alexa search engine, employs an algorithm that crawls the Internet at large, storing snapshots of pages it encounters along the way. In contrast, WebCite archives only those pages which are submitted to it, and it is geared toward the scientific community. These two methods, however, can only capture information that is visible from the client. Logic and data housed on the server are not frequently available.
Other tools, like the Digital Object Identifier (DOI) System  and Persistent Uniform Resource Locator (PURL) , provide solutions for when a web resource is moved to a different URL but is still available. The DOI System was created by an international consortium of organizations wishing to assign unique identifiers to items such as movies, television shows, books, journal articles, web sites and data sets. It encompasses several thousand "Naming Authorities" organized under a few "Registration Agencies" that have a lot of flexibility in their business models. Perhaps 30-60% of link rot could be solved using DOIs and PURLs[11, 12]. However they are not without pitfalls. One is that a researcher or company could stop caring about a particular tool for various reasons and thus not be interested in updating its permanent identifier. Another is that the one wanting the permanent URL (the publishing author) is frequently not the same as the person administering the site itself over the long term, thus we have an imbalance of desire vs. responsibilities between the two parties. A third in the case of the DOI System is that there may be a cost in terms of money and time associated with registering their organization that could be prohibitive to authors that don't already have access to a Naming Authority. One example of a DOI System business model would be that of the California Digital Library's EZID service, which charges a flat rate (currently $2,500 for a research institution) for up to 1 million DOIs per year.
In this study, we ask two questions: what are the problem's characteristics in scientific literature as a whole and how is it being addressed? To assess progress in combating the problem, we evaluate the effectiveness of the two most prevalent preservation engines: and examine the effectiveness of one prototyped solution. If a URL is published in the abstract, it is assumed that the URL plays a prominent role within that paper, similar to the rationale proposed by Wren .
Comparison of certain statistics based on the subject of a given URL.
# Alive (%)
Median Survival with 95% CI in years
Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Biochemical Research Methods
Mathematical & Computational Biology
Genetics & Heredity
Statistics & Probability
Astronomy & Astrophysics
How common are published, scholarly online resources? For WOS, both the percentage of published items which contained a URL as well as their absolute number increased steadily since 1996 as seen in Figure 1. Simple linear fits showed the former's annual increase at a conservative 0.010 % per year with an R2 of 0.98 while the latter's increase was 174 papers with an R2 of 0.97.
A total of 189 (167 unique) DOI URLs were identified, consisting of 1% of the total, while 9 PURLs (8 unique) were identified. Due to cost, it is likely that DOIs will remain useful for tracking commercially published content though not the scholarly online items independent of those publishers.
Results of fitting a parametric survival regression using the logistic distribution to the unique URLs.
Log2(TimesCited + 1)
Funding text present
Comp. Physics Comm.
Nucleic Acids Research
Predictors of availability
For live web availability, the most deviance was explained by the last year a URL was published (42%) followed by the domain (26%). That these two predictors are very important agrees with much of the published literature thus far. For the Internet Archive, by far the most important predictor was the URL depth at 45%. Based on this, it stands to reason that the Internet Archive either prefers more popular URLs which happen to be at lower depths or employs an algorithm that prioritizes breadth over depth. Similar to the IA, WC had a single predictor that accounted for much of the explained deviance, with the publishing journal representing 49% of the explained deviance. This may reflect WC's efforts to work with publishers as the model shows one of the announced early adopters, BioMed Central , as having the two measured journals (BMC Bioinformatics and BMC Genomics) with the highest retention rates. Therefore, WC is biased towards a publication's source (journals).
Archive site performance
Another way to measure the effectiveness of the current solutions to link decay is to look at the number of "saved" URLs, or those missing ones that are available through archival engines. Out of the 31% of URLs (33% of the unique) which were not accessible on the live web, 49% of them (47% of the unique) were available in one of the two engines, with IA having 47% (46% unique) and WC having 7% (6% unique). WC's comparatively lower performance can likely be attributed to a combination of its requirement for human interaction and its still-growing adoption.
In order to address the discrepancy, all sites that were still active but not archived were submitted to the engine(s) from which they were missing. Using the information gleaned from probing the sites as well as the archives, URLs missing from one or both of the archives, yet still alive, were submitted programmatically. This included submitting 2,662 to the Wayback Machine as well as 7,477 to WebCite, of which 2,080 and 6,348 were successful, respectively.
Submission of missing URLs to archives
Archiving missing URLs in each of the archival engines had their own special nuances. For the Internet Archive, the lack of a practical documented way of submitting URLs (see http://faq.web.archive.org/my-sites-not-archived-how-can-i-add-it/) necessitated trusting a message shown by the Wayback Machine when one finds a URL that isn't archived and clicks the "Latest" button. In this instance, the user is sent to the URL "http://liveweb.archive.org/<url>" which has a banner proclaiming that the page "will become part of the permanent archive in the next few months". Interestingly, as witnessed by requests for a web page hosted on a server for which the authors could monitor the logs, only those items requested by the client were downloaded. This meant that if only a page's text were fetched, supporting items such as images and CSS files would not be archived. To archive the supporting items and avoid duplicating work, wget's "--page-requisites" option was used instead of a custom parser.
WebCite has an easy-to-use API for submitting URLs, though limitations during the submission of our dataset presented some issues. The biggest issue was WebCite's abuse detection process, which would flag the robot after it had made a certain number of requests. To account for this and be generally nice users, we added logic to ensure a minimum delay between archival requests submitted to both the IA and WC. Exponential delay logic was implemented for WC when encountering general timeouts, other failures (like mysql error messages) or the abuse logic. Eventually, we learned that certain URLs would cause WC's crawler to timeout indefinitely, requiring the implementation of a maximum retry count (and a failure status) if the error wasn't caused by the abuse logic.
To estimate what impact we had on the archives' coverage of the study URLs, we compared a URL survey done directly prior to our submission process to one done afterwards; a period of about 3.5 months. It was assumed that the contribution due to unrelated processes would not be very large given that there was only a modest increase in coverage, 5% for IA and 1% for WC, over the previous period of just under a year and a half.
Each of the two archival engines had interesting behaviors which required gauging successful submission of a URL by whether it was archived as of a subsequent survey rather than using the statuses returned by the engines. For the Internet Archive, it was discovered that an error didn't always indicate failure, as there were 872 URLs for which wget returned an error but which were successfully archived. Conversely, WebCite returned an asynchronous status, such that even in the case of a successful return the URL might fail archival; the case in 955 out of a total of 7,285.
Submitting the 2662 URLs to IA took a little less than a day, whereas submitting 7285 to WC took over 2 months. This likely reflects IA's large server capacity, funding and platform maturity due to its age.
Generating the list of unique URLs
Converting some of the potential predictors from the list of published URLs to the list of unique URLs presented some unique issues. In particular, while converting those based on the URL itself (domain, depth, whether alive or in an archive) were straightforward, those which depended upon a publishing article (number of times URL was published, the number of times an article was cited, publishing journal, whether there was funding text) were estimated by collating the data from each publishing. Only a small amount, 8%, of the unique URLs, appeared more than once, and among the measured variables that pertained to the publishing there was not a large amount of variety. Amongst repeatedly-published URLs, 43% appeared in only one journal and the presence of funding text was the same 76% of the time. For calculating the number of times a paper was published, multiple appearances of a URL within a given title/abstract were counted as one. Thus, while efforts were made to provide a representative collated value where appropriate, it's expected that different methods would not have produced significantly different results.
Additional sources of error
Even though WOS's index appears to have better quality Optical Character Recognition (OCR) than PubMed, it still has OCR artifacts. To compensate for this, the URL extraction script tried to use some heuristics to detect the most common sources of error and correct them. Some of the biggest sources of error were: randomly inserted spaces in URLs, "similar to" being substituted for the tilde character, periods being replaced with commas and extra punctuation being appended to the URL (sometimes due to the logic added to address the first issue).
Likely the largest contributors to false negatives are errors in OCR and the attempts to compensate for them. In assessing the effectiveness of our submissions to IA, it is possible that the estimate could be understated due to URLs that had been submitted but not yet made available within the Wayback Machine.
Dynamic websites with interactive content, if only present via an archiving engine, would be a source of false positives, as the person accessing the resource would presumably want to use it as opposed to viewing the design work of its landing page. If a published web site goes away and another installed in its place (especially true if a .com or .net domain is allowed to expire), then the program will not be able to tell the difference since it will see a valid (though impertinent) web site. In addition, though page contents can change and lose relevance from their original use, dates of archival were not compared to the publication date.
Another source of false positive error would be uncaught OCR artifacts that insert spaces within URLs if it truncated the path but left the correct host intact. The result would be a higher probability that the URL would appear as a higher level index page, which are generally more likely to function than pages at lower levels [11, 12].
Web of Science was chosen because, compared to PubMed, it was more cross-sectional and had better OCR quality based on a small sampling. Many of the other evaluation criteria were similar between PubMed and WOS, as both contain scholarly work and have an interface to download bibliographic data. Interestingly, due to the continued presence of OCR issues in newer articles, it appears that bibliographic information for some journals is not yet passed electronically.
Based on the data gathered in this and other studies, it is apparent that there is still a problem with irretrievable scholarly research on the Internet. We found that roughly 50% of URLs published 11 years prior to the survey (in 2000) are still left standing. Interesting is that the rate of decay for late-published URLs (within the past 11 years) appears to be higher than that for the older ones, lending credence to what Koehler suggested about eventual decay rate stabilization. Survival rates for living URLs published between 1996 and 1999, inclusive, only vary by 2.4% (1.5% for unique) and have poor linear fits (R2 of .51 and .18 for unique), whereas years [2000, 2010] have linear slope 0.031 and R2 .90 (.036 and R2 .95 for unique URLs using the first published year) indicating that the availability between years for older URLs is much more stable whereas the availability for more recent online resources follow a linear trend with a predictable loss rate. Overall, 84% of URLs (82% of the unique) were available in some manner: either via the web, IA or WC.
To address the control issue for redirection solutions (DOI, PURL) mentioned in the introduction, those who administer cited tools could begin to maintain and publish a permanent URL on the web site itself. Perhaps an even more radical step would be for either these existing tools or some new tool to take a Wikipedia approach and allow end-users to update and search a database of permanent URLs. Considering the studies that have shown around at least 30% of dead URLs to be locatable using web search engines [3, 18], such a peer-maintained system could be effective and efficient, though spam could be an issue if not properly addressed.
For dynamic websites, the current solutions are more technically involved, potentially expensive and less feasible. These include mirroring (hosting a website on another server, possibly at another institution) and providing access to the source code, both of which require time and effort. Once the source is acquired, it can sometimes take considerable expertise to make use of it as there may be complex libraries or framework configuration, local assumptions hard-coded into the software or it could be written for a different platform (GPU, Unix, Windows, etc.). The efforts to have reproducible research, where the underlying logic and data behind the results of a publication are made available to the greater community, have stated many of the same requirements as preserving dynamic websites [19, 20]. Innovation in this area could thus have multiple benefits beyond just the archival.
Data preparation and analysis
The then-current year (2011) was excluded to eliminate bias from certain journals being indexed sooner than others. For analysis and statistical modeling, the R program  and its "survival" library  were used (scripts included in Additional file 1).
Wherever possible, statistics are presented in 2 forms: one representing the raw list of URLs extracted from abstracts and the other representing a deduplicated set of those URLs. The former is most appropriate when thinking about what a researcher would encounter when trying to use a published URL in an article of interest and also serves as a way to give weight to multiply-published URLs. The latter is more appropriate when contemplating scholarly URLs as a whole or when using statistical models that assume independence between samples.
URLs not the goal of this study such as journal promotions and invalid URLs were excluded using computational methods as much as possible in order to minimize subjective bias. The first method, removing 943 (26 unique), looked for identical URLs which comprised a large percentage of a journal's published collection within a given year. Upon manual examination, a decision was then made whether to eliminate them. The second method, which identified 18 invalid URLs (all unique), consisted of checking for WebCitation's "UnexpectedXML" error. These URLs were corrupted to the point that they interfered with XML interpretation of the request due either to an error in our parsing or the OCR.
DOI sites were identified by virtue of containing "http://dx.doi.org". PURL sites were identified by virtue of containing "http://purl." in the URL. Interestingly, 3 PURL servers were identified through this mechanism: http://purl.oclc.org, http://purl.org and http://purl.access.gpo.gov.
To make for results more comparable to prior work as well as easier to interpret analysis, a URL was considered available if it successfully responded to at least 90% of the requests and unavailable if less than that. This method is similar to the method used by Wren, and differs from Ducut's by not using a "variable availability" category defined as being available > 0% and < 90% of the time. Our results show that 466 unique URLs (3.2%) would have been in this middle category, a number quite similar to what Wren's and Ducut's would have been (3.4% and 3.2%, respectively). Being such a small percentage of the total, their treatment is not likely to affect analysis much regardless of how they are interpreted. Having binary data also eases interpretation of the statistical models. In addition, due to the low URL counts for 1994 (3) and 1995 (22), these years were excluded from analysis.
Survival analysis was chosen to analyze living URLs due to its natural fit; like people, URLs have lifetimes and we are interested in discussing them, what causes them to be longer or shorter and by how much. Lifetimes were calculated by assuming URLs were alive each time they were published, which is a potential source of error . Data was coded as either right or left-censored; right-censored since living URLs presumably would die at an unknown time in the future and left-censored because it was unknown when a non-responding URL had died. Ages were coded in months rather than years in order to increase accuracy and precision.
Parametric survival regression models were constructed using R's survreg(). In selecting the distribution to use, all of those available were tried, with the logistical showing the best overall fit based on Akaike Information Criterion (AIC) score. Better fits for two of the numeric predictors (number of citations to a publishing paper and number of times a URL was published) were obtained by taking the base 2 logarithm. Collinearity was checked by calculating the variance inflation factor against a logistic regression fit to the web outcome variable. Overall lifetime estimates were made using the survfit() function from R's survival library.
Extracting and testing URLs
To prepare a list of URLs (and their associated data), a collection of bibliographic data was compiled by searching WOS for "http" in the title or abstract, downloading the results (500 at a time), then finally collating them into a single file. A custom program (extract_urls.py in Additional file 1) was then used to extract the URLs and associated metadata from these, after which 5 positive and 2 negative controls were added. A particular URL was only included once per paper.
With the extracted URLs in hand, another custom program (check_urls_web.py in Additional file 1) was used to test the availability of the URLs 3 times a day over the course of 30 days, starting April 16, 2011. These times were generated randomly by scheduler.py (included in Additional file 1), the algorithm guaranteeing that no consecutive runs were closer than 2 hours. A given URL was only visited once per run even if it was published multiple times, saving load on the server and speeding up the total runtime (which averaged about 25 minutes due to use of parallelism). Failure was viewed as anything that caused an exception in python's "urllib2" package (which includes error statuses, like 404), with the exception reason being recorded for later analysis.
While investigating some of the failed fetches, a curious thing was noted: there were URLs that would consistently work with a web browser but not with the Python program or other command line downloaders like wget. After some investigation, it was realized that the web server was denying access to unrecognized User Agent strings. In response, the Python program adopted the User Agent of a regular browser and subsequently reduced the number of failed URLs.
At the end of the live web testing period, a custom program (check_urls_archived.py in Additional file 1) was used to programmatically query the archive engines on May 23, 2011. For the Internet Archive's Wayback Machine, this was done using an HTTP HEAD request (which saves resources vs. GET) on the URL formed by "http://web.archive.org/web/*/" + <the url>. Status was judged by the resulting HTTP status code with 200 meaning success, 404 meaning not archived, 403 signifying a page blocked due to robots.txt and 503 meaning that the server was too busy. Because there were a number of these 503 codes, the script would make up to 4 attempts to access the URL, with increasing back off delays to keep from overloading IA's servers. The end result still contained 18, which were counted as not archived for analysis. For WebCite, the documented API was used. This supports returning XML, a format very suitable to automated parsing . For sites containing multiple statuses, any successful archiving was taken as a success.
The authors would like to thank the South Dakota State University departments of Mathematics & Statistics and Biology & Microbiology for their valuable feedback.
Publication of this article was funded by the National Institutes of Health [GM083226 to SXG].
This article has been published as part of BMC Bioinformatics Volume 14 Supplement 14, 2013: Proceedings of the Tenth Annual MCBIOS Conference. Discovery in a sea of data. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/14/S14.
- Ducut E, Liu F, Fontelo P: An update on Uniform Resource Locator (URL) decay in MEDLINE abstracts and measures for its mitigation. BMC Med Inform Decis Mak. 2008, 8:-PubMed CentralView ArticlePubMedGoogle Scholar
- Aronsky D, Madani S, Carnevale RJ, Duda S, Feyder MT: The prevalence and inaccessibility of Internet references in the biomedical literature at the time of publication. J Am Med Inform Assn. 2007, 14: 232-234. 10.1197/jamia.M2243.View ArticleGoogle Scholar
- Wren JD: URL decay in MEDLINE - a 4-year follow-up study. Bioinformatics. 2008, 24: 1381-1385. 10.1093/bioinformatics/btn127.View ArticlePubMedGoogle Scholar
- Wren JD: 404 not found: the stability and persistence of URLs published in MEDLINE. Bioinformatics. 2004, 20: 668-U208. 10.1093/bioinformatics/btg465.View ArticlePubMedGoogle Scholar
- Yang SL, Qiu JP, Xiong ZY: An empirical study on the utilization of web academic resources in humanities and social sciences based on web citations. Scientometrics. 2010, 84: 1-19. 10.1007/s11192-009-0142-7.View ArticleGoogle Scholar
- The Internet Archive. [http://www.archive.org/web/web.php]
- Eysenbach G, Trudell M: Going, going, still there: Using the WebCite service to permanently archive cited web pages. Journal of Medical Internet Research. 2005, 7: 2-6. 10.2196/jmir.7.1.e2.View ArticleGoogle Scholar
- The DOI System. [http://www.doi.org/]
- PURL Home Page. [http://purl.org]
- Key Facts on Digital Object identifier System. [http://www.doi.org/factsheets/DOIKeyFacts.html]
- Wren JD, Johnson KR, Crockett DM, Heilig LF, Schilling LM, Dellavalle RP: Uniform resource locator decay in dermatology journals - Author attitudes and preservation practices. Arch Dermatol. 2006, 142: 1147-1152. 10.1001/archderm.142.9.1147.View ArticlePubMedGoogle Scholar
- Casserly MF, Bird JE: Web citation availability: Analysis and implications for scholarship. College & Research Libraries. 2003, 64: 300-317. 10.5860/crl.64.4.300.View ArticleGoogle Scholar
- EZID: Pricing. [http://n2t.net/ezid/home/pricing]
- Wagner C, Gebremichael MD, Taylor MK, Soltys MJ: Disappearing act: decay of uniform resource locators in health care management journals. J Med Libr Assoc. 2009, 97: 122-130. 10.3163/1536-5050.97.2.009.PubMed CentralView ArticlePubMedGoogle Scholar
- Koehler W: An analysis of Web page and Web site constancy and permanence. J Am Soc Inf Sci. 1999, 50: 162-180. 10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B.View ArticleGoogle Scholar
- Bar-Ilan J, Peritz BC: Evolution, continuity, and disappearance of documents on a specific topic on the web: A longitudinal study of "informetrics". Journal of the American Society for Information Science and Technology. 2004, 55: 980-990. 10.1002/asi.20049.View ArticleGoogle Scholar
- Koehler W: A longitudinal study of Web pages continued: a consideration of document persistence. Information Research-an International Electronic Journal. 2004, 9: --Google Scholar
- Casserly MF, Bird JE: Web citation availability - A follow-up study. Libr Resour Tech Ser. 2008, 52: 42-53. 10.5860/lrts.52n1.42.View ArticleGoogle Scholar
- Peng RD: Reproducible research and Biostatistics. Biostatistics. 2009, 10: 405-408. 10.1093/biostatistics/kxp014.View ArticlePubMedGoogle Scholar
- Ince DC, Hatton L, Graham-Cumming J: The case for open computer programs. Nature. 2012, 482: 485-488. 10.1038/nature10836.View ArticlePubMedGoogle Scholar
- R Development Core Team: R: A Language and Environment for Statistical Computing. Book R: A Language and Environment for Statistical Computing. 2011, City: R Foundation for Statistical ComputingGoogle Scholar
- Therneau T: A Package for Survival Analysis in S. Book A Package for Survival Analysis in S. 2012, City, 2.36-12Google Scholar
- WebCite Technical Background and Best Practices Guide. [http://www.webcitation.org/doc/WebCiteBestPracticesGuide.pdf]
- Markwell J, Brooks DW: "Link rot" limits the usefulness of web-based educational materials in biochemistry and molecular biology. Biochemistry and Molecular Biology Education. 2003, 31: 69-72. 10.1002/bmb.2003.494031010165.View ArticleGoogle Scholar
- Thorp AW, Brown L: Accessibility of internet references in Annals of Emergency Medicine: Is it time to require archiving?. Ann Emerg Med. 2007, 50: 188-192. 10.1016/j.annemergmed.2006.11.019.View ArticlePubMedGoogle Scholar
- Carnevale RJ, Aronsky D: The life and death of URLs in five biomedical informatics journals. International Journal of Medical Informatics. 2007, 76: 269-273. 10.1016/j.ijmedinf.2005.12.001.View ArticlePubMedGoogle Scholar
- Dimitrova DV, Bugeja M: Consider the source: Predictors of online citation permanence in communication journals. Portal-Libraries and the Academy. 2006, 6: 269-283. 10.1353/pla.2006.0032.View ArticleGoogle Scholar
- Duda JJ, Camp RJ: Ecology in the information age: patterns of use and attrition rates of internet-based citations in ESA journals, 1997-2005. Frontiers in Ecology and the Environment. 2008, 6: 145-151. 10.1890/070022.View ArticleGoogle Scholar
- Rhodes S: Breaking Down Link Rot: The Chesapeake Project Legal Information Archive's Examination of URL Stability. Law Library Journal. 2010, 102: 581-597.Google Scholar
- Goh DHL, Ng PK: Link decay in leading information science journals. Journal of the American Society for Information Science and Technology. 2007, 58: 15-24. 10.1002/asi.20513.View ArticleGoogle Scholar
- Russell E, Kane J: The missing link - Assessing the reliability of Internet citations in history journals. Technology and Culture. 2008, 49: 420-429. 10.1353/tech.0.0028.View ArticleGoogle Scholar
- Dellavalle RP, Hester EJ, Heilig LF, Drake AL, Kuntzman JW, Graber M, Schilling LM: Information science - Going, going, gone: Lost Internet references. Science. 2003, 302: 787-788. 10.1126/science.1088234.View ArticlePubMedGoogle Scholar
- Evangelou E, Trikalinos TA, Ioannidis JPA: Unavailability of online supplementary scientific information from articles published in major journals. Faseb Journal. 2005, 19: 1943-1944. 10.1096/fj.05-4784lsf.View ArticlePubMedGoogle Scholar
- Sellitto C: The impact of impermanent web-located citations: A study of 123 scholarly conference publications. Journal of the American Society for Information Science and Technology. 2005, 56: 695-703. 10.1002/asi.20159.View ArticleGoogle Scholar
- Bar-Ilan J, Peritz B: The lifespan of "informetrics" on the Web: An eight year study (1998-2006). Scientometrics. 2009, 79: 7-25. 10.1007/s11192-009-0401-7.View ArticleGoogle Scholar
- Gomes D, Silva MJ: Modelling Information Persistence on the Web. Book Modelling Information Persistence on the Web. 2006, CityGoogle Scholar
- Markwell J, Brooks DW: Evaluating web-based information: Access and accuracy. Journal of Chemical Education. 2008, 85: 458-459. 10.1021/ed085p458.View ArticleGoogle Scholar
- Wu ZQ: An empirical study of the accessibility of web references in two Chinese academic journals. Scientometrics. 2009, 78: 481-503. 10.1007/s11192-007-1951-1.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.