Robin's Blog

Encouraging citation of software – introducing CITATION files

Summary: Put a plaintext file named CITATION in the root directory of your code, and put information in it about how to cite your software. Go on, do it now – it’ll only take two minutes!

Software is very important in science – but good software takes time and effort that could be used to do other work instead. I believe that it is important to do this work – but to make it worthwhile, people need to get credit for their work, and in academia that means citations. However, it is often very difficult to find out how to cite a piece of software – sometimes it is hidden away somewhere in the manual or on the web-page, but often it requires sending an email to the author asking them how they want it cited. The effort that this requires means that many people don’t bother to cite the software they use, and thus the authors don’t get the credit that they need. We need to change this, so that software – which underlies a huge amount of important scientific work – gets the recognition it deserves.

As with many things relating to software sustainability in science, the R project does this very well: if you want to find out how to cite the R software itself you simply run the command:

citation()

If you want to find out how to cite a package you simply run:

citation(PROJECTNAME)

For example:

> citation('ggplot2')

To cite ggplot2 in publications, please use:

  H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York,
  2009.

A BibTeX entry for LaTeX users is

  @Book{,
    author = {Hadley Wickham},
    title = {ggplot2: elegant graphics for data analysis},
    publisher = {Springer New York},
    year = {2009},
    isbn = {978-0-387-98140-6},
    url = {http://had.co.nz/ggplot2/book},
  }

In this case the citation was given by the author of the package, in R code, in a file called (surprise, surprise) CITATION inside the package directory. R can even intelligently make up a citation if the author hasn’t provided one (and will intelligently do this far better if you use the person class in your description). Note also that the function provides a nice handy BibTeX entry for those who use LaTeX – making it even easier to use the citation, and thus reducing the effort involved in citing software properly.

I think the R approach is wonderful, but the exact methods are rather specific to R (it is all based around a citEntry object or a bibentry object and the CITATION file contains actual R code). That’s fine for R code, but what about Python code, Ruby code, C code, Perl code and so on… I’d like to suggest a simpler, slightly more flexible approach for use broadly across scientific software:

Create a CITATION file in the root directory of your project and put something in there that tells people how to cite it

In most cases this will probably be some plain-text which gives the citation, and possibly a BibTeX entry for it, but it could be some sort of code (in the language your project uses) which will print out an appropriate citation when run (and, of course, R users should stick to the standard way of writing CITATION files for R packages – this proposal is really for users of other languages). This CITATION file will be one of a number of ALL CAPITALS files in your project’s directory – it will go alongside your README file (you do have one, don’t you?), and your LICENCE file (you must have one – see my explanation for why) and possibly also your INSTALL file and so on.

I know this approach isn’t perfect (machine-readability of citations is a problem using this method, but then again machine readability of citations is a big problem generally…) but I think it is a start and hopefully it’ll reduce the effort required to cite software, and thus encourage software citation.

So, go on – go and write a few CITATION files now!


If you found this post useful, please consider buying me a coffee.
This post originally appeared on Robin's Blog.


Categorised as: Academic, C, GIS, IDL, Programming, Python, R, Remote Sensing


16 Comments

  1. Great idea! Adding for GPULib now…

  2. Yihui says:

    Before everybody starts following the simpler approach you proposed, I’d like to point out that CRAN has done this for us. If a package contains a CITATION file, CRAN will convert it to a human-readable page with BibTeX entries, e.g. http://cran.r-project.org/web/packages/knitr/citation.html

    I very much agree that software citations should gain more attention, so thanks for the post!

  3. Thanks for the nice blog post. Yes, communicating the recommended citation for any kind of software project is a good idea. If the software runs its own web page, then putting the citation information there is also a good idea. (That’s what CRAN does automatically if CITATION files are provided for R packages.)

    A note on the recommended use for R: citEntry() has been completely reimplemented using the newer bibentry() infrastructure. Also, the automatically generated citation can be improved by using person() in the DESCRIPTION. See http://journal.R-project.org/archive/2012-1/RJournal_2012-1_Hornik~et~al.pdf for details.

  4. Robin Wilson says:

    Ah that’s great – I didn’t realise CRAN did that. Unfortunately PyPI, CPAN, Rubygems and various other package hosting sites for other languages don’t do that – as far as I am aware at least – so the simpler approach is likely to be useful for other languages (I’ll update the post to make it more explicit that I wasn’t suggesting the simpler approach for R, but for other languages that don’t have the nice citation functionality that R has).

  5. Robin Wilson says:

    Thanks for the link to the new citation infrastructure for R – I’ll update the post to mention that.

  6. Peter Cock says:

    Any thoughts about if an extension is permitted? e.g. CITATION.txt would work far better than just CITATION under windows (double click would automatically open it in a text editor), while CITATION.md or CITATION.rst would look much better on BitBucket or GitHub using markdown or reStructuredText markup. Currently all of the above seem to be used for README files.

  7. Robin Wilson says:

    Yeah, an extension is definitely permitted! As long as the file is named CITATION then it can be easily identified – so .txt, .md, .rst etc are all great.

  8. […] Mit einem bereits Ende August veröffentlichten und zuletzt im Oktober kommentierten Post in seinem persönlichen Weblog hat Robin Wilson einen konkreten Vorschlag zur Einbindung von sogennanten “CITATION […]

  9. […] use Madagascar in their research and wish to reference it in scientific publications. Following a recommendation of Robin Wilson, a file called CITATION.txt is placed in the top Madagascar directory to provide […]

  10. […] but actually there is a paper on Py6S which has seven citations on Google Scholar. My idea for CITATION files could help them to link up packages and papers – so that’s another reason to add a […]

  11. Hey Robin! So, I’m finally ready to add a CITATION file to my project, but I’m facing the Paradox of Choice: what’s the best way to format it? What have you seen out there? Here are the options I’m contemplating:

    – a valid .tex file with some free-form text at the top using @Comment{}, followed by one or more @article/book/whatever.
    – a plaintext file with a BibTeX entry at the end, that users would have to manually select and copy-paste. (This is my least-preferred option but the one that you seem to advocate.)
    – a plaintext file *referring* to an adjacent .tex file with the bibtex entry. So there would be both a Citation.markdown and Citation.tex.

    Thoughts?

  12. Robin Wilson says:

    Hi – nice to hear from you!

    Great to see you’re going to add a CITATION file 🙂 Of the options you mention, I’ve only ever seen the 2nd one ‘in the wild’ (I did some work with Depsy – http://www.depsy.org – to have a look at how many packages had CITATION files, and what their contents were). Most of them seem to be in plain text format with some explanatory text followed by a BibTeX entry (or, sometimes a formatted citation – which is less good, but still better than nothing).

    I understand why this may be your least-preferred option, but I think it gets a good balance between human and computer usage. It is fairly easy to extract a BibTeX entry from a text file automatically (I did this for the packages on Depsy), and it makes it nice and easy for humans to read too.

    What do you think?

  13. […] proposal for CITATION files for scientific software has gained a remarkable amount of traction, and was cited in the Force […]

  14. […] proposal for CITATION files for scientific software has gained a remarkable amount of traction, and was cited in the Force 11 […]

  15. Katrin Leinweber says:

    In parallel to spreading the use of CITATION files, a logical second step would be ensure that reference managers (EndNote, Mendeley, etc.) also extract the info.

    My impression of using Zotero on GitHub, GitLab and CRAN is that (at best!) a small part of the metadata / citation info is extracted.

    I’d be happy to receive pointers where to find better tools or more newcomer information about this topic. I already read peerj.com/articles/cs-1 😉

Leave a Reply

Your email address will not be published. Required fields are marked *