PyCon UK 2015

Getting meaning from scientific articles

Eleonore Mayola

The bibliography process means every scientist regularly has to go through a lot of published articles in parallel to her/his research. The aim is 1) to know what other researchers are doing: they might be ahead of you, they might have proven your project is a dead end, 2) get some context to interpret your research results. Using specialised search engines can be inefficient if you don't use the "right" keywords. Researcher also tend to find bibliography boring so it would be interesting to automate part of the process!

In my talk I'll answer the following question: can Python machine learning libraries (nltk, scikit-learn) be used to determine whether a research article is worth reading? I'll use the Natural Language Processing to identify articles topics and train a classifier to distinguish between relevant and non-relevant articles depending and someone's area of research.