Deakin University
Browse

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Download (45.03 MB)
Version 4 2024-10-20, 00:14
Version 3 2024-06-14, 06:33
Version 2 2024-06-06, 09:32
Version 1 2019-11-25, 13:42
journal contribution
posted on 2024-10-20, 00:14 authored by Peter Brown, Yaoqi Zhou, Aik Choon Tan, Mohamed A El-Esawi, Thomas Liehr, Oliver Blanck, Douglas P Gladue, Gabriel MF Almeida, Tomislav Cernava, Carlos O Sorzano, Andy WK Yeung, Michael S Engel, Arun R Chandrasekaran, Thilo Muth, Martin S Staege, Swapna V Daulatabad, Darius Widera, Junpeng Zhang, Adrian Meule, Ken Honjo, Olivier Pourret, Cong Cong Yin, Zhongheng Zhang, Marco Cascella, Willy A Flegel, Carl S Goodyear, Mark J van Raaij, Zuzanna Bukowy-Bieryllo, Luca G Campana, Nicholas A Kurniawan, David Lalaouna, Felix J Hüttner, Brooke A Ammerman, Felix Ehret, Paul A Cobine, Ene Choo Tan, Hyemin Han, Wenfeng Xia, Christopher McCrum, Ruud PM Dings, Francesco Marinello, Henrik Nilsson, Brett Nixon, Konstantinos Voskarides, Long Yang, Vincent D Costa, Johan Bengtsson-Palme, William Bradshaw, Dominik G Grimm, Nitin Kumar, Elvis Martis, Daniel Prieto, Sandeep C Sabnis, Said EDR Amer, Alan WC Liew, Paul Perco, Farid Rahimi, Giuseppe Riva, Chongxing Zhang, Hari P Devkota, Koichi Ogami, Zarrin Basharat, Walter Fierz, Robert Siebers, Kok H Tan, Karen A Boehme, Peter Brenneisen, James AL Brown, Brian P Dalrymple, David J Harvey, Grace Ng, Sebastiaan Werten, Mark Bleackley, Zhanwu Dai, Raman Dhariwal, Yael Gelfer, Marcus D Hartmann, Pawel Miotla, Radu Tamaian, Pragashnie Govender, Oliver J Gurney-Champion, Joonas H Kauppila, Xiaolei Zhang, Natalia Echeverría, Santhilal Subhash, Hannes Sallmon, Marco Tofani, Taeok Bae, Oliver Bosch, Páraic O Cuív, Antoine Danchin, Barthelemy Diouf, Tuomas Eerola, Evangelos Evangelou, Fabian Filipp, Hannes Klump, Lukasz Kurgan, Simon S Smith, Olivier Terrier, Neil Tuttle, David B Ascher, Sarath C Janga, Leon N Schulte, Daniel Becker, Christopher Browngardt, Stephen J Bush, Guillaume Gaullier, Kazuki Ide, Clement Meseko, Gijsbert DA Werner, Jan Zaucha, Abd A Al-Farha, Noah F Greenwald, Segun I Popoola, Shaifur Rahman, Jialin Xu, Sunny Y Yang, Noboru Hiroi, Ozgul M Alper, Chris I Baker, Michael Bitzer, George Chacko, Birgit Debrabant, Ray Dixon, Evelyne Forano, Matthew Gilliham, Sarah Kelly, Karl Heinz Klempnauer, Brett A Lidbury, Michael Z Lin, Iseult Lynch, Wujun Ma, Edward W Maibach, Diane E Mather, Kutty S Nandakumar, Robert S Ohgami, Piero Parchi, Patrizio Tressoldi, Yu Xue, Charles Armitage, Pierre Barraud, Stella Chatzitheochari, Luis P Coelho, Jiajie Diao, Andrew C Doxey, Angélique Gobet, Pingzhao Hu, Stefan Kaiser, Kate M Mitchell, Mohamed F Salama, Ivan G Shabalin, Haijun Song, Dejan Stevanovic, Ali Yadollahpour, Erliang Zeng, Katharina Zinke, CG Alimba, Tariku J Beyene, Zehong Cao, Sherwin S Chan, Michael Gatchell, Andreas Kleppe, Marcin Piotrowski, Gonzalo Torga, Adugna A Woldesemayat, Mehmet I Cosacak, Scott Haston, Stephanie A Ross, Richard Williams, Alvin Wong, Matthew K Abramowitz, Andem Effiong, Senhong Lee, Muhammad B Abid, Cyrus Agarabi, Cedric Alaux, Dirk R Albrecht, Gerald J Atkins, Charles R Beck, AMJJ Bonvin, Emer Bourke, Thomas Brand, Ralf J Braun, James A Bull, Pedro Cardoso, Dee Carter, Robin M Delahay, Bernard Ducommun, Pascal HG Duijf, Trevor Epp, Eeva Liisa Eskelinen, Mazyar Fallah, Debora B Farber, Jose Fernandez-Triana, Frank Feyerabend, Tullio Florio, Michael Friebe, Saori Furuta, Mads Gabrielsen, Jens Gruber, Malgorzata Grybos, Qian Han, Michael Heinrich, Heikki Helanterä, Michael Huber, Albert Jeltsch, Fan Jiang, Claire Josse, Giuseppe Jurman, Haruyuki Kamiya, Kim de Keersmaecker, Erik Kristiansson, Frank Erik de Leeuw, Jiuyong Li, Shide Liang, Jose A Lopez-Escamez, Francisco J Lopez-Ruiz, Kevin J Marchbank, Rolf Marschalek, Carmen S Martín, Adriana E Miele, Xavier Montagutelli, Esteban Morcillo, Rosario Nicoletti, Monika Niehof, Ronan O'Toole, Toshihiko Ohtomo, Henrik Oster, Jose Alberto Palma, Russell Paterson, Mark Peifer, Maribel Portilla, MC Portillo, Antonia L Pritchard, Stefan Pusch, Gajendra PS Raghava, Nicola J Roberts, Kehinde Ross, Birgitt Schuele, Kjell Sergeant, Jun Shen, Alessandro Stella, Olga Sukocheva, Vladimir N Uversky, Sven Vanneste, Martin H Villet, Miguel Viveiros, Julia A Vorholt, Christof Weinstock, Masayuki Yamato, Ioannis Zabetakis, Xin Zhao, Andreas Ziegler, Wan M Aizat, Lauren Atlas, Kristina M Bridges, Sayan Chakraborty, Mieke Deschodt, Helena S Domingues, Shabnam S Esfahlani, Sebastian Falk, JL Guisado, Nolan C Kane, Gray Kueberuwa, Colleen L Lau, Dai Liang, Enwu Liu, Andreas M Luu, Chuang Ma, Lisong Ma, Robert Moyer, Adam D Norris, Suresh Panthee, Jerod R Parsons, Yousong Peng, Ines M Pinto, Cristina R Reschke, Elina Sillanpää, Christopher J Stewart, Florian Uhle, Hui Yang, Kai Zhou, Shu Zhu, Mohamed Ashry, Niels Bergsland, Maximilian Berthold, Chang Er Chen, Vito Colella, Maarten Cuypers, Evan A Eskew, Xiao Fan, Maksymilian Gajda, Rayner Gonzálezlez-Prendes, Amie Goodin, Emily B Graham, Ewout JN Groen, Alba Gutiérrez-Sacristán, Mohamad Habes, Enrico Heffler, Daniel B Higginbottom, Thijs Janzen, Jayakumar Jayaraman, Lindsay A Jibb, Stefan Jongen, Timothy Kinyanjui, Rositsa G Koleva-Kolarova, Zhixiu Li, Yu Peng Liu, Bjarte A Lund, Alexandre A Lussier, Liping Ma, Pablo Mier, Matthew D Moore, Katja Nagler, Mark W Orme, James A Pearson, Anilkumar S Prajapati, Yu Saito, Simon E Tröder, Florence Uchendu, Niklas Verloh, Denitza D Voutchkova, Ahmed Abu-Zaid, Joaira Bakkach, Philipp Baumert, Marcos Dono, Jack Hanson, Sandrine Herbelet, Emma Hobbs, Ameya Kulkarni, Narendra Kumar, Siqi Liu, Nikolai D Loft, Tristan Reddan, Thomas Senghore, Howard Vindin, Haotian Xu, Ross Bannon, Branson Chen, Johnny TK Cheung, Jeffrey Cooper, Ashwini K Esnakula, Karine A Feghali, Emilia Ghelardi, Agostino Gnasso, Jeffrey Horbar, Hei M Lai, Jian Li, Lan Ma, Ruiyan Ma, Zihang Pan, Marco A Peres, Raymond Pranata, Esmond Seow, Matthew Sydes, Ines Testoni, Anna L Westermair, Yongliang Yang, Masoud Afnan, Joan Albiol, Lucia G Albuquerque, Shimon Amir, Eisuke Amiya, Rogerio M Amorim, Qianli An, Stig U Andersen, John D Aplin, Christos Argyropoulos, Yan W Asmann, Abdulaziz M Assaeed, Atanas G Atanasov, David A Atchison, Simon V Avery, Paul Avillach, Peter D Baade, Lars Backman, Christophe Badie, Alfonso Baldi, Elizabeth Ball, Olivier Bardot, Adrian G Barnett, Mathias Basner, Jyotsna Batra, OM Bazanova, Andrew Beale, Travis Beddoe, Melanie L Bell, Eugene Berezikov, Sue Berners-Price, Peter Bernhardt, Edward Berry, Theolis B Bessa, Craig Billington, John Birch, Randy D Blakely, Mark AT Blaskovich, Robert Blum, Marleen Boelaert, Dimitrios Bogdanos, Carles Bosch, Thierry Bourgoin, Daniel Bouvard, Laura M Boykin, Graeme Bradley, Daniel Braun, Jeremy Brownlie, Albert Brühl, Austin Burt, Lisa M Butler, Siddappa N Byrareddy, Hugh J Byrne, Stephanie Cabantous, Sara Calatayud, Eva Candal, Kimberly Carlson, Sònia Casillas, Valter Castelvetro, Patrick T Caswell, Giacomo Cavalli, Vaclav Cerovsky, Monica Chagoyen, Chang Shi Chen, Dong F Chen, Hao Chen, Hui Chen, Jui Tung Chen, Yinglong Chen, Changxiu Cheng, Jianlin Cheng, Mai Chinapaw, Christos Chinopoulos, William CS Cho, Lillian Chong, Debashish Chowdhury, Andre Chwalibog, A Ciresi, Shamshad Cockcroft, Ana Conesa, Penny A Cook, David N Cooper, Olivier Coqueret, Enoka M Corea, Antonio Costa, Elisio Costa, Carol Coupland, Stephanie Y Crawford, Aparecido D Cruz, Huijuan Cui, Qiang Cui, David C Culver, Amedeo D'Angiulli, Tanya ES Dahms, France Daigle, Raymond Dalgleish, Håvard E Danielsen, Sébastien Darras, Sean M Davidson, David A Day, Volkan Degirmenci, Luc Demaison, Koenraad Devriendt, Jiandong Ding, Yunus Dogan, XC Dong, Claudio F Donner, Walter Dressick, Christian A Drevon, Huiling Duan, Christian Ducho, Nicolas Dumaz, Bilikere S Dwarakanath, Mark H Ebell, Steffen Eisenhardt, Naser Elkum, Nadja Engel, Timothy B Erickson, Michael Fairhead, Marty J Faville, Marlena S Fejzo, Fernanda Festa, Antonio Feteira, Patrick Flood-Page, John Forsayeth, Simon A Fox, Steven J Franks, Francesca D Frentiu, Mikko J Frilander, Xinmiao Fu, Satoshi Fujita, Ian Galea, Luca Galluzzi, Federica Gani, Arvind P Ganpule, Antonio García-Alix, Kristene Gedye, Maurizio Giordano, Cecilia Giunta, Paul A Gleeson, Cyrille Goarant, Haipeng Gong
Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

History

Journal

Database

Volume

2019

Article number

ARTN baz085

Pagination

1-67

Location

England

Open access

  • Yes

ISSN

1758-0463

eISSN

1758-0463

Language

English

Notes

In Press

Publication classification

C1 Refereed article in a scholarly journal

Publisher

OXFORD UNIV PRESS