Summarizing ETDs with deep learning

Autores

  • William A. Ingram Virginia Tech
  • Bipasha Banerjee Virginia Tech
  • Edward A. Fox Virginia Tech

DOI:

https://doi.org/10.48798/cadernosbad.2014

Resumo

Inspired by the millions of Electronic Theses and Dissertations (ETDs) openly available online, we describe a novel use of ETDs as data for text summarization. We use a large corpus of ETDs to evaluate techniques for generating abstractive summaries with deep learning. Using an extensive ETD collection of over 30,000 doctoral dissertations and master’s theses, we examine the quality of state-of-the-art deep learning summarization technologies when applied to an ETD corpus. Deep learning requires a large set of training data to produce satisfactory results. Finding suitable training data is especially difficult due to the widespread use of domain-specific jargon in ETDs, coupled with the wide-ranging breadth of subject matter contained in an ETD corpus. To overcome this significant limitation, we demonstrate the potential of transfer learning on automatic summarization of ETD chapters. We apply several combinations of deep learning models and training data to the ETD chapter summarization task and compare the outputs of the top performers.

Downloads

Não há dados estatísticos.

Biografia Autor

William A. Ingram, Virginia Tech

Assistant Dean, University Libraries

 

Downloads

Publicado

31-03-2020

Como Citar

Ingram, W. A., Banerjee, B., & Fox, E. A. (2020). Summarizing ETDs with deep learning. Cadernos BAD, (1), 46–52. https://doi.org/10.48798/cadernosbad.2014

Edição

Secção

Comunicações