Summarizing ETDs with deep learning

Authors

  • William A. Ingram Virginia Tech
  • Bipasha Banerjee Virginia Tech
  • Edward A. Fox Virginia Tech

DOI:

https://doi.org/10.48798/cadernosbad.2014

Abstract

Inspired by the millions of Electronic Theses and Dissertations (ETDs) openly available online, we describe a novel use of ETDs as data for text summarization. We use a large corpus of ETDs to evaluate techniques for generating abstractive summaries with deep learning. Using an extensive ETD collection of over 30,000 doctoral dissertations and master’s theses, we examine the quality of state-of-the-art deep learning summarization technologies when applied to an ETD corpus. Deep learning requires a large set of training data to produce satisfactory results. Finding suitable training data is especially difficult due to the widespread use of domain-specific jargon in ETDs, coupled with the wide-ranging breadth of subject matter contained in an ETD corpus. To overcome this significant limitation, we demonstrate the potential of transfer learning on automatic summarization of ETD chapters. We apply several combinations of deep learning models and training data to the ETD chapter summarization task and compare the outputs of the top performers.

Downloads

Download data is not yet available.

Author Biography

William A. Ingram, Virginia Tech

Assistant Dean, University Libraries

 

Published

2020-03-31

How to Cite

Ingram, W. A., Banerjee, B., & Fox, E. A. (2020). Summarizing ETDs with deep learning. Cadernos BAD, (1), 46–52. https://doi.org/10.48798/cadernosbad.2014