Creating and Digitizing Language Corpora 2016: Databases for Public Engagement Volume 3
This book unites a range of approaches to the collection and digitization of diverse language corpora. Its specific focus is on best practices identified in the exploitation of these resources in landmark impact initiatives across different parts of the globe. The development of increasingly accessible digital corpora has coincided with improvements in the standards governing the collection, encoding and archiving of 'Big Data'. Less attention has been paid to the importance of developing standards for enriching and preserving other types of corpus data, such as that which captures the nuances of regional dialects, for example. This book takes these best practices another step forward by addressing innovative methods for enhancing and exploiting specialized corpora so that they become accessible to wider audiences beyond the academy.