08/06/2022

Digital searching, collecting and analysing of texts?

The text on this page was automatically translated and hence may differ from the original. No rights can be derived from this translation.

The KB, the national library of the Netherlands, provides access to historical text collections such as digitized newspapers and books through its online services. The KB identified a gap between user-friendly search services like Delpher and DBNL on one hand, and the service Dataservices for advanced digital research on the other. Dataservices is inaccessible to many users because it requires them to store the requested data themselves and program algorithms for it. At the same time, the KB sees increasing possibilities for digital research on its text collections. Against this backdrop, the KB has commissioned Dialogic to explore whether there is a need among its users for an analysis platform in which multiple collections (from the KB as well as external sources) can be analysed in a so-called "text suite."

To map out how a text suite can support users in their research on (historical) text collections, we have developed a schema of the different research phases and the needs that can arise in each phase based on literature research. This schema is displayed in the figure below. Various potential functional needs were tested through interviews and a survey of 873 users of KB services to determine how a text suite can add value.

We conclude that there is no clear need for advanced features in the Analyse phase. Although this was the starting point of the exploration, interviewees and survey respondents indicate less interest and usage if such features were offered. Three main arguments underlie this. Firstly, due to the great heterogeneity of source material from the KB and beyond, researchers prefer to bring everything together on their own computer for analysis. Alternatively, a text suite enabling the import of sources raises questions about sustainable preservation of compiled collections. Secondly, with rapid developments in quantitative analysis tools in particular, interviewees see it as a risk for the KB to offer tools that quickly become outdated, especially if they are not heavily used to justify extensive efforts for continuous development. Finally, we note that where analysis functionality is offered in existing platforms (e.g., the n-gram viewer in DBNL or frequency analysis in Nederlab), this does not seem to strongly lead to recognition and broad application for new research questions. The latent demand for such functionalities appears limited.

Conversely, we find a clear need for more advanced features in the Discover and Select phases. Therefore, our recommendation to the KB is to position a text suite as a user-friendly tool for users and researchers to make a selection of data that they can export for analysis with their own tools.

You can download the full report from https://doi.org/10.5281/zenodo.6591571. Based on our findings, the KB has decided to develop a service to support advanced capabilities for discovery and selection.

Want to learn more about this research? Contact Max Kemman.