Current Issue
Preview Issue
Previous Issues
Preview Issue
Previous Issues
- 2024: 18.1
- 2023: 17.4
- 2023: 17.3
- 2023: 17.2
- 2023: 17.1
- 2022: 16.4
- 2022: 16.3
- 2022: 16.2
- 2022: 16.1
- 2021: 15.4
- 2021: 15.3
- 2021: 15.2
- 2021: 15.1
- 2020: 14.4
- 2020: 14.3
- 2020: 14.2
- 2020: 14.1
- 2019: 13.4
- 2019: 13.3
- 2019: 13.2
- 2019: 13.1
- 2018: 12.4
- 2018: 12.3
- 2018: 12.2
- 2018: 12.1
- 2017: 11.4
- 2017: 11.3
- 2017: 11.2
- 2017: 11.1
- 2016: 10.4
- 2016: 10.3
- 2016: 10.2
- 2016: 10.1
- 2015: 9.4
- 2015: 9.3
- 2015: 9.2
- 2015: 9.1
- 2014: 8.4
- 2014: 8.3
- 2014: 8.2
- 2014: 8.1
- 2013: 7.3
- 2013: 7.2
- 2013: 7.1
- 2012: 6.3
- 2012: 6.2
- 2012: 6.1
- 2011: 5.3
- 2011: 5.2
- 2011: 5.1
- 2010: 4.2
- 2010: 4.1
- 2009: 3.4
- 2009: 3.3
- 2009: 3.2
- 2009: 3.1
- 2008: 2.1
- 2007: 1.2
- 2007: 1.1
ISSN 1938-4122
Announcements
DHQ: Digital Humanities Quarterly
2020 14.2
Editorials
[en] Remembering Stéfan Sinclair
DHQ editorial team, Association for Computers and the Humanities
Abstract
[en]
An obituary and remembrance of Stéfan Sinclair, one of the founding editorial team for Digital Humanities Quarterly.
Portuguese Language Special Issue
Editors: Cecily Raynor, Luis Ferla
Front Matter
[pt] Apresentação - Edição especial da DHQ em
português
[en] Introduction: A Portuguese-language Special Issue of DHQ
[en] Introduction: A Portuguese-language Special Issue of DHQ
Luis Ferla, Universidade Federal de São Paulo; Cecily Raynor, Universidade McGill
Abstract
[pt][en]
Após as edições em espanhol e em francês, vem agora à luz a edição especial
em português da Digital Humanities Quarterly. A premissa é a de que as humanidades digitais só podem
realizar plenamente a identidade que gostam de assumir, qual seja, a da
valorização do compartilhamento do conhecimento e da liberdade para
produzi-lo e fazê-lo circular, se efetivamente questionarem as atuais
geopolíticas do mundo acadêmico e científico que disciplinam as práticas das
comunidades envolvidas.
Following the Spanish and French editions, we present the special edition of
Digital Humanities Quarterly in Portuguese. The premise is that the digital humanities can only
fully assume its desired identity, namely that of the valuation of knowledge
sharing and the freedom to produce and circulate that knowledge, if they
effectively question the current geopolitics of the academic and scientific
world which dictate the practices of the communities they encompass.
Articles
[pt] Geovisualização de dados e ciência aberta e
cidadã - a experiência da Plataforma LindaGeo
[en] Data Geovisualization and Open and Citizen Science - the LindaGeo Platform Prototype
[en] Data Geovisualization and Open and Citizen Science - the LindaGeo Platform Prototype
Sarita Albagli, Instituto Brasileiro de Informação em Ciência e Tecnologia (IBICT); Hesley Py, Instituto Brasileiro de Geografia e Estatística (IBGE); Allan Yu Iwama, Universidad de Los Lagos - Centro de Estudios del Desarrollo Regional y Políticas Públicas (CEDER)
Abstract
[pt][en]
O trabalho discute as possibilidades e limites das novas infraestruturas
de geovisualização de dados e informações para o compartilhamento e a
coprodução de conhecimentos, bem como para instrumentalizar a
intervenção social sobre o ordenamento e o desenvolvimento territorial.
Faz uma resenha crítica das principais definições, conceitos-chave e
questões em debate sobre o tema, apresentando em seguida reflexões
derivadas dos resultados do desenvolvimento de um protótipo de
plataforma de dados abertos geoespaciais, como parte de uma
pesquisa-ação de ciência aberta realizada no município de Ubatuba, no
litoral norte do Estado de São Paulo, Brasil.
The paper discusses the possibilities and limits of new infrastructures
of data and information geovisualization for sharing and coproduction of
knowledge, as well as to instrumentalize social intervention on
territorial management and development. It presents a critical review of
the literature, systematizing the key definitions, concepts, and issues
under discussion on the subject. Then it presents reflections derived
from the development of a prototype of a geospatial open data platform
as part of an open science action-research in the municipality of
Ubatuba, on the North coast of the State of São Paulo, Brazil.
[pt] Aproximações ao cenário das humanidades digitais
no Brasil
[en] Approaches to the Digital Humanities Scene in Brazil
[en] Approaches to the Digital Humanities Scene in Brazil
Cláudio José Silva Ribeiro, Universidade Federal do Estado do Rio de Janeiro — Unirio; Suemi Higuchi, Fundação Getulio Vargas; Luis Antonio Coelho Ferla, Universidade Federal de São Paulo
Abstract
[pt][en]
As Humanidades Digitais incorporam os métodos e as questões legadas pelas
ciências humanas e sociais, ao mesmo tempo em que mobilizam as ferramentas e
perspectivas singulares abertas pela tecnologia digital. A partir dessa
concepção geral, o artigo apresenta uma visão panorâmica de algumas das
iniciativas de HDs no Brasil, apontando para o seu potencial de desenvolvimento.
Além disso, o artigo destaca certos princípios norteadores para a área, e
relaciona importantes desafios e oportunidades para o estabelecimento do campo
no país. Como suporte para a análise, o artigo relata a experiência do I
Congresso Internacional em Humanidades Digitais, realizado no Rio de Janeiro, em
abril de 2018.
Digital Humanities incorporate the methods and issues inherited by the human and
social sciences, while mobilizing the tools and perspectives opened by digital
technology. From this general conception, the paper presents a panoramic glance
of some of the Digital Humanities initiatives in Brazil, pointing to their
potential for development. It highlights certain guiding principles for the
area, in addition to relating important challenges and opportunities for the
establishment of the field in the country. In support of the analysis, the
article reports on the experience of the I International Congress on Digital
HumanitiesI, held in Rio de Janeiro, in April 2018.
[pt] Avanços no estudo das redes de itinerários da
Península Ibérica no século XVI. Aplicando os SIGH para estudar a história da
arquitetura
[en] Advances in the Study of 16th-century Road Networks in the Iberian Peninsula: Applying SIGH to the History of Architecture
[en] Advances in the Study of 16th-century Road Networks in the Iberian Peninsula: Applying SIGH to the History of Architecture
Patricia Ferreira-Lopes, Escuela Técnica Superior de Arquitectura de la Universidad de Sevilla
Abstract
[pt][en]
Pesquisas recentes estão demonstrando a potencialidade da aplicação dos Sistemas de
Informação Geográfica (SIG) nos estudos culturais e históricos. Esses estudos
comprovaram os benefícios, a flexibilidade e, também, as dificuldades que o uso dessa
tecnologia da informação traz para a área de humanidades. Nas pesquisas desenvolvidas
no campo da documentação do patrimônio, observou-se que grande parte delas defendem e
afirmam as vantagens para realizar análises com os SIG mas não conseguem alcançá-las
porque ficam restringidas a usar a tecnologia apenas para geolocalizar uma
determinada informação. Esse problema merece especial atenção porque limita a geração
do conhecimento a partir dos dados coletados e tratados. O presente artigo propõe um
exemplo de estudo que contempla a fase de digitalização e análise de uma importante
fonte documental, cartográfica e literária da Espanha: o Repertório de todos os
caminhos de Espanha no ano de graça de 1546. A rede de caminhos existente no século
XVI na Península Ibérica foi um dos principais fatores que favoreceram a consolidação
das atividades comerciais, o fluxo do conhecimento e das inovações técnicas
construtivas nos novos centros urbanos criados. Propomos uma análise do impacto
dessas redes de comunicação, através de mapas temáticos, visualização por
justaposição, cálculo de densidades e cálculo do caminho de menor custo para estudar
as conexões terrestres na Península. O resultado é a criação de um modelo de dados
espacial histórico capaz de inter-relacionar dados alfanuméricos e/ou
físico-geográficos que possibilita uma nova perspectiva e fonte de informação para os
pesquisadores e profissionais de diversas áreas.
Recent research shows the potentiality of applying Geographic Information Systems
(GIS) in cultural and historical studies. These studies have proven the benefits,
flexibility and also the difficulties that the use of this technology brings to the
humanities area. In the researches developed in the field of heritage documentation,
it was observed that most of them support and claim the advantages to carry out
analyses with GIS but cannot reach them because they are restricted to using the
technology only to geolocate certain information. This problem deserves special
attention because it limits the generation of knowledge from the data collected and
processed. This article proposes an example of a study that contemplates the
digitalization and analysis phase of an important source of documentation and
cartography of Spain: The Repertoire of all the ways of Spain in
the year of grace of 1546. The network of roads existing in the 16th
century in the Iberian Peninsula was one of the main factors that helped the
consolidation of commercial activities, the flow of knowledge and the constructive
technical innovations in the new urban centres created. This paper proposes an
analysis of the impact of these communication networks, through thematic maps,
juxtaposition visualization, calculation of densities and calculation of the lowest
cost path to study the terrestrial connections in the Peninsula. The result is a
historical spatial data model capable of interrelating alphanumeric and/or
physical-geographic data that enables a new perspective and source of information for
researchers and professionals in different fields.
[pt] Ler a prosa do mundo hoje
[en] Reading Prose in this Day and Age
[en] Reading Prose in this Day and Age
Maria Clara Paixão de Sousa, Universidade de São Paulo
Abstract
[pt][en]
As tecnologias digitais de difusão da informação transformaram profundamente o
trabalho das Humanidades, que está hoje inscrito na lógica digital
de um modo muito mais profundo do que o rótulo Humanidades
Digitais pode fazer crer – pois não estamos diante do surgimento de
uma nova tendência, nem de uma nova linha de pesquisas, nem de um novo campo de
estudo no interior das Humanidades: estamos de fato diante de outras
Humanidades. Na raiz dessa transformação está uma nova materialidade
do texto, que, como já discutiram Pédauque (2004, 2006, 2007), Crane et al.
(2008), Gradmann & Meister (2008), Chaudiron et al. (2008), Baumann &
Crane (2010), e Paixão de Sousa (2013), modifica a condição histórica do
documento – e, assim, instaura novas formas de leitura e novas
formas de ordenação da leitura. Nesse sentido, sendo tarefas das Humanidades a
interpretação do texto e a organização da sua transmissão social, essa nova
materialidade interpela diretamente os métodos, os horizontes epistemológicos, e
a conformação discursiva do campo. Neste artigo discutirei alguns dos efeitos
desse processo, fundada em uma abordagem conceitual da materialidade do texto
digital já proposta em Paixão de Sousa (2013), e inspirada na reflexão de
Pêcheux (1992[1982]) sobre a leitura do arquivo e nas ideias de
Unsworth (2006) sobre as formas de atenção.
This paper discusses the material conditions of digital text diffusion and its
effects for the Humanities, particularly as regards new forms of reading and new
forms in the ordination of reading. Observing the characteristic dispersion of
text in the digital medium, I investigate the forms of production and ordering
of digital texts from a material perspective, pointing out the inclusion of an
artificial logical stage in the text diffusion process as the fundamental
singularity of this medium (based on proposals already outlined in Paixão de
Sousa, 2017). This contingency allows us to consider digital text diffusion as a
radically new stage in the history of writing and reading, in a similar sense as
pointed out by Pédauque (2004, 2006, 2007); Crane et al. (2008,) Gradmann &
Meister (2008), Chaudiron et al. (2008), and Baumann & Crane (2010). Rested on
the concept of “forms of attention” in Unsworth (2006), I also suggest that, in
shaping new forms of reading, this change in the material character of the text
profoundly alters traditional work in the Humanities, and establishes a new
discursive conformation for the field. Finally, I argue that this new condition
deserves critical consideration in all different areas of the Humanities, but
particularly in those that take the “text” as their central object of
attention.
[pt] Curadoria Digital e Custos – Exploração de
abordagens e perceções
[en] Digital Curation and its Costs: A Study of Practices and Insights
[en] Digital Curation and its Costs: A Study of Practices and Insights
Luís Corujo, Universidade de Lisboa, Faculdade de Letras, Centro de Estudos Clássicos; Jorge Revez, Universidade de Lisboa, Faculdade de Letras, Centro de Estudos Clássicos; Carlos Guardado da Silva, Universidade de Lisboa, Faculdade de Letras, Centro de Estudos Clássicos
Abstract
[pt][en]
Introdução – No âmbito das preocupações que estão na origem da curadoria digital,
toma-se como exemplo o contexto da produção de grandes volumes de informação
científica, que requer abordagens que garantam a sua manutenção, reutilização e
valorização, dado o seu elevado custo.
Objetivos – Pretende-se conhecer o pensamento existente referente aos custos da
curadoria digital e desenvolver uma proposta de modelo de enquadramento para a
análise de custos em curadoria digital, inclusivamente projetos de carácter mais
prático. Tal implica abordar a definição deste conceito e a problemática dos custos,
baseando-nos nos estudos referentes a modelos de custos.
Metodologia - Procedeu-se a uma revisão da literatura, contextualizando a questão dos
custos no seio da curadoria digital, delimitada pelos dados recolhidos nas fontes de
pesquisa, a Biblioteca do Conhecimento Online (B-On) e o
Repositórios Científicos de Acesso Aberto de Portugal
(RCAAP). Em seguida, desenvolveu-se um modelo de enquadramento esquematizado com base
nas categorias identificadas por via do Método da Comparação Constante, algumas
relacionadas com o Modelo de Referência Open Archival
Information System (OAIS) e o ciclo de vida Digital
Curation Centre (DCC). Tal permitiu desenvolver uma análise de conteúdo
das perceções recolhidas em diversos autores, resultando em memorandos, de que este
texto é um resumo.
Resultados/Conclusão – Propõe-se um esquema de sistematização das problemáticas da
curadoria digital, que constitui um modelo de enquadramento que se considera
pertinente para a análise de custos em curadoria digital, inclusivamente projetos de
carácter mais prático, e que interliga a visão do ciclo de vida da curadoria do
objeto digital do DCC e a abordagem do modelo de referência OAIS, numa lógica
transversal apreendida pelos Modelos de Custos e Plano/Políticas de gestão dos dados.
Conclui-se que se deteta uma mudança de paradigma, de uma visão de
black-box para uma abordagem de identificação dos custos e de
tentativa de sistematização de modelos de previsão para utilização institucional,
como forma de incentivar a transparência e a accountability e de captar
o interesse de potenciais financiadores.
Introduction - Regarding the concerns that prompted the emergence of digital
curation, this study takes as an example the context of the production of large
volumes of scientific information, which requires approaches that ensure its
maintenance, reuse and valorization, considering its cost.
Objectives - It is intended to discuss the relevant issues regarding the costs of
digital curation and to develop a framework model proposal for the analysis of costs
in digital curation, including projects of a more practical nature. This means
addressing the concept definition and the issue of costs, based on studies related to
cost models.
Methodology - A literature review was carried out, contextualizing the issue of costs
within digital curation, with data collected from Biblioteca do
Conhecimento Online – B-On (Online Knowledge Library) and Repositórios Científicos de Acesso Aberto de Portugal –
RCAAP (Scientific Open Access Repository of Portugal) as research literature sources.
Then, a framework model scheme was developed based on the categories identified using
the Constant Comparative Method, some related to the Open
Archival Information System (OAIS) Reference Model and to the Digital Curation Centre (DCC) Life Cycle. This allowed the
development of a content analysis of the perceptions collected from different
authors, resulting in memos, of which this text is a summary.
Results / Conclusion - It is proposed a systematization of digital curation issues,
which constitutes a framework model considered relevant for the analysis of costs in
digital curation, including practical projects, which interconnects the DCC Life
Cycle view of the digital object curation to the OAIS Reference Model approach, in a
transversal logic seized by the cost models and plan/data management policies. It is
concluded that a paradigm shift is occurring from a black-box perspective to another
one that identifies costs and attempts to standardise predictive models for
institutional use with the aim of promoting transparency, accountability, and
capturing the interest of potential funding providers.
[pt] Reconstruir histórias da conservação da natureza na
Califórnia: 1850 – 2010
[en] Reconstructing Histories of Nature Conservation in California: 1850-2010
[en] Reconstructing Histories of Nature Conservation in California: 1850-2010
Maria J.Ferreira dos Santos, Department of Geography, University of Zurich
Abstract
[pt][en]
Este artigo descreve a experiência da autora em criar uma plataforma digital para
revelar, levantar hipóteses e transmitir conhecimento histórico sobre a conservação
da natureza na Califórnia desde 1848, estado norte-americano pioneiro nas ideias da
conservação da natureza. A riqueza natural da Califórnia aliada com empreendedorismo,
levaram a que a conservação da natureza tenha estado, desde muito cedo, enraizada na
legislação e na forma de agir do governo do estado e dos seus habitantes. No entanto,
estas ideias têm sido desafiadas por processos intrinsecamente associados ao
progresso, especialmente o crescimento exponencial da população e do tecido urbano, e
a necessidade cada vez maior da implementação de outros usos do território e de
utilização de recursos naturais. A ideia por detrás do projeto é realçar a história
da conservação da natureza no estado, visualizar o seu progresso através do tempo, e
analisar os fatores socio-económicos e ambientais determinantes para o seu desenrolar
durante os últimos 160 anos. A relevância do artigo está em contribuir para o
aprofundamento da discussão decorrente do desafio posto pelo crescimento demográfico
e urbano. Nesse sentido, a plataforma digital proposta, ao coletar, armazenar e
distribuir dados, bem como explorá-los de modo inovador, tem o mérito de estimular a
atividade intelectual e levantar novas hipóteses para enriquecer o nosso conhecimento
multi-disciplinar sobre a intersecção da história com muitas outras disciplinas.
This paper addresses the experience of the author when developing a digital platform
on the history of nature conservation in California from 1848. This digital platform
is designed to showcase, allow exploration and knowledge transfer as well as ask new
questions and place new hypotheses on the history of conservation, for a state
pioneer in these themes. The natural resource values of California along with the
entrepreneurship allowed for an early start of nature conservation in the state.
These ideas of conservation soon were brought into legislation, governmental action
and processes, and population awareness and behavior. However, the development of the
conservation history was not without barriers, as nature conservation has been
challenged by development processes, namely exponential population growth and urban
fabric, and the growing need for land use management that met the needs from
different sectors. The Project herein described is based on the need to showcase the
history of conservation in California, visualize its development over time, and
analyze socio-economic and environmental drivers of its development over the last
160 years. This article is therefore of importance as it contributes to broadening and
deepening the discussion along the challenges brought about by urban and population
development. The digital platform described herein is therefore innovative and
fundamental to further stimulate new questions at the interface of history and many
other disciplines.
Articles
[en] Digital Editions and Version Numbering
Paul A. Broyles, North Carolina State University
Abstract
[en]
Digital editions are easily modified after they are first published — a state of
affairs that poses challenges both for long-term scholarly reference and for
various forms of electronic distribution and analysis. This article argues that
producers of digital editions should assign meaningful version numbers to their
editions and update those version numbers with each change, allowing both humans
and computers to know when resources have been modified and how significant the
changes are. As an examination of versioning practices in the software industry
reveals, version numbers are not neutral descriptors but social products
intended for use in specific contexts, and the producers of digital editions
must consider how version numbers will be used in developing numbering schemes.
It may be beneficial to version different parts of an edition separately, and in
particular to version the data objects or content of an edition independently
from the environment in which it is displayed. The article concludes with a case
study of the development of a versioning policy for the Piers Plowman Electronic Archive, and includes an appendix
surveying how a selection of digital editions handle the problem of recording
and communicating changes.
[en] Crowdsourcing Image Extraction and Annotation:
Software Development and Case Study
Ana Jofre, SUNY Polytechnic; Vincent Berardi, Chapman University; Kathleen P.J. Brennan, University of Queensland; Aisha Cornejo, Chapman University; Carl Bennett, SUNY Polytechnic; John Harlan, SUNY Polytechnic
Abstract
[en]
We describe the development of web-based software that facilitates large-scale,
crowdsourced image extraction and annotation within image-heavy corpora that are
of interest to the digital humanities. An application of this software is then
detailed and evaluated through a case study where it was deployed within Amazon
Mechanical Turk to extract and annotate faces from the archives of Time magazine. Annotation labels included categories
such as age, gender, and race that were subsequently used to train machine
learning models. The systemization of our crowdsourced data collection and
worker quality verification procedures are detailed within this case study. We
outline a data verification methodology that used validation images and required
only two annotations per image to produce high-fidelity data that has comparable
results to methods using five annotations per image. Finally, we provide
instructions for customizing our software to meet the needs for other studies,
with the goal of offering this resource to researchers undertaking the analysis
of objects within other image-heavy archives.
[en] Digital Humanities and Natural Language
Processing: “Je t’aime... Moi non
plus”
Barbara McGillivray, The Alan Turing Institute and University of Cambridge; Thierry Poibeau, CNRS / Ecole normale supérieure — PSL/ Université Sorbonne nouvelle; Pablo Ruiz Fabo, Laboratoire LiLPa, Université de Strasbourg
Abstract
[en]
In spite of the increasingly large textual datasets humanities researchers are
confronted with, and the need for automatic tools to extract information from
them, we observe a lack of communication and diverging goals between the
communities of Natural Language Processing (NLP) and Digital Humanities (DH).
This contrasts with the wealth of potential opportunities that could arise from
closer collaborations. We argue that more efforts are needed to make NLP tools
work for DH datasets so that that NLP research applied to humanities data
receives more attention, leading to the development of evaluation approaches
tailored towards relevant research questions. This has the potential to bring
methodological advances to NLP, while at the same time confronting DH datasets
with powerful state-of-the-art techniques.
[en] Reading Chicago Reading: Quantitative Analysis
of a Repeating Literary Program
John Shanahan, DePaul University; Robin Burke, University of Colorado — Boulder; Ana Lučić, Beckman Institute, University of Illinois
Abstract
[en]
This essay presents quantitative capture and predictive modeling for one of the
largest and longest running mass reading programs of the past two decades: “One
Book One Chicago” (OBOC) sponsored by the Chicago Public Library (CPL). The
Reading Chicago Reading project uses data associated with OBOC as a probe into
city-scale library usage and, by extension, as a window onto contemporary
reading behavior. The first half of the essay explains why CPL’s OBOC program is
conducive for modeling purposes, and the second half documents the creation of
our models, their underlying data, and the results.
[en] Tracking the Consumption Junction: Temporal
Dependencies between Articles and Advertisements in Dutch Newspapers
Melvin Wevers, Digital Humanities Lab, KNAW Humanities Cluster, Netherlands; Jianbo Gao, Center for Geodata and Analysis, Faculty of Geographical Science, Beijing Normal University, Beijing, China; Kirstoffer L. Nielbo, Center for Humanities Computing Aarhus, Aarhus University, Denmark
Abstract
[en]
Historians have regularly debated whether advertisements can be used as a viable source
to study the past. One of their main concerns centered on the question of agency.
Were advertisements a reflection of historical events and societal debates, or were
ad makers instrumental in shaping society and the ways people interacted with
consumer goods? Using techniques from econometrics (Granger causality test) and
complexity science (Adaptive Fractal Analysis), this paper analyzes to what extent
advertisements shaped or reflected society. We found
evidence that indicates a fundamental difference between the dynamic behavior of word
use in articles and advertisements published in a century of Dutch newspapers.
Articles exhibit persistent trends. Contrary to this, advertisements have a more
irregular behavior characterized by short bursts and fast decay, which, in part,
mirrors the dynamic through which advertisers introduced terms into public discourse.
On the issue of whether advertisements shaped or reflected society, we found
particular product types that seemed to be collectively driven by a Granger causality
going from advertisements to articles. Generally, we found support for a complex
interaction pattern, analogous to Cowan’s concept of the consumption junction.
Finally, we discovered noteworthy patterns in terms of Granger causality and
long-range dependencies for specific product groups. All in, this study shows how
methods from econometrics and complexity science can be applied to humanities data to
improve our understanding of complex cultural-historical phenomena such as the role
of advertising in society.
[en] Calamari − A High-Performance Tensorflow-based
Deep Learning Package for Optical Character Recognition
Christoph Wick, Universität Würzburg, Chair of Computer Science VI; Christian Reul, Centre for Philology and Digitality; University of Würzburg; Frank Puppe, Universität Würzburg, Chair of Computer Science VI
Abstract
[en]
Optical Character Recognition (OCR) on contemporary and historical data is still
in the focus of many researchers. Especially historical prints require book
specific trained OCR models to achieve applicable results
. To reduce the human effort for manually annotating
ground truth (GT) various techniques such as voting and pretraining have shown
to be very efficient
. Calamari is a new open source OCR line recognition
software that both uses state-of-the art Deep Neural Networks (DNNs) implemented
in Tensorflow and giving native support for techniques such as pretraining and
voting. The customizable network architectures constructed of Convolutional
Neural Networks (CNNS) and Long-Short-Term-Memory (LSTM) layers are trained by
the so-called Connectionist Temporal Classification (CTC) algorithm of Graves et
al. (2006). Optional usage of a GPU drastically reduces the computation times
for both training and prediction. We use two different datasets to compare the
performance of Calamari to OCRopy, OCRopus3, and Tesseract 4. Calamari reaches a
Character Error Rate (CER) of 0.11% on the UW3 dataset written in modern English
and 0.18% on the DTA19 dataset written in German Fraktur, which considerably
outperforms the results of the existing softwares.
[en] Open Data in Cultural Heritage Institutions: Can
We Be Better Than Data Brokers?
S.L. Ziegler, Louisiana State University Libraries
Abstract
[en]
Treating collections in cultural institutions as data encourages novel approaches
to the use of historic collections. To reframe collections as data is to focus
on how digitized collection material, collection metadata, and transcriptions
can be used and reused for various types of computational analysis. Scholars
active in the field of digital humanities have long taken advantage of
computational data. This paper focuses on the work of cultural heritage
institutions, which are increasingly offering collections as data. This paper
outlines the collections as data project and examines specific examples of
cultural institutions active in this space. The paper then details the practices
of data brokers, and explores how the data broker model can frame the use of
data in cultural heritage institutions. In closing a number of experiments are
described that might help mitigate the harm that data in cultural institutions
might cause. As we create and share data, can we be sure we are better than data
brokers?
[en] A Prosopography as Linked Open Data: Some Implications from DPRR
John Douglas Bradley, King's Digital Lab, King's College London
Abstract
[en]
The Digital Prosopography of the Roman Republic (DPRR) project has created a
freely available structured prosopography of people from the Roman Republic. As
a part of this work the materials that were produced by the project have been
made available as Linked Open Data (LOD): translated into RDF, and served
through an RDF Server. This article explains what it means to present the
material as Linked Open Data by means of working, interactive examples. DPRR
didn't do some of the work which has been conventionally associated with Linked
Open Data. However, by considering the two conceptions of the Semantic Web and
Linked Open Data as proposed by Tim Berners-Lee one can see how DPRR's RDF
Server fits best into the LOD picture, including how it might serve to
facilitate new ways to explore its material. The article gives several examples
of ways of exploiting DPRR's RDF dataset, and other similarly structured
materials, to enable new research approaches.
Author Biographies
URL: http://www.digitalhumanities.org/dhq/vol/14/2/index.html
Comments: [email protected]
Published by: The Alliance of Digital Humanities Organizations and The Association for Computers and the Humanities
Affiliated with: Digital Scholarship in the Humanities
DHQ has been made possible in part by the National Endowment for the Humanities.
Copyright © 2005 -
Unless otherwise noted, the DHQ web site and all DHQ published content are published under a Creative Commons Attribution-NoDerivatives 4.0 International License. Individual articles may carry a more permissive license, as described in the footer for the individual article, and in the article’s metadata.
Comments: [email protected]
Published by: The Alliance of Digital Humanities Organizations and The Association for Computers and the Humanities
Affiliated with: Digital Scholarship in the Humanities
DHQ has been made possible in part by the National Endowment for the Humanities.
Copyright © 2005 -
Unless otherwise noted, the DHQ web site and all DHQ published content are published under a Creative Commons Attribution-NoDerivatives 4.0 International License. Individual articles may carry a more permissive license, as described in the footer for the individual article, and in the article’s metadata.