[PDF][PDF] A Ten-Year Summary of a SOA-based Micro-services Infrastructure for Linguistic Services.

M Büchler, T Eckart, G Franzini, E Franzini - DH, 2017 - dh2017.adho.org
M Büchler, T Eckart, G Franzini, E Franzini
DH, 2017dh2017.adho.org
In the mid 1990s, the Natural Language Processing Group at the University of Leipzig began
work on the Wortschatz project which aims to provide corpora in hundreds of languages and
in different size-normalisations, be that 100K, 300K or 1M sentences. As the resources grew
in size, so did the number of requests for the data. In the early stages of the project a specific
dump was created, parts of which even came with a small user-interface. The database
dump was shared with interested researchers and partners in the business sector. After …
In the mid 1990s, the Natural Language Processing Group at the University of Leipzig began work on the Wortschatz project which aims to provide corpora in hundreds of languages and in different size-normalisations, be that 100K, 300K or 1M sentences. As the resources grew in size, so did the number of requests for the data. In the early stages of the project a specific dump was created, parts of which even came with a small user-interface. The database dump was shared with interested researchers and partners in the business sector. After some time, however, the personnel costs of this kind of collaboration became unsustainable. For this reason, a new plan was put into motion in 2004, consisting of the development of a SOAP-based API-the Leipzig Linguistic Services (LLS)-that enabled any interested person to access the data of the Wortschatz databases in any provided language (Quasthoff et al. 2006, Eckart et al. 2012). Overall 20 services were provided, delivering specific information such as baseform, category classifications, and thesaurus data. The aim of the LLS was to establish a Service Oriented Architecture (SOA) for linguistic resources based on small and atomic micro-services that could be combined by users for particular needs. Users were then not only able to browse through the Wortschatz website, but also to integrate those services with their own existing digital ecosystems. In 2005 these services were made publicly available and by September 2006 all requests were systematically logged. In July 2014 the number of logged requests reached nearly one billion. While at the beginning the use was limited to academia, over time the services were increasingly used by the private and business sectors as well.
dh2017.adho.org
Showing the best result for this search. See all results