default search action

combined dblp search
author search
venue search
publication search

ask others

Pedro Ortiz Suarez

Pedro Javier Ortiz Suárez

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c14]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/coling/Palomar-GinerSE24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/coling/Palomar-GinerSE24
Jorge Palomar-Giner, José Javier Saiz, Ferran Espuña, Mario Mina, Severino Da Dalt, Joan Llop, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Aitor Gonzalez-Agirre, Marta Villegas:
A CURATEd CATalog: Rethinking the Extraction of Pretraining Corpora for Mid-Resourced Languages. LREC/COLING 2024: 335-349
[c13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/naacl/AliFTRLLKEDBJWJAJSOWSKF24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/naacl/AliFTRLLKEDBJWJAJSOWSKF24
Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr:
Tokenizer Choice For LLM Training: Negligible or Crucial? NAACL-HLT (Findings) 2024: 3907-3924
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-08707
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-08707
Matthieu Futeral, Armel Zebaze, Pedro Ortiz Suarez, Julien Abadji, Rémi Lacroix, Cordelia Schmid, Rachel Bawden, Benoît Sagot:
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus. CoRR abs/2406.08707 (2024)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2408-04554
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2408-04554
Rasul Dent, Juliette Janes, Thibault Clérice, Pedro Ortiz Suarez, Benoît Sagot:
Molyé: A Corpus-based Approach to Language Contact in Colonial France. CoRR abs/2408.04554 (2024)
2023
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-03915
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-03915
Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Sasko, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Muñoz, Jian Zhu, Daniel van Strien, Zaid Alyafeai, Khalid Almubarak, Minh Chien Vu, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Ifeoluwa Adelani, Long Phan, Hieu Tran, Ian Yu, Suhas Pai, Jenny Chim, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Sasha Luccioni, Yacine Jernite:
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset. CoRR abs/2303.03915 (2023)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2309-10923
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2309-10923
Luca Foppiano, Tomoya Mato, Kensei Terashima, Pedro Ortiz Suarez, Taku Tou, Chikako Sakai, Wei-Sheng Wang, Toshiyuki Amagasa, Yoshihiko Takano, Masashi Ishii:
Semi-automatic staging area for high-quality structured data extraction from scientific literature. CoRR abs/2309.10923 (2023)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2310-08754
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2310-08754
Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr:
Tokenizer Choice For LLM Training: Negligible or Crucial? CoRR abs/2310.08754 (2023)
2022
[b1]
- view
  - electronic edition @ archives-ouvertes.fr
  - details & citations
- export record
  dblp key:
  - phd/hal/Suarez22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/phd/hal/Suarez22
Pedro Javier Ortiz Suárez:
A Data-driven Approach to Natural Language Processing for Contemporary and Historical French. (Une approche basée sur les données pour le traitement automatique du langage naturel en français contemporain et historique). Sorbonne University, Paris, France, 2022
[j1]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/tacl/KreutzerCWWEUTS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/tacl/KreutzerCWWEUTS22
Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Balli, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi:
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets. Trans. Assoc. Comput. Linguistics 10: 50-72 (2022)
[c12]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/coling/SuarezG22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/coling/SuarezG22
Pedro Ortiz Suarez, Simon Gabay:
A Data-driven Approach to Named Entity Recognition for Early Modern French. COLING 2022: 3722-3730
[c11]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/lrec/GrobolRSSRC22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/GrobolRSSRC22
Loïc Grobol, Mathilde Regnault, Pedro Javier Ortiz Suárez, Benoît Sagot, Laurent Romary, Benoît Crabbé:
BERTrade: Using Contextual Embeddings to Parse Old French. LREC 2022: 1104-1113
[c10]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/lrec/GabaySBCBGS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/GabaySBCBGS22
Simon Gabay, Pedro Ortiz Suarez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot:
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. LREC 2022: 3367-3374
[c9]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/lrec/AbadjiSRS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/AbadjiSRS22
Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. LREC 2022: 4344-4355
[c8]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/LaurenconSWAMSW22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/LaurenconSWAMSW22
Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Sasko, Quentin Lhoest, Angelina McMillan-Major, Gérard Dupont, Stella Biderman, Anna Rogers, Loubna Ben Allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa, Paulo Villegas, Tristan Thrush, Shayne Longpre, Sebastian Nagel, Leon Weber, Manuel Muñoz, Jian Zhu, Daniel van Strien, Zaid Alyafeai, Khalid Almubarak, Minh Chien Vu, Itziar Gonzalez-Dios, Aitor Soroa, Kyle Lo, Manan Dey, Pedro Ortiz Suarez, Aaron Gokaslan, Shamik Bose, David Ifeoluwa Adelani, Long Phan, Hieu Tran, Ian Yu, Suhas Pai, Jenny Chim, Violette Lepercq, Suzana Ilic, Margaret Mitchell, Alexandra Sasha Luccioni, Yacine Jernite:
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset. NeurIPS 2022
[c7]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/taln/GabaySBBGS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/taln/GabaySBBGS22
Simon Gabay, Pedro Javier Ortiz Suárez, Rachel Bawden, Alexandre Bartz, Philippe Gambette, Benoît Sagot:
Le projet FREEM : ressources, outils et enjeux pour l'étude du français d'Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French). TALN-RECITAL 2022: 154-165
[i10]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-06642
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-06642
Julien Abadji, Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus. CoRR abs/2201.06642 (2022)
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-10066
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-10066
Angelina McMillan-Major, Zaid Alyafeai, Stella Biderman, Kimbo Chen, Francesco De Toni, Gérard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ilic, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Javier Ortiz Suárez, Zeerak Talat, Daniel van Strien, Yacine Jernite:
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources. CoRR abs/2201.10066 (2022)
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2202-09452
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-09452
Simon Gabay, Pedro Javier Ortiz Suárez, Alexandre Bartz, Alix Chagué, Rachel Bawden, Philippe Gambette, Benoît Sagot:
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French. CoRR abs/2202.09452 (2022)
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-15600
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-15600
Luca Foppiano, Pedro Baptista de Castro, Pedro Ortiz Suarez, Kensei Terashima, Yoshihiko Takano, Masashi Ishii:
Automatic Extraction of Materials and Properties from Superconductors Scientific Literature. CoRR abs/2210.15600 (2022)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-05100
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-05100
Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilic, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major, Iz Beltagy, Huu Nguyen, Lucile Saulnier, Samson Tan, Pedro Ortiz Suarez, Victor Sanh, Hugo Laurençon, Yacine Jernite, Julien Launay, Margaret Mitchell, Colin Raffel, Aaron Gokaslan, Adi Simhi, Aitor Soroa, Alham Fikri Aji, Amit Alfassy, Anna Rogers, Ariel Kreisberg Nitzav, Canwen Xu, Chenghao Mou, Chris Emezue, Christopher Klamm, Colin Leong, Daniel van Strien, David Ifeoluwa Adelani, et al.:
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. CoRR abs/2211.05100 (2022)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2212-10440
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2212-10440
Tim Jansen, Yangling Tong, Victoria Zevallos, Pedro Ortiz Suarez:
Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data. CoRR abs/2212.10440 (2022)
2021
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2103-12028
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2103-12028
Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Balli, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi:
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets. AfricaNLP 2021
2020
[c6]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/SeddahEFFMSSS20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/SeddahEFFMSSS20
Djamé Seddah, Farah Essaidi, Amal Fethi, Matthieu Futeral, Benjamin Muller, Pedro Javier Ortiz Suárez, Benoît Sagot, Abhishek Srivastava:
Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell. ACL 2020: 1139-1150
[c5]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/SuarezRS20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/SuarezRS20
Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. ACL 2020: 1703-1714
[c4]
- view
  authority control:
- export record
  dblp key:
  - conf/acl/MartinMSDRCSS20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/MartinMSDRCSS20
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric de la Clergerie, Djamé Seddah, Benoît Sagot:
CamemBERT: a Tasty French Language Model. ACL 2020: 7203-7219
[c3]
- view
  - electronic edition @ ceur-ws.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/clef/SuarezDLT20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/clef/SuarezDLT20
Pedro Javier Ortiz Suárez, Yoann Dupont, Gaël Lejeune, Tian Tian:
SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German. CLEF (Working Notes) 2020
[c2]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/lrec/SuarezDMRS20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/SuarezDMRS20
Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot:
Establishing a New State-of-the-Art for French Named Entity Recognition. LREC 2020: 4631-4638
[c1]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/taln/MartinMSDRCSS20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/taln/MartinMSDRCSS20
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Benoît Sagot, Djamé Seddah:
Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ). JEP-TALN-RECITAL (2) 2020: 54-65
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2005-13236
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2005-13236
Pedro Javier Ortiz Suárez, Yoann Dupont, Benjamin Muller, Laurent Romary, Benoît Sagot:
Establishing a New State-of-the-Art for French Named Entity Recognition. CoRR abs/2005.13236 (2020)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2006-06202
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2006-06202
Pedro Javier Ortiz Suárez, Laurent Romary, Benoît Sagot:
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages. CoRR abs/2006.06202 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1911-03894
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1911-03894
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah, Benoît Sagot:
CamemBERT: a Tasty French Language Model. CoRR abs/1911.03894 (2019)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.