| Project | Year | Sizes available | What it is | What it advanced | ArXiv paper | ArXiv HTML | Hugging Face model / dataset | Hugging Face org | GitHub repo |
|---|---|---|---|---|---|---|---|---|---|
| Albertina PT family | 2023 | 100M, 900M | DeBERTa-based Portuguese foundation encoder with pt-PT and pt-BR variants. | Established the strongest open pt-PT encoder baseline and treated European Portuguese as its own target. | 2305.06721 | HTML | Model; GLUE-PTPT dataset | PORTULAN | Not verified |
| Albertina PT family expansion | 2024 | 100M, 900M, 1.5B | Expanded family of open Portuguese encoders at multiple sizes. | Turned Albertina into an ecosystem rather than a single model. | 2403.01897 | HTML | 1.5B pt-PT model | PORTULAN | Not verified |
| Gervásio PT family | 2024 | 7B, 8B, 70B | Fully open instruction-tuned decoder model for Portuguese, with pt-PT and pt-BR variants. | One of the earliest serious open Portuguese decoder-side models from Portugal. | 2402.18766 | HTML | 7B pt-PT model | PORTULAN | Not verified |
| GlórIA | 2024 | 1.3B, 2.7B | Open generative LLM for Portuguese with strong European Portuguese orientation. | Pushed pt-PT into the decoder / LLM era and introduced CALAME-PT. | 2402.12969 | HTML | Model; CALAME-PT dataset | NOVA-vision-language | rvlopes/GlorIA |
| MediAlbertina | 2024 | 900M, 1.5B | Domain-adapted European Portuguese medical language model built on Albertina. | Brought pt-PT modeling into the medical domain. | Not verified | Not verified | HF model | portugueseNLP | Not verified |
| AMALIA | 2026 | 9B | Fully open pt-PT-first LLM paired with native pt-PT evaluation. | Current flagship for European Portuguese LLM work. | 2603.26511 | HTML | Not verified | Not verified | AMALIA-LLM/AMALIA |
| Project | Year | What it is | What it advanced | ArXiv paper | ArXiv HTML | Hugging Face model / dataset | Hugging Face org | GitHub repo |
|---|---|---|---|---|---|---|---|---|
| From Brazilian Portuguese to European Portuguese | 2024 | pt-BR → pt-PT translation study with a manually curated gold test set. | Created reusable native evaluation material for an important pt-PT correction task. | 2408.07457 | HTML | Not verified | Not verified | Not verified |
| PtBrVarId | 2025 | Cross-domain dataset for distinguishing European and Brazilian Portuguese. | Improved the curation pipeline needed to separate pt-PT from pt-BR across datasets and models. | 2502.14394 | HTML | Dataset; Model | liaad | LIAAD/portuguese_vid |
| Tradutor / PTradutor | 2025 | Open European Portuguese translation model plus dedicated parallel dataset. | Made pt-BR → pt-PT translation an open, reproducible research problem. | 2502.14385 | HTML | PTradutor dataset | hugosousa | hmosousa/tradutor; hmosousa/ptradutor |
| CitiLink-Minutes | 2026 | Multilayer pt-PT dataset of municipal meeting minutes. | Opened a practical civic-language dataset for European Portuguese. | 2602.12137 | HTML | Representative HF model | inesctec | INESCTEC/citilink-dataset |
| CitiLink-Summ | 2026 | pt-PT summarization dataset for discussion subjects in municipal meeting minutes. | Made summarization on real public-administration text possible in pt-PT. | 2602.16607 | HTML | HF summarization model | inesctec | INESCTEC/citilink-summ |
| ClaimPT | 2026 | European Portuguese dataset for claim detection in news. | Gave pt-PT fact-checking a proper research base using licensed news content. | 2601.19490 | HTML | HF model | lfcc | LIAAD/ClaimPT |
| Project | Year | What it is | What it advanced | ArXiv paper | ArXiv HTML | Hugging Face model / dataset | Hugging Face org | GitHub repo |
|---|---|---|---|---|---|---|---|---|
| DSL-TL / Language Variety Identification with True Labels | 2023 | Human-annotated benchmark for language variety identification, including pt-PT vs pt-BR. | Fixed a common evaluation flaw: assuming the source of a text reveals its variety. | 2303.01490 | Not verified | Not verified | Not verified | LanguageTechnologyLab/DSL-TL |
| CALAME-PT | 2024 | Portuguese zero-shot language-model benchmark introduced with GlórIA. | Gave Portuguese decoder models a shared evaluation surface. | GlórIA paper | HTML | HF dataset | NOVA-vision-language | rvlopes/GlorIA |
| CAMÕES | 2025 | Open benchmark for European Portuguese ASR and other Portuguese varieties. | Gave European Portuguese speech recognition a serious open benchmark. | 2508.19721 | HTML | HF dataset | inesc-id | Not verified |
| ALBA | 2026 | Linguistically grounded pt-PT benchmark for generative LLMs across eight linguistic dimensions. | Moved pt-PT evaluation toward native linguistic and cultural fidelity instead of translated proxies. | 2603.26516 | HTML | Not verified | Not verified | AMALIA-LLM/alba-benchmark |
| CLARIN-PT-LDB | 2026 | Open pt-PT LLM leaderboard focused on language, culture, and civility. | Made pt-PT LLM evaluation public and reproducible. | 2603.12872 | HTML | HF Space | PORTULAN | Not verified |
| AMALIA Eval Suite | 2026 | Native pt-PT evaluation suite released inside the AMALIA technical report. | Strengthened the move from translated benchmarks to native pt-PT evaluation. | 2603.26511 | HTML | Not verified | Not verified | AMALIA-LLM/AMALIA |