Skip to content
View dumitrescustefan's full-sized avatar

Block or report dumitrescustefan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dumitrescustefan/README.md

About me

I'm an Machine Learning Engineer, working on cool projects at the intersection of NLP and CV. I finished my PhD in 2011, worked as a Research Scientist at the Research Institute for AI (Romanian Academy) for 7 years, then switched applied ML as an ML engineer at Sustainalytics (2017-2019) and now at Adobe.

I'm active in open source, especially on Romanian NLP. Throughout the years I've published, teached and coded, all while having fun. I like to build stuff.

Showcase on HuggingFace:


Projects I'm proud of

Under development:

  • Romanian Text Corpus (joint project with Mihai Ilie)
  • Word Sense Disambiguation Corpus & Models for Romanian (large scale, long running project)
  • NLI Corpus for Romanian
  • Sentence segmentation for Romanian (because current Romanian tools fail miserably for anything but clean text)

2023

  • May Appeared on live TV discussing AI (#1, #2)
  • Apr Participated in WE Smart Diaspora conference in Timisoara, Romania, presenting "The Impact of Large Language Models"

2022

2021

2020

  • Aug I lead the development of the first ML leaderboard named LiRo Benchmark, together with Viorica Patraucean and other amazing RomaniaAI volunteers.
  • Jun Proposed and lead the development of the Romanian Semantic Textual Similarity dataset. It's a 1:1 high-quality human translation of the English STS dataset.
  • Apr: Trained an released the first monolingual Romanian BERT model, which became the most used BERT model in Romania, with thousdands of monthly downloads.

2019 and before

  • RoWordNet pip package providing quick access to the Romanian WordNet. After all these years it's still the only python plug-and-play package for Romanian - seems to be working well :)
  • Developed NLP-Cube with Tiberiu Boros (lead). Started as an entry in the 2018 Conll competition and evolved into a multilingual toolkit providing Tokenization, Sentence Segmentation, Lemmatization, POS and DEP parsing, trained on the Universal Dependencies dataset.

Selected publications

Google Scholar profile , h-index: 9


Pinned Loading

  1. roner roner Public

    Named Entity Recognition for Romanian, based on transformer models

    Python 12

  2. Romanian-Transformers Romanian-Transformers Public

    This repo is the home of Romanian Transformers.

    93 6

  3. adobe/NLP-Cube adobe/NLP-Cube Public

    Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing

    HTML 555 93

  4. ronec ronec Public

    Romanian Named Entity Corpus (RONEC) version 2.0

    Python 60 16

  5. RoWordNet RoWordNet Public

    Romanian WordNet (Data + API for Python)

    Python 49 18