Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
11
13
55
Marc Olejak
MarcGrumpyOlejak
Follow
keisar's profile picture
hadiasghari's profile picture
everton137's profile picture
6 followers
·
46 following
AI & ML interests
On the practical low-cost level of ML playing around with german bureaucratic language and still uses Levenshtein.
Recent Activity
liked
a model
about 1 month ago
EuropeanParliament/EuroLLM-22B-EU-legislative
reacted
to
tomaarsen
's
post
with 🔥
5 months ago
🐦🔥 I've just published Sentence Transformers v5.2.0! It introduces multi-processing for CrossEncoder (rerankers), multilingual NanoBEIR evaluators, similarity score outputs in mine_hard_negatives, Transformers v5 support and more. Details: - CrossEncoder multi-processing: Similar to SentenceTransformer and SparseEncoder, you can now use multi-processing with CrossEncoder rerankers. Useful for multi-GPU and CPU settings, and simple to configure: just `device=["cuda:0", "cuda:1"]` or `device=["cpu"]*4` on the `model.predict` or `model.rank` calls. - Multilingual NanoBEIR Support: You can now use community translations of the tiny NanoBEIR retrieval benchmark instead of only the English one, by passing `dataset_id`, e.g. `dataset_id="lightonai/NanoBEIR-de"` for the German benchmark. - Similarity scores in Hard Negatives Mining: When mining for hard negatives to create a strong training dataset, you can now pass `output_scores=True` to get similarity scores returned. This can be useful for some distillation losses! - Transformers v5: This release works with both Transformers v4 and the upcoming v5. In the future, Sentence Transformers will only work with Transformers v5, but not yet! - Python 3.9 deprecation: Now that Python 3.9 has lost security support, Sentence Transformers no longer supports it. Check out the full changelog for more details: https://github.com/huggingface/sentence-transformers/releases/tag/v5.2.0 I'm quite excited about what's coming. There's a huge draft PR with a notable refactor in the works that should bring some exciting support. Specifically, better multimodality, rerankers, and perhaps some late interaction in the future!
liked
a model
5 months ago
dbmdz/bert-base-german-uncased
View all activity
Organizations
None yet
MarcGrumpyOlejak
's datasets
12
Sort: Recently updated
MarcGrumpyOlejak/gooaq_mt_german
Viewer
•
Updated
Nov 24, 2025
•
3.01M
•
20
MarcGrumpyOlejak/LCC_deu_news_1M_bt
Viewer
•
Updated
Aug 6, 2025
•
17.4M
•
381
MarcGrumpyOlejak/gooaq_mt_german_0_hard_negatives
Viewer
•
Updated
Jul 30, 2025
•
623k
•
59
MarcGrumpyOlejak/gooaq_mt_german_5_hard_negatives
Viewer
•
Updated
Jul 30, 2025
•
2.08M
•
223
MarcGrumpyOlejak/mmarco-de-distilled-scored
Viewer
•
Updated
Jun 13, 2025
•
315k
•
12
MarcGrumpyOlejak/germanrag-scored
Viewer
•
Updated
Jun 10, 2025
•
3.36k
•
672
MarcGrumpyOlejak/german-oasst1-qa-format-scored
Viewer
•
Updated
Jun 10, 2025
•
10.4k
•
9
MarcGrumpyOlejak/swim-ir-monolingual-de-scored
Viewer
•
Updated
Jun 2, 2025
•
447k
•
10
MarcGrumpyOlejak/slimorca_dedup_german_experimental-scored
Viewer
•
Updated
Jun 2, 2025
•
322k
•
27
MarcGrumpyOlejak/gpt-4-self-instruct-german-scored
Viewer
•
Updated
Jun 2, 2025
•
10k
•
51
MarcGrumpyOlejak/ultradistil-intel-orca-dpo-de-scored
Viewer
•
Updated
Jun 2, 2025
•
5.92k
•
12
MarcGrumpyOlejak/alpaca-gpt4_de-scored
Viewer
•
Updated
Jun 2, 2025
•
50k
•
7