Post by @inex

Sign in
Sign up

EN

Log in Register

Modes

Overview Chat Timeline Communities Gallery Lists Friends Email Vault DNS VPN

Back to Timeline @inex

Inex Codium

@inex@pony.social

Инекс Коуд / Inex Code (he/him) Distributed over fedi since 2019 Developing stuff related to Fedi, NixOS and Flutter at @ selfprivacy Feel free to send a follow request

pony.social

Inex Codium

@inex@pony.social

Инекс Коуд / Inex Code (he/him) Distributed over fedi since 2019 Developing stuff related to Fedi, NixOS and Flutter at @ selfprivacy Feel free to send a follow request

pony.social

@inex@pony.social · Feb 24, 2026

I find it hilarious how all these comparative LLM benchmarks show that models get better and better, but when you open the source data of these benchmarks, it is basically assessors saying this over and over:

Model A’s output is complete bullshit, 2 stars out of 5 Model B’s output is not even relevant, 1 star out of 5

View on pony.social

0

Sign in to interact

Loading comments...

313k7r1n3

Company

About
Contact
FAQ

Legal

Terms of Service
Privacy Policy
VPN Policy

Email Settings

IMAP: mail.elektrine.com:993

POP3: pop3.elektrine.com:995

SMTP: mail.elektrine.com:465

SSL/TLS required

Support

support@elektrine.com
Report Security Issue

Connect

Tor Hidden Service

khav7sdajxu6om3arvglevskg2vwuy7luyjcwfwg6xnkd7qtskr2vhad.onion