• impure9435@kbin.run
    link
    fedilink
    arrow-up
    5
    ·
    25 days ago

    The thing that I find the most funny about this post, is the fact that you call this Italian

    • ChanchoManco@lemm.ee
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      24 days ago

      Typical AI behavior

      Edit: and then it will gaslight you if you say the answer is the same.

      • driving_crooner@lemmy.eco.br
        link
        fedilink
        arrow-up
        2
        ·
        24 days ago

        Fucking hate when do that.

        You are repeating the same mistake.

        I’m sorry for repeating the same mistake, here’s a new solution with corrections *proceed to write the exactly thing already told it was wrong*

  • stingpie@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    24 days ago

    This might be happening because of the ‘elegant’ (incredibly hacky) way openai encodes multiple languages into their models. Instead of using all character sets, they use a modulo operator on each character, to make all Unicode characters represented by a small range of values. On the back end, it somehow detects which language is being spoken, and uses that character set for the response. Seeing as the last line seems to be the same mathematical expression as what you asked, my guess is that your equation just happened to perfectly match some sentence that would make sense in the weird language.

        • crispy_kilt@feddit.de
          link
          fedilink
          arrow-up
          1
          ·
          22 days ago

          Seriously? Python for massive amounts of data? It’s a nice scripting language, but it’s excruciatingly slow

          • stingpie@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            22 days ago

            There are bindings in java and c++, but python is the industry standard for AI. The libraries for machine learning are actually written in c++, but use python language bindings. Python doesn’t tend to slow things down since machine learning is gpu-bound anyway. There are also library specific programming languages which urges the user to make pythonic code that can be compiled into c++.

    • NeatNit@discuss.tchncs.de
      link
      fedilink
      arrow-up
      0
      ·
      24 days ago

      I suppose it’s conceivable that there’s a bug in converting between different representations of Unicode, but I’m not buying and of this “detected which language is being spoken” nonsense or the use of character sets. It would just use Unicode.

      The modulo idea makes absolutely no sense, as LLMs use tokens, not characters, and there’s soooooo many tokens. It would make no sense to make those tokens ambiguous.

      • stingpie@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        23 days ago

        I completely agree that it’s a stupid way of doing things, but it is how openai reduced the vocab size of gpt-2 & gpt-3. As far as I know–I have only read the comments in the source code– the conversion is done as a preprocessing step. Here’s the code to gpt-2: https://github.com/openai/gpt-2/blob/master/src/encoder.py I did apparently make a mistake, as the vocab reduction is done through a lut instead of a simple mod.

    • 82cb5abccd918e03@lemmygrad.ml
      link
      fedilink
      arrow-up
      1
      ·
      25 days ago

      I found it! its the Glagolitic script used in the 9th century before Cyrillic took over:

      ⰀⰁⰂⰃⰄⰅⰆⰇⰈⰉⰊⰋⰌⰍⰎⰏⰐⰑⰒⰓⰔⰕⰖⰗⰘⰙⰚⰛⰜⰝⰞⰟⰠⰡⰢⰣⰤⰥⰦⰧⰨⰩⰪⰫⰬⰭⰮⰰⰱⰲⰳⰴⰵⰶⰷⰸⰹⰺⰻⰼⰽⰾⰿⱀⱁⱂⱃⱄⱅⱆⱇⱈⱉⱊⱋⱌⱍⱎⱏⱐⱑⱒⱓⱔⱕⱖⱗⱘⱙⱚⱛⱜⱝⱞ
      
  • Redex@lemmy.world
    link
    fedilink
    arrow-up
    1
    ·
    25 days ago

    Damn, wild Glagolitic script found. I didn’t even realise it was in the Unicode standard.

  • QuazarOmega@lemy.lol
    link
    fedilink
    English
    arrow-up
    1
    ·
    23 days ago

    You may not understand, but we do.
    Questo segreto rimarrà custodito gelosamente dalla stirpe italica. ◉‿◉

  • Vitaly@feddit.uk
    link
    fedilink
    arrow-up
    0
    ·
    25 days ago

    Kind of looks like the writing system of Georgian language but I’m not sure

    • Allero@lemmy.today
      link
      fedilink
      arrow-up
      1
      ·
      25 days ago

      No, this is Glagolitic script, an alternative to Cyrillic. Mostly used in old Slavic scriptures, was later replaced by Cyrillic and Latin.

      Most Slavs themselves don’t know how to read this