• 0 Posts
  • 4 Comments
Joined 2 年前
cake
Cake day: 2023年6月10日

help-circle

  • Not a silly question at all!

    Compilers are already really smart and do a lot of heavy lifting but they’re also restricted to what you write and they err on the side of safety. They will do things like inline object functions if you don’t have virtual functions and are simple enough which reduces the number of indirections. They won’t re-order your classes and re-write your code. In my experience compilers don’t do a good job at magically vectoring code (using SIMD registers to their fullest extent), so maybe that can be improved by a super smart compiler.

    I would say it’s possible to have a linter let you know if you’re making structs which are cache unfriendly.

    There are also runtime tools like Intel’s Vtune or perf on Linux. I would say that while those tools are very powerful the learning curve is very difficult. In my experience you need to know a lot about optimization to understand the results.

    Today’s generative AI can give you broad strokes about refactoring some code to DOD and I’m sure in a few years it could do something to whole projects.

    Oftentimes safety comes at the cost of performance with compilers if you don’t give it enough details such as restrict/noalias, packing, alignment, noexcept, assume/unreachable, memory barriers. Rust is able to be performant and safe because it is a very verbose and restrictive language when you write it. C++ gives you all the tools but they tend to be off by default. In my experience game devs like to stick to C++ despite the lack of safety guardrails because it’s faster to write efficient code and “we’re not making medical equipment” sentiments.



  • If you want your code to be performant you need to think about how you lay out your data for your CPU to manipulate it. This case might work well for one player but what if you have 100, 10 000?

    When you call player->move (assuming polymorphism), you’re doing three indirections: get the player data at the address of player, get the virtual function table of that player, get the address of the move function.

    Each indirection is going to be a cache miss. A cache miss means your cpu is going to be waiting for the memory controller to provide the data. While the cpu can hide some of this latency with pipelining and speculative execution, there are two problems: the memory layout limits how much it can do and the memory fetch is still orders of magnitude slower than cpu instructions.

    If you think that’s bad, it gets worse. You now have the address of the function and can now move your player. Your cpu does a few floating point operations on 3d or 4d vectors using SIMD instructions. Great! But did you know that those SIMD registers can be 512 bits wide? For a 4d vector, that’s 25% occupancy, meaning you could be running 4x as fast.

    In games, especially for movement, you should be ditching object oriented design (arrays of structs) and use data oriented design (struct of arrays).

    Don’t do

    struct Player { float x, float y, float rotation, vec3 color, Sprite* head};
    Player players[NUM];
    

    Instead do

    struct Players {
        Vec2 positions[NUM];
        float rotations[NUM];
        vec4 colors[NUM];
        Sprites heads[NUM];
    };
    

    You will have to write your code differently and rethink your abstractions but your CPU will thank you for it: Less indirections, operations will happen on data on the same cache lines, operations will be vectorizable by your compiler and even instruction cache will be optimized.

    Edit 1: formatting

    Edit 2: just saw you’re doing 2d instead of 3d. This means your occupancy is 12.5%. That operation could be 8 times as fast! Even faster without indirection and by optimizing cache data locality.