Moving/copying containers of containers crashes the MM

Hey @danngreen , just had to implement an interesting workaround for the MM which I thought was worth bringing to your attention. It seems that containers of containers have some issues when being copied or moved. I have a function which used std::move to transfer a sample from one buffer to another, this works perfectly on the desktop, but on the MM it was crashing 100% of the time. It seems to be because the data structure holding the samples is a (disgusting) vector<vector<vector<float>>> and there’s some issue with moving nested structures like that. The function is pretty simple so it wasn’t too hard to see that this was the culprit, and once I implemented a manual copy it’s not crashed once, so I’m pretty sure this is the cause.

 void menuClearExceptCurrent()
    {
        if (!lockSampleSelect.load())
        {
            lockSampleSelect.store(true);
            auto writeBuff = activeSampleBuffer == &sampleBuffer ? &sampleBufferB : &sampleBuffer;
            writeBuff->clear();
            writeBuff->swap(*writeBuff);
#if defined(METAMODULE)
            // Manually copy the data from activeSampleBuffer to writeBuff
            const auto &source = activeSampleBuffer->at(playBufferIdx);
            writeBuff->emplace_back(); // Add a new empty vector to writeBuff
            auto &destination = writeBuff->back();

            destination.resize(source.size());
            for (size_t i = 0; i < source.size(); ++i)
            {
                destination[i].resize(source[i].size());
                for (size_t j = 0; j < source[i].size(); ++j)
                {
                    destination[i][j] = source[i][j];
                }
            }
#else
            writeBuff->push_back(std::move(activeSampleBuffer->at(playBufferIdx)));
#endif
            //  Get path to this sample
            auto path = samplePaths.at(playBufferIdx);
            // clear the paths
            samplePaths.clear();
            // insert the path
            samplePaths.push_back(path);
            activeSampleBuffer = activeSampleBuffer == &sampleBuffer ? &sampleBufferB : &sampleBuffer;
            // mark dirty for ui update
            dirty.store(true);
            lockSampleSelect.store(false);
        }
    }

It’s hard to know exactly what’s going on here, but your Metamodule logic is fundamentally different than your original logic. Copy vs move. Is there a chance you access the vector after it has been moved from? That could result in undefined behavior or some sort of exception depending on what you try to do. I think perhaps the Metamodule build may have exposed undefined behavior that was going unnoticed in the original build. But maybe not, I’m just guessing here.

One thing that stands out to me is at the end of this function activeSampleBuffer should have a size of 1. And playBufferIdx could equal anything. So if this function were called again after your double buffer is swapped, then activeSampleBuffer->at(playBufferIdx) could throw.

Also, what’s the purpose of writeBuff->swap(*writeBuff);?

You’re right, but the same thing happens if you copy.

I’m super careful about this, the atomics in that function are all to make sure none of that stuff is accessed while these operations are going on, so Im pretty positive this isn’t the case. But I’ll definitely look through it again just incase.

What that value is possible to equal is handled in the audio thread and it’s done in such a way that it’s not possible to exceed the size of the vector. At one point I was explicitly setting it here too but I found it wasn’t necessary because of the way it’s implemented.

There was some talk in another thread that shrink_to_fit doesn’t work on the mm and this is a way to effectively achieve the same thing.

After staring at this for a bit longer I think you almost certainly have a race condition at this line if the function isn’t being called in the audio process function.

writeBuff->push_back(std::move(activeSampleBuffer->at(playBufferIdx)));

The std::move modifies the activeSampleBuffer which I assume the audio processing function is reading from. And your atomic won’t protect the audio process from interrupting this function at any arbitrary time

And I assume your fix works because the copy doesn’t modify the vector it’s copying from

Ok but as I said the exact same crash happens if you do a std::copy which isn’t modifying the active sample buffer. So although you might be right that it’s not great to move the vector between buffers here I still don’t think it explains the issue I’m seeing completely.

std::copy won’t resize your vector. So if the destination size < the size of the source range std::copy isn’t the right tool

Does this crash?
writeBuff->push_back(activeSampleBuffer->at(playBufferIdx));

I’ll give that a try when I’m at the computer and let you know. Thanks very much for all the help. I’m not a cpp developer by trade (though I am a developer) and there’s a lot of nuance which is not really obvious unless you use it all the time, so your help here is super useful!

1 Like

You’re welcome! Are the samples relatively large? Because having duplicate samples in memory, even briefly, can definitely have negative consequences when we have so little ram to play with relative to a PC.
I think in the future to avoid the duplicates, you could probably maintain one container that holds all the necessary samples in memory. And then something like your double buffer system, but instead of actually holding the samples they hold a reference/pointer to the sample.
Then when neither buffer holds a reference to the sample, you know you can remove that sample from the sample container.

I agree. Keep in mind that the entire audio process might run multiple times while this line of code is being executed, and there’s no guarantees in what order the compiler will perform the move.

A few other things come to mind that could be useful:

  • On a desktop, there is tons of RAM. So a bug that causes a write to a bad address (e.g. out of bounds of the vector’s data or writing to moved-from data) is statistically much less likely to overwrite an important address in memory than on a small system with less RAM. Last week I found two such bugs in a popular plugin – it seemed to “work” on desktop but it quickly crashed on MetaModule. Except in reality, if I ran it long enough on my computer under the right conditions it would eventually crash, but I only had to run it for a short while on the MetaModule to see it fail. So you could try creating some hard-hitting tests for the desktop version and see if you can make it fail.

  • Concurrency on the MetaModule is completely different than on the desktop computer. I see the function name is “menuClearExceptCurrent()” so I’m assuming this is called from the GUI context (e.g. it’s a right-click menu)? So as mentioned above, that means the audio thread is possibly running at the same time as this function a different core, or the audio thread is interrupting this function at any moment, for as long as it needs to, and as many times as it needs to. If there’s an AsyncThread also accessing this data, it gets a lot more complicated… so it’s hard to say without knowing all the various contexts that are accessing the sample data.

  • Memory allocation on the MetaModule is strictly limited. Except for some specific times (constructors of Modules and ModuleWidgets, or during plugin startup/initialization), only the GUI thread and AsyncThreads are allowed to allocate memory, and a crash is inevitable if any other context attempts to allocate or de-allocate.

Also, allocating memory in multiple AsyncThreads (that is, multiple modules having an AsyncThread) simultaneously is something I haven’t tested well. I don’t have any evidence of there being a bug but the fact that I haven’t tested it (and haven’t figured out a way to write useful concurrency tests) means I can’t rule out some issue there. So if this code is running in an AsyncThread and the crash only happens when another module that allocates in an AsyncThread is running in the same patch, then we can pursue this.

shrink_to_fit works the same as it does on a desktop: the issue is that the name is a bit misleading. According the the c++ standard, this function might do nothing. See here:

void shrink_to_fit();
Requests the removal of unused capacity.
It is a non-binding request to reduce capacity() to size(). It depends on the implementation whether the request is fulfilled.

So, on any platform (desktop, embedded, windows, mac, linux…) you can never be sure if calling this will do anything. It’s no different on the MetaModule.

The swap idiom allocates a new temporary vector, copies the contents of the old vector into it, sets the old vector’s pointers to the new temporary vector, and then de-allocates the old vector’s data. So, it’s a much heavier operation.