Memory Management

Managing memory on a edge devices is crucial for performance, efficiency, and preventing crashes. The SDK does it's best to manage memory automatically. However, you will also need to be cautious in allocations outside of the SDK.

Tip:

If you're using the Swift SDK, GPU memory is not reported in Xcode's "Debug Navigator" memory usage graph. Before each major operation of the SDK, we will report the available GPU memory in the console logs.

Automatic Management

Here's what the SDK manages for you:

Real-time monitoring: The SDK will automatically react to memory pressure warnings from the OS, and progressively unload AI models as needed.
Highlander mode: On memory limited devices (iPhones & iPads), the SDK will only allow one AI model to be loaded at a time. This has some additional overhead when switching models, but prevents out-of-memory crashes and allows larger context windows.
Automatic context window caching: The SDK will automatically cache the context window to disk at the end of each messageThreadRun to ensure that subsequent runs are fast. We call this a chat cache.

Manual Management

The best way to ensure that your app does not run out of memory, is to unload the model from memory when not in use.

// Unloads the default model that was specified by the AI Agent in the console.
FreeToken.shared.unloadModel()

// Unload a specific model:
FreeToken.shared.unloadModel(modelCode: "gemma3n_e2b_it")

Removing Models from Disk

You or your users may choose that they no longer want an AI model (or all AI models) to be stored on the device. You can remove a downloaded model from disk using:


// Remove a specific model from disk:
await FreeToken.shared.deleteAIModelCache(modelCode: "gemma3n_e2b_it")

// Remove all models from disk:
await FreeToken.shared.deleteAIModelCache()

Removing Chat Caches

A chat cache is a context window cache that is automatically saved to disk at the end of each messageThreadRun. This allows subsequent runs of the same thread to be much faster, as the context window does not need to be re-processed.

You can remove all chat caches by calling:


// Remove all chat caches from disk:
await FreeToken.shared.resetChatCache()