MultiCortex Linus Release Notes

1.0.6 - 26 February 2025

System Requirements Release policy Installation Guides

What's new

MultiCortex 1.0.6 : 26 February 2025

OLLAMA 5.12

Wat's Changed

The OpenAI-compatible API will now return tool_calls if the model called a tool
Performance on certain Intel Xeon processors should now be restored
Fixed permission denied issues after installing Ollama on Linux
Fixed issue where additional CPU libraries were included in the arm64 Linux install
The progress bar will no longer flicker when running ollama pull
Fixed issue where running a model would fail on Linux if Ollama was installed in a path with UTF-8 characters
X-Stainless-Timeout will now be accepted as a header in the OpenAI API endpoints

New models

Perplexity R1 1776: A version of the DeepSeek-R1 model that has been post trained to remove its refusal to respond to some sensitive topics.

MultiCortex 1.0.5 : 12 February 2025

OLLAMA 5.11

What's Changed

Fixed The system cannot find the path specified errors when running models in some cases on Windows
Fixed issue where running ollama serve on Intel Macs would not use CPU acceleration
Fixed issue on multi-GPU Windows and Linux machines where memory estimations would be incorrect
DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
Ollama will now use AVX-512 instructions where available for additional CPU acceleration
NVIDIA and AMD GPUs can now be used with CPUs without AVX instructions
Ollama will now use AVX2 instructions with NVIDIA and AMD GPUs
New ollama-darwin.tgz package for macOS that replaces the previous ollama-darwin standalone binary.
Fixed indexing error that would occur when downloading a model with ollama run or ollama pull
Fixes cases where download progress would reverse/li>
Fixed issue where using two FROM commands in Modelfile
Support importing Command R and Command R+ architectures from safetensors
Fixed errors that would occur when running ollama create on Windows and when using absolute paths
The /api/create API endpoint that powers ollama create has been changed to improve conversion time and also accept a JSON object. Note: this change is not backwards compatible. If importing models, make sure you're using version 0.5.5 or later for both Ollama and the ollama CLI when running ollama create. If using ollama.create in the Python or JavaScript libraries, make sure to update to the latest version.
Fixed runtime error that would occur when filling the model's context window
Fixed crash that would occur when quotes were used in /save
Fixed errors that would occur when sending x-stainless headers from OpenAI clients
Fixed issue where providing null to format would result in an error
Fixed runtime errors on older Intel Macs
Fixed issue where setting the format field to "" would cause an error

New models

DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
Phi-4: Phi 4 is a 14B parameter, state-of-the-art open model from Microsoft.
Command R7B: the smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.
DeepSeek-V3: A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
OLMo 2: a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3.1 on English academic benchmarks.
Dolphin 3: the next generation of the Dolphin series of instruct-tuned models designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
SmallThinker: A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model.
Granite 3.1 Dense: 2B and 8B text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.
Granite 3.1 MoE: 1B and 3B long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.
Falcon3: A family of efficient AI models under 10B parameters performant in science, math, and coding through innovative training techniques.

OpenVINO 2025.0.0

More GenAI coverage and framework integrations to minimize code changes

New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev
Whisper Model: Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories..
Broader Large Language Model (LLM) support and more model compression techniques.
Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by effectively utilizing predefined prompts that match the intended use case.
Preview: The GenAI API now offers image-to-image inpainting functionality. This feature enables models to generate realistic content by inpainting specified modifications and seamlessly integrating them with the original image.
Asymmetric KV Cache compression is now enabled for INT8 on CPUs, resulting in lower memory consumption and improved 2nd token latency, especially when dealing with long prompts that require significant memory. The option should be explicitly specified by the user.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow Lake-H)
Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to utilize the Triton server for enhanced model serving performance when deploying on Intel CPUs.
Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8 release.
The OpenVINO Model Server now supports native Windows Server deployments, allowing developers to leverage better performance by eliminating container overhead and simplifying GPU deployment.

Now deprecated:

Legacy prefixes l_, w_, and m_ have been removed from OpenVINO archive names.
The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.

MultiCortex 1.0.4 : 20 December 2024

OLLAMA 5.1

What's Changed

Fixed issue where Ollama's API would generate JSON output when specifying "format": null
Fixed issue where passing --format json to ollama run would cause an error
Llama 3.3: a new state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.
Snowflake Arctic Embed 2: Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.
Fixed error importing model vocabulary files
Experimental: new flag to set KV cache quantization to 4-bit (q4_0), 8-bit (q8_0) or 16-bit (f16). This reduces VRAM requirements for longer context windows.

OpenVINO 2024.6.0

Summary of major features and improvements

OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance.
Introduced support for Intel Arc B-Series Graphics (formerly known as Battlemage).
Implemented optimizations to improve the inference time and LLM performance on NPUs.
Improved LLM performance with GenAI API optimizations and bug fixes.

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.

Discontinued in 2024.0

Runtime components:

Intel Gaussian & Neural Accelerator (Intel GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel Core™ Ultra or 14th generation and beyond.
OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API

Tools:

Deployment Manager. See installation and deployment guides for current distribution options.
Accuracy Checker.
Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.

Deprecated and to be removed in the future:

The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.