MultiCortex Linus Release Notes
1.0.6 - 26 February 2025
What's new
MultiCortex 1.0.6 : 26 February 2025
OLLAMA 5.12
-
Wat's Changed
- The OpenAI-compatible API will now return tool_calls if the model called a tool
- Performance on certain Intel Xeon processors should now be restored
- Fixed permission denied issues after installing Ollama on Linux
- Fixed issue where additional CPU libraries were included in the arm64 Linux install
- The progress bar will no longer flicker when running ollama pull
- Fixed issue where running a model would fail on Linux if Ollama was installed in a path with UTF-8 characters
- X-Stainless-Timeout will now be accepted as a header in the OpenAI API endpoints
- Perplexity R1 1776: A version of the DeepSeek-R1 model that has been post trained to remove its refusal to respond to some sensitive topics.
New models
MultiCortex 1.0.5 : 12 February 2025
OLLAMA 5.11
-
What's Changed
- Fixed The system cannot find the path specified errors when running models in some cases on Windows
- Fixed issue where running ollama serve on Intel Macs would not use CPU acceleration
- Fixed issue on multi-GPU Windows and Linux machines where memory estimations would be incorrect
- DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
- OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
- Ollama will now use AVX-512 instructions where available for additional CPU acceleration
- NVIDIA and AMD GPUs can now be used with CPUs without AVX instructions
- Ollama will now use AVX2 instructions with NVIDIA and AMD GPUs
- New ollama-darwin.tgz package for macOS that replaces the previous ollama-darwin standalone binary.
- Fixed indexing error that would occur when downloading a model with ollama run or ollama pull
- Fixes cases where download progress would reverse/li>
- Fixed issue where using two FROM commands in Modelfile
- Support importing Command R and Command R+ architectures from safetensors
- Fixed errors that would occur when running ollama create on Windows and when using absolute paths
- The /api/create API endpoint that powers ollama create has been changed to improve conversion time and also accept a JSON object. Note: this change is not backwards compatible. If importing models, make sure you're using version 0.5.5 or later for both Ollama and the ollama CLI when running ollama create. If using ollama.create in the Python or JavaScript libraries, make sure to update to the latest version.
- Fixed runtime error that would occur when filling the model's context window
- Fixed crash that would occur when quotes were used in /save
- Fixed errors that would occur when sending x-stainless headers from OpenAI clients
- Fixed issue where providing null to format would result in an error
- Fixed runtime errors on older Intel Macs
- Fixed issue where setting the format field to "" would cause an error
- DeepScaleR: A fine-tuned version of Deepseek-R1-Distilled-Qwen-1.5B that surpasses the performance of OpenAI’s o1-preview with just 1.5B parameters on popular math evaluations.
- OpenThinker: A fully open-source family of reasoning models built using a dataset derived by distilling DeepSeek-R1.
- Phi-4: Phi 4 is a 14B parameter, state-of-the-art open model from Microsoft.
- Command R7B: the smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.
- DeepSeek-V3: A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
- OLMo 2: a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3.1 on English academic benchmarks.
- Dolphin 3: the next generation of the Dolphin series of instruct-tuned models designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.
- SmallThinker: A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model.
- Granite 3.1 Dense: 2B and 8B text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing.
- Granite 3.1 MoE: 1B and 3B long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.
- Falcon3: A family of efficient AI models under 10B parameters performant in science, math, and coding through innovative training techniques.
New models
OpenVINO 2025.0.0
-
More GenAI coverage and framework integrations to minimize code changes
- New models supported: Qwen 2.5, Deepseek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, and DeepSeek-R1-Distill-Qwen-1.5B, FLUX.1 Schnell and FLUX.1 Dev
- Whisper Model: Improved performance on CPUs, built-in GPUs, and discrete GPUs with GenAI API.
- Preview: Introducing NPU support for torch.compile, giving developers the ability to use the OpenVINO backend to run the PyTorch API on NPUs. 300+ deep learning models enabled from the TorchVision, Timm, and TorchBench repositories..
-
Broader Large Language Model (LLM) support and more model compression techniques. - Preview: Addition of Prompt Lookup to GenAI API improves 2nd token latency for LLMs by effectively utilizing predefined prompts that match the intended use case.
- Preview: The GenAI API now offers image-to-image inpainting functionality. This feature enables models to generate realistic content by inpainting specified modifications and seamlessly integrating them with the original image.
- Asymmetric KV Cache compression is now enabled for INT8 on CPUs, resulting in lower memory consumption and improved 2nd token latency, especially when dealing with long prompts that require significant memory. The option should be explicitly specified by the user.
- Support for the latest Intel® Core™ Ultra 200H series processors (formerly codenamed Arrow Lake-H)
- Integration of the OpenVINO ™ backend with the Triton Inference Server allows developers to utilize the Triton server for enhanced model serving performance when deploying on Intel CPUs.
- Preview: A new OpenVINO ™ backend integration allows developers to leverage OpenVINO performance optimizations directly within Keras 3 workflows for faster AI inference on CPUs, built-in GPUs, discrete GPUs, and NPUs. This feature is available with the latest Keras 3.8 release.
- The OpenVINO Model Server now supports native Windows Server deployments, allowing developers to leverage better performance by eliminating container overhead and simplifying GPU deployment.
- Now deprecated:
- Legacy prefixes l_, w_, and m_ have been removed from OpenVINO archive names.
- The runtime namespace for Python API has been marked as deprecated and designated to be removed for 2026.0. The new namespace structure has been delivered, and migration is possible immediately. Details will be communicated through warnings and via documentation.
More portability and performance to run AI at the edge, in the cloud, or locally.
Support Change and Deprecation Notices
MultiCortex 1.0.4 : 20 December 2024
OLLAMA 5.1
-
What's Changed
- Fixed issue where Ollama's API would generate JSON output when specifying "format": null
- Fixed issue where passing --format json to ollama run would cause an error
- Llama 3.3: a new state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.
- Snowflake Arctic Embed 2: Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.
- Fixed error importing model vocabulary files
- Experimental: new flag to set KV cache quantization to 4-bit (q4_0), 8-bit (q8_0) or 16-bit (f16). This reduces VRAM requirements for longer context windows.
OpenVINO 2024.6.0
-
Summary of major features and improvements
- OpenVINO 2024.6 release includes updates for enhanced stability and improved LLM performance.
- Introduced support for Intel Arc B-Series Graphics (formerly known as Battlemage).
- Implemented optimizations to improve the inference time and LLM performance on NPUs.
- Improved LLM performance with GenAI API optimizations and bug fixes.
- Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
- Runtime components:
- Intel Gaussian & Neural Accelerator (Intel GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel Core™ Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
- All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
- 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
- Deployment Manager. See installation and deployment guides for current distribution options.
- Accuracy Checker.
- Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
- A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
- Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
- As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.
Support Change and Deprecation Notices
Discontinued in 2024.0