Apple Unveils LazyLLM for Efficient AI and Open-Source Model DCLM-Baseline 7B

Apple has launched LazyLLM, a groundbreaking technique to enhance LLM inference efficiency by selectively computing key-value (KV) pairs for crucial tokens during pre-filling and decoding. Detailed in a recent research paper by Qichen Fu, Thomas Merth, Sachin Mehta, and Mahyar Najibi from Apple, and Mohammad Rastegari from Meta, LazyLLM boosts response generation in transformer-based models, particularly in long-context scenarios, without sacrificing accuracy. By deferring the computation of less relevant tokens, this method reduces the pre-filling stage's computational load, enabling more agile AI systems.

Additionally, Apple introduced DCLM-Baseline 7B, an open-source LLM with 7 billion parameters, trained on 2.5 trillion tokens primarily in English. Available on Hugging Face and Transformers under the Apple Sample Code License, this model, trained with PyTorch and OpenLM, rivals closed-dataset models like Mistral in performance. This release follows the introduction of Apple Intelligence at WWDC 2024, aimed at enhancing Siri with generative AI capabilities.