Core42, a G42 company, announced the launch of Jais 30B, the newest version of its open-source Arabic Large Language Model (LLM). Featuring 30 billion parameters, this follows the release in August 2023 of the 13 billion parameter model for the over 400 million Arabic speakers worldwide.
Jais was born from the collaboration between Inception – now converged into Core42 -, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), the world’s first graduate research university dedicated to AI, and Cerebras Systems. The model was trained on the Condor Galaxy-1 (CG-1) – an AI supercomputer, with 4 exaFLOPS of training compute, 54 million cores, and 64 nodes – built by G42 in partnership with Cerebras Systems. Jais 13B went from concept to fine-tuned, leading open-source model in less than four months. Notably, the production training run for Jais 13B was completed in 21 days on CG-1.
The new Jais 30B model was trained on a substantially larger dataset than its predecessor, made of 126 billion Arabic tokens, 251 billion English tokens, and 50 billion code tokens, and shows an increased performance across all key indicators. It offers 160% longer and more detailed answers in Arabic and a 233% increase in English, reflecting significant improvements in language generation. The model also presents better performance in summarization (53% in Arabic and 85% in English) and formatting (130% in Arabic and 134% in English). Jais 30B performance is now on par with monolingual English models and outperforms most open-source models in Foundation Model evaluations.
Jais 30B’s enhancements have been tested and validated using heuristic, cross-model comparison, and human evaluations, showing that the responses of the model’s iterations outperform those of Jais 13B 96% of the time in Arabic and 97% in English. The developing team has also further enhanced its processes and policies to guardrail biases and the production of hateful or harmful content by the model, a process made easier by its open-source release.
Jais’s capabilities in the Arabic language domain have already shown promise in applications across various sectors including telecommunications, energy, education, and healthcare as well as innovative solutions for the marketing communications industry.
Dr. Andrew Jackson, EVP, Chief AI Officer, Core42, said, "The launch of Jais 30B marks another significant milestone for Core42 and represents a giant leap forward for the Arabic-speaking world in harnessing the potential of generative AI. This release underscores the powerful synergy between Core42's technological leadership, our extensive partner ecosystem, and our shared dedication to pushing the boundaries of what's possible in the field of AI. I eagerly anticipate close collaboration with our customers and partners to explore new applications and continually enhance the model's capabilities, as we intensify our efforts to create top-quality LLMs for various other languages."
Andrew Feldman, CEO and co-founder, Cerebras Systems said, “Less than eight weeks after we introduced Jais 13B to the global Arabic-speaking community, the Core42 and Cerebras teams have delivered a new state-of-the-art LLM that is more than double in size. Jais 30B leverages the incredible, massive compute of Condor Galaxy 1 to set another record in bilingual performance and impressively fast training time.”
To know more about how Jais 30B was trained and benchmarks against other models, you can read the model’s blog post on G42’s website: https://www.g42.ai/resources/publications/Jais-30B