Pakistan to Develop Urdu LLM for Generative AI
National University of Science and Technology (NUST), National Information Technology Board (NITB) and Telecom network operator Jazz have signed a Memorandum of Understanding (MOU) to develop Pakistan’s first indigenous Large Language Model (LLM) with focus on Urdu, including datasets for Pashto and Punjabi languages. It is aimed at empowering individuals, businesses, and organizations with advanced AI tools in their native languages. The envisioned LLM is expected to drive innovation in Generative AI applications, boosting productivity and accessibility in critical sectors like healthcare, education, and agriculture.
![]() |
GPT-4 Accuracy Scores. Source: The Economist |
Generative AI tools such as ChatGPT are powered by large language models, or LLMs. These models need to be trained on vast amounts of data in specific languages to be useful. Unfortunately, the Urdu content of the Internet is less than 0.1%. This will present a challenge for the developers of Urdu LLMs.
![]() |
Online Content of Various Languages. Source: W3Techs |
Lack of Urdu content available for training ChatGPT affects the accuracy of the results for Urdu language users. For example, the GPT-4 accuracy score in question-answer tests in Urdu is just over 70%, compared with 85% accuracy score in the English language, according to data from OpenAI. Other South Asian languages, including Hindi, Bengali, Punjabi, Marathi and Telugu, suffer from the same problem.
It's not just a South Asian problem. These challenges exist in the developing world. Non-European languages are generally poorly represented online. It's a major obstacle for non-European nations in developing their own generative artificial-intelligence (AI) models, which rely on vast amounts of training data. Generative artificial intelligence (AI) can produce biased results due to a number of factors, including the data it's trained on, the algorithms used, and how it's deployed.
The use of AI in developing nations such as Pakistan will remain limited to a small number of people proficient in the use of the English language. Broadening the adoption of AI applications will require LLMs trained on local language content. The absence of this development could cost Pakistan the opportunity to take full advantage of the AI Revolution.
Related Links:
2021: A Banner Year For Tech Startups in Pakistan
Algorithm: Origins of Artificial Intelligence in Islamic Age
Digital Pakistan 2022: Broadband Penetration Soars to 90% of 15+ Population
STEM Enrollment in Pakistan Exceeds One Million
Digital Public Infrastructure in Pakistan
Generative AI Buzz in Pakistan
Is Pakistan Ready for the AI Revolution?
Growing Presence of Pakistani Women in Science and Technology
Riaz Haq's Youtube Channel
Comments
https://www.globenewswire.com/news-release/2024/10/24/2968536/0/en/VEON-s-Jazz-Launches-FikrFree-An-AI-Powered-Digital-Marketplace-to-Unlock-Affordable-Insurance-and-Healthcare-in-Pakistan.html
VEON Ltd. (Nasdaq: VEON, Euronext Amsterdam: VEON), a global digital operator (“VEON” or the “Company”), today announces that Jazz, its digital operator in Pakistan, has launched FikrFree, a new AI-powered digital marketplace for insurance and healthcare. The platform aims to bridge a significant gap in Pakistan, where insurance sector penetration is less than 1% of GDP according to the Securities and Exchange Commission of Pakistan, and millions lack access to essential healthcare. In comparison, insurance penetration in other countries is significantly higher (over 7% of GDP in the US and more than 9% of GDP in the UK, according to the World Bank). FikrFree helps users find accessible and affordable coverage through personalized insurance plans and healthcare services.
FikrFree aims to reach the underserved healthcare market in Pakistan through an innovative platform that seamlessly integrates insurance, healthcare, and financial services all in one mobile app. FikrFree also leverages artificial intelligence to recommend personalized insurance plans for customers. The new digital service builds on VEON’s commitment to creating innovative digital solutions as part of its Digital Operator 1440 strategy, offering customers a portfolio of connected services that are relevant for each of the 1,440 minutes in a day. In 2Q24, direct digital revenues represented over 10% of VEON Group’s total revenues.
"Access to affordable healthcare is a fundamental need. In Pakistan, where millions struggle to find suitable insurance coverage and healthcare services, VEON is addressing this challenge with connected digital services. With the launch of FikrFree, we are empowering customers to access personalized insurance plans, specialist doctors, and on-demand medicine delivery—all in one seamless platform. Our digital operator strategy focuses on investing in services that enhance lives, and with FikrFree, we aim to make affordable healthcare accessible to all Pakistanis," says Kaan Terzioglu, CEO of VEON Group.
https://www.unodc.org/copak/en/Stories/SP4/unodc-pakistan-provided-law-enforcement-with-cutting-edge-training-on-crime-analytics-and-ai-models-to-counter-terrorism.html
28 September 2024, Islamabad - UNODC Pakistan organized a comprehensive workshop aimed at building the capacity of National Counter Terrorism Authority analyst’s in using advanced crime analytics and artificial intelligence (AI) to combat terrorism. The workshop covered a wide range of critical topics, equipping participants with the skills and knowledge needed to analyze data and counter terrorism through innovative AI techniques. In total 25 analysts including 7 women participated in the training session.
The participants were introduced to the fundamentals of intelligence gathering, the intelligence cycle, and the development of intelligence products. Practical discussions were held around strategic intelligence and its pivotal role in decision-making. Participants also reviewed products developed in earlier training sessions on i2 Analyst's Notebook and Power BI, enabling them to grasp how past learnings integrate with the current focus on terrorism prevention. The workshop covered data analysis, beginning with an introduction to various data forms and their relevance in crime intelligence. Sessions covered both qualitative and quantitative data, with participants learning how to distinguish between structured and unstructured data and their real-world applications in intelligence work.
The hands-on segment includes Textalyser, an online tool used to analyze qualitative data specially for conducting sentimental analysis allowing participants to experiment with real-world examples. Participants were engaged through thought-provoking case studies, including analyses of social media sentiment and notable incidents such as the Al Qaeda network and the Sialkot lynching case. These examples highlighted the practical value of AI tools like Voyant in unraveling criminal networks and understanding public sentiment related to terrorist activities.
The overall workshop was dedicated to hands-on sessions with low-code and no-code AI platforms, empowering participants to leverage AI without the need for extensive programming knowledge. Practical exercises included case studies using Google Teachable Machines for image classification and Google Cloud AutoML for predictive crime analytics, both of which offer powerful tools for identifying criminal patterns and behaviors in complex datasets.
The workshop concluded with a closing session that recapped the key learnings and allowed participants to discuss the next steps in their professional development.
https://arxiv.org/html/2407.04459v1
In this paper, we compare general-purpose pretrained models, (OpenAI's) GPT-4-Turbo and (Meta/Facebook) Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo and Llama-3-8b-Instruct. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation by Llama-3-8b-Instruct. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.
https://www.cbsnews.com/news/labelers-training-ai-say-theyre-overworked-underpaid-and-exploited-60-minutes-transcript/
Naftali Wambalo: I did labeling for videos and images.
Naftali and digital workers like him, spent eight hours a day in front of a screen studying photos and videos, drawing boxes around objects and labeling them, teaching the AI algorithms to recognize them.
Naftali Wambalo: You'd label, let's say, furniture in a house. And you say "This is a TV. This is a microwave." So you are teaching the AI to identify these items. And then there was one for faces of people. The color of the face. "If it looks like this, this is white. If it looks like this, it's Black. This is Asian." You're teaching the AI to identify them automatically.
Humans tag cars and pedestrians to teach autonomous vehicles not to hit them. Humans circle abnormalities to teach AI to recognize diseases. Even as AI is getting smarter, humans in the loop will always be needed because there will always be new devices and inventions that'll need labeling.
Lesley Stahl: You find these humans in the loop not only here in Kenya but in other countries thousands of miles from Silicon Valley. In India, the Philippines, Venezuela - often countries with large low wage populations - well educated but unemployed.
Nerima Wako-Ojiwa: Honestly, it's like modern-day slavery. Because it's cheap labor–
Lesley Stahl: Whoa. What do you –
Nerima Wako-Ojiwa: It's cheap labor.
Like modern day slavery, says Nerima Wako-Ojiwa, a Kenyan civil rights activist, because big American tech companies come here and advertise the jobs as a ticket to the future. But really, she says, it's exploitation.
Nerima Wako-Ojiwa: What we're seeing is an inequality.
Lesley Stahl: It sounds so good. An AI job! Is there any job security?
Nerima Wako-Ojiwa: The contracts that we see are very short-term. And I've seen people who have contracts that are monthly, some of them weekly, some of them days. Which is ridiculous.
She calls the workspaces AIi sweatshops with computers instead of sewing machines.
Nerima Wako-Ojiwa: I think that we're so concerned with "creating opportunities," but we're not asking, "Are they good opportunities?"
Because every year a million young people enter the job market, the government has been courting tech giants like Microsoft, Google, Apple, and Intel to come here, promoting Kenya's reputation as the Silicon Savannah: tech savvy and digitally connected.
Nerima Wako-Ojiwa: The president has been really pushing for opportunities in AI –
Lesley Stahl: President?
Nerima Wako-Ojiwa: Yes.
--------------
Fasica: I was basically reviewing content which are very graphic, very disturbing contents. I was watching dismembered bodies or drone attack victims. You name it. You know, whenever I talk about this, I still have flashbacks.
Lesley Stahl: Are any of you a different person than they were before you had this job?
Fasica: Yeah. I find it hard now to even have conversations with people. It's just that I find it easier to cry than to speak.
Nathan: You continue isolating you-- yourself from people. You don't want to socialize with others. It's you and it's you alone.
Lesley Stahl: Are you a different person?
Naftali Wambalo: Yeah. I'm a different person. I used to enjoy my marriage, especially when it comes to bedroom fireworks. But after the job I hate sex.
Lesley Stahl: You hated sex?
---------
These three and nearly 200 other digital workers are suing SAMA and Meta over "unreasonable working conditions" that caused psychiatric problems
@globaltimesnews
AI is rapidly transforming various industries in China, creating numerous job opportunities, including in the field of data labeling. Recently, Global Times reporters visited the Ningxia Artificial Intelligence Industrial Park in Wuzhong, the Ningxia Hui Autonomous Region in Northwest China, to explore how AI, as a new driving force in productivity, is generating not only new employment opportunities but also new challenges and trends. At a local data labeling base, young annotators can be seen busily identifying specific words in text or speech, outlining objects in images or videos, and tagging them on their computers.
https://x.com/globaltimesnews/status/1869594668180369511
-------------------
Inside the Ningxia Data Labeling Industrial Base in NW China
https://www.globaltimes.cn/galleries/5598.html
Editor's Note:
AI is rapidly transforming various industries in China, creating numerous job opportunities, including in the field of data labeling. Recently, Global Times reporters visited the Ningxia Artificial Intelligence Industrial Park in Wuzhong, the Ningxia Hui Autonomous Region in Northwest China, to explore how AI, as a new driving force in productivity, is generating not only new employment opportunities but also new challenges and trends. At a local data labeling base, young annotators can be seen busily identifying specific words in text or speech, outlining objects in images or videos, and tagging them on their computers. (Photos: Chen Tao/GT)
https://www.cnbc.com/2025/01/24/how-chinas-new-ai-model-deepseek-is-threatening-us-dominance.html
A little-known AI lab out of China has ignited panic throughout Silicon Valley after releasing AI models that can outperform America's best despite being built more cheaply and with less-powerful chips.
DeepSeek, as the lab is called, unveiled a free, open-source large-language model in late December that it says took only two months and less than $6 million to build, using reduced-capability chips from Nvidia called H800s.
------------------
China’s cheap, open AI model DeepSeek thrills scientists
https://www.nature.com/articles/d41586-025-00229-6
A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.
These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 — which wowed researchers when it was released by OpenAI in September.
“This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.
R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.
“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.
--------------
China’s AI industry has almost caught up with America’s And it is more open and more efficient, too
https://www.economist.com/briefing/2025/01/23/chinas-ai-industry-has-almost-caught-up-with-americas
The WORLD’s first “reasoning model”, an advanced form of artificial intelligence, was released in September by OpenAI, an American firm. o1, as it is called, uses a “chain of thought” to answer difficult questions in science and mathematics, breaking down problems to their constituent steps and testing various approaches to the task behind the scenes before presenting a conclusion to the user. Its unveiling set off a race to copy this method. Google came up with a reasoning model called “Gemini Flash Thinking” in December. OpenAI responded with o3, an update of o1, a few days later.
https://propakistani.pk/2025/03/22/pakistan-launches-its-first-homegrown-ai-chatbot-tailored-for-local-needs/
Pakistan has achieved a notable milestone in the tech sector with the beta launch of Zahanat AI, the nation’s first locally developed artificial intelligence chatbot. Spearheaded by entrepreneur Mehwish Salman Ali, co-founder and CEO of Data Vault and Zahanat AI, the platform promises to address Pakistan’s unique challenges with culturally sensitive and locally relevant solutions.
Zahanat AI is the culmination of years of development, operating from a dedicated data center in Karachi since 2022. This means that the AI model’s data stays in Pakistan for processing and doesn’t go anywhere else. This data center is connected with high-speed internet and has robust DDoS protection.
The system utilizes a mixed GPU architecture, leveraging both Nvidia GPUs and chips, initially incorporating used gaming GPUs to build its computational power. It was a relatively low-cost development project. The owners said that DeepSeek cost $5 million to make, but Zahanat AI cost less, without specifying exactly how much.
Mehwish Salman Ali said:
Our goal was to create an AI that understands and responds to the specific needs of Pakistan. Zahanat AI is trained on a massive dataset of 2 billion parameters, all processed and stored within Pakistan, ensuring cultural awareness and relevance.
A key distinguishing feature of Zahanat AI is its focus on ethical and responsible AI development. The platform is specifically trained to censor sensitive topics and avoid discussions about particular individuals.
The beta launch of Zahanat AI is currently on an invitational basis, requiring interested users to submit their email and personal details, including their profession, work email, and social media accounts. This selective access aims to gather valuable feedback and refine the platform before a wider public release.
ALSO READ
Baidu’s New AI Model “Rivals Deepseek At Half the Price”
Ali further explained:
By training Zahanat AI on Pakistan’s data, we are building a tool that can provide Pakistan-focused solutions to the diverse problems faced by our communities. We believe this technology has the potential to transform various sectors, from education and healthcare to business and governance.
For now, the beta launch has introduced Zahanat AI’s initial Z1 model, with plans to improve and expand further in the future with Z2 and so on. The Z2 model will introduce multilingual support for all languages in Pakistan and will get voice input as well.
https://www.arabnews.com/node/2594353/pakistan
Zahanat AI is a text-based generative AI model that enables users to engage in human-like conversations, answer queries, and assist in various domains
Its key differentiator is its hosting and local training on Pakistani culture and localized issues, which makes it equipped to address regional challenges
---------------
Meet the woman who made Pakistan's first AI chatbot
https://theprint.in/go-to-pakistan/woman-pakistans-first-ai-chatbot-mehwish-salman-ali/2567308/
The platform is designed to empower Pakistani citizens, particularly in sectors like education and healthcare. One of Zahanat’s most anticipated developments is the upcoming Z2 model, which will support Urdu and multiple regional languages. This is a game-changer for more than half of Pakistan’s population, a large part of which struggleswith English or even Urdu.
Ali imagines the platform being used by a rural student to access world-class education in their native Sindhi or Pashto. She dreams of Zahanat helping an elderly womanreceive her healthcare diagnosis in Balochi.
“We’re not just enabling access to AI, we’re redefining who gets to be part of the ecosystem. We’re moving from a digital divide to digital empowerment. This isn’t just tech progress. It’s social progress,” Ali said.
Zahanat is a personal mission for Ali to break the gender barriers that persist in tech. She has faced bias in the male-dominated industry, both spoken and unspoken.
“When I lead a project like Zahanat, it’s not just innovation, it’s disruption. It’s proof that women can lead tech.”
------------
Kineto: K-Electric Becomes Pakistan’s First Power Utility to launch Generative AI Chatbot To Enhance Customer Experience
https://propakistani.pk/2025/03/26/kineto-k-electric-becomes-pakistans-first-power-utility-to-launch-generative-ai-chatbot-to-enhance-customer-experience/
“Users of the KE Live App have grown by 21% annually over the last 5 years and now stand at 1.3 million digitally connected customers. This is over one-third of KE’s total customer base and conveys our digital-savvy population. We then heralded another innovation when we launched the WhatsApp platform back in 2021, and now this platform caters to over 2.0 million people. Additionally, nearly another half a million subscribe to our e-billing feature, a step that helps save Pakistan paper and reduce its import bill.”
“Now, leading the way with digital transformation in customer engagement, Kineto was just the next logical step forward reflecting our investment in future-ready digital platforms, further transforming the way Karachi’s customers interact with its power utility.”
The chatbot has been developed in collaboration with Convex Interactive, KE’s technology partner for this initiative.
“This partnership with K-Electric aligns with our mission to revolutionize customer engagement through AI,” said Aamir Irfan Siddiqui, CEO & Founder of Convex Interactive. “By leveraging generative AI, we’re making customer interactions faster, smarter, and more intuitive.”
https://www.thenews.com.pk/print/1296015-tcf-set-to-bring-ai-powered-learning-to-teachers-with-khanmigo
The Citizens Foundation (TCF) and Khan Academy Pakistan have announced an innovative AI-powered collaboration to support teachers and enhance classroom learning in selected TCF schools.
This pilot initiative aims to empower teachers by enhancing teachers’ lesson delivery, fostering critical thinking, and improving classroom engagement for students in Grades 6-8. Under this collaboration, Khanmigo will be integrated into selected TCF schools to enhance mathematics and science instruction.
Unlike traditional AI, Khanmigo acts as an interactive teaching assistant, helping educators enhance their knowledge, craft lesson hooks, develop quizzes, and foster deeper student engagement.
The pilot programme will equip teachers with AI-driven teacher tools, provide structured prompts to guide teachers to develop learning material relevant to their students, and offer bilingual support in English and Urdu.
Additionally, Khan Academy Pakistan will train school leaders on effective AI integration, offering guidance on best practices for using Khanmigo in classrooms. This initiative will empower TCF teachers to refine their teaching methods, personalise learning experiences, and drive meaningful classroom discussions, making AI-driven learning more accessible, structured, and engaging for students. “At TCF, we want to ensure that technology serves as a bridge to better learning opportunities rather than a barrier,” shared Syed Asaad Ayub Ahmad, the president and CEO of TCF.
“We are hopeful that Khanmigo will be useful in serving as a thinking partner for TCF teachers in the classroom and a transformative step towards making high-quality education accessible and engaging.”
One of Khanmigo’s most promising features is its bilingual support, allowing teachers to instruct in both English and Urdu. This ensures that educators from diverse backgrounds can fully engage with the content. As the programme progresses, regional language support will be explored, further broadening its accessibility.
“Khanmigo aims to give every child in Pakistan access to world-class education,” said Zeeshan Hasan, CEO of Khan Academy Pakistan. “By empowering teachers, we are ensuring that AI becomes a tool for empowerment rather than a shortcut. This partnership with TCF is a step forward towards transforming how education is delivered in classrooms.”
“TCF strongly believes in the power of good teachers, and there is an undeniable social aspect of learning from a teacher. We are hopeful that KhanMigo will augment teacher skills to make classroom experience fun, engaging, and meaningful for the students,” shared Shazia Kamal, executive vice president, Outcomes at TCF.
With Pakistan facing a critical education crisis and a shortage of trained teachers, AI-powered solutions like Khanmigo offer a scalable and cost-effective way to enhance teaching quality.
While this initiative is currently in its pilot phase, TCF and Khan Academy Pakistan envision expanding the programme to more schools.
As AI continues to reshape global education, this partnership reaffirms TCF’s commitment to equipping teachers with the best tools to inspire and educate the next generation of Pakistan’s changemakers.
TCF is a non-profit organisation set up in 1995 by a group of citizens who wanted to bring about positive social change through education.
The 30-year-old organisation is among Pakistan’s leading organisations in the field of education, educating 301,000 students across 2,033 school units in the country.
https://propakistani.pk/2025/04/07/pakistani-developer-builds-first-ai-voice-tool-for-sindhi-users/
A young Pakistani developer has successfully managed to create the first-ever AI tool to assist with the Sindhi language. These tools enable text-to-speech (TTS) and speech-to-text (SST) in Sindhi for the first time.
The 23-year-old software developer from Hyderabad, Fahad Maqsood Qazi, began work last year on an AI-based dubbing system for his company, Flis Technologies. During development, he realized there were no basic text-to-speech (TTS) or speech-to-text (STT) tools for Sindhi—a language spoken by nearly 40 million people worldwide.
Starting from Scratch
In August 2023, Qazi began gathering and transcribing hours of Sindhi audio from various sources, including YouTube videos, audiobooks, and news reports, to build a training dataset. Around the same time, he came across Mozilla’s Common Voice project, where Google employee Asad Memon had added Sindhi support.
Qazi merged that data with his own and began training AI models. By January 2024, he had built initial working versions of Sindhi TTS and STT systems. He also developed a tokenizer, a necessary tool for processing language in machine learning models, since one was not previously available for Sindhi.
Supporting Language Access
Sindhi is not formally taught in many countries where Sindhi-speaking communities live, which can result in younger generations being less familiar with the language. Qazi hopes his tools will make it easier for people to read, write, and speak Sindhi through digital platforms.
Qazi told Arab News:
My goal is to help them stay connected to it through speech and text tools. In many diaspora communities, younger Sindhis grow up without learning to read or write in their language.
In March, he uploaded his models to HuggingFace, which is essentially the GitHub for AI models, allowing developers and researchers access to his work.
Everyday Use and Accessibility
Qazi’s models could help Sindhi speakers send messages using speech input or listen to written text read aloud in Sindhi. These tools may also assist older adults and people with limited formal education in using the language in everyday communication.
Qazi said:
A person who can’t read Sindhi could use the TTS model to hear written stories. Or someone who never learned to write could still search for information and get answers by speaking.
Long-Term Potential
Qazi believes that the addition of Sindhi to tools like TTS and STT is necessary for the language to remain relevant in digital communication and technology.
“Without access to tools like these, Sindhi could be excluded from digital spaces,” he said. “Now it can be part of systems like voice interfaces, educational resources, and translation tools.”
By addressing a basic gap in language technology, Qazi’s work gives others a foundation to build further tools for Sindhi users, ensuring better access and usability in an increasingly digital world.