Pakistan to Develop Urdu LLM for Generative AI

National University of Science and Technology (NUST), National Information Technology Board (NITB) and Telecom network operator Jazz have signed a Memorandum of Understanding (MOU) to develop Pakistan’s first indigenous Large Language Model (LLM) with focus on Urdu, including datasets for Pashto and Punjabi languages. It is aimed at empowering individuals, businesses, and organizations with advanced AI tools in their native languages. The envisioned LLM is expected to drive innovation in Generative AI applications, boosting productivity and accessibility in critical sectors like healthcare, education, and agriculture.

GPT-4 Accuracy Scores. Source: The Economist


Generative AI tools such as ChatGPT are powered by large language models, or LLMs. These models need to be trained on vast amounts of data in specific languages to be useful. Unfortunately, the Urdu content of the Internet is less than 0.1%. This will present a challenge for the developers of Urdu LLMs.

Online Content of Various Languages. Source: W3Techs 


Lack of Urdu content available for training ChatGPT affects the accuracy of the results for Urdu language users. For example, the GPT-4 accuracy score in question-answer tests in Urdu is just over 70%, compared with 85% accuracy score in the English language, according to data from OpenAI. Other South Asian languages, including Hindi, Bengali, Punjabi, Marathi and Telugu, suffer from the same problem. 

It's not just a South Asian problem. These challenges exist in the developing world. Non-European languages are generally poorly represented online. It's a major obstacle for non-European nations in developing their own generative artificial-intelligence (AI) models, which rely on vast amounts of training data. Generative artificial intelligence (AI) can produce biased results due to a number of factors, including the data it's trained on, the algorithms used, and how it's deployed. 

The use of AI in developing nations such as Pakistan will remain limited to a small number of people proficient in the use of the English language. Broadening the adoption of AI applications will require LLMs trained on local language content. The absence of this development could cost Pakistan the opportunity to take full advantage of the AI Revolution


Comments

Riaz Haq said…
VEON’s Jazz Launches FikrFree: An AI-Powered Digital


https://www.globenewswire.com/news-release/2024/10/24/2968536/0/en/VEON-s-Jazz-Launches-FikrFree-An-AI-Powered-Digital-Marketplace-to-Unlock-Affordable-Insurance-and-Healthcare-in-Pakistan.html

VEON Ltd. (Nasdaq: VEON, Euronext Amsterdam: VEON), a global digital operator (“VEON” or the “Company”), today announces that Jazz, its digital operator in Pakistan, has launched FikrFree, a new AI-powered digital marketplace for insurance and healthcare. The platform aims to bridge a significant gap in Pakistan, where insurance sector penetration is less than 1% of GDP according to the Securities and Exchange Commission of Pakistan, and millions lack access to essential healthcare. In comparison, insurance penetration in other countries is significantly higher (over 7% of GDP in the US and more than 9% of GDP in the UK, according to the World Bank). FikrFree helps users find accessible and affordable coverage through personalized insurance plans and healthcare services.

FikrFree aims to reach the underserved healthcare market in Pakistan through an innovative platform that seamlessly integrates insurance, healthcare, and financial services all in one mobile app. FikrFree also leverages artificial intelligence to recommend personalized insurance plans for customers. The new digital service builds on VEON’s commitment to creating innovative digital solutions as part of its Digital Operator 1440 strategy, offering customers a portfolio of connected services that are relevant for each of the 1,440 minutes in a day. In 2Q24, direct digital revenues represented over 10% of VEON Group’s total revenues.

"Access to affordable healthcare is a fundamental need. In Pakistan, where millions struggle to find suitable insurance coverage and healthcare services, VEON is addressing this challenge with connected digital services. With the launch of FikrFree, we are empowering customers to access personalized insurance plans, specialist doctors, and on-demand medicine delivery—all in one seamless platform. Our digital operator strategy focuses on investing in services that enhance lives, and with FikrFree, we aim to make affordable healthcare accessible to all Pakistanis," says Kaan Terzioglu, CEO of VEON Group.
Riaz Haq said…
UNODC Pakistan provided Law Enforcement with Cutting-Edge Training on Crime Analytics and AI Models to Counter Terrorism


https://www.unodc.org/copak/en/Stories/SP4/unodc-pakistan-provided-law-enforcement-with-cutting-edge-training-on-crime-analytics-and-ai-models-to-counter-terrorism.html


28 September 2024, Islamabad - UNODC Pakistan organized a comprehensive workshop aimed at building the capacity of National Counter Terrorism Authority analyst’s in using advanced crime analytics and artificial intelligence (AI) to combat terrorism. The workshop covered a wide range of critical topics, equipping participants with the skills and knowledge needed to analyze data and counter terrorism through innovative AI techniques. In total 25 analysts including 7 women participated in the training session.

The participants were introduced to the fundamentals of intelligence gathering, the intelligence cycle, and the development of intelligence products. Practical discussions were held around strategic intelligence and its pivotal role in decision-making. Participants also reviewed products developed in earlier training sessions on i2 Analyst's Notebook and Power BI, enabling them to grasp how past learnings integrate with the current focus on terrorism prevention. The workshop covered data analysis, beginning with an introduction to various data forms and their relevance in crime intelligence. Sessions covered both qualitative and quantitative data, with participants learning how to distinguish between structured and unstructured data and their real-world applications in intelligence work.

The hands-on segment includes Textalyser, an online tool used to analyze qualitative data specially for conducting sentimental analysis allowing participants to experiment with real-world examples. Participants were engaged through thought-provoking case studies, including analyses of social media sentiment and notable incidents such as the Al Qaeda network and the Sialkot lynching case. These examples highlighted the practical value of AI tools like Voyant in unraveling criminal networks and understanding public sentiment related to terrorist activities.

The overall workshop was dedicated to hands-on sessions with low-code and no-code AI platforms, empowering participants to leverage AI without the need for extensive programming knowledge. Practical exercises included case studies using Google Teachable Machines for image classification and Google Cloud AutoML for predictive crime analytics, both of which offer powerful tools for identifying criminal patterns and behaviors in complex datasets.

The workshop concluded with a closing session that recapped the key learnings and allowed participants to discuss the next steps in their professional development.
Riaz Haq said…
Generalists vs. Specialists: Evaluating Large Language Models for Urdu


https://arxiv.org/html/2407.04459v1

In this paper, we compare general-purpose pretrained models, (OpenAI's) GPT-4-Turbo and (Meta/Facebook) Llama-3-8b-Instruct with special-purpose models fine-tuned on specific tasks, XLM-Roberta-large, mT5-large, and Llama-3-8b-Instruct. We focus on seven classification and six generation tasks to evaluate the performance of these models on Urdu language. Urdu has 70 million native speakers, yet it remains underrepresented in Natural Language Processing (NLP). Despite the frequent advancements in Large Language Models (LLMs), their performance in low-resource languages, including Urdu, still needs to be explored. We also conduct a human evaluation for the generation tasks and compare the results with the evaluations performed by GPT-4-Turbo and Llama-3-8b-Instruct. We find that special-purpose models consistently outperform general-purpose models across various tasks. We also find that the evaluation done by GPT-4-Turbo for generation tasks aligns more closely with human evaluation compared to the evaluation by Llama-3-8b-Instruct. This paper contributes to the NLP community by providing insights into the effectiveness of general and specific-purpose LLMs for low-resource languages.
Riaz Haq said…
Labelers training AI say they're overworked, underpaid and exploited by big American tech companies - CBS News

https://www.cbsnews.com/news/labelers-training-ai-say-theyre-overworked-underpaid-and-exploited-60-minutes-transcript/

Naftali Wambalo: I did labeling for videos and images.

Naftali and digital workers like him, spent eight hours a day in front of a screen studying photos and videos, drawing boxes around objects and labeling them, teaching the AI algorithms to recognize them.

Naftali Wambalo: You'd label, let's say, furniture in a house. And you say "This is a TV. This is a microwave." So you are teaching the AI to identify these items. And then there was one for faces of people. The color of the face. "If it looks like this, this is white. If it looks like this, it's Black. This is Asian." You're teaching the AI to identify them automatically.

Humans tag cars and pedestrians to teach autonomous vehicles not to hit them. Humans circle abnormalities to teach AI to recognize diseases. Even as AI is getting smarter, humans in the loop will always be needed because there will always be new devices and inventions that'll need labeling.

Lesley Stahl: You find these humans in the loop not only here in Kenya but in other countries thousands of miles from Silicon Valley. In India, the Philippines, Venezuela - often countries with large low wage populations - well educated but unemployed.

Nerima Wako-Ojiwa: Honestly, it's like modern-day slavery. Because it's cheap labor–

Lesley Stahl: Whoa. What do you –

Nerima Wako-Ojiwa: It's cheap labor.

Like modern day slavery, says Nerima Wako-Ojiwa, a Kenyan civil rights activist, because big American tech companies come here and advertise the jobs as a ticket to the future. But really, she says, it's exploitation.

Nerima Wako-Ojiwa: What we're seeing is an inequality.

Lesley Stahl: It sounds so good. An AI job! Is there any job security?

Nerima Wako-Ojiwa: The contracts that we see are very short-term. And I've seen people who have contracts that are monthly, some of them weekly, some of them days. Which is ridiculous.

She calls the workspaces AIi sweatshops with computers instead of sewing machines.

Nerima Wako-Ojiwa: I think that we're so concerned with "creating opportunities," but we're not asking, "Are they good opportunities?"

Because every year a million young people enter the job market, the government has been courting tech giants like Microsoft, Google, Apple, and Intel to come here, promoting Kenya's reputation as the Silicon Savannah: tech savvy and digitally connected.

Nerima Wako-Ojiwa: The president has been really pushing for opportunities in AI –

Lesley Stahl: President?

Nerima Wako-Ojiwa: Yes.

--------------

Fasica: I was basically reviewing content which are very graphic, very disturbing contents. I was watching dismembered bodies or drone attack victims. You name it. You know, whenever I talk about this, I still have flashbacks.

Lesley Stahl: Are any of you a different person than they were before you had this job?

Fasica: Yeah. I find it hard now to even have conversations with people. It's just that I find it easier to cry than to speak.

Nathan: You continue isolating you-- yourself from people. You don't want to socialize with others. It's you and it's you alone.

Lesley Stahl: Are you a different person?

Naftali Wambalo: Yeah. I'm a different person. I used to enjoy my marriage, especially when it comes to bedroom fireworks. But after the job I hate sex.

Lesley Stahl: You hated sex?

---------
These three and nearly 200 other digital workers are suing SAMA and Meta over "unreasonable working conditions" that caused psychiatric problems

Riaz Haq said…
Global Times
@globaltimesnews
AI is rapidly transforming various industries in China, creating numerous job opportunities, including in the field of data labeling. Recently, Global Times reporters visited the Ningxia Artificial Intelligence Industrial Park in Wuzhong, the Ningxia Hui Autonomous Region in Northwest China, to explore how AI, as a new driving force in productivity, is generating not only new employment opportunities but also new challenges and trends. At a local data labeling base, young annotators can be seen busily identifying specific words in text or speech, outlining objects in images or videos, and tagging them on their computers.

https://x.com/globaltimesnews/status/1869594668180369511

-------------------
Inside the Ningxia Data Labeling Industrial Base in NW China

https://www.globaltimes.cn/galleries/5598.html

Editor's Note:
AI is rapidly transforming various industries in China, creating numerous job opportunities, including in the field of data labeling. Recently, Global Times reporters visited the Ningxia Artificial Intelligence Industrial Park in Wuzhong, the Ningxia Hui Autonomous Region in Northwest China, to explore how AI, as a new driving force in productivity, is generating not only new employment opportunities but also new challenges and trends. At a local data labeling base, young annotators can be seen busily identifying specific words in text or speech, outlining objects in images or videos, and tagging them on their computers. (Photos: Chen Tao/GT)
Riaz Haq said…
How China’s new AI model DeepSeek is threatening U.S. dominance


https://www.cnbc.com/2025/01/24/how-chinas-new-ai-model-deepseek-is-threatening-us-dominance.html

A little-known AI lab out of China has ignited panic throughout Silicon Valley after releasing AI models that can outperform America's best despite being built more cheaply and with less-powerful chips.

DeepSeek, as the lab is called, unveiled a free, open-source large-language model in late December that it says took only two months and less than $6 million to build, using reduced-capability chips from Nvidia called H800s.

------------------
China’s cheap, open AI model DeepSeek thrills scientists


https://www.nature.com/articles/d41586-025-00229-6


A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.

These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems and could make them useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on par with that of o1 — which wowed researchers when it was released by OpenAI in September.

“This is wild and totally unexpected,” Elvis Saravia, an AI researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.

R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data has not been made available.

“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort o3 are “essentially black boxes”, he says.

--------------

China’s AI industry has almost caught up with America’s And it is more open and more efficient, too

https://www.economist.com/briefing/2025/01/23/chinas-ai-industry-has-almost-caught-up-with-americas

The WORLD’s first “reasoning model”, an advanced form of artificial intelligence, was released in September by OpenAI, an American firm. o1, as it is called, uses a “chain of thought” to answer difficult questions in science and mathematics, breaking down problems to their constituent steps and testing various approaches to the task behind the scenes before presenting a conclusion to the user. Its unveiling set off a race to copy this method. Google came up with a reasoning model called “Gemini Flash Thinking” in December. OpenAI responded with o3, an update of o1, a few days later.
Riaz Haq said…
Pakistan Launches Its First Homegrown AI Chatbot Zehanat AI Tailored for Local Needs



https://propakistani.pk/2025/03/22/pakistan-launches-its-first-homegrown-ai-chatbot-tailored-for-local-needs/


Pakistan has achieved a notable milestone in the tech sector with the beta launch of Zahanat AI, the nation’s first locally developed artificial intelligence chatbot. Spearheaded by entrepreneur Mehwish Salman Ali, co-founder and CEO of Data Vault and Zahanat AI, the platform promises to address Pakistan’s unique challenges with culturally sensitive and locally relevant solutions.

Zahanat AI is the culmination of years of development, operating from a dedicated data center in Karachi since 2022. This means that the AI model’s data stays in Pakistan for processing and doesn’t go anywhere else. This data center is connected with high-speed internet and has robust DDoS protection.

The system utilizes a mixed GPU architecture, leveraging both Nvidia GPUs and chips, initially incorporating used gaming GPUs to build its computational power. It was a relatively low-cost development project. The owners said that DeepSeek cost $5 million to make, but Zahanat AI cost less, without specifying exactly how much.



Mehwish Salman Ali said:

Our goal was to create an AI that understands and responds to the specific needs of Pakistan. Zahanat AI is trained on a massive dataset of 2 billion parameters, all processed and stored within Pakistan, ensuring cultural awareness and relevance.

A key distinguishing feature of Zahanat AI is its focus on ethical and responsible AI development. The platform is specifically trained to censor sensitive topics and avoid discussions about particular individuals.

The beta launch of Zahanat AI is currently on an invitational basis, requiring interested users to submit their email and personal details, including their profession, work email, and social media accounts. This selective access aims to gather valuable feedback and refine the platform before a wider public release.

ALSO READ
Baidu’s New AI Model “Rivals Deepseek At Half the Price”
Ali further explained:

By training Zahanat AI on Pakistan’s data, we are building a tool that can provide Pakistan-focused solutions to the diverse problems faced by our communities. We believe this technology has the potential to transform various sectors, from education and healthcare to business and governance.

For now, the beta launch has introduced Zahanat AI’s initial Z1 model, with plans to improve and expand further in the future with Z2 and so on. The Z2 model will introduce multilingual support for all languages in Pakistan and will get voice input as well.

Popular posts from this blog

Pakistani Women's Growing Particpation in Workforce

Pakistan's Saadia Zahidi Leads World Economic Forum's Gender Parity Effort

Pakistan Among World's Largest Food Producing Countries