The Oraqu Project
In 2022, a team of artists, scholars and software engineers began an experiment which became The Oraqu, an interactive experience that give users a unique access to a sonic archive, based on the mechanics and architecture of the Ifá divination system of the Yorùbá people; invented by the Yorùbá father of philosophy - Orunmila - thousands of years ago, as a standardized binary-based protocol adapted for organizing, processing, and retrieving the vast information within a dynamic oral knowledge database. Ifá is both an encyclopedia and a divination technique that relies on the analysis of a byte of binary patterns that are interpreted by a diviner (Ifá priest or Babalawo). It consists of 16 major Odus (pot), each of which are 16 chapters deep, making it 16 X 16 = 256 Odus (Oyebisi. 2019), and each Odu has an unknowable number of verses (Ẹsẹ Ifá). The 256 Odus are each mapped to specific divination patterns on a divining chain. Babalawos memorize the innumerable verses connected to each of these 256 specific divination patterns/signatures and interpret them for the querent.
The Ifá divination system uses a divining chain with 8 nuts, each with an “on/off” position. When twirled and thrown to the ground, the chain forms a specific signature of on and off codes that are mapped to each of the 256 Odus. This method of storing and archiving data using 0/1s or on/off is the exact same system that our modern-day computers use. Similar to the Ifá Divination system, the binary counting system is used by computers to control states and store large data. The current Oraqu app attempts to simulate the experience of Ifá. Similar to Ifá divination, it hosts a series of 256 Odus that are played to users as they access the app either through a playlist or divination mode. These sounds are stored in the app and are mapped to a specific divination “signature” as an intuitive repository of oral wisdom, a curated archive that consists of philosophical thinkings, scientific fictions, and spirituality; various cosmologies and mythologies, speeches from diverse historical figures and accounts of past events; poetry, music, and soundscapes. It is a fine collection of thoughts that collectively offer a doorway for insight into epistemes of the visible and invisible worlds.
As a basic introduction, the Yorùbá people are currently one of the largest ethnic groups in Africa, with a sizable Diaspora community around the world, and a wealth of cultural heritage and knowledge systems that have been passed down from generation to generation. Excavated materials from Yorùbáland indicate a wealthy, sophisticated society with an established monarchy. Radiocarbon dating places the “classical” period of Yorùbá art at about AD 1000-1400. The bulk of the Yorùbás currently live in southwest Nigeria, as well as in other West African countries, such as the Rep. of Benin, Togo, Liberia, and Sierra Leone. Outside Africa, Yorùbás are present in Cuba, Brazil, Haiti, Peru, Trinidad and Tobago, Jamaica, Puerto Rico, and the USA (Usman, Falola 2019).
The legacy of colonialism and its modernist project – along with the rise of globalization and the hegemony of Western epistemology – has led to the lack of adequate preservation efforts of this rich heritage and knowledge system. The Oraqu 2.0 aims to address this challenge by harnessing the power of AI to restore and revitalize Yorùbá culture and knowledge systems. The goal of the Oraqu app is to create a project that not merely brings Yorùbá culture to contemporary consciousness, but also helps to render Yorùbá indigenous culture more visible through a technological lens, to cease to see it as a series of incomprehensible ancient traditions, and to re-present its appropriated epistemology as a fundamental part of our modern structures.
The central research question of this project is: Can we use AI/Natural Language Processing (NLP) technology to support ongoing efforts to restitute, preserve, promote, and revitalize Yorùbá cultural heritage and knowledge systems? To answer this question, we will explore the following sub-questions:
1. Who are the current custodians of Yorùbá knowledge systems globally, where can we find them, and how do we ethically get them to collaborate in a participatory research project involving native communities, scholars and software engineers?
2. How much of the Yorùbá labeled and unlabeled dataset can we archive and make available as reference data that researchers may subsequently build upon?
3. How can cutting edge technologies be harnessed to create innovative and interactive educational materials that teach Yorùbá worldview to a wider audience?
The component of this project is in overlapping four phases:
Data Collection and Labeling: The collection of a large and diverse set of Yorùbá oral and written literature, through various means, such as consulting with experts in Yorùbá culture, studying existing written materials on the subject, and conducting fieldwork to gather additional data, recording conversations, interviews, and cultural events. The collected data will then be transcribed, translated to English and labeled, with each audio file tagged with relevant metadata and information to develop AI-ready datasets, as training models for Yorùbá Natural Language Processing (YNLP). (June 2024 - May 2026)
Training the AI model: The labeled data is used to train the AI model. The model needs to learn how to recognize and understand different accents, dialects, and variations of the Yorùbá language. The AI model also needs to be trained to accurately transcribe, translate, and summarize Yorùbá audio data. This includes the development of an AI/machine learning pipeline to model the dataset and train the model, and to provide new ways of engaging with Yorùbá Cultural Heritage and knowledge systems. The model will be rigorously tested and validated to ensure that it meets the necessary accuracy, reliability, and usability standards. (June 2025 - May 2027)
System Deployment: This is the continuous development of Oraqu 2.0, into an AI-powered mobile app still based on the Ifá divination system, as well as the deployment of the training dataset for YNLP researchers The main objective is to increase access to Yorùbá language, culture and knowledge, and to enhance educational, artistic, scientific, and economic opportunities for over 40 million Yorùbá-speaking communities globally, as well as non-native speakers who have such needs. The product will be deployed to the market, while we ensure ongoing maintenance and support. This includes updating the product regularly to improve its performance and adding new features based on user feedback. (June 2025 - May 2028)
Dissemination and Advocacy: We will work closely with indigenous communities to ensure that the end product is returned to the communities in a meaningful way, and that they have a say in how the knowledge is disseminated and used. This will involve developing community engagement strategies, building partnership with local artists in engaging with the technology, sample it in workshops, conferences and presentations, distribute the app/software to schools, cultural institutions, and to entrepreneurs, as well as leveraging Social Media to develop an advocacy for the restitution of intangible Yorùbá archives in academic libraries, and to simultanously provide ongoing support for the AI training and the revitalization project of teaching and learning Yorùbá language and worldview. (June 2027 - Dec. 2029)
It was Toyin Falola – “Ritual Archives” (2017) – that critically deploys the term “archives” in relation to rituals as a means of challenging the conventions of Western archives, namely, what is deemed worthy of preservation and organization as data, whether or not it is interpreted at any given moment. The Yorùbá conception of archive is a cumulation of words, texts, sounds, ideas, and symbols, signs, images, performances, and objects that documents as well as speak to mythical, historical, cultural, religious, and artistic experiences and practices, in ways in which we are able to understand the Yorùbá worldview through various fields. These archives are huge, for they store tremendous amounts of data: Indigenous productions, memories, legacies, philosophies, ethics, literatures, arts, sciences, technologies and histories, and they lead us to reimagine and envision the cosmos that the Yorùbás inhabit, different from, but not unlike what modern science does, just as in poems, dances, drums, textiles, songs, sculptors, architecture, and painting. The main proposition here is that while components of these archives can be isolated, it is how they combine into a body of resonating, interlocking ideas and disciplines that encode wisdom and re/membering that is necessary to all development.
This body of knowledge, however, has hardly found its way into the so-called “postcolonial” national consciousness, the inherited archives that Western academy literally rams into our brains as knowledge – while simultaneously prohibiting indigenous languages (Abiodun, 2014) - does not acknowledge or allow the capacity of indigenous epistemologies to provide useful templates for the future. This imposed neo-colonial archive has been given prominence over the ancestral archives, leading to the degradation of indigenous perspectives. While these postcolonial archives have probably served us well in a number of ways, Falola argues that they have mostly proven to be severely limited and limiting, both in terms of intellectual possibilities as well as scope. They have also served as the bureaucratic technologies that have framed our subjectivities and objectivities and how we can pursue them. As the postcolony struggled for its economic, political, and cultural survival in the late 20th century, the wisdom of centuries, from stone age to the 19th century were rendered irrelevant and has been either left to rot or ignored and, in the process, lost for the most part. But the ritual archives are still intact. We hypothesize that they can be documented, revived, and re-activated through oral history, once serious work is done to awaken them from their various reserves.
Falola posits further that these two competing, but not necessarily complimentary, archives have created a knowledge divide within contemporary Yorùbá societies; the colonial archive is always aligned to the official/state-sanctioned power while the indigenous archive is aligned with marginality. How then do we retrieve and peacefully move these indigenous archives into the public to reinstate their grandeur? Because whether as texts, objects, symbols, or performances, ritual archives face serious dangers, ranging from extinction, ridicule, marginalization, to erasure. A major problem is that of intellectual inequality where all externally derived knowledge systems, both Islamic and Western epistemologies, are deemed to be superior to the indigenous (Jacob Olupona. 1991). The connection between global knowledge and global power is very clear cut, and global knowledge has been problematic in a continent like Africa. Our way of challenging this problem in this research is to rethink not just the archives we use, but to search for the very center of the epistemologies.
On the Restitution of Intangible cultural assets:
The recent report by Felwine Sarr and Bénédicte Savoy commissioned by President Emmanuel Macron, titled "The Restitution of African Cultural Heritage. Toward a New Relational Ethics" (Sarr, F. Savoy B. 2018), calls for a new ethical approach to the ownership and display of African cultural heritage that centers on the recognition of the rights and agency of African communities. This issue of restitution of art works and cultural assets that have been taken without consent is part of a broader ongoing global discourse. While significant work is currently being done, with major Western museums hastily returning historical pieces and indigenous art works to their various provenance since the report, the concern of our research is however to move away from the iconic to the invisible reality of knowledge, from tangible to the intangible, the wide range of imaginations and thought systems they have inspired over time. In the case of Yorùbá cultural heritage, the process of restitution may involve not only the return of objects taken from Yorùbáland, but also the return of knowledge and intellectual property that has been appropriated and incorporated into Western knowledge capital. This epistemicide accounts for the violence of the colonial encounter and the need to return to the drawing board before it is entirely too late. The concern of this research project is therefore, to seek newer methods of contending with this grave omission by restituting the intangible cultural assets that communicate messages, which can be used to reconstruct the past and understand ideas about the world and the future.
The logistics that imperial Europe built on the African continent has been mostly extractive, those strategies that allowed for all sorts of capture, whether it was to extract enslaved bodies or to mine resources and grab objects of curiosity or prestige, or to collect and extract intangible data captured as intelligence, from archaeological data to cultural data, ethnographic data to sociological data. The economy of colonialism has mostly been explained to us through these mined resources that emigrated to Europe. The Sarr/Savoy report also analyzed the immense value generated from these artworks that were captured and distributed to museums all over Europe, but even greater is the value of the intangible resources that went with them because, if knowledge is power, intelligence is powerful, and if a collective intelligence in the possession of certain unethical caretakers can generate immense and immeasurable amount of embezzled wealth, then we may begin to accurately imagine the amount of questionable wealth, both symbolically and materially, that colonialism have generated for the colonizers in the past, and continues to generate in the present. All over the world, from one generation to the next, these objects continue to traverse temporalities and the preoccupations of mortals. These objects that traverse time contain within them a power of germination, which is a force in itself, and by interacting with them, new generations create new things, actualize new ideas and shepherd new forms into the world that, until then, had not existed. Over time, Europe therefore, created scientific and cultural technologies to perfect the art of appropriating both tangible and intangible assets that are now domiciled in the West as native intelligence.
In the extractivist nature of colonial capitalism, the indigenous archive feeds the colonial archive as data and raw materials by and large, existing permanently in its shadows and dominance. In the academy, the indigenous archives are assessed within the realm of folklore or myths; analysis of the colonial archives, however, are assessed in the categories of originality and validity in the same academy. Westernized scholars now write about the sociology of knowledge, without necessarily acknowledging that they are building on the indigenous inheritances, but by appropriating indigenous voices into a chaotic global discourse, writing about indigenous cultures without joining their worship, except as totally disconnected, so-called “objective” observers. This archive of colonial and neo-colonial, privileged methodologies has gone a long way towards advancing the project of imperial superiority, to say nothing of economic and political dominance. We will certainly be rethinking the outcome of our research. We will assume that the power of social agency doesn’t simply lie in the academy, but instead strive to give social agency to those who produce other forms of knowledge. The assumption that knowledge generated from within the academy is the most important is misleading. Indigenous intellectuals, artists and researchers connect with their organic community more directly, generating knowledge and ways of knowing.
The contention that the past was not possible to recover is also a mythical assertion based on the coloniality of knowledge, which unfortunately incapacitated the possibility of use and the transformation of our collective memories. Falola and Abiodun postulate that the Yorùbá understanding of orality is extensive, comprising parables, proverbs, tales, allegories, divination, incantations, invocations, chants, litanies, stories, drums, and musical composition are laced with venerations, sacrifices, rituals, and elaborate ritual speeches and actions connected with kinship and indigenous power systems. As we attempt to record the complexities of these vastly dense archives in the Oraqu project, we will disaggregate the oral archive into many component units, on a wide range of issues, including but not limited to cultural cognition, ideas and ideas formation, semiotics, sensiotics, and education, into modern disciplines such as literature, music, dance, drama, design, psychology, anthropology, philosophy, linguistics, magic, medicine, ecology, pharmacology, botany, geology, geometry, chemistry, physics, engineering, and many more. In the academy, with its disciplinary compartments, the archive becomes disconnected from the multilayered and intricately connected indigenous epistemology that produces it, in favor of the concerns of the disciplines. If the integrated trans-disciplinary approach that we are proposing is properly achieved, we would have succeeded in not only dismantling the fragmentation of the many components of Yorùbá body of knowledge, but also in creating a unified field theory infused with the vitality and performative power (or àṣẹ) that is fundamental to Yorùbá philosophy.
By invoking new readings of oral and written archives, we are automatically generating an immeasurable amount of data. The contention therefore, will be how to define and analyze the content of these immense archives, their metadata, and their multilayered meanings. AI technology now bestows us with an astounding capacity to assemble and articulate huge amounts of data, and the new knowledge that the Oraqu Project brings to the improvement of education is the utilization of the latest developments in AI - including Machine Learning in Generative and Natural Language Processing - to create innovative solutions to indigenous problems. AI technology now provides us with an interesting opportunity to rethink the inheritances in various ways, and to test a set of new ideas outside of imperial power structures and postcolonial corruption, to be able to recapture and restitute, to reformulate and disseminate the various research on Yorùbá intellectual traditions that have mostly been published in various colonizers’ languages, and to localize them in far more innovative ways than our predecessors have done. AI technology will be helpful in making Yorùbá wisdom more accessible beyond the elitism inherited from the Western academy – for artists, tech creatives, researchers, entrepreneurs, policy makers and ordinary people – to engage with Yorùbá epistemology and bring back the indigenous ways of knowing into the realm of contemporary knowledge, as we begin to teach a new generation a new body of ideas.
In the field of AI, Natural Language Processing (NLP) is a multidisciplinary field combining Math, Linguistics and Computer Science, with the goal of getting computers to do useful things (such as translation, summarization, question answering, speech recognition, classification, assisted writing and more) with natural language data, including those that we read, write, and speak in. As part of their research conducted into the state and fate of linguistic diversity and inclusion in the NLP world, Joshi et al noted that the handful of languages on which NLP systems are trained and tested are often related and from the same geography, drawn from a few dominant language families, leading to a typological echo-chamber, (Joshi et al, 2021). In this typological echo-chamber, English dominates. According to them, not only do some languages basically lack a large corpus of labeled data which may serve as training data, lots of languages never see any NLP systems. They categorized these languages into 6 unique positions by availability of resources: the left-behinds, the scraping-bys, the hopefuls, the rising stars, the underdogs, and the winners. While the winners are in the lead and enjoy vast investment of resources and technologies which provide them an advantage in the NLP race, the left-behinds have been and are still being ignored in the aspect of language technologies. The latter lack exceptional resources, so much that even if they can be fed into existing NLP systems, the dearth of labeled or unlabeled data to use makes their future extremely precarious.
Yorùbá language, with its 40-million-strong speakers and enormous available data, still belongs to the latter end of the spectrum where it is merely scraping-by. There are no extensive online or readily ingestible resources which could serve NLP researchers who desire to work on Yorùbá language, and there doesn't appear to be any plan to create one. The Oraqu project looks to move Yorùbá language from merely scaling-by to hopeful and possibly a rising star, by making such a resource open and available. To effectively train machine learning for accurate analysis, we need a significant volume of accurately and manually labeled sets of Yorùbá data across billions of categories. Our solution is to train different machine learning algorithms through supervised learning on the extensive Odu Ifá divination literature, on existing Yorùbá films, proverbs, music archives, as well as field works. The trained model will then be tested on existing unlabeled data online, after the testing data is validated by data annotators who are expert speakers in Yorùbá language and who will label the named entities in their respective categories. Data collected in the upstream part of the project is evaluated and subsequently stored on cloud platforms that make it globally accessible to researchers who want to work with it.
Data Collection and Evaluation:
Due to the lack of large-scale data collection effort to improve the processing of Yorùbá culture and knowledge system, the Oraqu project design will be a mixed-methods approach that combines quantitative and qualitative data collection and analysis. We will utilize surveys, interviews, and focus groups to collect data, the sample will count on the support of Yorùbá cultural experts, linguists, and members of the Yorùbá speaking communities both in West Africa and in the Diaspora, to ensure a comprehensive overview of the retrievable data that will then be transformed into fresh content for the Oraqu app, as well as AI-ready dataset, which will be cleaned, transformed(transcribed and translated) and tagged for use inthe machine learning modelling. Other transformational ways of dignifying these archives are revalidating them, by involving indigenous religion practitioners, priests, elders, sages, scholars, linguists, computer scientists, data analysts, machine learning engineers and artists in research and its dissemination, and to formulate evaluation mechanisms, to authenticate indigenous knowledge by those who communicate with them, using data driven methods to decode its epistemology and create fluid intelligence and other forms of knowledge.
Evaluating the collected data is an important step in the overall process of delivering reliable results. Here, data evaluation includes the process of examining and analyzing the collected data to establish its quality, accuracy, relevance and usefulness. Continuous reviews will be conducted, with different components of the collected data assessed for properties such as completeness, consistency, validity and reliability. Expert input may also be required at this stage. However, it is more important to filter the data for excerpts which may help in achieving the ultimate objective of knowledge discovery. As a critical step in the process, data evaluation helps to ensure that the data to be used is appropriate and trustworthy, while also assisting with identifying potential errors which may affect the validity or correctness of future analysis. This may be an exhaustive process for which advanced Yorùbá speakers and researchers will be employed to conduct activities such as data profiling, cleansing, quality assessment and data validation, many of which should help in realizing the desired objective of identifying and correcting any issues in the data before it is forwarded for onward ingestion by the designed AI/ML models.
During data evaluation, we will consider ethical issues related to the use of AI technology in the restoration of indigenous knowledge, and shall consider issues such as data privacy, ownership, and control, while ensuring that the communities from which the data is collected are aware of and consent to its use in the AI development process. We will develop a data management, data use, and security plan as part of the agreement to be executed between us and our researchers, field workers and engineers. The plan will include, but not be limited to the following criteria:
Justification for data format and volume in relation to storage, backup, and access.
Process for refining and implementing a Data Use Agreement that addresses data preservation, sharing, ethics, and legal compliance.
Data storage capacity and responsibilities related to backup and recovery. Specifications regarding storage and management.
Description of the type of data to be retained, how it will be retained and preserved; and if/when how it will be destroyed in accordance with contractual, legal and regulatory.
Protocols on how the data will be shared, with whom and under what conditions.
Metadata information needed and standards to be used, how metadata will be captured, and documentation to be created to assist with accessing and understanding the data.
Delineate the parties responsible for implementing, reviewing, and monitoring compliance with a data management plan. Outline all activities related to data quality, storage, backup archival and sharing.
Knowledge Discovery and Educational Content:
One of the final goals of the Oraqu project is to find bits of new, undiscovered knowledge and to help with the creation of finer educational content for Yorùbá speakers, students, entrepreneurs, creatives and researchers. Here, knowledge discovery is the process of extracting valuable insights and knowledge from large datasets by using advanced algorithms and techniques to analyze and interpret data, with the aim of identifying patterns, trends, and associations that can be used to make informed decisions and predictions. As such, the Oraqu project will not only engender immense data which will last for decades as new resources on which researchers can build, it will also apply AI technology on these indigenous archives and other oral traditions to provide new educational opportunities.
There are now AI solutions for processing and analyzing audio data based on deep learning techniques. For example, Google Cloud, IBM Watson, NVIDIA, Amazon Web Services (AWS) and Microsoft Azure all offer AI-powered services including speech-to-text transcription, text-to-speech conversion, speech recognition, audio indexing, language translation, sentiment analysis, speaker diarization, speaker recognition, and emotion analysis. These services are designed to help independent developers build speech-enabled applications and improve accessibility for users. The underlying technology in these solutions can be used as our mode of data analyses. Audio processing AI technology can equally be applied to these indigenous archives and other oral traditions by providing new educational opportunities. The Oraqu app will help ensure that these important Yorùbá cultural traditions are passed down and preserved for future generations in the following ways:
Vocalizing text translations: The Oraqu can be used to automatically translate written texts on Yorùbá epistemology into spoken Yorùbá, making these texts more accessible to people who may not be able to read or understand the written language.
Interactive audio lessons: The Oraqu can also be used to create interactive audio lessons on Yorùbá language, epistemology and other aspects of Yorùbá culture.
Audio books: The Oraqu can be used to create audio versions of books and other written materials on Yorùbá worldview.
Audio summaries: The Oraqu can be used to automatically summarize written texts on Yorùbá philosophy into shorter, more digestible summaries.
Audio-based cultural experiences: The Oraqu can be used to create immersive cultural experiences based on aspects of Yorùbá culture.
Preservation and transcription: Many oral traditions have been passed down through generations via oral storytelling. The Oraqu can be used to preserve these stories by digitizing and transcribing them into written text in any language.
Analysis and interpretation: Oral traditions often contain rich cultural and historical information, but analyzing and interpreting them can be challenging due to their complex structure and metaphorical language. The Oraqu can be used to analyze the structure and language of these stories, allowing users to better understand their cultural and historical significance.
Education and accessibility: The Oraqu can also be used to make these oral traditions more accessible for educational purposes, allowing students to learn and engage with these stories in new and innovative ways.
Community Engagement: The Oraqu will enable community members to record and upload recordings of oral traditions and cultural practices in their various localities in West Africa as well as in the Diaspora for participation and continuous training of the system.
Once the Oraqu software, dataset, ML model and mobile app are developed, we will work closely with indigenous communities, to ensure that the end product is returned to the communities in a meaningful way. This will involve developing community engagement strategies, building partnership with local artists in engaging with the technology, present demos in conferences, and distribute the app to schools, universities, cultural institutions, and to entrepreneurs, as well as leveraging Social Media to develop an advocacy for the restitution of intangible Yorùbá archives in academic libraries, to simultanously provide ongoing support for the continuous scaling of the model and the revitalization project of teaching and learning Yorùbá language and worldview. We are thinking of our dissemination plan for the Oraqu in various ways, to make it available in both formal and informal settings, in both rural and urban settings by making it an offline application. Here are a few other possibilities we shall explore:
Collaborating with Community Organizations to disseminate the software, dataset and mobile app to a wider audience, partnering with local NGOs, community centers, libraries, and cultural organizations to promote the use of the technology.
Building Local Partnerships with schools, traditional associations, and cultural institutions to promote the Oraqu app in formal and informal education settings. This will include working with teachers, students, religious leaders, and cultural groups to integrate the technology into their instruction materials and curricula.
Leveraging Social Media platforms such as Facebook, Twitter, Instagram, TikTok and YouTube to disseminate the Oraqu app to a wider audience. Creating engaging and informative content that highlights the benefits of the technology and its potential applications to attract users and generate interest.
Organizing Workshops and Training Sessions to introduce the Oraqu project to users and provide them with the skills and knowledge needed to use the technology effectively. These sessions will be organized both in rural and urban settings, and will target specific user groups such as teachers, students, community leaders, and cultural practitioners.
Using Offline Strategies in areas where internet access is limited. Offline strategies such as distributing USB drives, iPads, mobile phones or other storage devices containing the software and mobile app can help to make the technology more accessible. These strategies can be especially useful in rural settings where internet access is limited or unreliable.
Using a Multi-sensory Approach to learning that incorporates different forms of arts, such as Dance, Music, Poetry, Visual art, Storytelling, and Film as a way to further disseminate the content of the app and to promote the use of the technology.