Regarding the application for the “Taiwan Sovereign AI Training Corpus” launched by the Ministry of Digital Affairs, please refer to the attached documents for detailed application guidelines and terms of use.
1.To promote the development of Taiwan’s sovereign AI, the Ministry of Digital Affairs has established the Taiwan Sovereign AI Training Corpus (hereinafter referred to as the “Corpus”). The Corpus aggregates high-quality Traditional Chinese language data to support AI model training that better reflects Taiwan’s language, culture, and everyday contexts, thereby enhancing local recognition and semantic understanding capabilities of AI models.
2.To date, more than 200 government agencies have contributed to the Corpus, with over 3,000 datasets published. The corpus scale exceeds 1 billion tokens and continues to expand. The collected content consists of high-quality datasets with Taiwanese cultural characteristics, covering fields such as language, culture, transportation, education, biology, and the geographical environment.
3.Government agencies, public and private organizations, research institutions, schools, unincorporated groups, and individuals with AI model training needs are welcome to apply for access. Use Taiwan’s data to build AI that understands Taiwan.
4.For inquiries related to applications, please contact the Corpus operations and management team via email at tsaitc@moda.gov.tw