數位發展部有關「臺灣主權 AI 訓練語料庫」申請事宜,申請須知及使用規範詳如附件。
一、為推動臺灣主權AI發展,數位發展部打造「臺灣主權AI訓練語料庫」(下稱語料庫),廣納高品質正體中文語料,支援AI模型訓練更貼近臺灣語言、文化與生活情境,促進AI模型具備更高的本土辨識力與語意理解能力。
二、語料庫目前已逾200個政府機關投入,上架累計超過3,000筆資料集,語料規模超過10億詞元(token)並持續擴充,收錄內容為各機關具臺灣文化特色之高品質資料集,涵蓋語言、文化、交通、教育、生物、地理環境等領域。
三、歡迎有AI模型訓練需求之機關(構)、公私法人、研究機構、學校、非法人團體或自然人申請使用,用臺灣的語料,打造理解臺灣的AI!
四、若有申請相關問題,請洽語料庫維運管理單位客服信箱:tsaitc@moda.gov.tw
Regarding the application for the “Taiwan Sovereign AI Training Corpus” launched by the Ministry of Digital Affairs, please refer to the attached documents for detailed application guidelines and terms of use.
1.To promote the development of Taiwan’s sovereign AI, the Ministry of Digital Affairs has established the Taiwan Sovereign AI Training Corpus (hereinafter referred to as the “Corpus”). The Corpus aggregates high-quality Traditional Chinese language data to support AI model training that better reflects Taiwan’s language, culture, and everyday contexts, thereby enhancing local recognition and semantic understanding capabilities of AI models.
2.To date, more than 200 government agencies have contributed to the Corpus, with over 3,000 datasets published. The corpus scale exceeds 1 billion tokens and continues to expand. The collected content consists of high-quality datasets with Taiwanese cultural characteristics, covering fields such as language, culture, transportation, education, biology, and the geographical environment.
3.Government agencies, public and private organizations, research institutions, schools, unincorporated groups, and individuals with AI model training needs are welcome to apply for access. Use Taiwan’s data to build AI that understands Taiwan.
4.For inquiries related to applications, please contact the Corpus operations and management team via email at tsaitc@moda.gov.tw