How to build a language translator?
by mervyn
So I want to build a language translator which can deal with foreign languages in a short amount of time and memory. The translation result should be pricise, and would handle large input about local education.
| Application Development Bulletpoint | |
|---|---|
| Purpose | Assist automatic subtitles of international online education, such as MOOC |
| Requirement | Language Translator handle both daily words and academic terminology |
| Dataset | contemporary literature, newspapers, and textbooks |
| Competency | Basic, Emergency translation between English-Local Language |
Recurrent Neural Network(RNN) based sequence to sequence learning caught my interest, so I will try to develop it for paragraph to paragraph translation of two languages, not limited to a word-to-word or sentence-to-sentence translation. Below is an Algorithm I am thinking:
1, Use news articles and textbooks as dataset for sentence to sentence matching system.
- Use novels and jornals for paragraph to paragraph matching system.
-
Word dataset origin from elementry~high school textbooks.
-
Programming Language: C++, Java, Python (adds on)
- Academic background: Data Structure, Machine Learning, Natural Language Process, RNN, AI (adds on)
We first need to distinguish langauages we will translate. I chose Korean and Japanese, pretty similar languages to each other.
tags:
Comments
Post comment