Data Journey

International Studies Grad. racing for AI Engineer

Download as .zip Download as .tar.gz View on GitHub
15 February 2021

How to build a language translator?

by mervyn

So I want to build a language translator which can deal with foreign languages in a short amount of time and memory. The translation result should be pricise, and would handle large input about local education.

Application Development Bulletpoint  
Purpose Assist automatic subtitles of international online education, such as MOOC
Requirement Language Translator handle both daily words and academic terminology
Dataset contemporary literature, newspapers, and textbooks
Competency Basic, Emergency translation between English-Local Language

Recurrent Neural Network(RNN) based sequence to sequence learning caught my interest, so I will try to develop it for paragraph to paragraph translation of two languages, not limited to a word-to-word or sentence-to-sentence translation. Below is an Algorithm I am thinking:

1, Use news articles and textbooks as dataset for sentence to sentence matching system.

  1. Use novels and jornals for paragraph to paragraph matching system.
  2. Word dataset origin from elementry~high school textbooks.

  3. Programming Language: C++, Java, Python (adds on)

  4. Academic background: Data Structure, Machine Learning, Natural Language Process, RNN, AI (adds on)

We first need to distinguish langauages we will translate. I chose Korean and Japanese, pretty similar languages to each other.

back next

tags:

Comments

Post comment
Loading...

Share this: