I
am a MA Translation, Korean | English student at the Middlebury Institute of
International Studies at Monterey (May 2016 Candidate). This CAT Portfolio
shows the projects that conducted during my studies at MIIS. It mainly focuses
on the projects conducted as part of my Advanced Computer-Assisted Translation
course that allowed me to learn how to use various and useful CAT tools as a
translator. The course focused on the technology for translation work along
with practical hands-on work using the tools. I learned a new way of
approaching translation that reflects the rapid changes happening in the field.
2. MT
Pilot Project Files
o Proposal/SOW
Statement of Work
This Statement of Work (SOW) is by and between Adam
Wooten and Korbleu.
Korbleu
Team:
Heather Ahn
Naomi Kim
Jin Heui Kwon
Joohyun Lee
Robin Park
Eunah Sung
PROJECT
OBJECTIVES
This pilot
project aims to form a basis for estimating the work involved in training a
machine translation (MT) engine for tax laws from Korean into English.
Post-edited
machine translations (PEMT) from this engine will meet the following goals for
efficiency, cost savings, and quality:
●
Efficiency: To make PEMT 20% faster
than HT
●
Cost Savings: To make PEMT 20%
cheaper than HT
●
Quality: To produce PEMT
translations with an acceptable score under the LISA QA model, assessed as
follows:
Error
Type
|
Minor
|
Major
|
Critical
|
Mistranslation
|
1
|
5
|
10
|
Accuracy
|
1
|
5
|
10
|
Terminology
|
1
|
5
|
10
|
Language
|
1
|
5
|
10
|
Style
|
1
|
5
|
10
|
Country
|
1
|
5
|
10
|
Consistency
|
1
|
5
|
10
|
Scorecard adapted from LISA QA model by SDL:
http://producthelp.sdl.com/SDL_TMS_2011/en/Creating_and_Maintaining_Organizations/Managing_QA_Models/LISA_QA_Model.htm
●
PEMT will be evaluated by a human
reviewer in accordance to the guideline provided above
●
To pass the quality check, the final
output should not have any critical errors in areas above, and the number will
be limited to 10
TIMELINE AND
COSTS
Project
begins: March 31, 2016
Send Deliverables: April 21, 2016
●
Following the kickoff meeting on
Thursday, March 31, the pilot project will be carried out over a period of
three weeks
●
One to two rounds of training will
be completed on weekdays At least 6 rounds, try
many different methods
●
Data collected from the pilot engine
training, post-editing and QA will be used to calculate time and cost savings
and quality estimates, which will be presented in an updated project proposal
to be delivered on April 21.
Task
|
Est.
Hours
|
Quantity
|
Hourly Rate
|
Subtotal
|
Round of MT Training
|
0.5
|
8 documents
|
$30
|
$120
|
Document Alignment
|
2.5
|
4 documents
|
$30
|
$300
|
Dictionary Creation
|
1
|
1
|
$30
|
$30
|
Post-editing
|
0.5
|
2 persons
|
$30
|
$30
|
QA
|
0.5
|
2 persons
|
$30
|
$30
|
Total
|
$510
|
WORKFLOW
1. Data Extraction
a.
Compile list of tax laws to be
extracted: 8 documents in total, with average 1500 segments per document
b.
Find both EN and KO versions of the
laws using Korea Legislation Research Institute website
c.
Clean up the documents (formatting,
line breaks, punctuation, etc.)
2. Alignment Preparation
a.
Edit documents to avoid segmentation
and alignment problems
3. Alignment
a.
Align documents using Trados Studio -- don’t have to be perfect before running the test (to
see how worthwhile alignment is)
b.
Fix alignment so it is perfectly
aligned
c. Remove problematic segments
4. MT Training
a. Add aligned
segments into Microsoft Translator’s Hub SMT
b.
Get a bleu score
5. Improve Bleu Score
a. Improve bleu
score with different strategies such as adding more segments, cleaning up
documents, adding a TMX alignment, and adding a dictionary, etc.
DELIVERABLES
●
An updated proposal for a full-scale
MT training project
●
Progress Chart detailing changes
made to the SMT and the resulting blue scores
●
Report evaluating the achievability of the goals
outlined in this statement and detailing recommendations on how the training
can be scaled and on the potential value of training the SMT
o Updated
Proposal
Updated Statement of Work
This Statement of Work (SOW) is by and between Adam
Wooten and Korbleu.
Korbleu
Team:
Heather Ahn
Naomi Kim
Jin Heui Kwon
Joohyun Lee
Robin Park
Eunah Sung
SUMMARY OF
PILOT PROJECT OUTCOMES
There was a
total of 11 rounds of training over the two-week statistical machine
translation (SMT) engine training pilot project. The first round of training
achieved a BLEU score of 1.93; by the end of the pilot project, this score
increased to 11.34, an improvement of 9.41 points.
In each
round, a different strategy was adopted to determine which tactics have the
greatest potential of improving the quality of the SMT engine. Positive results
were achieved by cleaning up the bi-texts to ensure accurate alignment of
information, manually aligning segments for maximum accuracy, and adding
bi-texts to training/tuning data.
Another
attempt we made was adding a 7,000-term dictionary of generic legal terms. This
had no impact on the BLEU score, but could potentially be reattempted in the
future to improve the BLEU score when there are more segments in the system.
After the
final round of training was complete, the system with the highest BLEU score
was deployed. Testing segments were post-edited by two linguists for 30 minutes
each and then cross-checked for quality using the LISA-based QA scorecard
established in the original pilot project proposal. Details of these HT versus
PEMT tests are provided in Appendix B.
PROJECT OBJECTIVES
As stated,
the target of PEMT was saving 20% of time and cost savings over human
translation (HT) and an acceptable LISA QA score. The results of the pilot test
in each
of these
categories are detailed below.
Efficiency: The
following table shows the time breakdown of each post-editor, their average,
and usual standards for HT and review.
Post-Editor
1
|
Post-Editor
2
|
Average
|
|
PEMT words/30 min.
|
157 words
|
139 words
|
148 words
|
PEMT words/hour
|
314 words
|
278 words
|
296 words
|
PEMT time
needed for 2,000 words
|
6.37 hours
|
7.19 hours
|
6.76 hours
|
HT words/hour (standard)
|
250 words
|
||
HT time
needed for 2,000 words
|
8 hours
|
||
Review words/hour (standard)
|
1,000 words
|
||
Review
time needed for 2,000 words
|
2 hours
|
||
PEMT +
Review, total time for 2,000 words
|
8.37 hours
|
9.19 hours
|
8.76 hours
|
HT +
Review, total time for 2,000 words
|
10 hours
|
||
Difference
in total time for 2,000 words
|
1.24 hours
|
||
PEMT overall time savings
|
12.4%
|
This
breakdown reveals time savings of 12.4% with PEMT, calculated using the average
speed of the two post-editors. This is well below the stated goal of 20% time
savings, meaning that the efficiency of PEMT is not satisfactory. The goal will
remain 20% for the continued training project.
Cost savings: The
following table shows the cost breakdown of PEMT versus HT based on a sample of
2,000 words and using rates established in the pilot project proposal.
File
Prep
|
Translation/PE
Rate/Word
|
T/PE
Subtotal
|
Review
Rate
|
Hours
|
Review
Subtotal
|
Total
|
|
HT
|
$0
|
$0.10
|
$200.00
|
$30/hr
|
2
hours
|
$60.00
|
$260.00
|
PEMT
|
$15
|
$0.08
|
$160.00
|
$30/hr
|
2
hours
|
$60.00
|
$235.00
|
PEMT
Savings:
|
$25.00
10.4%
|
This
breakdown reveals cost savings of 10.4% with PEMT. This is well below the
stated goal of 20% cost savings, meaning that the cost savings of efficiency is
not satisfactory. The goal will remain 20% for the continued training project.
Quality: Appendix B presents QA scorecards for each
reviewer. The results show that the two post-editing samples achieved an
average score of 20.5. Considering the strict quality standards of the legal
documents, rather than changing the objectives, continued training of the
engine is needed. However, since quality is highly subjective, it is difficult
to estimate with any certainty how much improvement in the engine needs to be
made to achieve the stated quality objectives.
RECOMMENDATIONS
a)
Targetting a
narrow subject area: This pilot project used texts that
are related to different local tax laws. Currently, the BLEU score is very low
even with bi-texts from similar subject areas; therefore, focusing on a
specific subject area (i.e. income tax law) until reasonably high BLEU score is
achieved would work better than adding bi-texts from different areas of tax
law.
b)
Adding
Bi-texts: Adding bi-texts, in both training and tuning data,
has shown improvements in the BLEU score. The more texts there are, the higher
the likelihood of matching words and segments. Essentially, the amount of data
in the SMT heavily impacts the level of accuracy, especially due to the highly
repetitive nature of legal texts.
c)
Clean-up and
Formatting: Thorough clean-up and formatting of texts are
essential for accurate alignment. Due to the vast differences in the Korean and
English texts (i.e. numbering methods, line breaks, presence of Chinese
characters in Korean texts, etc.), the majority of texts has to first go through
manual clean-ups to ensure accuracy in alignment.
d)
Alignment: Alignment
of the documents was one of the most important parts of the work. For example,
the number of sentences in the source text did not always match the number of
sentences in the target text. Adjusting the number of sentences in both
languages was a necessary process in achieveing alignment. In addition,
chopping sentences in two parts was needed to match the number of sentences in
the translated text. Therefore, it is necessary to go through each sentence to
check sentence compositions.
e) Cooperation with LSPs: In the pilot
project, it became clear that proper alignment was the most time consuming
aspect of the project, requiring approximately 1 hour of work to align 200
bi-texts. Thus, the full-scale project proposes that an LSP already engaged in
translation take on or cooperate with the project. Doing this, TM creation will
naturally occur through standard legal translation, without the need for
alignment, reducing time and cost.
f)
Testing: Periodic
testing of once in every six months is recommended in order to gauge progress.
A test would include at least 20 translators translating a minimum of 5,000
Korean characters for separate legal texts into English. The average speed of
each translator will be calculated and the average of all the translators will
provide a clear image of the SMT progress. Within each six month interval,
progress will be gauged through BLEU scores provided by Microsoft Translator
Hub.
RECOMMENDED WORKFLOW
1.
Gather more bi-texts
2.
Ensure proper alignment of bi-texts for training/tuning
data
3.
MT Training
4.
Check BLEU score
5.
Periodic testing
PROJECT TIMELINE AND COSTS
Due to the
nature of the project, it is difficult to estimate the exact time and cost
required to fully train the machine translation system. We currently have 4,086
segments in the system that we have added over the course of three weeks.
Excluding initial errors and trials, we would say that we have spent a full
week for the training. This includes formatting the documents, aligning
segments, creating dictionaries, etc. It is impossible to measure how much it
takes to format documents or create dictionaries. However, we did keep track of
how much time it took to align segments: 200 segments per hour. Therefore, to
train the optimal number of 100,000 segments, it will require a minimum of
479.57 hours which equals 60 days.
# of Full-time Workers
|
2
|
Estimated
segments in current dataset
|
4,086
|
Number of
docs in current dataset:
|
4
|
Estimated
number of documents needed to achieve 100,000
|
96
|
Time to convert/align
|
479.57 hours/60 days
|
However,
this calculation is based on one of the strategies we have employed: adding
bi-texts. To achieve efficiency, multiple attempts with different methods would
be needed. Given the complex nature of the process, it is highly likely that
the overall process may take much longer than anticipated.
As a result,
it would be difficult to estimate the overall timeline to achieve the goal we
have stated in the original SOW. At a rate of $30.00 per hour, two full-time
employees would incur $14,387 on coversion/alignment. If the project time
period lengthens, this could incur significant cost. Whether this is worth the
investment should be determined after careful consideration.
RECOMMENDATION
FOR IMPLEMENTATION IN CAT TOOLS
The following includes all possible CAT Tool settings
including, but not limited to, QA.
Segments Verification
|
●
Check the
target segment whether it is shorter by: 50%
|
Exclude Repetitions
|
●
Exclude the following segment:
○
Chinese characters
○
Table
|
Exclude confirmed translations
|
Since it has been subject to
review, once confirmed, the translation should be excluded from QA.
|
Exclude locked segment
|
Since it has been subject to
review, once confirmed, the translation should be excluded from QA.
|
Punctuation/
Segmentation |
●
Add rule
exception that after a number and a period, it should remain as one sentence,
not two separate sentences
●
Add rule
exception that after a full stop within a paranthesis should not be separated
segements
●
Korean
language usage of floating dots for listing words () should be recognized as
commas
|
Ignore tags
|
●
Ignore {1> and 1<}
●
Ignore 「」
|
PROPOSED
DELIVERABLES
●
Progress Chart detailing changes
made to the SMT and the resulting blue scores
●
Report evaluating the achievability of the goals
outlined in this statement and detailing recommendations on how the training
can be scaled and on the potential value of training the SMT
ANTICIPATED RESULTS
Based on the
pilot test, it is questionable whether it would be possible to achieve the
initial goal of 20% cheaper and faster PEMT with acceptable level of quality.
The BLEU score did increase by 9.41 after manual formatting and alignment, so
it is reasonable to assume that, by cleaning up the entire documents, the score
may keep improving.
However, as
the pilot test shows, adding a few hundred segments does not make a big
difference. We assume that, for the changes to be reflected on the BLEU score,
it is necessary to add at least a few thousand segments each time, which will
require more time and budget. Also, because legal documents require a high
level of specificity, human editing will always be necessary. Therefore,
although it is not completely impossible to bring the score up to the desired level,
it will definitely take a lot of time, cost, and effort.
Therefore,
whether or not the project can benefit from machine translation should be
determined after a careful analysis of the volume, importance, and available
budget.
APPENDIX 1
System
|
BLEU
Score
|
Training
(Segments)
|
Tuning
(Segments)
|
Testing
(Segments)
|
Comments
|
1
|
1.93
|
1,004
|
1,551
|
1,255
|
|
2
|
1.93
|
1,290
|
1,551
|
1,255
|
Clone System #1
Cleared formatting, aligned
approximately 50 segments in the training documents
|
3
|
5.55
|
97
|
500
|
500
|
Clone
System #2
|
4
|
FAILED
|
Clone
System #3
|
|||
5
|
8.29
|
1,090
|
1,937
|
601
|
Clone System #4
Formatted 40% of the document
Aligned 20% of the training
document, 10% of the tuning document.
|
6
|
3.11
|
1,034
|
1,127
|
784
|
Clone System #5
Added more texts to training and
testing
|
7
|
11.34
|
1,268
|
1,076
|
1,082
|
Clone System #6
Cleaned all training, tuning, and
testing documents
|
8
|
11.26
|
1,510
|
1,076
|
1,082
|
Clone System #7
Added 1,700-term dictionary
|
9
|
11.34
|
1,805
|
1,076
|
1,082
|
Clone System #8
Added more bi-texts to training
|
10
|
11.34
|
1,805
|
1,358
|
1,082
|
Clone System #9
Added more bi-texts to testing
|
11
|
11.34
|
1,646
|
1,358
|
1,082
|
Clone System #10
Formatted first bi-texts in
training
|
APPENDIX 2
SAMPLE 1: 400 Words
Estimated words/hour: 800 Words
Error
|
Type
|
Occurrences
|
Severity
|
Weight
|
Score
|
Inconsistent
use of punctuation marks
|
Style
|
3
|
Minor
|
1
|
3
|
Wrong use of prepositions
|
Language
|
1
|
Minor
|
1
|
1
|
County
should be translated into Si/Gun/Do
|
Country/
Terminology |
1
|
Minor
|
1
|
1
|
Total:
|
5
|
||||
Est.
total per 2,000 words:
|
25
|
||||
Pass/fail?:
|
fail
|
SAMPLE
2: 500 Words
Estimated
words/hour: 1000 Words
Error
|
Type
|
Occurrences
|
Severity
|
Weight
|
Score
|
e-Government Act should be
Electronic Government Act
|
Mistranslation
|
1
|
Minor
|
1
|
1
|
Chairman should be written as the
head of the government
|
Country
|
2
|
Major
|
1
|
2
|
Mix-up in singular and plural
forms of nouns
|
Language
|
1
|
Minor
|
1
|
1
|
Total:
|
4
|
||||
Est.
total per 2,000 words:
|
16
|
||||
Pass/fail?:
|
Fail
|
o Presentation
on Lessons Learned
3. Custom
Filter Assignment
The website pseudol10n.wordpress.com
needed to be translated into Simplified Chinese (PRC). Before translating the
website, the client needed to make sure that I could create the appropriate
filter to manage its website.
4. Article on the Future of Translation
Technology
No comments:
Post a Comment