CAT Portfolio

1.    Introduction

I am a MA Translation, Korean | English student at the Middlebury Institute of International Studies at Monterey (May 2016 Candidate). This CAT Portfolio shows the projects that conducted during my studies at MIIS. It mainly focuses on the projects conducted as part of my Advanced Computer-Assisted Translation course that allowed me to learn how to use various and useful CAT tools as a translator. The course focused on the technology for translation work along with practical hands-on work using the tools. I learned a new way of approaching translation that reflects the rapid changes happening in the field.    

2.    MT Pilot Project Files

o    Proposal/SOW

Statement of Work

 

This Statement of Work (SOW) is by and between Adam Wooten and Korbleu.


Korbleu Team:

Heather Ahn

Naomi Kim

Jin Heui Kwon

Joohyun Lee

Robin Park

Eunah Sung
 

PROJECT OBJECTIVES

This pilot project aims to form a basis for estimating the work involved in training a machine translation (MT) engine for tax laws from Korean into English.

Post-edited machine translations (PEMT) from this engine will meet the following goals for efficiency, cost savings, and quality:

       Efficiency: To make PEMT 20% faster than HT

       Cost Savings: To make PEMT 20% cheaper than HT

       Quality: To produce PEMT translations with an acceptable score under the LISA QA model, assessed as follows:

 

Error Type
Minor
Major
Critical
Mistranslation
1
5
10
Accuracy
1
5
10
Terminology
1
5
10
Language
1
5
10
Style
1
5
10
Country
1
5
10
Consistency
1
5
10

Scorecard adapted from LISA QA model by SDL:

http://producthelp.sdl.com/SDL_TMS_2011/en/Creating_and_Maintaining_Organizations/Managing_QA_Models/LISA_QA_Model.htm


       PEMT will be evaluated by a human reviewer in accordance to the guideline provided above

       To pass the quality check, the final output should not have any critical errors in areas above, and the number will be limited to 10


TIMELINE AND COSTS

Project begins: March 31, 2016

Send Deliverables: April 21, 2016

       Following the kickoff meeting on Thursday, March 31, the pilot project will be carried out over a period of three weeks

       One to two rounds of training will be completed on weekdays At least 6 rounds, try many different methods

       Data collected from the pilot engine training, post-editing and QA will be used to calculate time and cost savings and quality estimates, which will be presented in an updated project proposal to be delivered on April 21.

 

Task
Est. Hours
Quantity
Hourly Rate
Subtotal
Round of MT Training
0.5
8 documents
$30
$120
Document Alignment
2.5
4 documents
$30
$300
Dictionary Creation
1
1
$30
$30
Post-editing
0.5
2 persons
$30
$30
QA
0.5
2 persons
$30
$30
 
 
 
Total
$510


WORKFLOW

1. Data Extraction

a.       Compile list of tax laws to be extracted: 8 documents in total, with average 1500 segments per document

b.      Find both EN and KO versions of the laws using Korea Legislation Research Institute website

c.       Clean up the documents (formatting, line breaks, punctuation, etc.)

2. Alignment Preparation

a.       Edit documents to avoid segmentation and alignment problems

3. Alignment

a.       Align documents using Trados Studio -- don’t have to be perfect before running the test (to see how worthwhile alignment is)

b.      Fix alignment so it is perfectly aligned

c.       Remove problematic segments

4. MT Training

a.       Add aligned segments into Microsoft Translator’s Hub SMT

b.      Get a bleu score

5. Improve Bleu Score

a.       Improve bleu score with different strategies such as adding more segments, cleaning up documents, adding a TMX alignment, and adding a dictionary, etc.



DELIVERABLES

       An updated proposal for a full-scale MT training project

       Progress Chart detailing changes made to the SMT and the resulting blue scores

       Report evaluating the achievability of the goals outlined in this statement and detailing recommendations on how the training can be scaled and on the potential value of training the SMT

 

o    Updated Proposal

Updated Statement of Work


This Statement of Work (SOW) is by and between Adam Wooten and Korbleu.


Korbleu Team:

Heather Ahn

Naomi Kim

Jin Heui Kwon

Joohyun Lee

Robin Park

Eunah Sung


SUMMARY OF PILOT PROJECT OUTCOMES

There was a total of 11 rounds of training over the two-week statistical machine translation (SMT) engine training pilot project. The first round of training achieved a BLEU score of 1.93; by the end of the pilot project, this score increased to 11.34, an improvement of 9.41 points.

In each round, a different strategy was adopted to determine which tactics have the greatest potential of improving the quality of the SMT engine. Positive results were achieved by cleaning up the bi-texts to ensure accurate alignment of information, manually aligning segments for maximum accuracy, and adding bi-texts to training/tuning data.

Another attempt we made was adding a 7,000-term dictionary of generic legal terms. This had no impact on the BLEU score, but could potentially be reattempted in the future to improve the BLEU score when there are more segments in the system.

After the final round of training was complete, the system with the highest BLEU score was deployed. Testing segments were post-edited by two linguists for 30 minutes each and then cross-checked for quality using the LISA-based QA scorecard established in the original pilot project proposal. Details of these HT versus PEMT tests are provided in Appendix B.

PROJECT OBJECTIVES

As stated, the target of PEMT was saving 20% of time and cost savings over human translation (HT) and an acceptable LISA QA score. The results of the pilot test in each

of these categories are detailed below.

Efficiency: The following table shows the time breakdown of each post-editor, their average, and usual standards for HT and review.


 
Post-Editor 1
Post-Editor 2
Average
PEMT words/30 min.
157 words
139 words
148 words
PEMT words/hour
314 words
278 words
296 words
PEMT time needed for 2,000 words
6.37 hours
7.19 hours
6.76 hours
HT words/hour (standard)
 
 
250 words
HT time needed for 2,000 words
 
 
8 hours
Review words/hour (standard)
 
 
1,000 words
Review time needed for 2,000 words
 
 
2 hours
PEMT + Review, total time for 2,000 words
8.37 hours
9.19 hours
8.76 hours
HT + Review, total time for 2,000 words
 
 
10 hours
Difference in total time for 2,000 words
 
 
1.24 hours
PEMT overall time savings
 
 
12.4%


This breakdown reveals time savings of 12.4% with PEMT, calculated using the average speed of the two post-editors. This is well below the stated goal of 20% time savings, meaning that the efficiency of PEMT is not satisfactory. The goal will remain 20% for the continued training project.

Cost savings: The following table shows the cost breakdown of PEMT versus HT based on a sample of 2,000 words and using rates established in the pilot project proposal.


 
File Prep
Translation/PE Rate/Word
T/PE Subtotal
Review
Rate
Hours
Review Subtotal
Total
HT
$0
$0.10
$200.00
$30/hr
2 hours
$60.00
$260.00
PEMT
$15
$0.08
$160.00
$30/hr
2 hours
$60.00
$235.00
 
 
 
 
 
 
PEMT Savings:
$25.00
10.4%

 
This breakdown reveals cost savings of 10.4% with PEMT. This is well below the stated goal of 20% cost savings, meaning that the cost savings of efficiency is not satisfactory. The goal will remain 20% for the continued training project.
Quality:  Appendix B presents QA scorecards for each reviewer. The results show that the two post-editing samples achieved an average score of 20.5. Considering the strict quality standards of the legal documents, rather than changing the objectives, continued training of the engine is needed. However, since quality is highly subjective, it is difficult to estimate with any certainty how much improvement in the engine needs to be made to achieve the stated quality objectives.


RECOMMENDATIONS

a)      Targetting a narrow subject area: This pilot project used texts that are related to different local tax laws. Currently, the BLEU score is very low even with bi-texts from similar subject areas; therefore, focusing on a specific subject area (i.e. income tax law) until reasonably high BLEU score is achieved would work better than adding bi-texts from different areas of tax law.


b)      Adding Bi-texts: Adding bi-texts, in both training and tuning data, has shown improvements in the BLEU score. The more texts there are, the higher the likelihood of matching words and segments. Essentially, the amount of data in the SMT heavily impacts the level of accuracy, especially due to the highly repetitive nature of legal texts.


c)      Clean-up and Formatting: Thorough clean-up and formatting of texts are essential for accurate alignment. Due to the vast differences in the Korean and English texts (i.e. numbering methods, line breaks, presence of Chinese characters in Korean texts, etc.), the majority of texts has to first go through manual clean-ups to ensure accuracy in alignment.


d)     Alignment: Alignment of the documents was one of the most important parts of the work. For example, the number of sentences in the source text did not always match the number of sentences in the target text. Adjusting the number of sentences in both languages was a necessary process in achieveing alignment. In addition, chopping sentences in two parts was needed to match the number of sentences in the translated text. Therefore, it is necessary to go through each sentence to check sentence compositions.


e)      Cooperation with LSPs: In the pilot project, it became clear that proper alignment was the most time consuming aspect of the project, requiring approximately 1 hour of work to align 200 bi-texts. Thus, the full-scale project proposes that an LSP already engaged in translation take on or cooperate with the project. Doing this, TM creation will naturally occur through standard legal translation, without the need for alignment, reducing time and cost.


f)       Testing: Periodic testing of once in every six months is recommended in order to gauge progress. A test would include at least 20 translators translating a minimum of 5,000 Korean characters for separate legal texts into English. The average speed of each translator will be calculated and the average of all the translators will provide a clear image of the SMT progress. Within each six month interval, progress will be gauged through BLEU scores provided by Microsoft Translator Hub.


RECOMMENDED WORKFLOW

1.      Gather more bi-texts

2.      Ensure proper alignment of bi-texts for training/tuning data

3.      MT Training

4.      Check BLEU score

5.      Periodic testing


PROJECT TIMELINE AND COSTS

Due to the nature of the project, it is difficult to estimate the exact time and cost required to fully train the machine translation system. We currently have 4,086 segments in the system that we have added over the course of three weeks. Excluding initial errors and trials, we would say that we have spent a full week for the training. This includes formatting the documents, aligning segments, creating dictionaries, etc. It is impossible to measure how much it takes to format documents or create dictionaries. However, we did keep track of how much time it took to align segments: 200 segments per hour. Therefore, to train the optimal number of 100,000 segments, it will require a minimum of 479.57 hours which equals 60 days.


# of Full-time Workers
2
Estimated segments in current dataset
4,086
Number of docs in current dataset:
4
Estimated number of documents needed to achieve 100,000
96
Time to convert/align
479.57 hours/60 days


However, this calculation is based on one of the strategies we have employed: adding bi-texts. To achieve efficiency, multiple attempts with different methods would be needed. Given the complex nature of the process, it is highly likely that the overall process may take much longer than anticipated.

As a result, it would be difficult to estimate the overall timeline to achieve the goal we have stated in the original SOW. At a rate of $30.00 per hour, two full-time employees would incur $14,387 on coversion/alignment. If the project time period lengthens, this could incur significant cost. Whether this is worth the investment should be determined after careful consideration.


RECOMMENDATION FOR IMPLEMENTATION IN CAT TOOLS

The following includes all possible CAT Tool settings including, but not limited to, QA.

Segments Verification
        Check the target segment whether it is shorter by: 50%
Exclude Repetitions
        Exclude the following segment:
        Chinese characters
        Table
Exclude confirmed translations
Since it has been subject to review, once confirmed, the translation should be excluded from QA.
Exclude locked segment
Since it has been subject to review, once confirmed, the translation should be excluded from QA.
Punctuation/
Segmentation
        Add rule exception that after a number and a period, it should remain as one sentence, not two separate sentences
        Add rule exception that after a full stop within a paranthesis should not be separated segements
        Korean language usage of floating dots for listing words () should be recognized as commas
Ignore tags
        Ignore {1> and 1<}
        Ignore  「」


PROPOSED DELIVERABLES

       Progress Chart detailing changes made to the SMT and the resulting blue scores

       Report evaluating the achievability of the goals outlined in this statement and detailing recommendations on how the training can be scaled and on the potential value of training the SMT


ANTICIPATED RESULTS

Based on the pilot test, it is questionable whether it would be possible to achieve the initial goal of 20% cheaper and faster PEMT with acceptable level of quality. The BLEU score did increase by 9.41 after manual formatting and alignment, so it is reasonable to assume that, by cleaning up the entire documents, the score may keep improving.

However, as the pilot test shows, adding a few hundred segments does not make a big difference. We assume that, for the changes to be reflected on the BLEU score, it is necessary to add at least a few thousand segments each time, which will require more time and budget. Also, because legal documents require a high level of specificity, human editing will always be necessary. Therefore, although it is not completely impossible to bring the score up to the desired level, it will definitely take a lot of time, cost, and effort.

Therefore, whether or not the project can benefit from machine translation should be determined after a careful analysis of the volume, importance, and available budget.
 

APPENDIX 1

System
BLEU Score
Training (Segments)
Tuning (Segments)
Testing (Segments)
Comments
1
1.93
1,004
1,551
1,255
 
2
1.93
1,290
1,551
1,255
Clone System #1
Cleared formatting, aligned approximately 50 segments in the training documents
3
5.55
97
500
500
Clone System #2
4
FAILED
 
 
 
Clone System #3
5
8.29
1,090
1,937
601
Clone System #4
Formatted 40% of the document
Aligned 20% of the training document, 10% of the tuning document.
6
3.11
1,034
1,127
784
Clone System #5
Added more texts to training and testing
7
11.34
1,268
1,076
1,082
Clone System #6
Cleaned all training, tuning, and testing documents
8
11.26
1,510
1,076
1,082
Clone System #7
Added 1,700-term dictionary
9
11.34
1,805
1,076
1,082
Clone System #8
Added more bi-texts to training
10
11.34
1,805
1,358
1,082
Clone System #9
Added more bi-texts to testing
11
11.34
1,646
1,358
1,082
Clone System #10
Formatted first bi-texts in training


APPENDIX 2


SAMPLE 1: 400 Words

Estimated words/hour: 800 Words

Error
Type
Occurrences
Severity
Weight
Score
Inconsistent use of punctuation marks
Style
3
Minor
1
3
Wrong use of prepositions
Language
1
Minor
1
1
County should be translated into Si/Gun/Do
Country/
Terminology
1
Minor
1
1
 
 
 
 
Total:
5
 
 
 
 
Est. total per 2,000 words:
25
 
 
 
 
Pass/fail?:
fail


SAMPLE 2: 500 Words

Estimated words/hour: 1000 Words

Error
Type
Occurrences
Severity
Weight
Score
e-Government Act should be Electronic Government Act
Mistranslation
1
Minor
1
1
Chairman should be written as the head of the government
Country
2
Major
1
2
Mix-up in singular and plural forms of nouns
Language
1
Minor
1
1
 
 
 
 
Total:
4
 
 
 
 
Est. total per 2,000 words:
16
 
 
 
 
Pass/fail?:
Fail


o    Presentation on Lessons Learned
            KorBleu - MT vs HT



3.    Custom Filter Assignment

The website pseudol10n.wordpress.com needed to be translated into Simplified Chinese (PRC). Before translating the website, the client needed to make sure that I could create the appropriate filter to manage its website.
 
4.    Article on the Future of Translation Technology
 

 

No comments:

Post a Comment