Transcription Guiding Principles: Difficult Utterances
Transcription Guiding Principles: Difficult Utterances
Transcription Guiding Principles: Difficult Utterances
This document summarizes the guiding principles for certain types of phrases, that are
transcribed in ways that conflict with standard grammar rules for transcription
language. For language specific transcription, follow the language specific written
domain convention.
Difficult utterances
Everything relating to problematic utterances (background noise, false starts, etc.) or
different language varieties.
Skipping a prompt
● If you can't understand part of the audio, transcribe only the part you can
understand. For the part you cannot understand, create a separate speaker
segment and add the Unintelligible label.
● For utterances that contain speech that is user-generated, pre-recorded, or
synthesized, transcribe all of it.
● If a very small part of a word (at most one syllable) has been cut off, and you
know what the word is supposed to be, transcribe the entire word. If you are not
sure what the word should be, do not transcribe the word at all. Do not put
punctuation after words that have been cut off.
● If a quotation is cut off in the middle, use an end quotation mark anyway.
● Transcribe only numbers that you hear even if the speaker didn't finish saying the
entire number.
Accents
● If you hear a word with non-standard pronunciation, transcribe the word using the
standard spelling
○ Example Audio: where is dat
○ Correct: Where is that?
Agreed spelling
Spelling conventions for words where several options are thinkable, as well as proper
names.
Spelling out
● If a word is spelled out, write it with spaces in between. This rule does not apply
to acronyms, URLs, or email addresses.
○ Example Audio: how do you get to c o s t c o
○ Correct: How do you get to c o s t c o?
Interjections
Proper names
● For proper names, always use the official spelling and punctuation.
○ Example Audio: will i am
○ Correct: will.i.am
● If a personal name could have multiple spellings and context does not help
choose a spelling, use the spelling that yields the most Google search hits when
you search for the name followed by the word "name" (without quotation marks)
(e.g. "Anna name").
○ Example Audio: mcdonald
○ Correct: MacDonald
Media title
● Write media titles as they are most commonly written.
○ Example Audio: screenshots of call of duty black ops two
○ Correct: screenshots of Call of Duty: Black Ops 2
Multiple spellings
● If you hear a word that does not sound like a standard word of your language
because there is a small sound change (i.e. accent, speech errors, speech
impairment, etc.), transcribe the intended word.
○ Example Audio: where is the nearest liberry
○ Correct: Where is the nearest library?
● If you hear a word that does not sound like a standard word of your language, but
it is obviously based on real words, suffixes, prefixes, infixes or circumfixes,
transcribe as is.
○ Correct: interpretate
● If you hear a word that does not sound like a standard word of your language
because it appears to be nonsense, first search for the word in Google. If there is
a clear candidate, transcribe that word. If there is not a clear candidate, but it is
easy to spell and articulated clearly, transcribe it anyway. If there is no clear
candidate and it is not easy to spell, create a separate speaker segment and add
the Unintelligible label.
○ Correct: Souk Abdali
○ Explanation: User says Souk Abdali. Transcriber searches “sukabdali”,
finds correct results. Transcribe Souk Abdali.
Punctuation
Follow the punctuation regulations of your locale. Additional conventions are outlined in
this section.
Commas
● Only use commas if they are required according to language grammar.
Other symbols
● Apart from standard letters, you should not use any other symbol than: 0-9
äâàæÆçÇéèëêïîñÑôöŒœüûùμÿÄÂÀÉÈËÊÏÎÔÖÜÛÙŸ23,?!'"_°:.()<>{}[]√/@#$€£₹+=
%*&-.;
● When two opposing teams are mentioned, include a hyphen between their
names.
○ Example Audio: are you going to the saints bears game
○ Correct: Are you going to the Saints-Bears game?
● Include a hyphen between locations in flight itineraries.
○ Example Audio: rome london flight
○ Correct: Rome-London flight
Spoken punctuation
● For sentence-level spoken punctuation, write out the full word or words between
curly brackets. Do not add punctuation symbols after spoken punctuation. Be
careful with homonyms.
○ Example Audio: okay dot dot dot
○ Correct: Okay {dot} {dot} {dot}
Format
Number
● Cardinals and ordinals from 0 to 9 are written with letters (except for measures
and currency - see Currency and Unit). Use digits for cardinals and ordinals 10
and above, even if they are coordinated with numbers under 10. Transcribe all
decimal numbers as digits.
○ Correct: I have six dogs and 12 parakeets.
● In math expressions or units & measures, transcribe fraction words using
numerals and slashes. Be careful not to use pre-combined fractions like "1⁄4".
○ Example Audio: in three fourths of a mile turn right
○ Correct: In 3/4 of a mi, turn right.
● For mixed numbers in math expressions and units & measures, transcribe them
using numerals.
○ Example Audio: the koala weight twelve and a third pounds yesterday
○ Correct: The koala weighed 12 and 1/3 lb yesterday.
● When referring to items (not units or measures), write fractions out in words.
With mixed numbers, write the whole number part out in words if it is under ten,
otherwise write it with numerals.
○ Example Audio: twelve and a half pumpkin pies were made
○ Correct: 12 and a half pumpkin pies were made.
● Transcribe percentages using numerals followed by the "%" sign. In the unlikely
case that you encounter a number of a million or greater used as a percentage,
spell it out.
○ Example Audio: two percent milk
○ Correct: 2% milk
● Use Roman numerals only when part of an official name or title.
○ Example Audio: king henry the eighth
○ Correct: King Henry VIII
● Transcribe seasons and episodes of television shows with numerals.
○ Example Audio: season three episode two
○ Correct: season 3 episode 2
● Transcribe phone numbers using the most common format(s) in the
transcription language.
● Transcribe phone numbers as you would write them down in their natural groups.
When applicable, the STD code should be surrounded by spaces.
● Math expressions should be transcribed with numerals and math symbols with
spaces in between them.
○ Example Audio: five times six to the third
○ Correct: 5 / 6 ^ 3
● Use the common form for transcribing dates and times as used in transcription
language.
● Write times in hh:mm format whenever possible, unless it would look unnatural
to do so.
○ Example Audio: a few minutes after three
○ Correct: a few minutes after 3:00
Address
● Write out the full names of locations, roads, states, etc. Only use abbreviations
when explicitly spoken.
○ Correct: 751 Jefferson Street, New York City
● Transcribe entities and locations by using a comma between them "ENTITY,
LOCATION"
○ Correct: McDonald's, Castro Street
Web
● Write URLs, email addresses, and Twitter hashtags as they are spoken and don't
capitalize them.
○ Example Audio: im so hashtag hungry i could eat a whole pizza
○ Correct: I'm so #hungry I could eat a whole pizza.
● Do not correct speaker errors such as transcribing a slash when the user actually
says "backslash".
○ Example Audio: h t t p colon backslash backslash mail dot yahoo dot com
○ Correct: http:\mail.yahoo.com
● If the speaker drops a "w" or dots and it's an obvious URL, you should correct
these errors. If the speaker doesn't say the "w"s at all, do not add them.
○ Example Audio: w w facebook dot com
○ Correct: www.facebook.com
● If a URL is spelled out in individual letters, transcribe without spaces between
individual letters.
○ Example Audio: w w w dot t a r g e t dot c o m”
○ Correct: www.target.com
Abbreviation
If you hear speech that is unintelligible or in a foreign language, create a speaker
segment that covers only the audio range with that speech. Select either the
Unintelligible or Foreign Speech Label and assign to the appropriate speaker.
If the entire audio is unintelligible or in a foreign language, create a speaker segment
that covers the entire audio range and select either the Unintelligible or Foreign Speech
Label.
If you hear audio that is singing, transcribe the lyrics, assign to the appropriate speaker,
and add the Singing label. If the singing is in a foreign language, select the Foreign
Speech label.
Segmentation
All speaker segment boundaries should be accurate with at least 100ms precision.
Speaker turns should not contain pauses in speech that are longer than 0.5 seconds. If
a speaker does pause their speech for longer than 0.5 seconds, end the speaker turn
before the pause then create a new turn for when the speaker resumes talking.
Speaker turns should not be longer than 30 seconds. If a single speaker talks for more
than 30 consecutive seconds without taking a 0.5 second pause, then end the turn at
the 30 second mark and begin a new turn.
Speaker labeling
All speaker labels should be consistently formatted. Speaker labels should always: be in
all lowercase, be spelled correctly, and should not contain underscores or hyphens. Only
transcribe up to the 20th unique speaker.
Correct: speaker 1
Incorrect: Speaker 1
● 'speaker #' Used for different speakers in the audio. Includes a number that
corresponds to each different speaker.
● 'pre recorded speaker #' Used when there is speech coming from a machine.
Includes a number that corresponds to each different pre recorded speaker.
● 'unidentifiable speaker' Used when you cannot identify who the speaker is. Does
not ever include numbers.
● 'speaker Tom' Used when the name of a speaker becomes known. The names of
speakers should always be capitalized. You can use first and last names. (Note:
adding speaker names will be allowed for some projects but not others. In tool
validators will indicate whether or not you can submit a speaker name.)
Audio labels
For instances with music and lyrics, create separate labels: one for Music and one for
Singing. Transcribe the singing speech and assign to the appropriate speaker.
Below is a list of labels that may be available for you to choose from. Each contain a
description of when is an appropriate time to use it.
● Applause: Use this label if you hear one or more people clapping or cheering.
● DTMF: Stands for 'dual-tone multi-frequency.' This is the sound you hear when a
number is pressed during a phone call. For example, 'Press one to speak to a
representative' DTMF.
● Foreign Speech: Use this label if the speaker is not talking in the target language.
● Laughter: Use this label when you hear laughter.
● Music: Use this label when you hear music.
● Noise: Use this label for instances of miscellaneous noise events.
● PII: Use this label when you hear Personally Identifiable Information (PII). For
more information see the PII section of this guideline.
● Ring Tone: Use this label when you hear a ring-tone.
● Singing: Use this label to indicate that the speaker is singing their speech.
● Unintelligible: Use this label to indicate that you cannot understand what the
speaker is saying.
● Unknown: Use this label for audio events that are not classified to any of the
above labels.
Pii
PII stands for Personally Identifiable Information. PII is information that is not publicly
available, but can help you or Google identify an individual person.
PII should never be transcribed. When PII is heard, create a new speaker segment that
captures the audio range of the PII speech. Add the PII label and assign to the
appropriate speaker. Do not transcribe PII.
If applicable, select the appropriate PII subcategory which is nested underneath the
parent 'PII' label. If the appropriate PII subcategory is not available then select the
parent 'PII' label to cover all other cases. If there are no PII subcategories available to
choose from then use the 'PII' label for all cases of PII.
See the below list for all valid PII items
● NAME: First and/or Last name
● CREDIT_CARD_NUMBER
● EMAIL
● PHONE_NUMBER
● SOCIAL_INSURANCE_NUMBER
● DRIVER_LICENSE_NUMBER
● NATIONAL_HEALTH_SERVICE_NUMBER
● SOCIAL_SECURITY_NUMBER
● PASSPORT
● TAX_FILE_NUMBER: A tax file number (TFN) is a unique identifier issued by the
Australian Taxation Office (ATO) to each taxpaying entity
● LOCATION_STREET: If the street name is heard, mark as PII. Other locations
such as State, City, County, zip code are all OK to transcribe.
● LOCATION_STREET_NUMBER: If the street number is heard, mark as PII. Other
locations such as State, City, County, zip code are all OK to transcribe.
● MRN (medical record number)
● BANKERS_CUSIP_ID: CUSIP stands for Committee on Uniform Securities
Identification Procedures. A CUSIP number identifies most financial instruments,
including: stocks of all registered U.S. and Canadian companies, commercial
paper, and U.S. government and municipal bonds.
● BC_PHN: Each B.C. resident enrolled with the Medical Services Plan (MSP) is
given a unique lifetime identifier for health care called a Personal Health Number
(PHN)
● OHIP: Ontario Health Insurance Plan
● QUEBEC_HIN: Quebec Health Insurance Number
● CNI NIR: The French national identity card (French: Carte nationale d'identite or
CNI) is an official identity document consisting of a laminated plastic card
bearing a photograph, name and address.
● IBAN_CODE: The International Bank Account Number (IBAN) is an internationally
agreed system of identifying bank accounts
● SWIFT_CODE: A SWIFT code is an international bank code that identifies
particular banks worldwide. It's also known as a Bank Identifier Code (BIC).
● BANK_ROUTING_MICR: The numbers located on the bottom of a check is called
a MICR line. MICR means Magnetic Image Character Recognition. The MICR line
is made up of three sets of numbers. The first set is called the ABA Bank Routing
Number or routing transit number (RTN)
● DEA_NUMBER: A DEA number (DEA Registration Number) is an identifier
assigned to a health care provider (such as a physician, optometrist, dentist, or
veterinarian) by the United States Drug Enforcement Administration
● HEALTHCARE_NPI: A National Provider Identifier or NPI is a unique 10-digit
identification number issued to health care providers in the United States by the
Centers for Medicare and Medicaid Services (CMS).
● MEDICARE_NUMBER
● NIE_NUMBER: The NIE is a tax identification number in Spain
● CPF_NUMBER: The CPF (Cadastro de Pessoas Fisicas or Natural Persons
Register) is a number assigned by the Brazilian revenue agency to both Brazilians
and resident aliens who are subject to taxes in Brazil
● PAN_INDIVIDUAL: Permanent Account Number (PAN) is a code that acts as an
identification for individuals, families and corporates (Indian or Foreign),
especially those who pay Income Tax
● BSN_NUMBER: netherlands: The citizen service number (BSN) is a unique
personal number allocated to everyone registered in the Personal Records
Database (BRP).
● ICD_CODE: International Statistical Classification of Diseases and Related Health
Problems (ICD), a medical classification list by the World Health Organization
(WHO). It contains codes for diseases, signs and symptoms, abnormal findings,
complaints, social circumstances, and external causes of injury or diseases.
● FDA_CODE: Prescription drug
● NIF: Tax Identification Number in Spain
http://www.investinspain.org/guidetobusiness/en/2/art_2_3.html
● TAXPAYER_REFERENCE
● CURP_NUMBER: CURP is the abbreviation for Clave Unica de Registro de
Poblacion (translated into English as Unique Population Registry Code or else as
Personal ID Code Number). It is a unique identity code for both citizens and
residents of Mexico.
● RRN: Receiver Registration Number (RNN) is a 10-character alphanumeric can be
to a bank account, a credit/debit card, mobile wallet, or home delivery.
Information that does not fit the above PII rules should NOT be considered PII. Some
examples of things that are not PII are:
Correct: Commonly known names like Taylor
Swift, Tom Hanks, Michael Jordan
Typo
A typo results in the unintentional creation of a non-word.
Correct: နလငဘ Explanation: Please make sure to type
correctly.
Incorrect: နလူငဘ
Avoid making any typographical errors. Carefully check your work before marking items
as "complete".
Correct: ရပစနဆင တ ကမယ။ Explanation: Certain words spell differently than
their pronunciation. Please make sure you're
Incorrect: ရပဇနဆင တ ကမယ။
using the right spelling.
Context error
A context error occurs when a real word is used incorrectly or when the incorrect form
of a word is used. This includes homophones and punctuation, among other things.
Correct: ငက ပ ရက တ သတလက။
Incorrect: ငက ပ ရက တ သပလက။
Do not transcribe words that are not spoken, even if they are obviously intended by the
speaker. Avoid putting words in the speaker's mouth. However, do transcribe implied
times and units of currency.
Correct: ပန က ဝတမန သ ကညရ အ င။ Example audio: " ပန က ဝတမန သ ကညရ အ င "
Incorrect: ပန က ဝတမန ပရင သ ကညရ အ င။ Explanation: Do not add the omitted " ပရင".
Incorrect: Kyat င သ င င ထ င ဆတ
သ က လ တစဗ အတက ဈ မ က ဘ လ ။
Transcribe all words spoken, even if they are not intended by the speaker. For
interjections and non-speech vocalizations, refer to Agreed Spelling > Interjections and
Difficult Utterances > Hesitations and Truncations.
Correct: ဇ တက ထမ ဇတလက အ လ Explanation: Speaker clearly corrected
ဇ တလကလပတ ဘယသမ လ? themselves after mistakenly saying "ဇတလက".
Correct: စ မ ပ ပ ရင ဟ လ စတတ Explanation: Speaker thinks out loud by saying
သ ကညမယ န ။ "ဟ လ".
Substitution
Spacing
Use only one space between words and sentences.
Correct: မနကဖနနနက ဘယသ ခ ငသလ။
Correct: ရတဂက သ ပ န က တ
ပည ထ ငစရပသ မ စ မယ။
Incorrect: ရတဂက သ ပ _ န က တ
ပည ထ ငစရပသ မ စ မယ။
For most types of punctuation, do not put a space between the preceding word and the
punctuation.
Correct: န နတ လ ။
Incorrect: န နတ လ ။
Correct: မမ ဝ၊ မ တ တ က ပ န ။
Incorrect: မမ ဝ ၊ မ တ တ က ပ န ။
For quotation marks and similar punctuation, put a space before the opening
punctuation, but not necessarily after the closing punctuation.
Punctuation
Follow the punctuation regulations of your locale. Additional conventions are outlined in
this section.
Add punctuation where needed, but err on the side of keeping it minimal.
Sometimes a phrase which is not obviously grammatically a sentence should
nevertheless be treated as a sentence because of its context, e.g. if it's an answer to a
specific question, or if it's an example where dropping the subject sounds completely
natural as a complete sentence.
Correct: ဘယသ ပ နတ လ။ ဟဘ က ။ Explanation: Two speakers. "ဟဘ က ။" is an
answer to a specific question.
Interjections, greetings, and farewells said in isolation should be considered complete
sentences and punctuated as such.
Correct: သဟ။ Explanation: interjection
Add end punctuation to sentence fragments that sound like the end of a sentence. For
fragments that do not clearly sound like the end of a sentence, leave out punctuation.
Note that sentence fragments may be a result of cut-off audio samples.
Correct: ဘယလလပမလ။ တကယလမ Explanation: Sentence-initial fragment ends
mid-stream.
Correct: ဆ တ ပခက နရင သ မလပန ။ Explanation: Audio was cut off at the beginning.
Correct: လကဘကရညဆင သ လကဦ မယ။ ခဏ လ Explanation: Do not use a punctuation, hyphen,
ဆရ လကဦ မလ ။ or ellipsis after a fragment even if another
sentence follows.
Correct: ဘယက န ဘ ဖစလ ဘယလ က င Explanation: Both sound like beginnings of
sentences.
If an utterance is not clearly a sentence according to the above rules and examples, do
not punctuate it as a sentence.
Commas
Only use downward stroke "၊" where required. Err on the side of minimal punctuation. Do
not rely on intonation.
Correct: ဓ တဆ ဖညဖ နရ က ဘယန မ လ။ Explanation: Even if the speaker uses long
pauses, do not use commas to show those
pauses. There are places where commas are
Incorrect: ဓ တဆ ဖညဖ၊ နရ က၊ ဘယန မ လ။ allowed or required, but this example contains
neither.
Use a downward stroke "၊" when a sentence begins with a discourse word, interjection,
or yes/no word. If there is a long pause between a discourse word, interjection, or
yes/no word and a full sentence that follows it, treat that initial word as a separate
sentence.
Correct: သ ဓကယ၊ က န မ ခ မ သ ပ စ။ Explanation: Discourse word. Other examples of
discourse words in Burmese include "ခငဗ ",
"တငပ ", "မနလပ ", and "ဟငအင ".
Correct: ဇ ကကရယ၊ ပစတင ထ ငရယ၊ ဖ ဝ ပရယ Explanation: A series of items shall be
အကနဝယခ။ separated by little section sign "၊".
Except in greetings, sentence-initial and sentence-final addressees should be separated
by downward stroke "၊".
Correct: မ မ၊ သ အမစ ပ လ တယ။
The phrase "Ok Google" in isolation is transcribed without a comma or end punctuation.
When the phrase appears before longer utterances, place a comma after "Google".
Correct: Ok Google
Incorrect: အ က၊ ဂဂ။
Correct: Ok Google, သလငပင ပပ Explanation: Always use a comma "," after
Google even when followed by an utterance in
Burmese.
Intonation marks
Questions should be punctuated as sentences. In the case of a question in another
language (English for instance), the sentence should be capitalized and punctuated with
a question mark.
Correct: သခ တယလ ။ Explanation: "လ " is a question word in
Burmese.
Correct: မနက ၃:၀၀ လ ။ Explanation: Regardless of rising tone, it is
most likely a question when an utterance ends
with "လ ".
Exclamation marks should not be used in Burmese script. Use a double downward
stroke "။" even if the speaker uses an exclamatory intonation.
Correct: ဟ ။ Explanation: Speaker sounds enthusiastic.
Do not put punctuation between reported speech verbs and direct quotations. Do not
put punctuation within quotation marks unless the punctuation belongs to the reported
speech.
Correct: သဇ က " ပညက သ လညမယ" တ။ Explanation: The word "တ" is the most common
reported speech particle in Burmese, but other
Incorrect: သဇ က၊ " ပညက သ လညမယ" တ။
words such as "ဆပ", " ပ တယ", "လ" can be used
for reported speech. No need to use comma or
Incorrect: သဇ က " ပညက သ လညမယ။" တ။
section sign before quotation.
Incorrect: သဇ က - " ပညက သ လညမယ" တ။
If the text in quotation marks qualifies as a sentence, punctuate as if it were its own
utterance. Do not alter its end punctuation even if the quote is within a sentence. Do not
add excess punctuation after end quotation marks.
Correct: အဘက ဆသညမ "သသရ ငမ စသတည ။" Explanation: Text in quotation marks qualifies
as a sentence.
Incorrect: အဘက ဆသညမ "သသရ
ငမ စသတည ။"။
Use a hyphen in quotative voice actions when the quote follows the command. Use
quotation marks when the quote is in the middle of the sentence.
Correct: ပငသစဘ သ ပနရန - န က င လ ။ Explanation: The quote follows the command,
so use a hyphen not quotation marks.
Correct: " န က င လ " က ပငသစဘ သ သ ပနပ ။ Explanation: The quote is in the middle of a
sentence, so use quotation marks not a
hyphen.
Correct: ပငသစလ "ခ စတယ" လ ဘယလ ပ မလ။ Explanation: Do not use a hyphen after " ပ "
verbs in translation requests.
Incorrect: ပငသစလ - "ခ စတယ" လ ဘယလ ပ မလ။
Correct: [email protected] သ - ဟ ၊
န က င ကလ ။
Do not use quotation marks for metalinguistic uses of words or phrases. These uses
include defining the word, talking about the spelling of the word, or any other type of
reference to the word itself as a thing.
Other symbols
Apart from the Burmese letters and the Latin letters a through z, you should not use any
other symbol than: 0-9 ၀-၉
äâàáāçčćđéèéëêēïîíīñóôöüőōšûùúűūÿȳžÄÂÀÁĀÇČĆĐÉÈÉËÊĒÏÎÍĪÑÓÔÖŌŠÜÛÙÚŪŸȲŽ²³,?
!~^\'"_°:.()<>{}[]√/@#$€£+=%*&-.;
Spoken punctuation
For sentence-level spoken punctuation, write out the full word or words between curly
brackets. Do not add punctuation symbols after spoken punctuation. Be careful with
homonyms. (See exceptions in the next rule.)
Correct: Okay {dot} {dot} {dot} Example audio: " okay dot dot dot "
Incorrect: Okay...
Correct: ဇဇဝ {ပဒက လ } ခတ {ပဒက လ } ပန ညက Example audio: " ဇဇဝ ပဒက လ ခတ ပဒက လ
ဝယလ ပ {ပဒမ} ပန ညက ဝယလ ပ ပဒမ "
Incorrect: :-)
Don't spell out internal punctuation like hyphens in web pages, email addresses,
addresses, phone numbers, or other word-level punctuation.
Correct: အခန နပ တ ၉၈/၉၉ Example audio: " အခန နပ တ က ဆယရစ မ စ င
က ဆယက "
Incorrect: အခန နပ တ ၉၈ မ စ င ၉၉
If a word that can refer to a punctuation mark is spoken in isolation, it should be written
out between curly brackets.
Correct: {ပဒမ}
Correct: {မ စ င }
Format
Transcribe numbers, abbreviations etc. following the formatting conventions in this
section.
Number
Use Burmese numerals and not Arabic numerals.
Correct: ၁ ၂ ၃ ၄ ၅ Example audio: " တစ စ သ လ င "
Incorrect: 1 2 3 4 5
Cardinals and ordinals from 0 to 9 are written with letters (except for measures and
currency - see Currency and Unit). Use digits for cardinals and ordinals 10 and above,
even if they are coordinated with numbers under 10. Transcribe all decimal numbers as
digits.
Correct: အတန ထမ က င သ က ယ က ရတယ။ Explanation: Numbers less than 10 are written
with letters.
If a large number consists of only a number followed by "သန ", "မလယ", "ဘလယ",
"ထရလယ", or higher, then transcribe as a numeral plus word. Otherwise, transcribe as
numerals.
Correct: ဘငန က င ရ ၅ သန Example audio: " ဘငန က င ရ င သန "
Write lists of numbers with digits and without commas.
Correct: ၀ ၁ ၂ ၃ ၅ ၇ ၁၁ ၁၃ Example audio: " သည တစ စ သ င ခန ဆယတစ
ဆယသ "
In math expressions or units & measures, transcribe fraction words using numerals and
slashes.
Correct: သ က ၁/၃ ပ လမယ။ Example audio: " သ က သ ပတစပပ လမယ "
Incorrect: ၈ မငခက လ ခပ ။
Correct: က န တ တ ၁/၂ လကမ သစ ပ လမယ။ Example audio: " က န တ တ လကမဝက သစ ပ
လမယ "
For mixed numbers that represent currency amounts, always use decimals.
Correct: လတင မ Kyat ၂.၅ ရတယ။ Example audio: " လတင မ စက ပခ ရတယ "
Correct: အမဝယတန က Kyat ၅.၅ သန ပ ခရတ ။ Example audio: " အမဝယတန က င သန ခ ပ ခရတ
"
Transcribe percentages using numerals and the % sign. (In the unlikely case that you
encounter a number of a million or greater used as a percentage, spell it out.)
Correct: ၂%
Correct: ၁ မလယ ရ ခင န
If a number appears in a context which calls for a certain formatting in your language,
use that formatting. Otherwise, default to the general rule for transcribing numbers.
Transcribe phone numbers using the most common format in the transcription
language.
Correct: ၀၁-၂၀၂၈၁၈ Example audio: " သည တစ စ သည စ ရစ တစ ရစ "
If it really sounds like a math expression, then transcribe it with numbers and symbols,
with spaces in between.
Correct: ၅ / ၆ ^ ၃ Example audio: " င အစ ခ က သ ထပ "
Incorrect: င အစ ခ က ထပ န ကန သ
Transcribe currencies as commonly written in the transcription language.
Correct: Kyat င ခ ကနခတယ။ Example audio: " က ပ င ခ ကနခတယ "
Incorrect: ¥၂၀၀
For degrees, use the ° symbol.
Correct: အပခ န ၃၁° စငတဂရတ
Correct: အ ရ ၉၇°၃၀' လ ငဂ က ဟ
မနမ စ တ ခ နမ ပ ။
Correct: ပတ အမ ရခမတ အ က အ တင ရတယ။ Example audio: " ပတ အမ ရခမတ အ က အ တင
ရတယ "
Incorrect: ပတ အမ ရခမတ အ က -၅ ရတယ။
For all the units that follow numeric values, please use full Burmese words.
Correct: အမမ ၂ လတ ပလင ရတယ။ Example audio: " အမမ စလတ ပလင ရတယ "
Transcribe all numeric values preceding units in numeral form, even if under 10.
If it is clear from context that a number or number sequence refers to currency or time,
format it as such.
Correct: မနက ၅:၃၀ စက ပ ထ ပ ။ Example audio: " မနကင ခ စက ပ ထ ပ "
Incorrect: မနက ၅ ခ စက ပ ထ ပ ။
Use the natural form for transcribing dates.
Correct: ဇလင ၁၂ ရက၊ ၁၉၆၄ ခ စ Example audio: " ဇလငလ တစ စ ရက
ထ ငက ရ ခ ကဆယ လ ခ စ "
Correct: ဗဒဟ န၊ မတလ ၆ ရက Example audio: " ဗဒဟ န မတလ ခြ ကရက "
Write times in hh:mm format whenever possible, unless it would look unnatural to do
so.
Correct: ၃:၀၀ န ရ Example audio: " သ န ရ "
Correct: ၆:၃၀ အ ရ က ပနလ မယ။ Example audio: " ခ ကခအမ ပနလ မယ "
Use "နနက၊", " နလည၊", "ည န၊", or "ည၊" if spoken.
Correct: နလည ၁၂:၀၀ န ရတတ လ ခမယ။ Example audio: " နလည ဆယ စန ရတတ လ ခမယ
"
Address
Favor full spellings over abbreviations where natural, but use abbreviations when
explicitly spoken.
Use the section sign "ပဒက လ " for ENTITY၊ LOCATION
Correct: ဝင က ကအ၊ က ကတတ
Correct: ရ က င ကယ၊ ၅၁ လမ
Correct: အမအမတ ၃၁၊ ၄၉ လမ
Correct: မ င အ က ဖ ၊ ဗလ မတထန လမ င
ရ က လမ ထ င
Correct: ရနကန မ တ ခန မ၊ မ လ မ
Web
Write URLs, email addresses, and Twitter hashtags as they are spoken and don't
capitalize them.
Correct: www.google.com.mm Example audio: " ဒဘလ သ လ ဒ ဂဂ ဒ ကန ဒ
အမအမ "
If the speaker drops a "w" or dots and it's an obvious URL, you should correct these
errors. If the speaker doesn't say the "w"s at all, do not add them.
Correct: www.google.com.mm Example audio: " ဒဘလ ဒဘလ ဒ ဂဂ ဒ ကန
ဒ အမအမ "
Abbreviation
Do not abbreviate unless the speaker says an abbreviated form.
Correct: မက န သ သမဟတ က က တ Example audio: " မက န သ သမဟတ က က တ "
Incorrect: မက န သ (သ) က က တ
Incorrect: မက န သ သ က က တ
Incorrect: စ ပ ရ င က သန ရ င ဝယ ရ
ဝန က ဌ န
Capitalize and abbreviate English titles when accompanied by proper English names.
Correct: Dr. Schuster
In Burmese, transcribe titles as spoken.
Correct: ပ ရ ဖကဆ ဒ နလ သန
In acronyms, do not use periods between letters.
Correct: NASA, NASCAR, AAMCO, ZIP code
Agreed spelling
Spelling conventions for words where several options are thinkable, as well as proper
names.
Spelling out
If a word is spelled or obvious pauses are made between letters, spell it into letters as it
is said (often done for foreign names or businesses, for example). Use lowercase letters
for the spelled-out portion. This rule does not apply to acronyms or initialisms, or to
spelled-out web or email addresses.
Correct: သ ဝထ မရရစ မ Example audio: " သ ဝထ မရရစ မ "
Incorrect: ဗအငပ တ
Explanation: When it is plural, add the
plural-indicator as pronounced.
Interjections
Transcribe words representing laughter or other non-speech vocalizations with up to
three syllables, but no more.
Correct: ဟ ဟ ဟ ဟ
Proper names
use official spelling and punctuation for proper names. Google them and pay attention
to the correct format. Official format and spelling of a proper name may supersede the
usual written transcription conventions detailed in this document.
If a personal name could have multiple spellings and context does not help choose a
spelling, use the spelling that yields the most Google search hits when you search for
the name followed by the word "name" (without quotation marks) (e.g. "Anna name").
Correct: သန လင Explanation: Although "သန ", "သဏ " and "သမ "
have the same pronunciation, "သန " is the most
Incorrect: သဏ လင
common spelling.
Correct: ဦ က ကလ Explanation: "ဦ က ကလ " is the most common
spelling.
Incorrect: ဦ က ကလ
Please follow the standard spelling when you transliterate toBurmese. For names
constituted of a first name and last name, there shouldn't be a space between the first
name and the surname. Look them up to check their standard spelling.
Correct: အဒလ Explanation: If the name is only one word (in
this example: Adele), transliterate to Burmese.
Please follow the standard spelling.
If a speaker makes a small mistake in a proper name, transcribe it anyway as long as
the difference is minimal. "Minimal differences" refers to adding or dropping articles,
possessives, and plurals.
Correct: ညမ လ က အခ စက က ကယတယတလ Example audio: " ညမ လ က အခ စက
က ကယတယတလ "
Incorrect: ညမ လ က အခ စက က ကယသတလ
Correct: The Lord of the Ring Explanation: Actual name is "The Lord of the
Rings".
Incorrect: သမ ငအ မမ
Correct: ယက
Correct: ဂ သန
Incorrect: ခ သန
Correct: LEGO
Incorrect: Lego
The phrase "Ok Google", as well as possible derivatives such as "Ok Google Now" and
"Ok Glass", require their own particular spelling of "okay". This spelling is unique to these
cases.
Correct: Ok Google
Correct: အ က
Correct: အ က၊ ဆကလပ။
Media title
Refer to the Google Play Store for official spellings of media titles. For film/television,
IMDb is also available. If an utterance is ambiguous between a media title and a
sentence or web search, use your judgment for which is more likely; if truly unclear,
default to media title.
Do not use quotation marks for media titles.
Correct: သန ဖ မငရ အ ရ က နဝန ထကသညပမ
စ အပ
Sometimes, media titles can include numbers or digits. Please transcribe as full words.
Correct: သတ ခနစတန Example audio: " သတ ခနစတန "
Correct: ခ က ဒသမ သညတစ စ အပ Example audio: " ခ က ဒသမ သညတစ စ အပ "
Incorrect: ၆.၀၁ စ အပ Explanation: For media titles, the whole title
should be transcribed including numbers.
Multiple spellings
When multiple spellings are attested, use the first spelling used in the reference
dictionary for your language. If there is no entry, Google the word and use the form with
the most hits.
Correct: ဖည ဖည Explanation: " ဖည ဖည " is preferred by
Myanmar Language Commission (MLC)
Incorrect: ဖ ဖ
Dictionary.
Transcribe slang and colloquialisms as spoken according to the appendix on this page.
Do not alter non-standard speech that the speaker probably wouldn't want corrected.
Correct: မသဘ Example audio: " မသဘ "
Incorrect: မသဘ
Incorrect: ခဏ လ က က
Incorrect: က ဖ သ ကဦ မယ
Write commonly accepted contractions as usual. Transcribe contractions when you
hear them spoken.
Correct: ပနလ မယ မတလ Example audio: " ပနလ မယ မတလ "
Incorrect: ပနလ မယ မဟတလ Explanation: The speaker said "မတလ " which
actually means "မဟတလ ", please write the
utterance as pronounced.
Correct: ဟတကလ Example audio: " ဟတကလ "
Use standard spelling for reductions that commonly occur in normal running speech,
like "သ တ မလ" and "စ မလ ", for "သ တ မ" and "စ မ " respectively.
Correct: ဘယသတန Example audio: " ဘသတန "
If you hear a word that does not sound like a standard word of your language, but it is
obviously based on real words, suffixes, or prefixes, transcribe as is.
Correct: ဥ ရဩ Explanation: Speaker meant "ဥဩ" but added
one extra middle-infix " ရ".
Transcribe onomatopoeia when clearly spoken. Otherwise, use the Unintelligible label
as instructed in: Longform generic rules > Unintelligible or foreign or singing.
Difficult utterances
Everything relating to problematic utterances (background noise, false starts, etc.) or
different language varieties.
Skipping a prompt
For Loft 1.0: If you can't understand part of the audio, transcribe only the part you can
understand. For the part you cannot understand, create a separate speaker segment
and add the Unintelligible label as instructed in: Longform generic rules > Unintelligible
or foreign or singing.
For Loft 2.0: Below is a list of reasons to skip the audio that may be available for you to
choose from. Each contain a description of when is an appropriate time to use it.
● No Audio: The audio doesn't load.
● No Sound: The waveform indicates there is audio but I can't hear anything.
● Other Locale: All of the speech is in a different language.
● Silent Audio: The entire utterance is silent
● Noisy Audio: The entire utterance is too noisy.
● Other: Other reason (Please explain).
For utterances that contain speech that is user-generated, pre-recorded, or synthesized,
transcribe all of it.
If a speaker says only the beginning part of the word, transcribe it if it can be considered
a word on its own. Otherwise do not transcribe the false start.
Correct: အလ မ ခ ကန Example audio: " အ အလ မ ခ ကန "
Sometimes, twisted words may be used in Burmese. The speaker may or may not have
actual apraxia, sometimes, he/she may choose to say the word by purposely swapping
vowels or syllables. English example: "kitchen" - "chicken" or "tea pot" - "top pea". If the
speaker uses a twisted word, do not correct, transcribe as spoken.
Correct: သဘ တ ပ သလ ။ Example audio: " သဘ တ ပ သလ "
Transcribe any filler words that are applicable and used in the target language. Below
are examples of filler words in the English language. These may or may not be
applicable in the target language. Again, only transcribe filler words that exist in and are
used in the target language.
● um
● uh
● right
● you know
● so
● like
Below is a list of all filler words that should only be transcribed if affirmations or
answers to a question.
Correct: I need to get a new um telephone.
● ah
● er
● mhm
Only transcribe foreign words if they are common in your language or if speakers of
your language would understand it. If it is foreign and speakers of your language would
not know this word, use the Foreign Speech label as instructed in: Longform generic
rules > Unintelligible or foreign or singing.
Foreign language
Accents
Correct non-standard pronunciations to their standard ones. Non-standard
pronunciations could be from speakers of regional dialects, language learners, or
speakers from different countries.
Correct: အ က က သယရတ Example audio: " အ က က သယရသ "
Incorrect: အ က က သယရသ Explanation: Speaker said "သ " instead of "တ ",
but it should be spelled as standard.