File categories and languages in Smart projects


File categories

To ensure the correct file flow throughout the localization process, every file uploaded to a Smart project needs to be assigned one of the following categories:

Bilingual Document

A document with content in both source and target language.

CAT Analysis

A result of a statistical analysis performed by CAT Tool software on source documents or bilingual documents using selected translation memories. The analysis contains information about segment coverage of the specified document by translation memories, including repetitions and fuzzy matches.

CAT Package

A package file created by a CAT tool software, e.g., sdlppx (Trados) or mqout (memoQ).

CAT Package (Return)

A return package file created by a CAT Tool software, e.g., sdlrpx (Trados) or mqback (memoQ).

Filtering Rules

A configuration file for CAT tool software. It defines how and which segments should be translated.

Formatted Document

A document with translated content that underwent formatting changes.

memoQ Light Resource

A configuration file for CAT tool software.

Other

Documents that can't be assigned to other categories.

QA Report

A result of a Quality Assurance analysis performed by a CAT tool software on the translated document.

Reference File

An additional file that may help PMs or vendors perform their jobs.

Segmentation Rules

A configuration file for CAT Tool software. It defines how the source document should be split into segments.

Source Document

A file to be translated. It can be delivered by a client or obtained from the File Preparation step (for documents that require preparation before being translated).

Source to Be Prepared

A document that requires preparation before being translated, e.g., PDF or image files that contain text that needs to be extracted. Usually the preparation is performed in the File Preparation step.

Terminology

A database consisting of terminological entries, usually in multilingual format, where each term has its corresponding term in another language.

Terminology is also referred to as a termbase, term base, or terminology base.

Translated Document

A document in the target language, in the original format (i.e., same as the corresponding source document). It can be created manually or exported from a CAT tool as a clean document. Usually, a translated document is verified by an editor or proofreader who can suggest corrections and make notes about the content.

Translation Memory

A database that stores' segments,' which can be sentences, paragraphs, or sentence-like units that have previously been translated, to aid translators. The translation memory stores the source text and its corresponding translation in language pairs called "translation units." Translation memories are usually in bilingual format.

File categories cannot be modified.


Files auto-categorization

XTRF automatically categorizes some of the Smart project files according to the following rules:

  1. Source Document - default fallback category. Files with extensions .pdf, .png, .jpg, .jpeg, .gif, .tmp, .zip, .tar, .gz, and .xlsx files larger than 10 MB are always categorized as 'Source Documents'.

  2. CAT Package - recognized extensions: .sdlppx, .mqout.

  3. CAT Package (Return) - recognized extensions: .sdlrpx, .mqback.

  4. Terminology - recognized extensions: .tbx, .sdtbx, .mqtbx.

  5. Translation Memory - recognized extensions: .tmx, .sqtmx, .mqtmx, .sdtmx, .sdltm.

  6. CAT Analysis - recognized extensions .csv, .htm, .html, .james, .log, .mht, .mhtml, .rep, .rtf, .txt, .xlf, .xls, .xlsx, .xml., .zip. Files larger than 1 MB won't be categorized as CAT Analysis.

  7. Bilingual Document - recognized extensions: .rtf, .sdlxliff, .mqxlz, .mqxliff.

  8. Segmentation Rules - content must include the following attribute: resourcetype= "SegRules".

  9. Filtering Rules - content must include the following attribute: resourcetype= "FilterConfigs".

  10. QA Report - .xlsx files below 500 KB.

  11. memoQ Light Resource - content must include the following attribute: resourcetype="*".

Other categories need to be selected manually upon uploading the files.


Language detection

Language detection works for files uploaded on the Home and Vendor Portals.

XTRF automatically detects and assigns languages to the uploaded files using the following data:

  • project or quote source and target languages

  • job languages

  • content parsing of XLIFF, TM, and TB files.

Files coming from integrated CAT tools contain language assignments defined in the CAT tool. The same languages are automatically assigned in XTRF. This applies to the following file categories:

  1. Bilingual Document

  2. Translation Memory

  3. Terminology

  4. CAT Analysis.


How to change the work files' language or category after upload

Work files are either source documents the client provides for translation or translated documents that will be delivered to the client once the project is finished.

If the uploaded work files have been categorized incorrectly, you can modify their category or language at any point during the project execution.

  1. Go to the Files card and select the checkbox next to the file in question.

  2. Click the File Actions button and select the Edit Properties option.

     

For source documents

In the Edit File Properties pop-up, select the correct category and source language and click the Save File Properties button.

For translated documents

In the Edit File Properties pop-up, change the category to Translated Document and select the target language. Then click the Save File Properties button.