File Format Reference

This page documents the Supervertaler Project Data Format (Universal Data Exchange Standard).

If you prefer the repository copy, see: https://github.com/Supervertaler/Supervertaler-Workbench/blob/main/docs/specifications/SUPERVERTALER_DATA_FORMAT.md


Supervertaler Project Data Format

Universal Data Exchange Standard

Date: October 14, 2025 Version: 1.0 Status: Implemented


Overview

The Supervertaler Project Data Format is a unified, comprehensive data structure available in two file formats (DOCX and TSV). It serves as the standard for data exchange, archiving, and specialized workflows.


Core Format Specification

Column Structure (Version 1.0)

Column
Type
Description
Required

ID

Integer

Segment identifier

βœ… Yes

Status

String

Translation status (untranslated, draft, translated, approved, locked)

βœ… Yes

Source

String

Source text

βœ… Yes

Target

String

Target text

βœ… Yes

Paragraph

Integer

Original paragraph ID

βœ… Yes

Notes

String

Translator/proofreader notes

βšͺ Optional

File Formats

1. DOCX Format

  • Structure: Word table with 6 columns

  • Style: "Light Grid Accent 1"

  • Headers: Bold text

  • Use case: Review, printing, Word-based workflows

2. TSV Format

  • Structure: Tab-separated values

  • Encoding: UTF-8

  • Header row: Column names

  • Use case: Excel analysis, scripting, data processing


Current Implementation

Export

Menu: Export > Supervertaler project data (DOCX/TSV)

Methods:

  • export_supervertaler_data() - Main entry point with format dialog

  • export_bilingual_docx_full() - DOCX export with all 6 columns

  • export_tsv() - TSV export with all 6 columns

Import

Status: Not yet implemented (planned feature)

Planned functionality:

  • Import DOCX or TSV files in Supervertaler project data format

  • Auto-detect format based on file extension

  • Validate column structure

  • Reimport with full metadata preservation


Future Extensions

1. Proofreading Workflow πŸ“

Concept: Round-trip translation β†’ proofreading β†’ reimport

Export for Proofreading:

Import from Proofreading:

2. Termbase Integration πŸ“š

Concept: Export segments as termbase entries

3. QA Workflow πŸ”

Concept: Export for quality assurance checks

4. Segment Filtering & Partial Export 🎯

Concept: Export subsets based on filters

5. Version Comparison πŸ”„

Concept: Compare two versions of same document

6. Collaborative Translation πŸ‘₯

Concept: Split project among multiple translators

Last updated