Last changes

04/09/2015
Version 0.5
Segments where source = translation can now be recorded

04/12/2014
Version 0.4.10
Fixed one incorrect Swedish abbreviations

05/04/2012
Version 0.4.9
Improvement of Swedish abbreviations

01/04/2012
Version 0.4.8
Improvement of Swedish abbreviations

24/03/2012
Version 0.4.7
Swedish abbreviations provided by Per Tunedal

08/03/2011
Version 0.4.6
Source and target languages can be entered interactively.
Thanks to Yves Moy for contributing the Python dialog.

25/02/2011
Version 0.4.5
Bug fix: A BOM could be inserted at the beginning of the first segments aligned.
Thanks to Yves Moy for reporting it.

05/11/2006
Version 0.4.4
Correction of bug introduced in last version:
< and > were not transformed properly.

30/10/2006
Version 0.4.3
Better conformity with XML/TMX standard:
& is transformed to &
" is transformed to "
Illegal characters x00-x08, x0B, x0C, x0E-x1F are removed

A short documentation (in English and in French) is included in the package.

19/09/2006
A syntax error was prevented bligner.py to run.
Thanks to Martine for reporting it.

05/07/2006
Version 0.4.1
Segment exceptions containing "\s" (e.g., "z\.\sB\.") were not working.

14/04/2006
Version 0.4
When segmentation is sentence-based, instead of looping on the source sentences, loop is on the max of source and target sentences.
As a result, in case there is a different number of sentences between source and target, all the sentences are used.
In case a sentence does not exist (either in source or target), the last existing one is used instead.
The "sentence number" (e.g., [10]) is added at the end of the sentence, repeatedly in case the sentence is re-used several times (e.g., [10][11][12]).
Based on an enhancement request from Jean-Christophe Helary.

Correction of a bug in the Python version:
When segmentation was sentence-based, and a sentence existed several times in a document, the split pattern was duplicated increasingly for the subsequent target sentences.
As a result, the split pattern remained repeatedly in the subsequent target sentences.

13/03/2006
Version 0.3.1
When Pattern after was not "\s" in a break expression, the "pattern after" was lost
in the second sentence after the split.
Added a specific handling when the pattern is not "\s".

03/03/2006
Version 0.3
"SRX-like" segmentation, more or less compatible
with OmegaT: "\s" is not hard-coded anymore as the "Pattern after".

14/02/2006
Enhanced "algorithm" for segmentation: faster and much shorter.

various small changes in Perl version

creationtoolversion="0.2"
in TMX header

18/01/2006
Perl port: bligner.pl

creationtool="bligner.py"
creationtoolversion="0.1.3"
in TMX header

Removed pipes ("|") in segment.
They shouldn't have been there, since the expression used to segment is "[segment]\s".
As a consequence, "\|" was no more needed in nosegment.

16/01/2006
creationtoolversion="0.1.2"
in TMX header

Corrected typos in nosegment expression.

28/12/2005
Corrected a typo in a comment.

15/12/2005

Modified:
creationtoolversion="0.1.1"
in TMX header

cleanline = cleanline.replace('*', '\\*') was missing for the source text, giving an error message when trying to segment a source text containing "*"

03/12/2005

Modified:
creationtool="bligner"
creationtoolversion="0.1"
in TMX header

Added "sequence" or "paragraph" as segtype=" in TMX header

Reverted to Source and Target language in Uppercase, for compatibility with project_save.tmx

01/12/2005

Added version number

Added Inc., "\" were missing for a.m. and p.m., preventing segmentation in some cases.

nosegment += "|Inc\.|U\.S\.|i\.\e\.|Tel\.|a\.m\.|p\.m\.|\|"

Added cleanline = cleanline.replace('|', '\\|') to the list of cleans
Otherwise, there was an error message when trying to segment a text containing "|"

Printable Version