[Lazarus] Unicode RTL for FPC

Michael Van Canneyt michael at freepascal.org
Thu Jan 12 10:24:42 CET 2023

On Thu, 12 Jan 2023, Rolf Wetjen via lazarus wrote:

> Hi Michael,
> I'm interested in this but I expect that I'm not the expert you are 
> looking for. Some time ago ( Lazarus 1.8 or even earlier) I made a 
> directory sync program for my own use for Windows which is aware of 
> Unicode names in the file system.

Free Pascal is already aware of these names ? If you use UnicodeString, all
file system routines will use the native Windows unicode APIs.

You don't need the unicode RTL for that, however you will need to convert
the names to UTF8 for display in the lazarus GUI, as it uses UTF8.

> I tried to follow your instruction but I failed as git is a pain for me:
>  - Update your git clone
>     git pull https://gitlab.com/freepascal.org/lazarus/lazarus.git 
> lazarus. "lazarus" is my target folder.
> - switch to branch unicodertl
>     git branch --list gives only one branch: main
> Can you please show in detail what to do?

Lazarus itself at this point has not been adapted. 
The instructions were meant for Free Pascal itself, not lazarus.

If you want nonetheless to try FPC:
- Create .fpc-unicodertl.cfg as per instructions in my first mail.

- update/clone fpc

git clone https://gitlab.com/freepascal.org/fpc/source.git fpc

- Switch to unicode branch:

git switch unicodertl

- Create FPC unicode-rtl-capable compiler

cd fpc
make all

- Optionally, install this compiler:

make install

- Use the compiler to create unicode rtl:

cd rtl

make clean all SUB_TARGET=unicodertl PP=/path/to/newly/compiled/compiler

> Do you plan a full Unicode (up to four bytes per codepoint as far as I 
> remember) or a DBCS (double byte character set) version? I don't know 
> what Windows uses and what Delphi does.

I'm just leveraging the existing UnicodeString support of FPC, 
which mimics the Delphi/Windows support.

I'm not that much of a unicode expert, but as far as I understand it is UTF16, 
meaning that up to four bytes per codepoint can be used, which means that a full 
unicode codepoint can take up to 2 pascal characters: the pascal 'Char' can
never specify all unicode codepoints.


More information about the lazarus mailing list