<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 03/05/17 17:53, Sven Barth via Lazarus wrote:<br>

    <blockquote

cite="mid:CAFMUeB-Pxwp_G-JKD12Yf1EmMdQ=p0+7v4JkD4GZyUUU3L4jfg@mail.gmail.com"

      type="cite">

      <p>Am 03.05.2017 14:37 schrieb "Tony Whyman via Lazarus" <<a

          moz-do-not-send="true"

          href="mailto:lazarus@lists.lazarus-ide.org">lazarus@lists.lazarus-ide.org</a>>:<br>

        > On the other hand, AnsiString and UnicodeString are still

        separate types. Why? Why should there not be a single unified

        string type with (e.g.) ASCII, UTF8 and UTF-16 (or MS Unicode)

        being just another code page?</p>

      <p>Because indexed access to the string data would slow down quite

        a bit as the RTL would need to determine whether the string is a

        1-Byte, 2-Byte, 4-Byte or multi Byte String. Yes the compiler

        could do optimizations for this inside loops, but it would

        definitely slow down -O- code.</p>

      <p>Regards,<br>

        Sven</p>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

    </blockquote>

    <br>

    <p>I don't believe that string indexing even works for UTF8 strings

      at present - at least not in a simple s[i] way.</p>

    <p>Is it really that much overhead to have a simple codepage check

      before calling the correct function to index a string? The obvious

      optimisation would be to check for UTF8, then UTF16 then the

      Default codepage and then the rest. Or perhaps UTF16 first for

      Windows. With register level code you are talking about very few

      actual machine level operations.<br>

    </p>

    <p>To me, a unified string type would have the advantage that:</p>

    <p>- You would only have one managed string type "string" (and hence

      avoids the confusion that exists today).<br>

    </p>

    <p>- You would have standard string byte length and string character

      length functions (which yes, in the latter case, would have to

      have a codepage check as above).</p>

    <p>- String indexing could be standardised as always returning the

      character at position 'i' (including UTF8 strings - albeit after

      having to "walk" the string).<br>

    </p>

    <p>- Automatic transliteration on string compare (with code page

      check of course) - and perhaps with the option to specific a

      non-standard collation.<br>

    </p>

    <p>- Readily portable code.</p>

    <p>- The only time that a programmer has to think about the

      character encoding is when writing code that interacts directly

      with an external interface.</p>

    <p>How often would that extra lookup be significant compared with

      the benefits that unified string handling would bring? And, there

      is no reason why you could not retain the UnicodeString type for

      cases where you really need to optimise UTF16 handling. <br>

    </p>

    <p>I see the unified string type as a further extension to

      AnsiString to include UTF16 and UCS2 code pages together with

      appropriate function support.<br>

    </p>

    <p>Tony<br>

    </p>

    <br>

  </body>

</html>