UTF8
is a simple UTF-8 package for Carp. This nascent string replacement type allows you to use many of the functions you know from Carp strings while respecting unicode runes instead of just having bytes.
Installation
You can obtain this library like so:
(load "git@github.com:carpentry-org/utf8.carp@0.0.5")
Usage
First, let’s define a UTF-8 string to work with!
(let [s (UTF8.from-string "hεllö")]
That’s a cute, short string! Hm, I wonder how long it is!
(length s) ; => 5
That’s surprising! So the length is the actual number of runes, and not the number of bytes, you say? Most curious!
You know what, I want to see this second character there up close! It somehow looks all Greek to me!
(nth s 1) ; => ε
Hm, so this is what that looks like, huh? Interesting. And what’s its type?
(type UTF8.nth) ; => UTF8.nth : (λ [(Ref UTF8), Int] Rune)
So, it’s called a Rune
, huh? Hm, they don’t seem to be super interesting, but
I seem to be able to compare them and stringify them, and even take their length
in bytes! Quite delicious!
I wonder what else I can do with these functions?
ends-with?
(Fn [(Ref UTF8 a), (Ref UTF8 b)] Bool)
(ends-with? u sub)
checks if the string u
ends with the string sub
.
from-string
(Fn [(Ref String a)] UTF8)
(from-string s)
creates an UTF-8 string from a regular string.
nth
(Fn [(Ref UTF8 a), Int] (Maybe Rune))
(nth u n)
returns the n
th rune from a UTF-8-encoded string.
prefix
(Fn [(Ref UTF8 a), Int] UTF8)
(prefix u n)
returns the first n
chararacters of the string u
.
set-runes!
(Fn [(Ref UTF8 a), (Array Rune)] ())
sets the runes
property of a UTF8
in place.
slice
(Fn [(Ref UTF8 a), Int, Int] UTF8)
(slice u a b)
returns a substring of the string from the index a
to the index
b
.
starts-with?
(Fn [(Ref UTF8 a), (Ref UTF8 b)] Bool)
(starts-with? u sub)
checks if the string u
begins with the string sub
.
suffix
(Fn [(Ref UTF8 a), Int] UTF8)
(suffix u n)
returns the last n
chararacters of the string u
.
unsafe-nth
(Fn [(Ref UTF8 a), Int] Rune)
(unsafe-nth u n)
returns the n
th rune from a UTF-8-encoded string unsafely.
update-runes
(Fn [UTF8, (Ref (Fn [(Array Rune)] (Array Rune) a) b)] UTF8)
updates the runes
property of a UTF8
using a function f
.