» CodepointIterator
…is a small C++ library for traversing unicode codepoints in a UTF8-encoded string.
It provides a std::iterator
derived class implementing the std::bidirectional_iterator_tag
.
The source code is available on both my Github profile and cgit.
For readers versed in German a blog article describing the implementation in a more detailed manner is available.
Current features
- Bidirectional iteration through unicode codepoints
- The class itself does not rely on any external libraries
- Dereferencing an instance of the iterator yields the codepoint as
char32_t
- Unit Tests based on GoogleTest
Usage example
While all features of this class are demonstrated by Google-Test based Unit-Tests we can see a basic UTF8::CodepointIterator
usage example in the following code snippet. The example text is written in Old Norse runes.
std::string test(u8"ᛖᚴ ᚷᛖᛏ ᛖᛏᛁ ᚧ ᚷᛚᛖᚱ ᛘᚾ ᚦᛖᛋᛋ ᚨᚧ ᚡᛖ ᚱᚧᚨ ᛋᚨᚱ"); for ( UTF8::CodepointIterator iter(test.cbegin()); iter != test.cend(); ++iter ) { std::wcout << static_cast<wchar_t>(*iter); }