Strings
How text is represented, and the common patterns used to manipulate strings efficiently.
A string is essentially an array of characters. However, because we use strings so often, many languages treat them specially, often making them immutable. Understanding the performance implications of how strings are stored and modified is key to writing efficient code.
Immutability and the Concat Trap
In many high-level languages (like Python, Java, and JavaScript), strings are immutable. Once created, they cannot be changed in place.
When you do s += "!", the language doesn’t just append a character. It:
- Allocates a new buffer large enough for the result.
- Copies the entire old string into it.
- Appends the new character.
If you build a string of length by adding one character at a time in a loop,
you perform copies, which is total work. To
avoid this, always use a mutable buffer (like StringBuilder or an array of
characters) and join at the very end.
Internal Representation
- Contiguous Memory: Like arrays, characters in a string are stored next to
each other. This makes indexing
s[i]an operation. - Encoding: A character isn’t always one byte. ASCII uses 1 byte, but UTF-8 (the web standard) uses 1 to 4 bytes per character. This means “length” can refer to bytes or to visible characters.
Common Operations
- Search (substring): Finding “cat” in a long text. The naive approach is , but algorithms like KMP or Boyer-Moore can do it in .
- Reverse: Usually done in by swapping characters from both ends toward the middle.
Takeaways
- Strings are character arrays, but often immutable in modern languages.
- Repeated concatenation on immutable strings leads to performance.
- Indexing is , but substring operations require more careful algorithms.