C++: splitting strings

2015/11/08

Tags: c++ string operations split strings strings tokenize split

There are multiple ways of splitting or tokenizing strings in C++. I’ll enumerate below four types that seem most used and/or useful.

The C-style strtok

#include <cstring>  
  
char str[] ="The quick brown fox jumps over the lazy dog";  
char * pch;  
pch = strtok (str," ");  
while (pch != NULL)  
{  
  printf ("%s\\n",pch);  
  pch = strtok (NULL, " ");  
}  

Using the C++ std::stringstream class

#include <sstream>  
  
std::string input = "The quick brown fox jumps over the lazy dog";  
std::stringstream ss(input);  
std::string item;  
while (std::getline(ss, item, ' ')) {  
    std::cout << item << std::endl;  
}  

Using std::string methods only

std::string input = "The quick brown fox jumps over the lazy dog";  
std::string strSplit = " ";  
size_t pos = 0;  
size_t start = 0;  
std::string subStr;  
while( (pos = input.find(strSplit, start)) != std::string::npos){  
    subStr = input.substr(start, pos-start);  
    start = pos + strSplit.size();  
    std::cout << subStr << std::endl;  
}  
subStr = input.substr(start);  
std::cout << subStr << std::endl;  

Using the boost libraries

#include <vector>  
#include <boost/foreach.hpp>  
#include <boost/algorithm/string.hpp>  
#include <boost/algorithm/string/iter_find.hpp>  
  
std::string input = "The quick brown fox jumps over the lazy dog";  
std::string strSplit = " ";  
std::vector<std::string> stringVector;  
boost::iter_split(stringVector, input, boost::first_finder(strSplit));  
for(auto it : stringVector){  
    std::cout << it << std::endl;  
}  

All these methods do the same thing, each having it’s pros and cons (which won’t be explained here - google knows best :) ).
The first two methods can use as delimiters only single characters, while the last two can use words also.