Splitting text in python

Split strings in Python (delimiter, line break, regex, etc.)

This article explains how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.

Refer to the following articles for more information on concatenating and extracting strings.

Split by delimiter: split()

Use the split() method to split by delimiter.

If the argument is omitted, it splits by whitespace (spaces, newlines \n , tabs \t , etc.) and processes consecutive whitespace together.

A list of the words is returned.

s_blank = 'one two three\nfour\tfive' print(s_blank) # one two three # four five print(s_blank.split()) # ['one', 'two', 'three', 'four', 'five'] print(type(s_blank.split())) # 

Use join() , described below, to concatenate a list into a string.

Specify the delimiter: sep

Specify a delimiter for the first parameter, sep .

s_comma = 'one,two,three,four,five' print(s_comma.split(',')) # ['one', 'two', 'three', 'four', 'five'] print(s_comma.split('three')) # ['one,two,', ',four,five'] 

To specify multiple delimiters, use regular expressions as described later.

Specify the maximum number of splits: maxsplit

Specify the maximum number of splits for the second parameter, maxsplit .

If maxsplit is given, at most maxsplit splits are done (thus, the returned list will have at most maxsplit + 1 elements).

s_comma = 'one,two,three,four,five' print(s_comma.split(',', 2)) # ['one', 'two', 'three,four,five'] 

For example, maxsplit is helpful for removing the first line from a string.

If you specify sep=’\n’ and maxsplit=1 , you can get a list of strings split by the first newline character \n . The second element [1] of this list is a string excluding the first line. Since it is the last element, it can also be specified as [-1] .

s_lines = 'one\ntwo\nthree\nfour' print(s_lines) # one # two # three # four print(s_lines.split('\n', 1)) # ['one', 'two\nthree\nfour'] print(s_lines.split('\n', 1)[0]) # one print(s_lines.split('\n', 1)[1]) # two # three # four print(s_lines.split('\n', 1)[-1]) # two # three # four 

Similarly, to delete the first two lines:

print(s_lines.split('\n', 2)[-1]) # three # four 

Split from right by delimiter: rsplit()

rsplit() splits from the right of the string.

The result differs from split() only when the maxsplit parameter is provided.

Similar to split() , if you want to remove the last line, use rsplit() .

s_lines = 'one\ntwo\nthree\nfour' print(s_lines.rsplit('\n', 1)) # ['one\ntwo\nthree', 'four'] print(s_lines.rsplit('\n', 1)[0]) # one # two # three print(s_lines.rsplit('\n', 1)[1]) # four 

To delete the last two lines:

print(s_lines.rsplit('\n', 2)[0]) # one # two 

Split by line break: splitlines()

There is also a splitlines() for splitting by line boundaries.

As shown in the previous examples, split() and rsplit() split the string by whitespace, including line breaks, by default. You can also specify line breaks explicitly using the sep parameter.

However, using splitlines() is often more suitable.

For example, split string that contains \n (LF, used in Unix OS including Mac) and \r\n (CR + LF, used in Windows OS).

s_lines_multi = '1 one\n2 two\r\n3 three\n' print(s_lines_multi) # 1 one # 2 two # 3 three 

By default, when split() is applied, it splits not only by line breaks but also by spaces.

print(s_lines_multi.split()) # ['1', 'one', '2', 'two', '3', 'three'] 

As sep allows specifying only one newline character, split() may not work as expected if the string contains mixed newline characters. It is also split at the end of the newline character.

print(s_lines_multi.split('\n')) # ['1 one', '2 two\r', '3 three', ''] 

splitlines() splits at various newline characters but not at other whitespaces.

print(s_lines_multi.splitlines()) # ['1 one', '2 two', '3 three'] 

If the first argument, keepends , is set to True , the result includes a newline character at the end of the line.

print(s_lines_multi.splitlines(True)) # ['1 one\n', '2 two\r\n', '3 three\n'] 

See the following article for other operations with line breaks.

Split by regex: re.split()

split() and rsplit() split only when sep matches completely.

If you want to split a string that matches a regular expression (regex) instead of perfect match, use the split() of the re module.

In re.split() , specify the regex pattern in the first parameter and the target character string in the second parameter.

Here’s an example of splitting a string by consecutive numbers:

import re s_nums = 'one1two22three333four' print(re.split('\d+', s_nums)) # ['one', 'two', 'three', 'four'] 

The maximum number of splits can be specified in the third parameter, maxsplit .

print(re.split('\d+', s_nums, 2)) # ['one', 'two', 'three333four'] 

Split by multiple different delimiters

These two examples are helpful to remember, even if you are not familiar with regex:

Enclose a string with [] to match any single character in it. You can split a string by multiple different characters.

s_marks = 'one-two+three#four' print(re.split('[-+#]', s_marks)) # ['one', 'two', 'three', 'four'] 

If patterns are delimited by | , it matches any pattern. Of course, it is possible to use special characters of regex for each pattern, but it is OK even if normal string is specified as it is. You can split by multiple different strings.

s_strs = 'oneXXXtwoYYYthreeZZZfour' print(re.split('XXX|YYY|ZZZ', s_strs)) # ['one', 'two', 'three', 'four'] 

Concatenate a list of strings

In the previous examples, you can split the string and get the list.

If you want to concatenate a list of strings into one string, use the string method, join() .

Call join() from ‘separator’ , and pass a list of strings to be concatenated.

l = ['one', 'two', 'three'] print(','.join(l)) # one,two,three print('\n'.join(l)) # one # two # three print(''.join(l)) # onetwothree 

See the following article for details of string concatenation.

Split based on the number of characters: slice

Use slice to split strings based on the number of characters.

s = 'abcdefghij' print(s[:5]) # abcde print(s[5:]) # fghij 

The split results can be obtained as a tuple or assigned to individual variables.

s_tuple = s[:5], s[5:] print(s_tuple) # ('abcde', 'fghij') print(type(s_tuple)) # s_first, s_last = s[:5], s[5:] print(s_first) # abcde print(s_last) # fghij 
s_first, s_second, s_last = s[:3], s[3:6], s[6:] print(s_first) # abc print(s_second) # def print(s_last) # ghij 

The number of characters can be obtained with the built-in function len() . You can also split a string into halves using this.

half = len(s) // 2 print(half) # 5 s_first, s_last = s[:half], s[half:] print(s_first) # abcde print(s_last) # fghij 

If you want to concatenate strings, use the + operator.

print(s_first + s_last) # abcdefghij 

Источник

Python .split() – Splitting a String in Python

Dionysia Lemonaki

Dionysia Lemonaki

Python .split() – Splitting a String in Python

In this article, you will learn how to split a string in Python.

Firstly, I’ll introduce you to the syntax of the .split() method. After that, you will see how to use the .split() method with and without arguments, using code examples along the way.

Here is what we will cover:

What Is The .split() Method in Python? .split() Method Syntax Breakdown

You use the .split() method for splitting a string into a list.

The general syntax for the .split() method looks something like the following:

string.split(separator, maxsplit) 
  • string is the string you want to split. This is the string on which you call the .split() method.
  • The .split() method accepts two arguments.
  • The first optional argument is separator , which specifies what kind of separator to use for splitting the string. If this argument is not provided, the default value is any whitespace, meaning the string will split whenever .split() encounters any whitespace.
  • The second optional argument is maxsplit , which specifies the maximum number of splits the .split() method should perform. If this argument is not provided, the default value is -1 , meaning there is no limit on the number of splits, and .split() should split the string on all the occurrences it encounters separator .

The .split() method returns a new list of substrings, and the original string is not modified in any way.

How Does The .split() Method Work Without Any Arguments?

Here is how you would split a string into a list using the .split() method without any arguments:

coding_journey = "I am learning to code for free with freeCodecamp!" # split string into a list and save result into a new variable coding_journey_split = coding_journey.split() print(coding_journey) print(coding_journey_split) # check the data type of coding_journey_split by using the type() function print(type(coding_journey_split)) # output # I am learning to code for free with freeCodecamp! # ['I', 'am', 'learning', 'to', 'code', 'for', 'free', 'with', 'freeCodecamp!'] #

The output shows that each word that makes up the string is now a list item, and the original string is preserved.

When you don’t pass either of the two arguments that the .split() method accepts, then by default, it will split the string every time it encounters whitespace until the string comes to an end.

What happens when you don’t pass any arguments to the .split() method, and it encounters consecutive whitespaces instead of just one?

coding_journey = "I love coding" coding_journey_split = coding_journey.split() print(coding_journey_split) # output # ['I', 'love', 'coding'] 

In the example above, I added consecutive whitespaces between the word love and the word coding . When this is the case, the .split() method treats any consecutive spaces as if they are one single whitespace.

How Does The .split() Method Work With The separator Argument?

As you saw earlier, when there is no separator argument, the default value for it is whitespace. That said, you can set a different separator .

The separator will break and divide the string whenever it encounters the character you specify and will return a list of substrings.

For example, you could make it so that a string splits whenever the .split() method encounters a dot, . :

fave_website = "www.freecodecamp.org" fave_website_split = fave_website.split(".") print(fave_website_split) # output # ['www', 'freecodecamp', 'org'] 

In the example above, the string splits whenever .split() encounters a .

Keep in mind that I didn’t specify a dot followed by a space. That wouldn’t work since the string doesn’t contain a dot followed by a space:

fave_website = "www.freecodecamp.org" fave_website_split = fave_website.split(". ") print(fave_website_split) # output # ['www.freecodecamp.org'] 

Now, let’s revisit the last example from the previous section.

When there was no separator argument, consecutive whitespaces were treated as if they were single whitespace.

However, when you specify a single space as the separator , then the string splits every time it encounters a single space character:

coding_journey = "I love coding" coding_journey_split = coding_journey.split(" ") print(coding_journey_split) # output # ['I', 'love', '', '', 'coding'] 

In the example above, each time .split() encountered a space character, it split the word and added the empty space as a list item.

How Does The .split() Method Work With The maxsplit Argument?

When there is no maxsplit argument, there is no specified limit for when the splitting should stop.

In the first example of the previous section, .split() split the string each and every time it encountered the separator until it reached the end of the string.

However, you can specify when you want the split to end.

For example, you could specify that the split ends after it encounters one dot:

fave_website = "www.freecodecamp.org" fave_website_split = fave_website.split(".", 1) print(fave_website_split) # output # ['www', 'freecodecamp.org'] 

In the example above, I set the maxsplit to 1 , and a list was created with two list items.

I specified that the list should split when it encounters one dot. Once it encountered one dot, the operation would end, and the rest of the string would be a list item on its own.

Conclusion

And there you have it! You now know how to split a string in Python using the .split() method.

I hope you found this tutorial helpful.

To learn more about the Python programming language, check out freeCodeCamp’s Python certification.

You’ll start from the basics and learn in an interactive and beginner-friendly way. You’ll also build five projects at the end to put into practice and help reinforce what you’ve learned.

Thank you for reading, and happy coding!

Источник

Читайте также:  Java на компьютере зачем нужен
Оцените статью