Count number words python

3 ways in Python to count the words and characters of a string

In this Python programming tutorial, we will learn how to count the total number of words and characters in a given string. The program will take the string as input from the user, count the words and characters, and then it will print out the result.

Method 1: By splitting the string:

With this approach, we will split the string into words to count the total number of words. The following algorithm will use in this program:

  1. Ask the user to enter a string.
  2. Split the string into words and save it in a variable.
  3. Calculate the total number of words, i.e. the total count of words in the string.
  4. Iterate over the words of the string one by one. For each word, count the number of characters and add that value to a final word counter variable. We need to initialize this variable as 0 at the start of the program.
  5. Print out the calculated total count of words and characters to the user.

Python program to count the words and characters in a string by splitting the string:

# 1 word_count = 0 char_count = 0 # 2 usr_input = input("Enter a string : ") # 3 split_string = usr_input.split() # 4 word_count = len(split_string) # 5 for word in split_string: # 6 char_count += len(word) # 7 print("Total words : <>".format(word_count)) print("Total characters : <>".format(char_count))

python count words and characters in string

The commented numbers in the above program denote the step numbers below:

  1. Create two variables to store the final word count and the character count. The variables are word_count and char_count . Initialize these variables as 0.
  2. Ask the user to enter a string. Read and assign the string to the usr_input variable.
  3. Split the user-provided string with the split() method. It will split the string into words and return one list with these words.
  4. Count the total number of words in the list. Assign this to the variable word_count .
  5. By using a for loop, we are iterating over the words of the split_string .
  6. While iterating over the list, add the length of each word to the variable char_count . We are using the len() method to find the length of each word.
  7. The word count is assigned to the word_count variable, and the character count is assigned to the char_count variable. Print out these values to the user.
: Hello world Total words : 2 Total characters : 10 Enter a string : The quick brown fox jumps over the lazy dog Total words : 9 Total characters : 35 Enter a string : abc def gh ij k l mno pqr stu vwx yz Total words : 11 Total characters : 26 Enter a string : a b c d e f g h i j k l m n o p q r s t u v w x y z Total words : 26 Total characters : 26

python count words and characters in string

Method 2: With list comprehension:

Читайте также:  Css and media types

The above example is using a for loop to calculate the length of the words. With list comprehension, we can calculate the character count of each word in a simpler and more concise way. Let’s write the program:

= 0 char_count = 0 last_word_space = False usr_input = input("Enter a string : ") word_list = usr_input.split() word_count = len(word_list) char_count = sum([len(c) for c in word_list]) print("Total words : <>".format(word_count)) print("Total characters : <>".format(char_count))

Here, we are calculating the word count by using the split() and the len() methods. The character count is calculated by adding the length of each word in the list. The word count and character count values are assigned to the word_count and char_count variables respectively. If you run this program, it will print similar output.

: a bc de Total words : 3 Total characters : 5

Method 3: By using regular expression:

With a regular expression, we can filter out the words of a string and get one list of all the words. The total number of words and the total number of characters can be calculated easily if we filter out the words. For example:

import re word_count = 0 char_count = 0 last_word_space = False usr_input = input("Enter a string : ") word_list = re.findall(r"\w+", usr_input) word_count = len(word_list) char_count = sum([len(c) for c in word_list]) print("Total words : <>".format(word_count)) print("Total characters : <>".format(char_count))
  • The program is using the re module to use the regular expression.
  • The findall method is finding all the words of the string with the regular expression. It returns one list of words and it is assigned to the word_list variable.
  • The total word count is equal to the length of the word_list . The len() method is used to find the length of the word list.
  • The character count is calculated by finding the sum of the lengths of the words in the list. It is assigned to the char_count variable.

If you run this program, it will print similar output.

: a bc fl lk Total words : 4 Total characters : 7

In this tutorial, we have learned how to find the count of words and characters in a string in Python. It is one of the most commonly used programs in Python or any other programming language. You can use a similar approach to count the total number of words and characters in a file.

Источник

Python: Count Words in a String or File

How to Use Python to Count Words and Generate Word Frequencies Cover Image

In this tutorial, you’ll learn how to use Python to count the number of words and word frequencies in both a string and a text file. Being able to count words and word frequencies is a useful skill. For example, knowing how to do this can be important in text classification machine learning algorithms.

By the end of this tutorial, you’ll have learned:

  • How to count the number of words in a string
  • How to count the number of words in a text file
  • How to calculate word frequencies using Python

Reading a Text File in Python

The processes to count words and calculate word frequencies shown below are the same for whether you’re considering a string or an entire text file. Because of this, this section will briefly describe how to read a text file in Python.

If you want a more in-depth guide on how to read a text file in Python, check out this tutorial here. Here is a quick piece of code that you can use to load the contents of a text file into a Python string:

# Reading a Text File in Python file_path = '/Users/datagy/Desktop/sample_text.txt' with open(file_path) as file: text = file.read()

I encourage you to check out the tutorial to learn why and how this approach works. However, if you’re in a hurry, just know that the process opens the file, reads its contents, and then closes the file again.

Count Number of Words In Python Using split()

One of the simplest ways to count the number of words in a Python string is by using the split() function. The split function looks like this:

# Understanding the split() function str.split( sep=None # The delimiter to split on maxsplit=-1 # The number of times to split )

By default, Python will consider runs of consecutive whitespace to be a single separator. This means that if our string had multiple spaces, they’d only be considered a single delimiter. Let’s see what this method returns:

# Splitting a string with .split() text = 'Welcome to datagy! Here you will learn Python and data science.' print(text.split()) # Returns: ['Welcome', 'to', 'datagy!', 'Here', 'you', 'will', 'learn', 'Python', 'and', 'data', 'science.']

We can see that the method now returns a list of items. Because we can use the len() function to count the number of items in a list, we’re able to generate a word count. Let’s see what this looks like:

# Counting words with .split() text = 'Welcome to datagy! Here you will learn Python and data science.' print(len(text.split())) # Returns: 11

Count Number of Words In Python Using Regex

Another simple way to count the number of words in a Python string is to use the regular expressions library, re . The library comes with a function, findall() , which lets you search for different patterns of strings.

Because we can use regular expression to search for patterns, we must first define our pattern. In this case, we want patterns of alphanumeric characters that are separated by whitespace.

For this, we can use the pattern \w+ , where \w represents any alphanumeric character and the + denotes one or more occurrences. Once the pattern encounters whitespace, such as a space, it will stop the pattern there.

Let’s see how we can use this method to generate a word count using the regular expressions library, re :

# Counting words with regular expressions import re text = 'Welcome to datagy! Here you will learn Python and data science.' print(len(re.findall(r'\w+', text))) # Returns: 11

Calculating Word Frequencies in Python

In order to calculate word frequencies, we can use either the defaultdict class or the Counter class. Word frequencies represent how often a given word appears in a piece of text.

Using defaultdict To Calculate Word Frequencies in Python

Let’s see how we can use defaultdict to calculate word frequencies in Python. The defaultdict extend on the regular Python dictionary by providing helpful functions to initialize missing keys.

Because of this, we can loop over a piece of text and count the occurrences of each word. Let’s see how we can use it to create word frequencies for a given string:

# Creating word frequencies with defaultdict from collections import defaultdict import re text = 'welcome to datagy! datagy will teach data. data is fun. data data data!' counts = defaultdict(int) for word in re.findall('\w+', text): counts[word] += 1 print(counts) # Returns: # defaultdict(, )

Let’s break down what we did here:

  1. We imported both the defaultdict function and the re library
  2. We loaded some text and instantiated a defaultdict using the int factory function
  3. We then looped over each word in the word list and added one for each time it occurred

Using Counter to Create Word Frequencies in Python

Another way to do this is to use the Counter class. The benefit of this approach is that we can even easily identify the most frequent word. Let’s see how we can use this approach:

# Creating word frequencies with Counter from collections import Counter import re text = 'welcome to datagy! datagy will teach data. data is fun. data data data!' counts = Counter(re.findall('\w+', text)) print(counts) # Returns: # Counter()

Let’s break down what we did here:

  1. We imported our required libraries and classes
  2. We passed the resulting list from the findall() function into the Counter class
  3. We printed the result of this class

One of the perks of this is that we can easily find the most common word by using the .most_common() function. The function returns a sorted list of tuples, ordering the items from most common to least common. Because of this, we can simply access the 0th index to find the most common word:

# Finding the Most Common Word from collections import Counter import re text = 'welcome to datagy! datagy will teach data. data is fun. data data data!' counts = Counter(re.findall('\w+', text)) print(counts.most_common()[0]) # Returns: # ('data', 5)

Conclusion

In this tutorial, you learned how to generate word counts and word frequencies using Python. You learned a number of different ways to count words including using the .split() method and the re library. Then, you learned different ways to generate word frequencies using defaultdict and Counter . Using the Counter method, you were able to find the most frequent word in a string.

Additional Resources

To learn more about related topics, check out the tutorials below:

Источник

Оцените статью