Python is a very flexible language. In addition to the language's flexibility, it also has several functions and libraries that can be employed to accomplish a task in one or two lines of code that would otherwise take several lines. It is because there are several ways to accomplish a task, I thought it would be a good idea to illustrate this by going over four ways that a programmer can use to count the number of words in a sentence.
The most simplistic way to count the words in a sentence is to use Python's inbuilt split() function. The split function separates a string into a list.
The pseudocode for the first function that can count the number of words in a string is:
- Define a string, S, that will be used in the program.
- Define the function, count_words, that will count the words in the string.
- The function uses Python's inbuilt split() function, which splits a string into a list.
- Define the variable, num_words, which uses the split() function to split the string up, convert it to a list, and count the number of elements in the list.
- When the function is completed, it returns the variable, num_words.
- The original string is printed out.
- The number of words, as indicated by the variable, num_words, is also printed out.
The code for this algorithm can be found in the screenshot below:
The algorithm above does not exclude special characters from the word count, so if a string has special characters that should not be included in the word count, another algorithm will need to be used.
The next algorithm can be used to count the number of words in a sentence, excluding special characters, utilises a library in Python, re. Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialised programming language embedded inside Python and made available through the re module.
The pseudocode for the algorithm below is as follows:
- Import the library, re.
- Define the string, S.
- Define the function, count_words_re, which will serve the purpose of counting the words in the string.
- The function re.findall breaks the string up into elements. Python's len function counts the length of the list and posits this information in the variable, num_words.
- When the function is complete, it returns the element num_words.
- Print the original sentence.
- Print off the number of words in the original sentence.
The code for the solution using the library, re, can be found in the screenshot below:
There is another way that special characters can be removed from a string by using string, being a built-in set of methods that can be used on strings.
The pseudocode for this methodology is cited as follows:-
- Import the library, string.
- Define the string, S.
- Define the function, count_words_string, which serves the purpose of counting the words in the string.
- Define the variable, num_words, which sums up the number of elements in a list, excluding special characters, after they have been split.
- When the function is completed, it returns the variable, num_words.
- Print the original sentence.
- Print the number of words in the sentence by calling up the function.
The code for the script can be found in the screenshot below:
Although Python has a number of functions that can be used to count the words in a sentence, it might be nice to try the solution out using a for loop.
The pseudocode for solving this problem using a for loop is as follows:
- Import the library, re
- Define the variable, S, which will be the string.
- Defined the function, count_words_loop, which takes the string as its input and will count the number of words in a string.
- In the function, remove the extra spaces in the string by using the function, re.sub.
- Print out the original sentence.
- Define the variable, num_words, and set it to 0.
- If the string has a space, new line, tab, or full stop, it is deemed to be the start of a new word, so num_words is incremented by 1.
- When the iterations are complete, the function will finish and the variable, num_words, is returned.
- Print out the number of words in the sentence.
The code for the pseudocode, written above, can be found in the screenshot below:
The example of the for loop that was mentioned above does not remove any special characters, although it does remove extra spaces. If special characters need to be removed from the sentence before the words are counted, then the code below needs to be used for this purpose:
To summarise, I have included five pieces of code that can count the words in a sentence and even remove special characters if needed. I know that there are more ways to count words, and I may even receive a message reflecting this, but I have given enough code to give the prospective programmer a quick start.
I did receive a message discussing O notation, which discusses how quickly a function or a program can execute. Perhaps one day soon I will post something on O notation because that too is something that interviewers look for. Not only do they want an applicant to solve the problem, but they also want the program to execute quickly, which is where O notation comes in.