- String count with overlapping occurrences [closed]
- 25 Answers 25
- Look-ahead regular expressions
- Generate all substrings
- How to find a pattern in another string with overlapping
- Метод string count() в Python
- Параметры
- Возвращаемое значение
- Пример 1: Подсчитать количество вхождений данной подстроки
- Пример 2: Подсчитать количество появлений данной подстроки, используя начало и конец
String count with overlapping occurrences [closed]
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
What’s the best way to count the number of occurrences of a given string, including overlap in Python? This is one way:
def function(string, str_to_search_for): count = 0 for x in xrange(len(string) - len(str_to_search_for) + 1): if string[x:x+len(str_to_search_for)] == str_to_search_for: count += 1 return count function('1011101111','11')
The canonical is Count number of occurrences of a substring in a string, which covers how to count both overlapping and non-overlapping occurrences.
25 Answers 25
Well, this might be faster since it does the comparing in C:
def occurrences(string, sub): count = start = 0 while True: start = string.find(sub, start) + 1 if start > 0: count+=1 else: return count
>>> import re >>> text = '1011101111' >>> len(re.findall('(?=11)', text)) 5
If you didn’t want to load the whole list of matches into memory, which would never be a problem! you could do this if you really wanted:
>>> sum(1 for _ in re.finditer('(?=11)', text)) 5
As a function ( re.escape makes sure the substring doesn’t interfere with the regex):
def occurrences(text, sub): return len(re.findall('(?=)'.format(re.escape(sub)), text))
Could you clarify why it would never be a problem? In practice, if there were a lot of matches or the matched substrings were very large, it could cause a lot of memory usage, no?
@wjandrea I think maybe I should have said «which would probably never be a problem» because I provided both the iter and non iter solution. But in this case the text is already in memory so I thought that getting the list would be fine
You can also try using the new Python regex module, which supports overlapping matches.
import regex as re def count_overlapping(text, search_for): return len(re.findall(search_for, text, overlapped=True)) count_overlapping('1011101111','11') # 5
There’s nothing wrong with import regex — documentation shows that approach. regex , however, has all the same components as the standard library re , so I prefer writing re.compile , etc. in way that is familiar and concise. Also, most of the time I end up using regex I started with re and then found some use case I want to rely on regex . I can then update import re to import regex as re at the top of the file and not have to make other changes.
Python’s str.count counts non-overlapping substrings:
In [3]: "ababa".count("aba") Out[3]: 1
Here are a few ways to count overlapping sequences, I’m sure there are many more 🙂
Look-ahead regular expressions
In [10]: re.findall("a(?=ba)", "ababa") Out[10]: ['a', 'a']
Generate all substrings
In [11]: data = "ababa" In [17]: sum(1 for i in range(len(data)) if data.startswith("aba", i)) Out[17]: 2
def count_substring(string, sub_string): count = 0 for pos in range(len(string)): if string[pos:].startswith(sub_string): count += 1 return count
This could be the easiest way.
A fairly pythonic way would be to use list comprehension here, although it probably wouldn’t be the most efficient.
sequence = 'abaaadcaaaa' substr = 'aa' counts = sum([ sequence.startswith(substr, i) for i in range(len(sequence)) ]) print(counts) # 5
The list would be [False, False, True, False, False, False, True, True, False, False] as it checks all indexes through the string, and because int(True) == 1 , sum gives us the total number of matches.
s = "bobobob" sub = "bob" ln = len(sub) print(sum(sub == s[i:i+ln] for i in xrange(len(s)-(ln-1))))
How to find a pattern in another string with overlapping
This function (another solution!) receive a pattern and a text. Returns a list with all the substring located in the and their positions.
def occurrences(pattern, text): """ input: search a pattern (regular expression) in a text returns: a list of substrings and their positions """ p = re.compile('(?=())'.format(pattern)) matches = re.finditer(p, text) return [(match.group(1), match.start()) for match in matches] print (occurrences('ana', 'banana')) print (occurrences('.ana', 'Banana-fana fo-fana'))
My answer, to the bob question on the course:
s = 'azcbobobegghaklbob' total = 0 for i in range(len(s)-2): if s[i:i+3] == 'bob': total += 1 print 'number of times bob occurs is: ', total
Here is my edX MIT «find bob»* solution (*find number of «bob» occurences in a string named s), which basicaly counts overlapping occurrences of a given substing:
s = 'azcbobobegghakl' count = 0 while 'bob' in s: count += 1 s = s[(s.find('bob') + 2):] print "Number of times bob occurs is: <>".format(count)
An alternative very close to the accepted answer but using while as the if test instead of including if inside the loop:
def countSubstr(string, sub): count = 0 while sub in string: count += 1 string = string[string.find(sub) + 1:] return count;
This avoids while True: and is a little cleaner in my opinion
If strings are large, you want to use Rabin-Karp, in summary:
- a rolling window of substring size, moving over a string
- a hash with O(1) overhead for adding and removing (i.e. move by 1 char)
- implemented in C or relying on pypy
That can be solved using regex.
import re def function(string, sub_string): match = re.findall('(?='+sub_string+')',string) return len(match)
def count_substring(string, sub_string): counter = 0 for i in range(len(string)): if string[i:].startswith(sub_string): counter = counter + 1 return counter
Above code simply loops throughout the string once and keeps checking if any string is starting with the particular substring that is being counted.
>>> import re >>> re.subn('(?=11)', '', '1011101111')[1] 5
Solution with replaced parts of the string
s = 'lolololol' t = 0 t += s.count('lol') s = s.replace('lol', 'lo1') t += s.count('1ol') print("Number of times lol occurs is:", t)
def count_overlaps (string, look_for): start = 0 matches = 0 while True: start = string.find (look_for, start) if start < 0: break start += 1 matches += 1 return matches print count_overlaps ('abrabra', 'abra')
Function that takes as input two strings and counts how many times sub occurs in string, including overlaps. To check whether sub is a substring, I used the in operator.
def count_Occurrences(string, sub): count=0 for i in range(0, len(string)-len(sub)+1): if sub in string[i:i+len(sub)]: count=count+1 print 'Number of times sub occurs in string (including overlaps): ', count
For a duplicated question i've decided to count it 3 by 3 and comparing the string e.g.
counted = 0 for i in range(len(string)): if string[i*3:(i+1)*3] == 'xox': counted = counted +1 print counted
This is another example of using str.find() but a lot of the answers make it more complicated than necessary:
def occurrences(text, sub): c, n = 0, text.find(sub) while n != -1: c += 1 n = text.find(sub, n+1) return c In []: occurrences('1011101111', '11') Out[]: 5
sequence = '1011101111' sub = "11"
sum(x == tuple(sub) for x in zip(sequence, sequence[1:])) # 5
windows = zip(*([sequence[i:] for i, _ in enumerate(sequence)][:len(sub)])) sum(x == tuple(sub) for x in windows) # 5
import itertools as it iter_ = (sequence[i:] for i, _ in enumerate(sequence)) windows = zip(*(it.islice(iter_, None, len(sub)))) sum(x == tuple(sub) for x in windows)
Alternative
import more_itertools as mit len(list(mit.locate(sequence, pred=lambda *args: args == tuple(sub), window_size=len(sub)))) # 5
A simple way to count substring occurrence is to use count() :
You can use replace () to find overlapping strings if you know which part will be overlap:
>>> s = 'bobob' >>> s.replace('b', 'bb').count('bob') 2
Note that besides being static, there are other limitations:
>>> s = 'aaa' >>> count('aa') # there must be two occurrences 1 >>> s.replace('a', 'aa').count('aa') 3
def occurance_of_pattern(text, pattern): text_len , pattern_len = len(text), len(pattern) return sum(1 for idx in range(text_len - pattern_len + 1) if text[idx: idx+pattern_len] == pattern)
I wanted to see if the number of input of same prefix char is same postfix, e.g., "foo" and """foo"" but fail on """bar"" :
from itertools import count, takewhile from operator import eq # From https://stackoverflow.com/a/15112059 def count_iter_items(iterable): """ Consume an iterable not reading it into memory; return the number of items. :param iterable: An iterable :type iterable: ```Iterable``` :return: Number of items in iterable :rtype: ```int``` """ counter = count() deque(zip(iterable, counter), maxlen=0) return next(counter) def begin_matches_end(s): """ Checks if the begin matches the end of the string :param s: Input string of length > 0 :type s: ```str``` :return: Whether the beginning matches the end (checks first match chars :rtype: ```bool``` """ return (count_iter_items(takewhile(partial(eq, s[0]), s)) == count_iter_items(takewhile(partial(eq, s[0]), s[::-1])))
Метод string count() в Python
Метод count() строки возвращает количество вхождений подстроки в заданной строке.
Проще говоря, метод ищет подстроку в заданной строке и возвращает, сколько раз подстрока присутствует в ней.
Также требуются необязательные параметры start и end, чтобы указать соответственно начальную и конечную позиции в строке.
string.count(substring, start=. end=. )
Параметры
- substring ‒ строка, количество которой нужно найти.
- start (необязательно) ‒ начальный индекс в строке, с которой начинается поиск.
- end (необязательно) ‒ конечный индекс в строке, где заканчивается поиск.
Примечание: Индекс в Python начинается с 0, а не с 1.
Возвращаемое значение
Команда возвращает количество вхождений подстроки в заданной строке.
Пример 1: Подсчитать количество вхождений данной подстроки
# define string string = "Python is awesome, isn't it?" substring = "is" count = string.count(substring) # print count print("The count is:", count)
Пример 2: Подсчитать количество появлений данной подстроки, используя начало и конец
# define string string = "Python is awesome, isn't it?" substring = "i" # count after first 'i' and before the last 'i' count = string.count(substring, 8, 25) # print count print("The count is:", count)
Здесь подсчет начинается после того, как будет обнаружен первый i, то есть седьмая позиция индекса. И он заканчивается перед последним i, то есть 25-й позицией индекса.