Suffix trees in python

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Suffix tree for string searching

License

kvh/Python-Suffix-Tree

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Читайте также:  Change array in function php

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Based off of Mark Nelson’s C++ implementation of Ukkonen’s algorithm. Ukkonen’s algorithm gives a O(n) + O(k) contruction time for a suffix tree, where n is the length of the string and k is the size of the alphabet of that string. Ukkonen’s is an online algorithm, processing the input sequentially and producing a valid suffix tree at each character.

string = "I need to be searched!" tree = SuffixTree(string) index_of_need = tree.find_substring("need") 

This library is mostly an academic exercise. If you need an efficient library I would recommend a python-wrapped c implementation, such as this one.

About

Suffix tree for string searching

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A Generalized Suffix Tree for any Python iterable using Ukkonen’s algorithm, with Lowest Common Ancestor retrieval.

License

cceh/suffix-tree

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.rst

A Generalized Suffix Tree

A Generalized Suffix Tree for any Python sequence, with Lowest Common Ancestor retrieval.

>>> from suffix_tree import Tree >>> tree = Tree("A": "xabxac">) >>> tree.find("abx") True >>> tree.find("abc") False
  • works with any Python sequence, not just strings, if the items are hashable,
  • is a generalized suffix tree for sets of sequences,
  • is implemented in pure Python,
  • builds the tree in time proportional to the length of the input,
  • does constant-time Lowest Common Ancestor retrieval.

Being implemented in Python this tree is not very fast nor memory efficient. The building of the tree takes time proportional to the length of the string of symbols. The query time is proportional to the length of the query string.

To get the best performance turn the python optimizer on: python -O.

>>> from suffix_tree import Tree >>> tree = Tree() >>> tree.add(1, "xabxac") >>> tree.add(2, "awyawxawxz") >>> tree.find("abx") True >>> tree.find("awx") True >>> tree.find("abc") False
>>> tree = Tree("A": "xabxac", "B": "awyawxawxz">) >>> tree.find_id("A", "abx") True >>> tree.find_id("B", "abx") False >>> tree.find_id("B", "awx") True
>>> tree = Tree( . < . "A": "sandollar", . "B": "sandlot", . "C": "handler", . "D": "grand", . "E": "pantry", . > . ) >>> for k, length, path in tree.common_substrings(): . print(k, length, path) . 2 4 s a n d 3 3 a n d 4 3 a n d 5 2 a n
>>> tree = Tree("A": "xabxac", "B": "awyawxawxz">) >>> for C, path in sorted(tree.maximal_repeats()): . print(C, path) . 1 a w 1 a w x 2 a 2 x 2 x a

About

A Generalized Suffix Tree for any Python iterable using Ukkonen’s algorithm, with Lowest Common Ancestor retrieval.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A suffix tree implementation in Python

XLuyu/suffix_tree

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

A suffix tree implementation in Python

Ukkonen E. On-line construction of suffix trees[J]. Algorithmica, 1995, 14(3): 249-260. https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf

Notice: In this paper, the template use 1-based indexing. But here I use 0-based indexing.

st = SuffixTree() # create an empty suffix tree st.append('AAATGATCATCAACC') # feed the tree a string to construct st.append('ACAACAGCCAGG') # feed more if you like print st.match_pattern_suffix('CATCAACCACAACAGCCAGGTTGTAGGCGA')

This code utilizes graphviz to visualize constructed tree and matching process.

  1. Install graphviz and make sure it is in PATH
    (type «dot -version» in command line/shell to check whether it prints version info)
  2. Install graphviz python binding by pip install graphviz
  3. In the code, you can call st.plot_tree() to visualize it after construction.
    *The generated .png files are in the visualization subdirectory»
  4. You can also match_pattern_suffix(pattern,visual=True) to generate image files for each step in matching. The red node in image indicates current state, the string in the red node indicates matched part on some edge from this node

Источник

Оцените статью