- Saved searches
- Use saved searches to filter your results more quickly
- License
- kvh/Python-Suffix-Tree
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- License
- cceh/suffix-tree
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.rst
- About
- Saved searches
- Use saved searches to filter your results more quickly
- XLuyu/suffix_tree
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Suffix tree for string searching
License
kvh/Python-Suffix-Tree
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Based off of Mark Nelson’s C++ implementation of Ukkonen’s algorithm. Ukkonen’s algorithm gives a O(n) + O(k) contruction time for a suffix tree, where n is the length of the string and k is the size of the alphabet of that string. Ukkonen’s is an online algorithm, processing the input sequentially and producing a valid suffix tree at each character.
string = "I need to be searched!" tree = SuffixTree(string) index_of_need = tree.find_substring("need")
This library is mostly an academic exercise. If you need an efficient library I would recommend a python-wrapped c implementation, such as this one.
About
Suffix tree for string searching
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A Generalized Suffix Tree for any Python iterable using Ukkonen’s algorithm, with Lowest Common Ancestor retrieval.
License
cceh/suffix-tree
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.rst
A Generalized Suffix Tree
A Generalized Suffix Tree for any Python sequence, with Lowest Common Ancestor retrieval.
>>> from suffix_tree import Tree >>> tree = Tree("A": "xabxac">) >>> tree.find("abx") True >>> tree.find("abc") False
- works with any Python sequence, not just strings, if the items are hashable,
- is a generalized suffix tree for sets of sequences,
- is implemented in pure Python,
- builds the tree in time proportional to the length of the input,
- does constant-time Lowest Common Ancestor retrieval.
Being implemented in Python this tree is not very fast nor memory efficient. The building of the tree takes time proportional to the length of the string of symbols. The query time is proportional to the length of the query string.
To get the best performance turn the python optimizer on: python -O.
>>> from suffix_tree import Tree >>> tree = Tree() >>> tree.add(1, "xabxac") >>> tree.add(2, "awyawxawxz") >>> tree.find("abx") True >>> tree.find("awx") True >>> tree.find("abc") False
>>> tree = Tree("A": "xabxac", "B": "awyawxawxz">) >>> tree.find_id("A", "abx") True >>> tree.find_id("B", "abx") False >>> tree.find_id("B", "awx") True
>>> tree = Tree( . < . "A": "sandollar", . "B": "sandlot", . "C": "handler", . "D": "grand", . "E": "pantry", . > . ) >>> for k, length, path in tree.common_substrings(): . print(k, length, path) . 2 4 s a n d 3 3 a n d 4 3 a n d 5 2 a n
>>> tree = Tree("A": "xabxac", "B": "awyawxawxz">) >>> for C, path in sorted(tree.maximal_repeats()): . print(C, path) . 1 a w 1 a w x 2 a 2 x 2 x a
About
A Generalized Suffix Tree for any Python iterable using Ukkonen’s algorithm, with Lowest Common Ancestor retrieval.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A suffix tree implementation in Python
XLuyu/suffix_tree
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
A suffix tree implementation in Python
Ukkonen E. On-line construction of suffix trees[J]. Algorithmica, 1995, 14(3): 249-260. https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
Notice: In this paper, the template use 1-based indexing. But here I use 0-based indexing.
st = SuffixTree() # create an empty suffix tree st.append('AAATGATCATCAACC') # feed the tree a string to construct st.append('ACAACAGCCAGG') # feed more if you like print st.match_pattern_suffix('CATCAACCACAACAGCCAGGTTGTAGGCGA')
This code utilizes graphviz to visualize constructed tree and matching process.
- Install graphviz and make sure it is in PATH
(type «dot -version» in command line/shell to check whether it prints version info) - Install graphviz python binding by pip install graphviz
- In the code, you can call st.plot_tree() to visualize it after construction.
*The generated .png files are in the visualization subdirectory» - You can also match_pattern_suffix(pattern,visual=True) to generate image files for each step in matching. The red node in image indicates current state, the string in the red node indicates matched part on some edge from this node