- Saved searches
- Use saved searches to filter your results more quickly
- License
- darrso/parse_channels
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Saved searches
- Use saved searches to filter your results more quickly
- mmat16/telegram_channel_parser
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- Saved searches
- Use saved searches to filter your results more quickly
- Miracle-Aligner/telegram-channel-scraper
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Userbot and Telegram bot for parsing channels
License
darrso/parse_channels
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Parse channels telegram bot
A bot that allows you to parse telegram channels using a user bot
- parse_channels/bot/python/config.py — add Bot Token(@BotFather), admin id(your id), chat id(all posts will be forwarded to this chat), admin chat name.
- parse_channels/user_bot/grab_config.py — add api id and api hash from https://my.telegram.org
Once you’ve edited everything, run main.py from bot and grab_pyrogram.py from user_bot
About
Userbot and Telegram bot for parsing channels
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
mmat16/telegram_channel_parser
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Для того чтобы пользоваться данным парсером вам необходимо установить Python3 и несколько сторонних библиотек с помощью данной команды:
На Windows: pip install -r requirements.txt
На Mac OS и Linux: pip3 install -r requirements.txt
Так же вам понадобится зарегистрировать собственное приложение Telegram. Для этого надо зайти на сайт https://my.telegram.org/apps, зайти в свою учётную запись Telegram и создать приложение (Create new application). Следует указать:
- App title — название приложения (неважно какое)
- Short name — сокращённое название (только буквы и цифры, 5-32 символа)
- Platform — указать Other
Остальные поля можно оставить пустыми. Нажать кнопку Create application. В этот момент зачастую Telegram не пускает вас дальше по непонятным причинам, но главное не сдаваться. Иногда помогает прокликивание без изменения данных, иногда надо поменять App title или Short name. После того как ваше приложение будет зарегистрировано откроется следующая страница на которой будут указаны регистрационные данные вашего приложения. Стоит сохранить все данные в надёжном месте, но для работы парсера вам понадобятся графы App api_id и App api_hash. Их надо вставить в одноимённые переменные в файле config.py.
После установки библиотек и регистрации приложения, парсером можно пользоваться. Для этого:
- зайдите в директорию с исходным кодом и вызовите парсер командой «python3 parser.py»
- при первом запуске будет необходимо подтвердить вход через Telegram (двухфакторную аутентификацию лучше отключить на это время):
- в консоли появится сообщение, после которого надо ввести номер телефона, привязанный к Telegram
- после следующего сообщения ввести код подтверждения Telegram
После получения ссылки сразу же начнётся сбор сообщений.
В директории со скриптом появится папка с айди канала и журнал с расширением .log куда будут заносится отметки о работе скрипта. Внутри папки канала начнут появляться папки с названиями, соответствующими айди сообщения, а в них будет находится текстовый файл с текстом сообщения и зашитыми в него гиперссылками, а так же текстовый файл с «метаданными» — ссылкой на сообщение и датой и временем его отправки. Так же если к сообщению были приложены какие-либо медиа — они будут загружены в ту же папку.По умолчанию (при первом запуске) скрипт будет собирать сообщения за последние три месяца. Если же при повторном запуске в директории скрипта будет находится папка с ранне собранными сообщениями канала, то собраны будут только новые сообщения.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
This project provides a simple scraping tool for telegram channels’ text and metadata.
Miracle-Aligner/telegram-channel-scraper
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This project provides a simple scraping tool for telegram channels’ text and metadata.
- Install Selenium. Comprehensive installation guide can be found here.
- Install required packages using pip install -r requirements.txt
- Import Channel parser:
from utils.channel_parser import ChannelParser - Create ChannelParser instance:
parser = ChannelParser(channel_name, start_date) - Perform scraping:
parser.scrape() - Save result:
parser.save_json(path_to_save)
Usage example also can be found in the demo.
class ChannelParser( channel_name, start_date, finish_date=None, timezone='Europe/Kiev', get_media=True, get_text=True, get_meta=True )
Provides methods to scrape telegram channel texts + metadata, save a result in JSON.
Params:
channel_name : Name of telegram channel to parse
start_date : The date of the oldest massages to scrap
finish_date : The date of the newest messages to scrap. Default is None in that case scraps to the newest one.
timezone : Preferable timezone. The list of acceptable timezones corresponds to pytz.all_timezones
get_media : Boolean. If True, collects meta-data about photos and videos in post.
get_text : Boolean. If True, collects posts’ texts.
get_meta : Boolean. If True, collects meta-data about post, which includes: date, views,
is reply / is forward / is edited flags.Saves scraping_result object into JSON file.
:param path : Absolute file path for saving
:return: String, describing save status.Scrapes telegram channel content according to user-given channel name, flags and starting date.
:return: Scraped result in a form of a list
Result can be presented either in a form of a list or in a form of JSON file.
The data that will be scraped according to flags you provided to ChannelParser constructor:
text : text of particular post.
lang : language code according to polyglot module. If the language is undetectable equals «unknown».has_photo : boolean flag. True if post has attached photo.
photo_urls : list of links to attached photos.
has_video : boolean flag. True if post has attached video.
videos_meta : list of meta information of attached videos.
length : length of attached video.
thumbnail_link : link to thumbnail of attached video.channel_url : link to scraped channel.
post_id : post’s unique for particular channel id.
datetime : date of publication in a datetime format.
views : string wit a views quantity.
is_reply : boolean flag. True if post is a reply.
reply_to : link to post that was replied to.
is_forwarded : boolean flag. True if post is forwarded.
forwarded_from : link to the original post.
is_edited : boolean flag. True if post is edited.The result presented in a form of JSON file with following JSON Schema:
< "type": "array", "items": [ < "type": "object", "properties": < "channel_url": < "type": "string" >, "post_id": < "type": "string" >, "has_photo": < "type": "boolean" >, "photo_urls": < "type": "array", "items": <>>, "has_video": < "type": "boolean" >, "videos_meta": < "type": "array", "items": [< "length": < "type": "string" >, "thumbnail_link": < "type": "string" >, >] >, "text": < "type": "string" >, "lang": < "type": "string" >, "datetime": < "type": "string" >, "views": < "type": "string" >, "is_reply": < "type": "boolean" >, "reply_to": < "type": "string" >, "is_forwarded": < "type": "boolean" >, "forwarded_from": < "type": "string" >, "is_edited": < "type": "boolean" >> > ] >