- google 3.0.0
- История выпусков Уведомления о выпусках | Лента RSS
- Загрузка файлов
- Source Distribution
- Built Distribution
- Хеши для google-3.0.0.tar.gz
- Хеши для google-3.0.0-py2.py3-none-any.whl
- Помощь
- О PyPI
- Внесение вклада в PyPI
- Использование PyPI
- yagooglesearch 1.7.0
- Навигация
- Ссылки проекта
- Статистика
- Метаданные
- Сопровождающие
- Описание проекта
- yagooglesearch — Yet another googlesearch
- Overview
- Terms and Conditions
- Installation
- pip
- setup.py
- Usage
- HTTP 429 detection and recovery (optional)
- HTTP and SOCKS5 proxy support
- HTTPS proxies and SSL/TLS certificates
- &tbs= URL filter clarification
- Verbatim search
- time-based filters
- Limitations
- License
- Contact
- Acknowledgements
- googlesearch
- Prerequisites
- Installing
- Option 1: From PyPI
- Option 2: From Git
- Usage
- Search
- How to use
- SearchResultElement
- Extra
- Exceptions
- Deployment
- Built With
- Authors
- License
google 3.0.0
История выпусков Уведомления о выпусках | Лента RSS
Загрузка файлов
Загрузите файл для вашей платформы. Если вы не уверены, какой выбрать, узнайте больше об установке пакетов.
Source Distribution
Uploaded 11 июл. 2020 г. source
Built Distribution
Uploaded 11 июл. 2020 г. py2 py3
Хеши для google-3.0.0.tar.gz
Алгоритм | Хеш-дайджест | |
---|---|---|
SHA256 | 143530122ee5130509ad5e989f0512f7cb218b2d4eddbafbad40fd10e8d8ccbe | Копировать |
MD5 | cd61f970f40babca924f2760b467ed63 | Копировать |
BLAKE2b-256 | 8997b49c69893cddea912c7a660a4b6102c6b02cd268f8c7162dd70b7c16f753 | Копировать |
Хеши для google-3.0.0-py2.py3-none-any.whl
Алгоритм | Хеш-дайджест | |
---|---|---|
SHA256 | 889cf695f84e4ae2c55fbc0cfdaf4c1e729417fa52ab1db0485202ba173e4935 | Копировать |
MD5 | 30c719790d3e7e57e9482d108eda355c | Копировать |
BLAKE2b-256 | ac3517c9141c4ae21e9a29a43acdfd848e3e468a810517f862cad07977bf8fe9 | Копировать |
Помощь
О PyPI
Внесение вклада в PyPI
Использование PyPI
Разработано и поддерживается сообществом Python’а для сообщества Python’а.
Пожертвуйте сегодня!
PyPI», «Python Package Index» и логотипы блоков являются зарегистрированными товарными знаками Python Software Foundation.
yagooglesearch 1.7.0
A Python library for executing intelligent, realistic-looking, and tunable Google searches.
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: BSD 3-Clause «New» or «Revised» License
Метки python, google, search, googlesearch
Требует: Python >=3.6
Сопровождающие
Описание проекта
yagooglesearch — Yet another googlesearch
Overview
yagooglesearch is a Python library for executing intelligent, realistic-looking, and tunable Google searches. It simulates real human Google search behavior to prevent rate limiting by Google (the dreaded HTTP 429 response), and if HTTP 429 blocked by Google, logic to back off and continue trying. The library does not use the Google API and is heavily based off the googlesearch library. The features include:
- Tunable search client attributes mid searching
- Returning a list of URLs instead of a generator
- HTTP 429 / rate-limit detection (Google is blocking your IP for making too many search requests) and recovery
- Randomizing delay times between retrieving paged search results (i.e., clicking on page 2 for more results)
- HTTP(S) and SOCKS5 proxy support
- Leveraging requests library for HTTP requests and cookie management
- Adds «&filter=0» by default to search URLs to prevent any omission or filtering of search results by Google
- Console and file logging
- Python 3.6+
Terms and Conditions
This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may violate their Terms of Service. Another Python Google search library had some interesting information/discussion on it:
Google’s preferred method is to use their API.
Installation
pip
pip install yagooglesearch
setup.py
git clone https://github.com/opsdisk/yagooglesearch yagooglesearch virtualenv -p python3.7 .venv .venv/bin/activate python setup.py install
Usage
Low and slow is the strategy when executing Google searches using yagooglesearch . If you start getting HTTP 429 responses, Google has rightfully detected you as a bot and will block your IP for a set period of time. yagooglesearch is not able to bypass CAPTCHA, but you can do this manually by performing a Google search from a browser and proving you are a human.
The criteria and thresholds to getting blocked is unknown, but in general, randomizing the user agent, waiting enough time between paged search results (7-17 seconds), and waiting enough time between different Google searches (30-60 seconds) should suffice. Your mileage will definitely vary though. Using this library with Tor will likely get you blocked quickly.
HTTP 429 detection and recovery (optional)
If yagooglesearch detects an HTTP 429 response from Google, it will sleep for http_429_cool_off_time_in_minutes minutes and then try again. Each time an HTTP 429 is detected, it increases the wait time by a factor of http_429_cool_off_factor .
The goal is to have yagooglesearch worry about HTTP 429 detection and recovery and not put the burden on the script using it.
If you do not want yagooglesearch to handle HTTP 429s and would rather handle it yourself, pass yagooglesearch_manages_http_429s=False when instantiating the yagooglesearch object. If an HTTP 429 is detected, the string «HTTP_429_DETECTED» is added to a list object that will be returned, and it’s up to you on what the next step should be. The list object will contain any URLs found before the HTTP 429 was detected.
HTTP and SOCKS5 proxy support
yagooglesearch supports the use of a proxy. The provided proxy is used for the entire life cycle of the search to make it look more human, instead of rotating through various proxies for different portions of the search. The general search life cycle is:
- Simulated «browsing» to google.com
- Executing the search and retrieving the first page of results
- Simulated clicking through the remaining paged (page 2, page 3, etc.) search results
To use a proxy, provide a proxy string when initializing a yagooglesearch.SearchClient object:
Supported proxy schemes are based off those supported in the Python requests library (https://docs.python-requests.org/en/master/user/advanced/#proxies):
- http
- https
- socks5 — «causes the DNS resolution to happen on the client, rather than on the proxy server.» You likely do not want this since all DNS lookups would source from where yagooglesearch is being run instead of the proxy.
- socks5h — «If you want to resolve the domains on the proxy server, use socks5h as the scheme.» This is the best option if you are using SOCKS because the DNS lookup and Google search is sourced from the proxy IP address.
HTTPS proxies and SSL/TLS certificates
If you are using a self-signed certificate for an HTTPS proxy, you will likely need to disable SSL/TLS verification when either:
If you want to use multiple proxies, that burden is on the script utilizing the yagooglesearch library to instantiate a new yagooglesearch.SearchClient object with the different proxy. Below is an example of looping through a list of proxies:
If you have a GOOGLE_ABUSE_EXEMPTION cookie value, it can be passed into google_exemption when instantiating the SearchClient object.
&tbs= URL filter clarification
The &tbs= parameter is used to specify either verbatim or time-based filters.
Verbatim search
time-based filters
Time filter | &tbs= URL parameter | Notes |
---|---|---|
Past hour | qdr:h | |
Past day | qdr:d | Past 24 hours |
Past week | qdr:w | |
Past month | qdr:m | |
Past year | qdr:y | |
Custom | cdr:1,cd_min:1/1/2021,cd_max:6/1/2021 | See yagooglesearch.get_tbs() function |
Limitations
Currently, the .filter_search_result_urls() function will remove any url with the word «google» in it. This is to prevent the returned search URLs from being polluted with Google URLs. Note this if you are trying to explicitly search for results that may have «google» in the URL, such as site:google.com computer
License
Distributed under the BSD 3-Clause License. See LICENSE for more information.
Contact
Acknowledgements
googlesearch
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
Prerequisites
You will need Python 3 to use this module
Minimum required versions: versions: According to Vermin, Python 3.2 is needed
Always check if your Python version works with googlesearch before using it in production
Installing
Option 1: From PyPI
pip install python-googlesearch
Make sure to download python-googlesearch as googlesearch cannot be given to any package on PyPI
Even if you download python-googlesearch , googlesearch is used for the imports and the CLI version for conveniency purposes
Option 2: From Git
pip install git+https://github.com/Animenosekai/googlesearch
You can check if you successfully installed it by printing out its version:
$ python -c googlesearch v1.1.1
$ googlesearch --version googlesearch v1.1.1
Usage
You can use googlesearch in Python by importing it in your script:
You can use googlesearch in other apps by accessing it through the CLI version:
$ googlesearch --query Python , , ,
An interactive version of the CLI is also available
$ googlesearch Enter to googlesearch ~ Query > : . What you want to —————————————————SEARCH RESULT————————————————— Description: URL: . Related Searches:
You can get help on this version by using:
$ googlesearch --help usage: googlesearch QUERY
Search
The search class represents a Google Search.
It lets you retrieve the different results/websites ( Search.results ) and the related searches ( Search.related_searches )
How to use
This class is lazy loading the results.
When you initialize it with Search() , it takes a query as the required parameter and the following parameters as optional parameters:
- language : The language to request the results in (All of the website won’t be in the given language as it is biased by lots of factors, including your IP address location). This needs to be a two-letter ISO 639-1 language code (default: «en»)
- number_of_results : The max number of results to be passed to Google Search while requesting the results (This won’t give you the exact number of results) (default: 10)
- retry_count : A positive integer representing the number of retries done before raising an exception (useful as googlesearch seems to fail sometimes) (default: 3)
- parser : The BeautifulSoup parser to use (default: «html.parser»)
It will only load and parse the website when results or related_searches is called.
parser is the BeautifulSoup parser used to parse the website and .
results is a list of googlesearch.models.SearchResultElement .
related_searches is a list of Search elements.
SearchResultElement
This class represents a result and is initialized by googlesearch .
It holds the following information:
- url : The URL of the website
- title : The title of the website
- displayed_url : The URL displayed on Google Search
- description : The description of the website
Extra
Every class has the as_dict function which converts the object into a dictionary. For Search , the as_dict function will convert the other Search objects in related_search to a string with the query.
Exceptions
All of the exceptions inherit from the GoogleSearchException exception.
You can find a list of exceptions in the exceptions.py file
Deployment
This module is currently in development and might contain bugs.
Feel free to use it in production if you feel like it is suitable for your production even if you may encounter issues.
Built With
- beautifulsoup4 — To parse the HTML
- requests — To make HTTP requests
- pyuseragents — To create the User-Agent HTTP header
- inquirer — To make a beautiful CLI interface
Authors
License
This project is licensed under the MIT License — see the LICENSE file for more details