Google search python library

google 3.0.0

История выпусков Уведомления о выпусках | Лента RSS

Загрузка файлов

Загрузите файл для вашей платформы. Если вы не уверены, какой выбрать, узнайте больше об установке пакетов.

Source Distribution

Uploaded 11 июл. 2020 г. source

Built Distribution

Uploaded 11 июл. 2020 г. py2 py3

Хеши для google-3.0.0.tar.gz

Хеши для google-3.0.0.tar.gz
Алгоритм Хеш-дайджест
SHA256 143530122ee5130509ad5e989f0512f7cb218b2d4eddbafbad40fd10e8d8ccbe Копировать
MD5 cd61f970f40babca924f2760b467ed63 Копировать
BLAKE2b-256 8997b49c69893cddea912c7a660a4b6102c6b02cd268f8c7162dd70b7c16f753 Копировать

Хеши для google-3.0.0-py2.py3-none-any.whl

Хеши для google-3.0.0-py2.py3-none-any.whl
Алгоритм Хеш-дайджест
SHA256 889cf695f84e4ae2c55fbc0cfdaf4c1e729417fa52ab1db0485202ba173e4935 Копировать
MD5 30c719790d3e7e57e9482d108eda355c Копировать
BLAKE2b-256 ac3517c9141c4ae21e9a29a43acdfd848e3e468a810517f862cad07977bf8fe9 Копировать

Помощь

О PyPI

Внесение вклада в PyPI

Использование PyPI

Разработано и поддерживается сообществом Python’а для сообщества Python’а.
Пожертвуйте сегодня!

PyPI», «Python Package Index» и логотипы блоков являются зарегистрированными товарными знаками Python Software Foundation.

Источник

yagooglesearch 1.7.0

A Python library for executing intelligent, realistic-looking, and tunable Google searches.

Ссылки проекта

Статистика

Метаданные

Лицензия: BSD 3-Clause «New» or «Revised» License

Метки python, google, search, googlesearch

Требует: Python >=3.6

Сопровождающие

Описание проекта

yagooglesearch — Yet another googlesearch

Overview

yagooglesearch is a Python library for executing intelligent, realistic-looking, and tunable Google searches. It simulates real human Google search behavior to prevent rate limiting by Google (the dreaded HTTP 429 response), and if HTTP 429 blocked by Google, logic to back off and continue trying. The library does not use the Google API and is heavily based off the googlesearch library. The features include:

  • Tunable search client attributes mid searching
  • Returning a list of URLs instead of a generator
  • HTTP 429 / rate-limit detection (Google is blocking your IP for making too many search requests) and recovery
  • Randomizing delay times between retrieving paged search results (i.e., clicking on page 2 for more results)
  • HTTP(S) and SOCKS5 proxy support
  • Leveraging requests library for HTTP requests and cookie management
  • Adds «&filter=0» by default to search URLs to prevent any omission or filtering of search results by Google
  • Console and file logging
  • Python 3.6+

Terms and Conditions

This code is supplied as-is and you are fully responsible for how it is used. Scraping Google Search results may violate their Terms of Service. Another Python Google search library had some interesting information/discussion on it:

Google’s preferred method is to use their API.

Installation

pip

pip install yagooglesearch

setup.py

git clone https://github.com/opsdisk/yagooglesearch  yagooglesearch virtualenv -p python3.7 .venv  .venv/bin/activate python setup.py install

Usage

Low and slow is the strategy when executing Google searches using yagooglesearch . If you start getting HTTP 429 responses, Google has rightfully detected you as a bot and will block your IP for a set period of time. yagooglesearch is not able to bypass CAPTCHA, but you can do this manually by performing a Google search from a browser and proving you are a human.

The criteria and thresholds to getting blocked is unknown, but in general, randomizing the user agent, waiting enough time between paged search results (7-17 seconds), and waiting enough time between different Google searches (30-60 seconds) should suffice. Your mileage will definitely vary though. Using this library with Tor will likely get you blocked quickly.

HTTP 429 detection and recovery (optional)

If yagooglesearch detects an HTTP 429 response from Google, it will sleep for http_429_cool_off_time_in_minutes minutes and then try again. Each time an HTTP 429 is detected, it increases the wait time by a factor of http_429_cool_off_factor .

The goal is to have yagooglesearch worry about HTTP 429 detection and recovery and not put the burden on the script using it.

If you do not want yagooglesearch to handle HTTP 429s and would rather handle it yourself, pass yagooglesearch_manages_http_429s=False when instantiating the yagooglesearch object. If an HTTP 429 is detected, the string «HTTP_429_DETECTED» is added to a list object that will be returned, and it’s up to you on what the next step should be. The list object will contain any URLs found before the HTTP 429 was detected.

HTTP and SOCKS5 proxy support

yagooglesearch supports the use of a proxy. The provided proxy is used for the entire life cycle of the search to make it look more human, instead of rotating through various proxies for different portions of the search. The general search life cycle is:

  1. Simulated «browsing» to google.com
  2. Executing the search and retrieving the first page of results
  3. Simulated clicking through the remaining paged (page 2, page 3, etc.) search results

To use a proxy, provide a proxy string when initializing a yagooglesearch.SearchClient object:

Supported proxy schemes are based off those supported in the Python requests library (https://docs.python-requests.org/en/master/user/advanced/#proxies):
  • http
  • https
  • socks5 — «causes the DNS resolution to happen on the client, rather than on the proxy server.» You likely do not want this since all DNS lookups would source from where yagooglesearch is being run instead of the proxy.
  • socks5h — «If you want to resolve the domains on the proxy server, use socks5h as the scheme.» This is the best option if you are using SOCKS because the DNS lookup and Google search is sourced from the proxy IP address.

HTTPS proxies and SSL/TLS certificates

If you are using a self-signed certificate for an HTTPS proxy, you will likely need to disable SSL/TLS verification when either:

If you want to use multiple proxies, that burden is on the script utilizing the yagooglesearch library to instantiate a new yagooglesearch.SearchClient object with the different proxy. Below is an example of looping through a list of proxies:

If you have a GOOGLE_ABUSE_EXEMPTION cookie value, it can be passed into google_exemption when instantiating the SearchClient object.

&tbs= URL filter clarification

The &tbs= parameter is used to specify either verbatim or time-based filters.

time-based filters

Time filter &tbs= URL parameter Notes
Past hour qdr:h
Past day qdr:d Past 24 hours
Past week qdr:w
Past month qdr:m
Past year qdr:y
Custom cdr:1,cd_min:1/1/2021,cd_max:6/1/2021 See yagooglesearch.get_tbs() function

Limitations

Currently, the .filter_search_result_urls() function will remove any url with the word «google» in it. This is to prevent the returned search URLs from being polluted with Google URLs. Note this if you are trying to explicitly search for results that may have «google» in the URL, such as site:google.com computer

License

Distributed under the BSD 3-Clause License. See LICENSE for more information.

Contact

Acknowledgements

Источник

googlesearch

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

You will need Python 3 to use this module

 Minimum required versions:  versions: According to Vermin, Python 3.2 is needed

Always check if your Python version works with googlesearch before using it in production

Installing

Option 1: From PyPI

pip install python-googlesearch

Make sure to download python-googlesearch as googlesearch cannot be given to any package on PyPI

Even if you download python-googlesearch , googlesearch is used for the imports and the CLI version for conveniency purposes

Option 2: From Git

pip install git+https://github.com/Animenosekai/googlesearch

You can check if you successfully installed it by printing out its version:

$ python -c  googlesearch v1.1.1
$ googlesearch --version googlesearch v1.1.1

Usage

You can use googlesearch in Python by importing it in your script:

You can use googlesearch in other apps by accessing it through the CLI version:

$ googlesearch --query Python          ,           ,  ,     

An interactive version of the CLI is also available

$ googlesearch Enter  to  googlesearch   ~ Query > : .   What  you want to  —————————————————SEARCH RESULT—————————————————  Description: URL: .  Related Searches: 

You can get help on this version by using:

$ googlesearch --help usage: googlesearch    QUERY 

The search class represents a Google Search.

It lets you retrieve the different results/websites ( Search.results ) and the related searches ( Search.related_searches )

How to use

This class is lazy loading the results.

When you initialize it with Search() , it takes a query as the required parameter and the following parameters as optional parameters:

  • language : The language to request the results in (All of the website won’t be in the given language as it is biased by lots of factors, including your IP address location). This needs to be a two-letter ISO 639-1 language code (default: «en»)
  • number_of_results : The max number of results to be passed to Google Search while requesting the results (This won’t give you the exact number of results) (default: 10)
  • retry_count : A positive integer representing the number of retries done before raising an exception (useful as googlesearch seems to fail sometimes) (default: 3)
  • parser : The BeautifulSoup parser to use (default: «html.parser»)

It will only load and parse the website when results or related_searches is called.

parser is the BeautifulSoup parser used to parse the website and .

results is a list of googlesearch.models.SearchResultElement .

related_searches is a list of Search elements.

SearchResultElement

This class represents a result and is initialized by googlesearch .

It holds the following information:

  • url : The URL of the website
  • title : The title of the website
  • displayed_url : The URL displayed on Google Search
  • description : The description of the website

Extra

Every class has the as_dict function which converts the object into a dictionary. For Search , the as_dict function will convert the other Search objects in related_search to a string with the query.

Exceptions

All of the exceptions inherit from the GoogleSearchException exception.

You can find a list of exceptions in the exceptions.py file

Deployment

This module is currently in development and might contain bugs.

Feel free to use it in production if you feel like it is suitable for your production even if you may encounter issues.

Built With

  • beautifulsoup4 — To parse the HTML
  • requests — To make HTTP requests
  • pyuseragents — To create the User-Agent HTTP header
  • inquirer — To make a beautiful CLI interface

Authors

License

This project is licensed under the MIT License — see the LICENSE file for more details

Источник

Читайте также:  Junit annotations in java
Оцените статью