Parsing google images python

GoogleImageScraper 2.3.5

This is a library for retrieving urls and downloading images from Google Images.

Ссылки проекта

Статистика

Метаданные

Лицензия: MIT

Автор: Jibble

Требует: Python >= 3.4

Сопровождающие

Описание проекта

Google Image Scraper

This is a library for retrieving and downloading images from Google Images.
It uses an input query and arguments to search and retrive image objects. These images may be protected under copyright, and you should not do anything punishable with them, like using them for commercial use. This library is inspired by google-images-download by hardikvasa, but adds a few quality of life improvments, such as being able to retrieve urls as well. This library would not be possible, however, without their work, and the people who are working to continue it.

Arguments

There is one required argument and two arguments in both of the main functions:

Argument Types Description
query: str, list Either a string or list containing the keywords to search for. If the query is a string, it will be separated into different keywords by spaces.
limit int The amount of images to search for. Cannot be greater then 100. *Defaults to 1*
arguments: dict This is a dictionary containing many optional values, all of which will be listed here. They are split into two categories: Search arguments and Download arguments

Download arguments

Argument Types Description
download_format str Specifies a file extension to download all images as. Must be a valid image file extension recognized by PIL. *Note: This takes considerably longer with large amounts of images*
directory str This specifies the directory name to download the images to. This will automatically be created in the directory the function is called in, unless the directory already exists or path is specified.
path str This specifies the path to create the download directory in.
timeout int float This specifies the maximum time the program will wait to retrieve a single image in seconds.
verbose bool Set to True in order to print updates on progress to the console.
Читайте также:  Kotlin firebase удаление данных

Search Arguments

Argument Accepted Values Description
color ‘red’, ‘orange’, ‘yellow’, ‘green’, ‘teal’, ‘blue’, ‘purple’, ‘pink’, ‘white’, ‘gray’, ‘black’, ‘brown’ Filter images by the dominant color.
color_type ‘full’, ‘grayscale’, ‘transparent’ Filter images by the color type, full color, grayscale, or transparent.
license ‘creative_commons’, ‘other_licenses’ Filter images by the usage license.
type ‘face’, ‘photo’, ‘clipart’, ‘lineart’, ‘gif’ Filters by the type of images to search for. *Not to be confused with search_format*
time ‘past_day’, ‘past_week’, ‘past_month’, ‘past_year’ Only finds images posted in the time specified.
aspect_ratio ‘tall’, ‘square’, ‘wide’, ‘panoramic’ Specifies the aspect ratio of the images.
search_format ‘jpg’, ‘gif’, ‘png’, ‘bmp’, ‘svg’, ‘webp’, ‘ico’, ‘raw’ Filters out images that are not a specified format. If you would like to download images as a specific format, use the ‘download_format’ argument instead.

Usage

There are four available functions, download, urls, image_objects and download_image, which works differently than the others:

download:

This will download images based on the arguments. The returned values will follow this format:
Each of the images in the list of images will follow a particular format as well:

urls:

This function simply returns a list of image urls from the search terms.

image_objects:

This function is a little more niche, but it may be useful to some people. Instead of returning a list of image urls like with the urls function, it returns a list of image objects containing useful data, structured like so:

The usage is similar to the previous functions:

download_image:

Use this function to download an image via url. This function is different from the rest in that it takes different input arguments, provided below:

Argument Types Description
url str The url to download the image from. *required*
name str The name of the file. Do not include file extension. *required*
path str The path to download the image to.
download_format str The format to download the image in. Takes a while longer
overwrite bool Whether to overwrite files with the same name. Defaults to True . Raises FileExistsError if False and the file exists.

Errors

There is a chance that you may not reach the number of images specified in the limit argument. This occurs when there is an error downloading an image, whether it is not in an image format, or the request times out, it can happen. When downloading a large amount of images, this may cause your limit to not be reached. The ‘errors’ item in the returned dictionary from downloads is your way of keeping track of that. For example, if your limit was 100, and 3 images threw errors, you would get 97 images back, and the ‘errors’ item would be 3. Now, if your limit was 20, however, and 3 images threw errors, you would still get 20 items back, and the ‘errors’ item would be 0. This is because a max of 100 urls can be found in one query, so higher limits increases the chance that errors will cut into your limit.

Included Errors

Error Description
LimitError Raised when the limit argument is above 100 or not the proper type.
ArgumentError Raised when an invalid value is given for an argument
QueryError Raised if there is no query or the query is not the proper type
UnpackError Raised if no images are found on the page.
DownloadError Exclusive to the download_image function. Raised if the image failed to download.
A few real examples are listed here:

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Contains a script that can be used to query google image and download all the images that are found.

License

Menchit-ai/parse-google-image

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

This script is use to parse google image with selenium. It can be launched using command lines and allow you to directly download all the images that you want in one query.

To setup the environment, simply run pip install -r requirements.txt inside the project directory.

To launch the script, use the following command : python parse_google_image.py [query] [nb_of_images] For example :

python parse_google_image.py «damaged apple fruit» 4

will give me the first 4 images that we can obtains typing damaged apple fruit in google image.

By default, the script will download your images in a folder that has the same name as your query inside a dataset directory. To continue with the precedent example, we will have this organisation :

. +-- dataset | +-- damaged_apple_fruit | | +-- damaged_apple_fruit(1).jpg | | +-- damaged_apple_fruit(2).jpg | | +-- damaged_apple_fruit(3).jpg | | +-- damaged_apple_fruit(4).jpg 

You can use any folder as you want to replace the dataset using the option —directory or -d in the command. For example, if I want to store my data inside a damaged folder, I can use the command :

python parse_google_image.py «damaged apple fruit» 4 —directory damaged

You can put the verbose option to 0, 1 or 2 depending on the output that you want. 0 will hide all the output of the script, 1 will only show how many images you have downloaded, and 2 will show the whole output of the script. This option is added to the command with —verbose or with -v. Example :

E:\parse-google-image>python parse_google_image.py "damaged apple fruit" 4 --directory damaged --verbose 2 Search : damaged apple fruit ; number : 4 ; scrolls : 1 damaged/damaged_apple_fruit/damaged_apple_fruit(0).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(1).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(2).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(3).jpg downloaded ! 

For now, the scrolling is overkill and want to load all the images possible without clicking on «show more results» but we can try to scroll the least possible. Not all the images are downloaded : the urls are handle without problem but some format may cause issues. For example, the data:image/jpeg;base64 is handled but some other formats are not. The application will probably crash if your query gives absolutely no result.

About

Contains a script that can be used to query google image and download all the images that are found.

Источник

Парсинг изображений на python

Парсинг изображений на python

Статьи

Введение

Научимся парсить изображения из гугла по запросу с помощью python.

Программа будет работать следующим образом: пользователь вводит запрос, количество изображений, которые нужно спарсить и путь до папки, в которую нужно сохранить все изображения. После чего, выбранное количество изображений скачается в выбранную папку.

Установка модуля icrawler

Для парсинга изображений мы будем использовать модуль под названием icrawler. Установим его:

Написание кода

Для начала импортируем GoogleImageCrawler из icrawler:

from icrawler.builtin import GoogleImageCrawler

Создадим три переменные, в которые пользователь будет вводить данные:

name = input('По какому запросу парсить изображения?\n') quantity = int(input('Сколько нужно спарсить изображений?\n')) path = input('Куда сохранить изображения?\n')

Создадим объект класса GoogleImageCrawler, так же добавим параметр storage, который приравняем к расположению папки итогового хранения изображений:

google_crawler = GoogleImageCrawler(storage=)

Осталось только скачать изображения, для этого вызовем метод crawl():

google_crawler.crawl(keyword=name, max_num=quantity)

keyword – запрос в google изображениях

max_num – количество скачиваемых изображений

Заключение

В данной статье мы написали простую программу для парсинга изображений на python.

Надеюсь Вам понравилась статья, удачи! 🙂

Источник

Оцените статью