- GoogleImageScraper 2.3.5
- Навигация
- Ссылки проекта
- Статистика
- Метаданные
- Сопровождающие
- Описание проекта
- Google Image Scraper
- Arguments
- Download arguments
- Search Arguments
- Usage
- download:
- urls:
- image_objects:
- download_image:
- Errors
- Included Errors
- Saved searches
- Use saved searches to filter your results more quickly
- License
- Menchit-ai/parse-google-image
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Парсинг изображений на python
- Введение
- Установка модуля icrawler
- Написание кода
- Заключение
GoogleImageScraper 2.3.5
This is a library for retrieving urls and downloading images from Google Images.
Навигация
Ссылки проекта
Статистика
Метаданные
Лицензия: MIT
Автор: Jibble
Требует: Python >= 3.4
Сопровождающие
Описание проекта
Google Image Scraper
This is a library for retrieving and downloading images from Google Images.
It uses an input query and arguments to search and retrive image objects. These images may be protected under copyright, and you should not do anything punishable with them, like using them for commercial use. This library is inspired by google-images-download by hardikvasa, but adds a few quality of life improvments, such as being able to retrieve urls as well. This library would not be possible, however, without their work, and the people who are working to continue it.
Arguments
There is one required argument and two arguments in both of the main functions:
Argument | Types | Description |
---|---|---|
query: | str, list | Either a string or list containing the keywords to search for. If the query is a string, it will be separated into different keywords by spaces. |
limit | int | The amount of images to search for. Cannot be greater then 100. *Defaults to 1* |
arguments: | dict | This is a dictionary containing many optional values, all of which will be listed here. They are split into two categories: Search arguments and Download arguments |
Download arguments
Argument | Types | Description |
---|---|---|
download_format | str | Specifies a file extension to download all images as. Must be a valid image file extension recognized by PIL. *Note: This takes considerably longer with large amounts of images* |
directory | str | This specifies the directory name to download the images to. This will automatically be created in the directory the function is called in, unless the directory already exists or path is specified. |
path | str | This specifies the path to create the download directory in. |
timeout | int float | This specifies the maximum time the program will wait to retrieve a single image in seconds. |
verbose | bool | Set to True in order to print updates on progress to the console. |
Search Arguments
Argument | Accepted Values | Description |
---|---|---|
color | ‘red’, ‘orange’, ‘yellow’, ‘green’, ‘teal’, ‘blue’, ‘purple’, ‘pink’, ‘white’, ‘gray’, ‘black’, ‘brown’ | Filter images by the dominant color. |
color_type | ‘full’, ‘grayscale’, ‘transparent’ | Filter images by the color type, full color, grayscale, or transparent. |
license | ‘creative_commons’, ‘other_licenses’ | Filter images by the usage license. |
type | ‘face’, ‘photo’, ‘clipart’, ‘lineart’, ‘gif’ | Filters by the type of images to search for. *Not to be confused with search_format* |
time | ‘past_day’, ‘past_week’, ‘past_month’, ‘past_year’ | Only finds images posted in the time specified. |
aspect_ratio | ‘tall’, ‘square’, ‘wide’, ‘panoramic’ | Specifies the aspect ratio of the images. |
search_format | ‘jpg’, ‘gif’, ‘png’, ‘bmp’, ‘svg’, ‘webp’, ‘ico’, ‘raw’ | Filters out images that are not a specified format. If you would like to download images as a specific format, use the ‘download_format’ argument instead. |
Usage
There are four available functions, download, urls, image_objects and download_image, which works differently than the others:
download:
This will download images based on the arguments. The returned values will follow this format:
Each of the images in the list of images will follow a particular format as well:
urls:
This function simply returns a list of image urls from the search terms.
image_objects:
This function is a little more niche, but it may be useful to some people. Instead of returning a list of image urls like with the urls function, it returns a list of image objects containing useful data, structured like so:
The usage is similar to the previous functions:
download_image:
Use this function to download an image via url. This function is different from the rest in that it takes different input arguments, provided below:
Argument | Types | Description |
---|---|---|
url | str | The url to download the image from. *required* |
name | str | The name of the file. Do not include file extension. *required* |
path | str | The path to download the image to. |
download_format | str | The format to download the image in. Takes a while longer |
overwrite | bool | Whether to overwrite files with the same name. Defaults to True . Raises FileExistsError if False and the file exists. |
Errors
There is a chance that you may not reach the number of images specified in the limit argument. This occurs when there is an error downloading an image, whether it is not in an image format, or the request times out, it can happen. When downloading a large amount of images, this may cause your limit to not be reached. The ‘errors’ item in the returned dictionary from downloads is your way of keeping track of that. For example, if your limit was 100, and 3 images threw errors, you would get 97 images back, and the ‘errors’ item would be 3. Now, if your limit was 20, however, and 3 images threw errors, you would still get 20 items back, and the ‘errors’ item would be 0. This is because a max of 100 urls can be found in one query, so higher limits increases the chance that errors will cut into your limit.
Included Errors
Error | Description |
---|---|
LimitError | Raised when the limit argument is above 100 or not the proper type. |
ArgumentError | Raised when an invalid value is given for an argument |
QueryError | Raised if there is no query or the query is not the proper type |
UnpackError | Raised if no images are found on the page. |
DownloadError | Exclusive to the download_image function. Raised if the image failed to download. |
A few real examples are listed here:
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Contains a script that can be used to query google image and download all the images that are found.
License
Menchit-ai/parse-google-image
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
This script is use to parse google image with selenium. It can be launched using command lines and allow you to directly download all the images that you want in one query.
To setup the environment, simply run pip install -r requirements.txt inside the project directory.
To launch the script, use the following command : python parse_google_image.py [query] [nb_of_images] For example :
python parse_google_image.py «damaged apple fruit» 4
will give me the first 4 images that we can obtains typing damaged apple fruit in google image.
By default, the script will download your images in a folder that has the same name as your query inside a dataset directory. To continue with the precedent example, we will have this organisation :
. +-- dataset | +-- damaged_apple_fruit | | +-- damaged_apple_fruit(1).jpg | | +-- damaged_apple_fruit(2).jpg | | +-- damaged_apple_fruit(3).jpg | | +-- damaged_apple_fruit(4).jpg
You can use any folder as you want to replace the dataset using the option —directory or -d in the command. For example, if I want to store my data inside a damaged folder, I can use the command :
python parse_google_image.py «damaged apple fruit» 4 —directory damaged
You can put the verbose option to 0, 1 or 2 depending on the output that you want. 0 will hide all the output of the script, 1 will only show how many images you have downloaded, and 2 will show the whole output of the script. This option is added to the command with —verbose or with -v. Example :
E:\parse-google-image>python parse_google_image.py "damaged apple fruit" 4 --directory damaged --verbose 2 Search : damaged apple fruit ; number : 4 ; scrolls : 1 damaged/damaged_apple_fruit/damaged_apple_fruit(0).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(1).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(2).jpg downloaded ! damaged/damaged_apple_fruit/damaged_apple_fruit(3).jpg downloaded !
For now, the scrolling is overkill and want to load all the images possible without clicking on «show more results» but we can try to scroll the least possible. Not all the images are downloaded : the urls are handle without problem but some format may cause issues. For example, the data:image/jpeg;base64 is handled but some other formats are not. The application will probably crash if your query gives absolutely no result.
About
Contains a script that can be used to query google image and download all the images that are found.
Парсинг изображений на python
Статьи
Введение
Научимся парсить изображения из гугла по запросу с помощью python.
Программа будет работать следующим образом: пользователь вводит запрос, количество изображений, которые нужно спарсить и путь до папки, в которую нужно сохранить все изображения. После чего, выбранное количество изображений скачается в выбранную папку.
Установка модуля icrawler
Для парсинга изображений мы будем использовать модуль под названием icrawler. Установим его:
Написание кода
Для начала импортируем GoogleImageCrawler из icrawler:
from icrawler.builtin import GoogleImageCrawler
Создадим три переменные, в которые пользователь будет вводить данные:
name = input('По какому запросу парсить изображения?\n') quantity = int(input('Сколько нужно спарсить изображений?\n')) path = input('Куда сохранить изображения?\n')
Создадим объект класса GoogleImageCrawler, так же добавим параметр storage, который приравняем к расположению папки итогового хранения изображений:
google_crawler = GoogleImageCrawler(storage=)
Осталось только скачать изображения, для этого вызовем метод crawl():
google_crawler.crawl(keyword=name, max_num=quantity)
keyword – запрос в google изображениях
max_num – количество скачиваемых изображений
Заключение
В данной статье мы написали простую программу для парсинга изображений на python.
Надеюсь Вам понравилась статья, удачи! 🙂