- Saved searches
- Use saved searches to filter your results more quickly
- License
- heureka/py-url-generator
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- Building URLs in Python
- The standard way
- The manual way
- The Furl way
- Conclusion
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
License
heureka/py-url-generator
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
UrlGenerator is a simple library that allows you to generate URLs from a single JSON configuration for different programming languages.
Other programming languages
+There is also UrlGenerator for PHP, which accepts the same configuration, +so you can combine multiple projects in Python and PHP using same configuration file.
Main entrypoint to this library is function url_generator.get_url which accepts the following parameters:
Path string defines path through route configuration using dot notation (see the #configuration section bellow).
Params is dictionary of route parameters. Params dict should contain mixed GET parameters and template parameters as defined in configuration.
example: lang=»sk», productId=12345
from url_generator import UrlGenerator ug = UrlGenerator('path/to/your/config.json', env="dev", lang="sk") ug.get_url("heureka.category.index", productId=12345)
Path to configuration file must be passed trough the constructor to the UrlGenerator instance (as a first parameter). Check tests/test.json for better understanding.
Structure is plain Json (planning support for comments in future).
Simple configuration file can look like this:
Keys in configuration that are prefixed by @ symbol are considered as keyword. Each route is defined by keyword. Note that @scheme and @host keywords are mandatory.
Keywords are prefixed by @ symbol to distinguish them from path nodes.
This represents URL scheme by RFC1738, usually http or https
This represents subdomain.
This represents host (and can contain port if necessary)
example: «@host»: «www.heureka.cz»
This represents postfix for host. For example for dev environment.
example: «@host_postfix»: «test.dev»
This represents URL path by RFC1738 like / for index or /iphone-7/recenze for product detail
example: «@path»: «/obchody/czc-cz/recenze»
This represents list of allowed query parameters with their internal and external representation.
For example if configuration contains «@query»: then index parameter in call ug.get_url(‘example_site’, index=10) will be compiled according to configuration into i=10 and returned url should looks like https://www.example.com/?i=10 .
Note that query parameters are not mandatory, and cannot be set as mandatory.
This represents fragment (anchor identifier) by RFC1738 like #section example: «@fragment»: «section»
Every key in configuration, which is not prefixed by @ or symbol, is considered as path node.
Path nodes should use underscore_case naming convention.
Path string is dot joined Path nodes like heureka.category.index .
For example, in following configuration some_site or index are path nodes. On the other hand @host and @path are keywords.
With the precending configuration we can call get_url(‘some_site.index’) and generated url will be https://example.com/index.php .
Using path string we define which URL we want to receive. The URL Generator parses the config file and gets keywords from given path node and all its parents.
At the example bellow, we can call get_url(‘example.russian’) and the response will be https://www.example.ru/information .
Note that URL Generator uses the @host keyword in example.russian path node, but @path keyword is from example . At least @scheme keyword is defined in global space (root path).
At this example you can also see keywords overloading, as the host keyword is defined in example path node and it is overloaded in example.russian path node.
This way we can build complex structures like heureka.product.detail.reviews.only_certified without too many repating definitions in the configuration.
In the configuration we can define template parameters using syntax. Those parameters will be expected in get_url(path, params) function call (in params array).
Template parameters can be also defined globally as first parameter of URL Generator constructor and those will be shared for all get_url function calls.
It is not recommended to use template parameters in values for @scheme and @query keywords.
At the example bellow we define top level domain using language parameter. So we can call get_url(path, language=’cz’, page=1]) and the result will be https://www.example.cz?p=1 .
Note that query parametes and template parameters are mixed in get_url(. **kwargs) second parameter together.
We can define template conditions in configuration to separate configuration for given parameter value.
This way we can define configuration only for given language/environment/etc..
Template conditions uses =expected_value syntax and can contain same rules as path nodes.
Rules defined inside template condition are processed only if given parameter equals expected value.
At the example bellow we define =spanish condition, so if we call get_url(«example», lang=»spanish») it returns https://www.ejemplar.es .
Note that rules are processed from top to bottom in file. Latter rule has priority so template condition must be placed after overloaded values to have effect.
The main advantage of the URLGenerator is that it can share the configuration through multiple programming languages. Therefore, it is necessary to keep the individual language versions compatible with each other.
So, when you create a pull-request into this repository, please concider contributing the same functionality to the other repositories listed in Other programming languages section.
Project owners should never merge code which breaks compatibility with other language versions.
The author of the first idea is Pavel Škoda.
Building URLs in Python
Building URLs is really common in applications and APIs because most of the applications tend to be pretty interconnected. But how should we do it in Python? Here’s my take on the subject.
Let’s see how the different options compare.
The standard way
Python has a built in library that is specifically made for parsing URLs, called urllib.parse.
You can use the urllib.parse.urlsplit function to break a URL string to a five-item named tuple. The items are parsed
scheme://netloc/path?query#fragment
The opposite of breaking an URL to parts is to build it using the urllib.parse.urlunsplit function.
If you check the library documentation you’ll notice that there is also a urlparse function. The difference between it and the urlsplit function is an additional item in the parse result for path parameters.
https://www.example.com/some/path;parameter=12?q=query
Path parameters are separated with a semicolon from the path and located before the query arguments that start with a question mark. Most of the time you don’t need them but it is good to know that they exist.
So how would you then build an URL with urllib.parse?
Let’s assume that you want to call some API and need a function for building the API URL. The required URL could be for example:
https://example.com/api/v1/book/12?format=mp3&token=abbadabba
Here is how we could build the URL:
import os from urllib.parse import urlunsplit, urlencode SCHEME = os.environ.get("API_SCHEME", "https") NETLOC = os.environ.get("API_NETLOC", "example.com") def build_api_url(book_id, format, token): path = f"/api/v1/book/book_id>" query = urlencode(dict(format=format, token=token)) return urlunsplit((SCHEME, NETLOC, path, query, ""))
Calling the function works as expected:
>>> build_api_url(12, "mp3", "abbadabba") 'https://example.com/api/v1/book/12?format=mp3&token=abbadabba'
I used environment variables for the scheme and netloc because typically your program is calling a specific API endpoint that you might want to configure via the environment.
I also introduced the urlencode function which transforms a dictionary to a series of key=value pairs separated with & characters. This can be handy if you have lots of query arguments as a dictionary of values can be easier to manipulate.
The urllib.parse library also contains urljoin which is similar to os.path.join . It can be used to build URLs by combining a base URL with a path. Let’s modify the example code a bit.
import os from urllib.parse import urljoin, urlencode BASE_URL = os.environ.get("BASE_URL", "https://example.com/") def build_api_url(book_id, format, token): path = f"/api/v1/book/book_id>" query = "?" + urlencode(dict(format=format, token=token)) return urljoin(BASE_URL, path + query)
This time the whole base URL comes from the environment. The path and query are combined with the base URL using the urljoin function. Notice that this time the question mark at the beginning of the query needs to be set manually.
The manual way
Libraries can be nice but sometimes you just want to get things done without thinking that much. Here’s a straight forward way to build a URL manually.
import os BASE_URL = os.environ.get(BASE_URL, "https://example.com").rstrip("/") def build_api_url(book_id, format, token): return f"BASE_URL>/api/v1/book/book_id>?format=format>&token=token>"
The f-strings in Python make this quite clean, especially with URLs that always have the same structure and not that many parameters. The BASE_URL initialization strips the tailing forward slash from the environment variable. This way the user doesn’t have to remember if it should be included or not.
Note that I haven’t added any validations for the input parameters in these examples so you may need take that into consideration.
The Furl way
Then there is a library called furl which aims to make URL parsing and manipulation easy. It can be installed with pip:
>> python3 -m pip install furl
import os from furl import furl BASE_URL = os.environ.get("BASE_URL", "https://example.com") def build_api_url(book_id, format, token): f = furl(BASE_URL) f /= f"/api/v1/book/book_id>" f.args["format"] = format f.args["token"] = token return f.url
There are a bit more lines here when compared to the previous example. First we need to initialize a furl object from the base url. The path can be appended using the /= operator which is custom defined by the library.
The query arguments can be set with the args property dictionary. Finally, the final URL can be built by accessing the url property.
Here’s an alternative implementation using the set() method to change the path and query arguments of an existing URL.
def build_api_url(book_id, format, token): return ( furl(BASE_URL) .set(path=f"/api/v1/book/book_id>", args="format": format, "token": token>,) .url )
In addition to building URLs Furl lets you modify existing URLs and parse parts of them. You can find many more examples from the API documentation.
Conclusion
These are just some examples on how to create URLs. Which one do you prefer?
Read next in the Python bites series.