Python get url xml

Python — How to Read XML from URL?

In Python, you can use urllib.request and xml.etree.ElementTree library to parse and read XML from URL. The following is an example:

Python Program Example — Read XML from URL

Below Python program will download and read the Oracle Database RSS feed through the URL. It will open the URL (https://blogs.oracle.com/oraclepartners/database-7/rss) using the urlopen method of urllib.request library and then it will parse and read the XML data using parse method of library xml.etree.ElementTree.

from urllib.request import urlopen from xml.etree.ElementTree import parse var_url = urlopen('https://blogs.oracle.com/oraclepartners/database-7/rss') xmldoc = parse(var_url) for item in xmldoc.iterfind('channel/item'): title = item.findtext('title') date = item.findtext('pubDate') link = item.findtext('link') print(title) print(date) print(link) print()

Output

Webcast: Oracle Database 19c: Strategy, Features & New Customers – April 9 Mon, 01 Apr 2019 13:09:04 +0000 https://blogs.oracle.com/oraclepartners/webcast%3A-oracle-database-19c%3A-strategy%2C-features-new-customers-%E2%80%93-april-9 Win Over Financial Services Prospects with MySQL Enterprise Edition Thu, 28 Mar 2019 21:10:44 +0000 https://blogs.oracle.com/oraclepartners/win-over-financial-services-prospects-with-mysql-enterprise-edition How will you design the future for Data & Analytics? - April 26, 2019 Thu, 21 Mar 2019 12:38:22 +0000 https://blogs.oracle.com/oraclepartners/how-will-you-design-the-future-for-data-analytics-april-26%2C-2019 MySQL Enterprise Edition - High Availablity Campaign Now Available Thu, 07 Mar 2019 23:00:00 +0000 https://blogs.oracle.com/oraclepartners/mysql-enterprise-edition-high-availablity-campaign-now-available Stay Ahead of the Game with Autonomous Database Training .

See also:

Источник

Читайте также:  Html как красивую кнопку

How to Download an XML File with Python

Really? An article on downloading and saving an XML file? “Just use requests mate!”, I hear you all saying. Well, it’s not that simple. At least, it wasn’t as straight forward as that for a beginner like me. Here’s why.

Parsing is Different to Saving

For sure, experts and beginners alike will have used requests to pull down the contents of a web page. Generally it’s for the purpose of parsing or scraping that page for specific data elements.

What if you wanted to actually save that web page to your local drive? Things get slightly different. You’re no longer just reading a text rendered version of the page, you’re trying to save the actual page in its original state.

This is what I found slightly confusing. I wasn’t dealing with a text = r.text situation anymore, I was trying to maintain the original format of the page as well, tabs and all.

Why XML?

I’m talking XML here because I was/am trying to download the actual XML file for an RSS feed I wanted to parse offline. For those of you playing at home, this is for our PyBites Code Challenge 17 (hint hint!).

Why Download when you can just Parse the feed itself?

Good question! It’s about best practice and just being nice.

In the case of our code challenge (PCC17), how many times are you going to run your Py script while building the app to test if it works? Every time you run that script with your requests.get code in place, you’re making a call to the target web server.

This generates unnecessary traffic and load on that server which is a pretty crappy thing to do!

The nicer and Pythonic thing to do is to have a separate script that does the request once and saves the required data to a local file. Your primary scraping or analysis script then references the local file.

Get to the code already!

import requests URL = "http://insert.your/feed/here.xml" response = requests.get(URL) with open('feed.xml', 'wb') as file: file.write(response.content) 

It all looks pretty familiar so I won’t go into detail on the usual suspects.

What I’m doing in this code is the following:

  • Pulling the xml content down using requests.get .
  • Using a with statement to create a file called feed.xml . (If the file exists it’ll be overwritten).
  • Writing the contents of the requests response into the file feed.xml .

Here’s why it was a learning exercise for me:

As I open/create the feed.xml file, I’m using the “Mode” wb . This means I’m opening the file for writing purposes but can only write to it in a binary format.

If you fail to choose the binary mode then you’ll get an error:

Traceback (most recent call last): File "pull_xml.py", line 12, in file.write(response.content) TypeError: write() argument must be str, not bytes 

This confused the hell out of me and resulted in me wasting time trying to convert the requests response data to different formats or writing to the external file one line at a time (which meant I lost formatting anyway!).

The binary mode is required to write the actual content of the XML page to your external file in the original format.

Speaking of content. Notice in the final write statement I’m using response.content ? Have any idea how long I spent thinking my use of the usual response.text was the only way to do this? Too damn long!

Using the content option allows you to dump the entire XML file (as is) into your own local XML file. Brilliant!

Note for beginners: If you’re reading other people’s code, be prepared to see with statements where files are opened as f . The same applies to the requests module. The line will generally read r = requests.get(URL) . I’ve used full form names for the sake of this article thus the words file and response in my code.

Conclusion

This is one of those things that we all just get used to doing. Pulling a feed down and saving it to a file is something Bob has done a thousand times so no longer has to give it any extra thought.

For me, however, this took an entire night* of playing around because I’d never done it before and was assuming (silly me!) that the parsing code I’ve been using requests for was all I needed.

I also found that I had to scour a ton of StackOverflow posts and other documentation just to get my head wrapped around this concept correctly.

So with this finally cleared up, it’s time to go attack some feeds!

Keep Calm and Code in Python!

*Not really an entire night. I do need my beauty sleep!

Want a career as a Python Developer but not sure where to start?

Источник

Getting XML from the Server [Python Code]

To retrieve XML from the server using Python, you need to send a GET request and specify the «Accept: application/xml» HTTP header in the request. The Accept header tells the server that the client is expecting XML data. Without this header, the server may return data in a different format. The Content-Type: application/xml response header informs the client that the server returned XML. In this Python GET XML Example, we send a GET request to the ReqBin echo URL with Accept: application/xml request header. Click Send to execute Python GET XML Request example online and see the results. The Python code was automatically generated for the GET XML example.

GET /echo/get/xml HTTP/1.1 Host: reqbin.com Accept: application/xml 

Python code for GET XML example

Python code for GET XML Example

This Python code snippet was generated automatically for the GET XML example.

What is XML?

XML (Extensible Markup Language) is an extensible markup language that provides structured information: data, documents, configuration, and more. XML is called extensible because it doesn’t fix the markup used in documents: you can create markup according to the needs of a particular area, limited only by the rules of XML syntax. An XML file is a simple text file that uses XML tags to describe the structure of a document, how it is stored and transported, and the data itself.

What is HTTP GET request?

HTTP GET is the most popular of the nine commonly used HTTP methods. The GET request method is used to retrieve data from the specified URL, cannot contain data in the body of the GET request, and should not change the server state. The GET method is defined as idempotent, which means that several similar GET requests should have the same effect on the server as a single request.

GET /echo/ HTTP/1.1 Host: reqbin.com

Get XML Example

GET /echo/get/xml HTTP/1.1 Host: reqbin.com Accept: application/xml

Why is it important to send the Accept header in the XML request?

The «Accept: application/xml» header indicates that the client expects to see XML in the server’s response. Without this header, the server can return data in a different format.

When the server returns XML in its response, it notifies the client using the «Content-Type: application/xml» response header.

Content-Type: application/xml

How to view XML directly in the browser?

Most browsers can display XML documents right in the browser. To do this, enter the address of the XML document in the address bar, and the browser will display the XML in a convenient tree format.

How to post XML to the server?

To post XML to the server, you need to make a POST request, include the XML in the request body and specify the correct MIME type. The MIME type for XML is «application/xml». The Accept: application/xml request header tells the server that the client is expecting XML, and the «Content-Type: application/xml» response header indicates that the server returned XML. Below is an example of sending XML to the ReqBin echo URL:

POST /echo/post/xml HTTP/1.1 Host: reqbin.com Content-Type: application/xml Accept: application/xml  name rating 

See also

Generate code snippets for Python and other programming languages

Convert your GET XML request to the PHP, JavaScript/AJAX, Node.js, Curl/Bash, Python, Java, C#/.NET code snippets using the Python code generator.

Источник

Оцените статью