HTML table to java object
I need to convert html table to java Object. For now I cant find any good method for implementing this task. The example of table is below:
I don’t quite understand, could you elaborate. Do you want to make a table object with fields name and address. So Table < Table(name, address)
Better, but there are still a few more loose ends. Are you trying to use JSP? Are you trying to parse the html doc for the table’s content and use it to seed the members of your object?
3 Answers 3
I think in your case you want to use Jsoup, a nice Java library for parsing web pages. Once you have parsed the data you want from the wepage using Jsoup’s selectors creating a Java object with it should be non-trivial. Here are a few helpful links:
Document doc = Jsoup.parse(input, "UTF-8", "http://somewebsite.com/"); Elements row1name = doc.select("tr"); Elements row1address = doc.select("tr"); MyClass table1 = new MyClass(row1name, row1address);
Something like that (the selectors are for row1name and address are wrong, you have to look at the docs to verify the proper way to do it. I don’t remember). I hope that helps.
My answer won’t probably be useful to the writer of this question (I am 3 years late so not the right timing I guess) but I think it will probably be useful for many other developers that might come across this answer.
Today, I just released (in the name of my company) an HTML to POJO complete framework that you can use to map HTML to any POJO class with simply some annotations. The library itself is quite handy and features many other things all the while being very pluggable. You can have a look to it right here : https://github.com/whimtrip/jwht-htmltopojo
How to use : Basics
Imagine we need to parse the following html page :
A la bonne Franquette
French cuisine restaurant for gourmet of fellow french people
in London
Restaurant n*18,190. Ranked 113 out of 1,550 restaurants
Veal Cutlet
4.5/5 stars
Chef Mr. Frenchie
Ratatouille
3.6/5 stars
Chef Mr. Frenchie and Mme. French-Cuisine
Let’s create the POJOs we want to map it to :
public class Restaurant < @Selector( value = "div.restaurant >h1") private String name; @Selector( value = "div.restaurant > p:nth-child(2)") private String description; @Selector( value = "div.restaurant > div:nth-child(3) > p > span") private String location; @Selector( value = "div.restaurant > p:nth-child(4)" format = "^Restaurant n\*([0-9,]+). Ranked ([0-9,]+) out of ([0-9,]+) restaurants$", indexForRegexPattern = 1, useDeserializer = true, deserializer = ReplacerDeserializer.class, preConvert = true, postConvert = false ) // so that the number becomes a valid number as they are shown in this format : 18,190 @ReplaceWith(value = ",", with = "") private Long id; @Selector( value = "div.restaurant > p:nth-child(4)" format = "^Restaurant n\*([0-9,]+). Ranked ([0-9,]+) out of ([0-9,]+) restaurants$", // This time, we want the second regex group and not the first one anymore indexForRegexPattern = 2, useDeserializer = true, deserializer = ReplacerDeserializer.class, preConvert = true, postConvert = false ) // so that the number becomes a valid number as they are shown in this format : 18,190 @ReplaceWith(value = ",", with = "") private Integer rank; @Selector(value = ".meal") private List meals; // getters and setters >
And now the Meal class as well :
We provided some more explanations on the above code on our github page.
For the moment, let’s see how to scrap this.
private static final String MY_HTML_FILE = "my-html-file.html"; public static void main(String[] args) < HtmlToPojoEngine htmlToPojoEngine = HtmlToPojoEngine.create(); HtmlAdapteradapter = htmlToPojoEngine.adapter(Restaurant.class); // If they were several restaurants in the same page, // you would need to create a parent POJO containing // a list of Restaurants as shown with the meals here Restaurant restaurant = adapter.fromHtml(getHtmlBody()); // That's it, do some magic now! > private static String getHtmlBody() throws IOException
Another short example can be found here
Hope this will help someone out there!
How to get a table from an html page using JAVA
I am working on a project where I am trying to fetch financial statements from the internet and use them in a JAVA application to automatically create ratios, and charts. The site I am using uses a login and password to get to the tables.
The Tag is TBODY, but there are 2 other TBODY’s in the html. How can I use java to print my table to a txt file where I can then use in my application? What would the best way to go about this, and what should I read up on?
1 Answer 1
If this were my project, I’d look into using an HTML parser, something like jsoup (although others are available). The jsoup site has a tutorial, and after playing with it a while, you’ll likely find it pretty easy to use.
For example, for an HTML table like so:
jsoup could parse it like so:
import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class TableEg < public static void main(String[] args) < String html = "http://publib.boulder.ibm.com/infocenter/iadthelp/v7r1/topic/" + "com.ibm.etools.iseries.toolbox.doc/htmtblex.htm"; try < Document doc = Jsoup.connect(html).get(); Elements tableElements = doc.select("table"); Elements tableHeaderEles = tableElements.select("thead tr th"); System.out.println("headers"); for (int i = 0; i < tableHeaderEles.size(); i++) < System.out.println(tableHeaderEles.get(i).text()); >System.out.println(); Elements tableRowElements = tableElements.select(":not(thead) tr"); for (int i = 0; i < tableRowElements.size(); i++) < Element row = tableRowElements.get(i); System.out.println("row"); Elements rowItems = row.select("td"); for (int j = 0; j < rowItems.size(); j++) < System.out.println(rowItems.get(j).text()); >System.out.println(); > > catch (IOException e) < e.printStackTrace(); >> >
Resulting in the following output:
headers ACCOUNT NAME BALANCE row 0000001 Customer1 100.00 row 0000002 Customer2 200.00 row 0000003 Customer3 550.00
parsing/extracting a HTML Table, Website in Java
I want to parse the data for each cell, all 5 cells under «Montag»(Monday) as an example. I tried several ways of parsing this Website using JSOUP but i havent got any succes with it. My main Goal is to show the contents in an listview in an Android app. For now i tried to print the contents in a java console. Both Languages are accepted :). Any Help is appreciated.
1 Answer 1
Here are the steps you would need to follow:
Eg 1: Enter «//tr[1]//td[1]» in the query and it will give all table elements at position (1,1)
Eg 2: «/html/body[@class=’tt’]/center/table[1]/tbody/tr[4]/td[3]/table/tbody/tr/td» Will give you all 15 values under Montag.
Eg 3: «/html/body[@class=’tt’]/center/table[1]/tbody/tr/td/table/tbody/tr/td» Will give you all 380 entries of the table
import org.jsoup.Jsoup; import java.io.IOException; public class Main < public static void main(String[] args) throws IOException < org.jsoup.nodes.Document doc = Jsoup.connect("http://www.kantschule-falkensee.de/uploads/dmiadgspahw/klassen/A_Klasse_11.htm").get(); org.jsoup.select.Elements rows = doc.select("tr"); for(org.jsoup.nodes.Element row :rows) < org.jsoup.select.Elements columns = row.select("td"); for (org.jsoup.nodes.Element column:columns) < System.out.print(column.text()); >System.out.println(); > > >
How to parse HTML table using jsoup?
I am trying to parse HTML using jsoup. This is my first time working with jsoup and I read some tutorial on it as well. Below is my HTML table which I am trying to parse — If you see my below table, it has three tr as of now (I have shorten it down to have three table rows just for understanding purpose but in general it will be more). Now I would like to extract Cluster Name from my below table and it’s corresponding host name so for example — I would extract Titan as cluster name and all its hostname whose status are down. As you can see below for Titan cluster name, I have two hostnames machineA.abc.com and machineB.abc.com in which machineA status is up but machineB status is down . So I will print out Titan as cluster name and print out machineB.abc.com as the hostname since it is down. Is this possible to do using jsoup?
Alert Cluster Name IP addr Host Name Type Status Free Version Restart Time UpTime(Days) Last probed Last up Hist VI   Titan 10.100.111.77 machineA.abc.com up 88% 2.0.5-SNAPSHOT 2014-07-04 01:49:08,220 381 07-14 20:01:59 07-14 20:01:59 Hist VI   10.200.192.99 machineB.abc.com down 85% 2.0.5-SNAPSHOT 2014-07-04 01:52:20,613 103 07-14 20:01:59 07-14 20:01:59
So far, I am able to extract whole HTML table using jsoup but not sure how would I extract cluster name and the hostnames which are down —
URL url = new URL("url_name"); Document doc = Jsoup.parse(url, 3000);
Alert Cluster Name IP addr Host Name Type Status Free Version Restart Time UpTime(Days) Last probed Last up Hist VI   Titan 10.100.111.77 machineA.abc.com up 88% 2.0.5-SNAPSHOT 2014-07-04 01:49:08,220 381 07-14 20:01:59 07-14 20:01:59 Hist VI   10.200.192.99 machineB.abc.com down 85% 2.0.5-SNAPSHOT 2014-07-04 01:52:20,613 103 07-14 20:01:59 07-14 20:01:59 Hist VI   Goldy 10.100.111.77 machineH.pqr.com up 88% 2.0.5-SNAPSHOT 2014-07-04 01:49:08,220 381 07-14 20:01:59 07-14 20:01:59
Now if you see above I have two cluster name — one is Titan and other is Goldy so I want to find all the machines which are down for Titan cluster name only.