How to get the entire document HTML as a string?
Stop upvoting John’s bolded comment! The answer he links to replaces && with && and so it breaks all your inline tags! You should use document.documentElement.outerHTML instead, but note that it doesn’t grab , so you’ll need to add that yourself.
17 Answers 17
Get the root element with document.documentElement then get its .innerHTML :
const txt = document.documentElement.innerHTML; alert(txt);
or its .outerHTML to get the tag as well
const txt = document.documentElement.outerHTML; alert(txt);
worked like a charm! thank you! is there any way to get the size of any/all files linked to the document as well including js and css files?
@CMCDragonkai: You could get the doctype separately and prepend it to the markup string. Not ideal, I know, but possible.
note that neither this nor none of these answers necessarily give you content that is the exact hash equivalent of saving the page to a file or the file generated by view-source. It seems the DOM normalizes some fields from the literal response content, like capitalising DOCTYPE headers
new XMLSerializer().serializeToString(document)
in browsers newer than IE 9
This was the first correct answer according to date/time stamps. Parts of the page such as the XML declaration will not be included and browsers will manipulate the code when using the other «answers». This is the only post that should be up-voted (dos’s posted three days later). People need to pay attention!
This is not entirely correct since it serializeToString performs an HTML encode. For example if your code contains styles defining fonts such as «Times New Roman», Times, serif the quotes will get html encoded. Perhaps that is not important to some of you but to me it is.
@John well the OP actually asks for «the entire HTML within the html tags». And the selected best answer by Colin Burnett does achieve this. This particular answer (Erik’s) will include the html tags and the doctype. That said, this was totally a diamond in the rough for me and exactly what I was looking for! Your comment helped too because it made me spend more time with this answer, so thanks 🙂
I think people should be careful with this one, specifically because it returns a value that is not the actual html that your browser receives. In my case, it added attributes to the html tag that the server never actually sent 🙁
I tried the various answers to see what is returned. I’m using the latest version of Chrome.
The suggestion document.documentElement.innerHTML; returned .