While using HTML Tidy I needed to remove the DOCTYPE prolog to prevent
‘org.xml.sax.SAXParseException: Already seen doctype.’ exception.
Regex is quite simple, only catch is that we need to make sure we include the \n\r in our selecton and make it not greedy.
convertedData = convertedData.replaceAll("<!DOCTYPE((.|\n|\r)*?)\">", ""); |
This will consume multiline as well as single declarations
/*
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
*/ |
Hi Greg,
I am using one java library for my android project. So I am exactly getting this exception. Now its code is packaged as .jar, in which SAXParser has been used. So how can I use your above solution to handle this? Can you please give me a concrete example in Java?
Thanks
Regards..