After a little search I found a good way to do this.
Libraries needed (in order of use): Http Components, JTidy, Saxon and json-lib.
There are a lot of dependencies, but they did the work.
First of all we need to use HttpClient from Http Components.
DefaultHttpClient client = new DefaultHttpClient();
HttpGet get = new HttpGet(resourcePath);
HttpResponse httpresponse = client.execute(get);
InputStream stream = httpresponse.getEntity().getContent();
After this we have the source code of the resource requested.
Now it's the turn of JTidy: it made the hard work of translating the ugliest html code of the whole internet in a clean xhtml document, well formatted, and parsable.
Tidy tidy = new Tidy();
tidy.setQuiet(true);
tidy.setShowWarnings(false);
Document doc = tidy.parseDOM(stream, null);
The Document is a org.w3c.dom.Document, browsable with DOM methods (getElementsByTagName, getElementById...), but I prefer XPath; Saxon is our boy.
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile(xPathExpresssion);
NodeList nodelist = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
We can choose differents return types (BOOLEAN, STRING, NUMBER, NODE, NODESET, DOM_OBJECT_MODEL); just change the constant and cast the result.
The final step is the conversion of the pojo in JSON string.
String jsonString = JSONObject.fromObject(myPojo).toString(indentFactor);
That's all.