public class HtmlLoadOptions
Constructor Summary |
---|
HtmlLoadOptions()
Initializes a new instance of this class with default values. |
HtmlLoadOptions(java.lang.Stringpassword)
A shortcut to initialize a new instance of this class with the specified password to load an encrypted document. |
HtmlLoadOptions(intloadFormat, java.lang.Stringpassword, java.lang.StringbaseUri)
A shortcut to initialize a new instance of this class with properties set to the specified values. |
Property Getters/Setters Summary | ||
---|---|---|
java.lang.String | getBaseUri() | |
void | setBaseUri(java.lang.Stringvalue) | |
Gets or sets the string that will be used to resolve relative URIs found in the document into absolute URIs when required. Can be null or empty string. Default is null. | ||
boolean | getConvertMetafilesToPng() | |
void | setConvertMetafilesToPng(booleanvalue) | |
Gets or sets whether to convert metafile ( |
||
boolean | getConvertShapeToOfficeMath() | |
void | setConvertShapeToOfficeMath(booleanvalue) | |
Gets or sets whether to convert shapes with EquationXML to Office Math objects. | ||
java.nio.charset.Charset | getEncoding() | |
void | setEncoding(java.nio.charset.Charsetvalue) | |
Gets or sets the encoding that will be used to load an HTML, TXT, or CHM document if the encoding is not specified inside the document. Can be null. Default is null. | ||
FontSettings | getFontSettings() | |
void | setFontSettings(FontSettings value) | |
Allows to specify document font settings. | ||
LanguagePreferences | getLanguagePreferences() | |
Gets language preferences that will be used when document is loading.
|
||
int | getLoadFormat() | |
void | setLoadFormat(intvalue) | |
Specifies the format of the document to be loaded.
Default is |
||
int | getMswVersion() | |
void | setMswVersion(intvalue) | |
Allows to specify that the document loading process should match a specific MS Word version.
Default value is |
||
java.lang.String | getPassword() | |
void | setPassword(java.lang.Stringvalue) | |
Gets or sets the password for opening an encrypted document. Can be null or empty string. Default is null. | ||
int | getPreferredControlType() | |
void | setPreferredControlType(intvalue) | |
Gets or sets preferred type of document nodes that will represent imported <input> and <select> elements.
Default value is |
||
boolean | getPreserveIncludePictureField() | |
void | setPreserveIncludePictureField(booleanvalue) | |
Gets or sets whether to preserve the INCLUDEPICTURE field when reading Microsoft Word formats. The default value is false. | ||
IResourceLoadingCallback | getResourceLoadingCallback() | |
void | ||
Allows to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML. | ||
boolean | getSupportVml() | |
void | setSupportVml(booleanvalue) | |
Gets or sets a value indicating whether to support VML images. | ||
java.lang.String | getTempFolder() | |
void | setTempFolder(java.lang.Stringvalue) | |
Allows to use temporary files when reading document.
By default this property is null and no temporary files are used.
|
||
boolean | getUpdateDirtyFields() | |
void | setUpdateDirtyFields(booleanvalue) | |
Specifies whether to update the fields with the dirty attribute.
|
||
IWarningCallback | getWarningCallback() | |
void | ||
Called during a load operation, when an issue is detected that might result in data or formatting fidelity loss. | ||
int | getWebRequestTimeout() | |
void | setWebRequestTimeout(intvalue) | |
The number of milliseconds to wait before the web request times out. The default value is 100000 milliseconds (100 seconds). |
public HtmlLoadOptions()
Example:
Shows how to support VML while parsing a document.HtmlLoadOptions loadOptions = new HtmlLoadOptions(); // If value is true, then we take VML code into account while parsing the loaded document loadOptions.setSupportVml(doSupportVml); // This document contains an image within "<!--[if gte vml 1]>" tags, and another different image within "<![if !vml]>" tags // Upon loading the document, only the contents of the first tag will be shown if VML is enabled, // and only the contents of the second tag will be shown otherwise Document doc = new Document(getMyDir() + "VML conditional.htm", loadOptions); // Only one of the two unique images will be loaded, depending on the value of loadOptions.SupportVml Shape imageShape = (Shape) doc.getChild(NodeType.SHAPE, 0, true); if (doSupportVml) TestUtil.verifyImageInShape(400, 400, ImageType.JPEG, imageShape); else TestUtil.verifyImageInShape(400, 400, ImageType.PNG, imageShape);
public HtmlLoadOptions(java.lang.String password)
password
- The password to open an encrypted document. Can be null or empty string.Example:
Shows how to encrypt an Html document and then open it using a password.// Create and sign an encrypted html document from an encrypted .docx CertificateHolder certificateHolder = CertificateHolder.create(getMyDir() + "morzal.pfx", "aw"); SignOptions signOptions = new SignOptions(); { signOptions.setComments("Comment"); signOptions.setSignTime(new Date()); signOptions.setDecryptionPassword("docPassword"); } String inputFileName = getMyDir() + "Encrypted.docx"; String outputFileName = getArtifactsDir() + "HtmlLoadOptions.EncryptedHtml.html"; DigitalSignatureUtil.sign(inputFileName, outputFileName, certificateHolder, signOptions); // This .html document will need a password to be decrypted, opened and have its contents accessed // The password is specified by HtmlLoadOptions.Password HtmlLoadOptions loadOptions = new HtmlLoadOptions("docPassword"); Assert.assertEquals(loadOptions.getPassword(), signOptions.getDecryptionPassword()); Document doc = new Document(outputFileName, loadOptions); Assert.assertEquals(doc.getText().trim(), "Test encrypted document.");
public HtmlLoadOptions(int loadFormat, java.lang.String password, java.lang.String baseUri)
loadFormat
- A password
- The password to open an encrypted document. Can be null or empty string.baseUri
- The string that will be used to resolve relative URIs to absolute. Can be null or empty string.Example:
Shows how to specify a base URI when opening an html document.// If we want to load an .html document which contains an image linked by a relative URI // while the image is in a different location, we will need to resolve the relative URI into an absolute one // by creating an HtmlLoadOptions and providing a base URI HtmlLoadOptions loadOptions = new HtmlLoadOptions(LoadFormat.HTML, "", getImageDir()); Assert.assertEquals(LoadFormat.HTML, loadOptions.getLoadFormat()); Document doc = new Document(getMyDir() + "Missing image.html", loadOptions); // While the image was broken in the input .html, it was successfully found in our base URI Shape imageShape = (Shape) doc.getChildNodes(NodeType.SHAPE, true).get(0); Assert.assertTrue(imageShape.isImage()); // The image will be displayed correctly by the output document doc.save(getArtifactsDir() + "HtmlLoadOptions.BaseUri.docx");
public java.lang.String getBaseUri() / public void setBaseUri(java.lang.String value)
This property is used to resolve relative URIs into absolute in the following cases:
Example:
Shows how to open an HTML document with images from a stream using a base URI.InputStream stream = new FileInputStream(getMyDir() + "Document.html"); try /*JAVA: was using*/ { // Pass the URI of the base folder while loading it // so that any images with relative URIs in the HTML document can be found. LoadOptions loadOptions = new LoadOptions(); loadOptions.setBaseUri(getImageDir()); Document doc = new Document(stream, loadOptions); // Verify that the first shape of the document contains a valid image. Shape shape = (Shape)doc.getChild(NodeType.SHAPE, 0, true); Assert.assertTrue(shape.isImage()); Assert.assertNotNull(shape.getImageData().getImageBytes()); Assert.assertEquals(32.0, ConvertUtil.pointToPixel(shape.getWidth()), 0.01); Assert.assertEquals(32.0, ConvertUtil.pointToPixel(shape.getHeight()), 0.01); } finally { if (stream != null) stream.close(); }
public boolean getConvertMetafilesToPng() / public void setConvertMetafilesToPng(boolean value)
public boolean getConvertShapeToOfficeMath() / public void setConvertShapeToOfficeMath(boolean value)
public java.nio.charset.Charset getEncoding() / public void setEncoding(java.nio.charset.Charset value)
This property is used only when loading HTML, TXT, or CHM documents.
If encoding is not specified inside the document and this property is null
, then the system will try to
automatically detect the encoding.
public FontSettings getFontSettings() / public void setFontSettings(FontSettings value)
When loading some formats, Aspose.Words may require to resolve the fonts. For example, when loading HTML documents Aspose.Words may resolve the fonts to perform font fallback.
If set to null, default static font settings
The default value is null.
public LanguagePreferences getLanguagePreferences()
public int getLoadFormat() / public void setLoadFormat(int value)
It is recommended that you specify the
Example:
Shows how to specify a base URI when opening an html document.// If we want to load an .html document which contains an image linked by a relative URI // while the image is in a different location, we will need to resolve the relative URI into an absolute one // by creating an HtmlLoadOptions and providing a base URI HtmlLoadOptions loadOptions = new HtmlLoadOptions(LoadFormat.HTML, "", getImageDir()); Assert.assertEquals(LoadFormat.HTML, loadOptions.getLoadFormat()); Document doc = new Document(getMyDir() + "Missing image.html", loadOptions); // While the image was broken in the input .html, it was successfully found in our base URI Shape imageShape = (Shape) doc.getChildNodes(NodeType.SHAPE, true).get(0); Assert.assertTrue(imageShape.isImage()); // The image will be displayed correctly by the output document doc.save(getArtifactsDir() + "HtmlLoadOptions.BaseUri.docx");
public int getMswVersion() / public void setMswVersion(int value)
public java.lang.String getPassword() / public void setPassword(java.lang.String value)
You need to know the password to open an encrypted document. If the document is not encrypted, set this to null or empty string.
Example:
Shows how to sign encrypted document file.// Create an X.509 certificate from a PKCS#12 store, which should contain a private key. CertificateHolder certificateHolder = CertificateHolder.create(getMyDir() + "morzal.pfx", "aw"); // Create a comment, date, and decryption password which will be applied with our new digital signature. SignOptions signOptions = new SignOptions(); { signOptions.setComments("Comment"); signOptions.setSignTime(new Date()); signOptions.setDecryptionPassword("docPassword"); } // Set a local system filename for the unsigned input document, and an output filename for its new digitally signed copy. String inputFileName = getMyDir() + "Encrypted.docx"; String outputFileName = getArtifactsDir() + "DigitalSignatureUtil.DecryptionPassword.docx"; DigitalSignatureUtil.sign(inputFileName, outputFileName, certificateHolder, signOptions);
public int getPreferredControlType() / public void setPreferredControlType(int value)
Example:
Shows how to set preferred type of document nodes that will represent imported <input> and <select> elements.final String html = "\r\n<html>\r\n<select name='ComboBox' size='1'>\r\n" + "<option value='val1'>item1</option>\r\n<option value='val2'></option>\r\n</select>\r\n</html>\r\n"; HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions(); htmlLoadOptions.setPreferredControlType(HtmlControlType.STRUCTURED_DOCUMENT_TAG); Document doc = new Document(new ByteArrayInputStream(html.getBytes("UTF-8")), htmlLoadOptions); NodeCollection nodes = doc.getChildNodes(NodeType.STRUCTURED_DOCUMENT_TAG, true); StructuredDocumentTag tag = (StructuredDocumentTag) nodes.get(0);
public boolean getPreserveIncludePictureField() / public void setPreserveIncludePictureField(boolean value)
By default, the INCLUDEPICTURE field is converted into a shape object. You can override that if you need the field to be preserved, for example, if you wish to update it programmatically. Note however that this approach is not common for Aspose.Words. Use it on your own risk.
One of the possible use cases may be using a MERGEFIELD as a child field to dynamically change the source path of the picture. In this case you need the INCLUDEPICTURE to be preserved in the model.
public IResourceLoadingCallback getResourceLoadingCallback() / public void setResourceLoadingCallback(IResourceLoadingCallback value)
public boolean getSupportVml() / public void setSupportVml(boolean value)
Example:
Shows how to support VML while parsing a document.HtmlLoadOptions loadOptions = new HtmlLoadOptions(); // If value is true, then we take VML code into account while parsing the loaded document loadOptions.setSupportVml(doSupportVml); // This document contains an image within "<!--[if gte vml 1]>" tags, and another different image within "<![if !vml]>" tags // Upon loading the document, only the contents of the first tag will be shown if VML is enabled, // and only the contents of the second tag will be shown otherwise Document doc = new Document(getMyDir() + "VML conditional.htm", loadOptions); // Only one of the two unique images will be loaded, depending on the value of loadOptions.SupportVml Shape imageShape = (Shape) doc.getChild(NodeType.SHAPE, 0, true); if (doSupportVml) TestUtil.verifyImageInShape(400, 400, ImageType.JPEG, imageShape); else TestUtil.verifyImageInShape(400, 400, ImageType.PNG, imageShape);
public java.lang.String getTempFolder() / public void setTempFolder(java.lang.String value)
null
and no temporary files are used.
The folder must exist and be writable, otherwise an exception will be thrown.
Aspose.Words automatically deletes all temporary files when reading is complete.
Example:
Shows how to load a document using temporary files.// Note that such an approach can reduce memory usage but degrades speed. LoadOptions loadOptions = new LoadOptions(); loadOptions.setTempFolder("C:\\TempFolder\\"); // Ensure that the directory exists and load. new File(loadOptions.getTempFolder()).mkdir(); Document doc = new Document(getMyDir() + "Document.docx", loadOptions);
public boolean getUpdateDirtyFields() / public void setUpdateDirtyFields(boolean value)
dirty
attribute.
Example:
Shows how to use special property for updating field result.Document doc = new Document(); DocumentBuilder builder = new DocumentBuilder(doc); // Give the document's built-in "Author" property value, and then display it with a field. doc.getBuiltInDocumentProperties().setAuthor("John Doe"); FieldAuthor field = (FieldAuthor)builder.insertField(FieldType.FIELD_AUTHOR, true); Assert.assertFalse(field.isDirty()); Assert.assertEquals("John Doe", field.getResult()); // Update the property. The field still displays the old value. doc.getBuiltInDocumentProperties().setAuthor("John & Jane Doe"); Assert.assertEquals("John Doe", field.getResult()); // Since the field's value is out of date, we can mark it as "dirty". // This value will stay out of date until we update the field manually with the Field.Update() method. field.isDirty(true); OutputStream docStream = new FileOutputStream(getArtifactsDir() + "Filed.UpdateDirtyFields.docx"); try { // If we save without calling an update method, // the field will keep displaying the out of date value in the output document. doc.save(docStream, SaveFormat.DOCX); // The LoadOptions object has an option to update all fields // marked as "dirty" when loading the document. LoadOptions options = new LoadOptions(); options.setUpdateDirtyFields(updateDirtyFields); doc = new Document(String.valueOf(docStream), options); Assert.assertEquals("John & Jane Doe", doc.getBuiltInDocumentProperties().getAuthor()); field = (FieldAuthor) doc.getRange().getFields().get(0); // Updating dirty fields like this automatically set their "IsDirty" flag to false. if (updateDirtyFields) { Assert.assertEquals("John & Jane Doe", field.getResult()); Assert.assertFalse(field.isDirty()); } else { Assert.assertEquals("John Doe", field.getResult()); Assert.assertTrue(field.isDirty()); } } finally { if (docStream != null) docStream.close(); }
public IWarningCallback getWarningCallback() / public void setWarningCallback(IWarningCallback value)
public int getWebRequestTimeout() / public void setWebRequestTimeout(int value)
Example:
Shows how to set a time limit for web requests that will occur when loading an html document which links external resources.public void webRequestTimeout() throws Exception { // Create a new HtmlLoadOptions object and verify its timeout threshold for a web request HtmlLoadOptions options = new HtmlLoadOptions(); // When loading an Html document with resources externally linked by a web address URL, // web requests that fetch these resources that fail to complete within this time limit will be aborted Assert.assertEquals(100000, options.getWebRequestTimeout()); // Set a WarningCallback that will record all warnings that occur during loading ListDocumentWarnings warningCallback = new ListDocumentWarnings(); options.setWarningCallback(warningCallback); // Load such a document and verify that a shape with image data has been created, // provided the request to get that image took place within the timeout limit String html = "\r\n<html>\r\n<img src=\"{AsposeLogoUrl}\" alt=\"Aspose logo\" style=\"width:400px;height:400px;\">\r\n</html>\r\n"; Document doc = new Document(new FileInputStream(html), options); Shape imageShape = (Shape) doc.getChild(NodeType.SHAPE, 0, true); Assert.assertEquals(7498, imageShape.getImageData().getImageBytes().length); Assert.assertEquals(0, warningCallback.warnings().size()); // Set an unreasonable timeout limit and load the document again options.setWebRequestTimeout(0); doc = new Document(new FileInputStream(html), options); // If a request fails to complete within the timeout limit, a shape with image data will still be produced // However, the image will be the red 'x' that commonly signifies missing images imageShape = (Shape) doc.getChild(NodeType.SHAPE, 0, true); Assert.assertEquals(924, imageShape.getImageData().getImageBytes().length); // A timeout like this will also accumulate warnings that can be picked up by a WarningCallback implementation Assert.assertEquals(WarningSource.HTML, warningCallback.warnings().get(0).getSource()); Assert.assertEquals(WarningType.DATA_LOSS, warningCallback.warnings().get(0).getWarningType()); Assert.assertEquals("Couldn't load a resource from \'{AsposeLogoUrl}\'.", warningCallback.warnings().get(0).getDescription()); Assert.assertEquals(WarningSource.HTML, warningCallback.warnings().get(1).getSource()); Assert.assertEquals(WarningType.DATA_LOSS, warningCallback.warnings().get(1).getWarningType()); Assert.assertEquals("Image has been replaced with a placeholder.", warningCallback.warnings().get(1).getDescription()); doc.save(getArtifactsDir() + "HtmlLoadOptions.WebRequestTimeout.docx"); } /// <summary> /// Stores all warnings occuring during a document loading operation in a list. /// </summary> private static class ListDocumentWarnings implements IWarningCallback { public void warning(WarningInfo info) { mWarnings.add(info); } public ArrayList<WarningInfo> warnings() { return mWarnings; } private ArrayList<WarningInfo> mWarnings = new ArrayList<>(); }