Wednesday, November 7, 2007

Alfresco : XML Metadata Extraction

I went thru the Wiki page and was trying desparately to make sense out of the explanation, however unlike other Wiki pages this one http://wiki.alfresco.com/wiki/Metadata_Extraction didn't lead me to a straight answer.
So I did some investigation in the source, and XML metadata extraction turned out to be a breeze.
The concept of extraction that I am following is
  1. I have my own Content Type 'License' defined in my content model.
  2. Given an XML file representing this content object, I would like to pull information from this xml and populate the properties of the Content Type.
  3. Thus If my Content Type has properties First Name and Last Name and the XML has tags that can be reached from an XPATH, representing these First and Last Names, then I should be able to tie them with my Content Type License.
  4. I kept my XML file simple and it looks like this
    [XML File]
  5. Alfresco 2.1 by default provides an XML Extractor in the form of a Java Class.
  6. It also provides an XML Extractor Selector Class, which peeks inside the XML, identifies it as a particular object based on the XPATH and then passes the Extraction control to the appropriate Extractor registered. All this including the Mapping is defined in the custom-metadata-extractors-context.xml file to be placed in the extensions folder.
  7. The mapping mechanism operates in 2 steps, the first step is to extract data from the XML and place it in a variable defined by the Extractor, and the second step is to put the data from the variable into the actual Content Type Object.
    So in the example XML, the XPATH /license/xlname/text() is first placed in a variable xlname and then it moved from there to ccs:lname property of the Content Type. This appears to be a reverse process.
  8. So inorder to get this working, all we need to do is place the custom-metadata-extractors-context.xml file in the extensions folder and set a rule in the space (which will be ingesting the xml data file) to listen on inbound XML mime-type documents and run the 'extract metadata from xml' action.
  9. Once the xml document is placed in the space, viewing the details of this document will show the extracted metadata assigned to the properties.
I am presently not sure, how does the Alfresco WCM mechanism pick up the custom metadata extractor, the only speculation I have is that the mechanism iterates thru the extractors and finds a match relevant to the mime-type, and when it reaches XML it proceeds further based on the Selector that we have configured. [TODO : Find the exact pick up mechanism for extractors]

2 comments :

rihanna said...

Hi author of this article- if you are still active on this page-i have a problem,maybe u can help me?
I follow your instruction but i can extract metadata to my aspect.As much as i know the problem is in xpath.
How to extract metadata from XMl file?
Thanks.

rihanna said...

Hey,i set up the wrong question.I know how to extract metadata,but the problem is because my XPath is null.
What do i need to do to get working xpath that gives me results?

Powered by Blogger.