On 07/03/2013, at 9:12 AM, [hidden email] wrote:
> My brother works with the Universities Library, and they are moving to wordpress for their webpages.
> They are moving their current content MANUALLY (thousands of pages), and to show them how insane this is (and maybe the choice of Wordpress, too), I want to show him how to migrate with Funnelweb.
> This is a typical page:
> ?. but I can not find any usefull div/class to use to extact as "body text" .. its all "tables" (and until now, Funnelweb has worked "out of the box for me")
> Maybe it is possible to somehow remove some of the <tr> and <td>s. It looks like the body text is usually in the second <td> of the second <tr>
"usually" is the key word. You have to find the pattern or groups of patterns
you can use xpath like //tr/td which is "in the second <td> of the second <tr>"
Sometimes I've used width values if they are unique enough.
You can also do some more advanced xpath by looking inside elements
For example if it's td with the h2 in it you can use //td[.//h2]
Remember you can also use multiple template sections and if one of the compulsory elements don't match it will go on to the next template.
You can also enable transmogrify.htmlcontentextractor.auto  which will try to find the xpath for you.
Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester
Wave(TM): Endpoint Security, Q1 2013 and "remains a good choice" in the
endpoint security space. For insight on selecting the right partner to
tackle endpoint security challenges, access the full report.
Plone-Users mailing list
7. mars 2013 kl. 00:57 skrev "Dylan Jay-4 [via Plone]" <[hidden email]>:
This is very useful: I googled a little and found also [contains(@class,'someclass')]
Does this work the same way, and how "deep does this go
(lets say you have
Would a rule like //div[.//h2] work for both ?
That said, I still can not get it to work without using (css) classes / ids (importing another plone site using ids works great), could it be something wrong with xml on my OS X (missing libs or something)
I made this (test) pipeline, and I can not understand why it add "blank pages (title field is empty, body text field is empty( " (all pages has "title" and "body" so something should show up (?))
include = funnelweb.remote
title = text //title
description = optional //nothing
text = html //body
|Free forum by Nabble||Edit this page|