Re: Creating content from XML feed

classic Classic list List threaded Threaded
4 messages Options
Tom Lazar Tom Lazar
Reply | Threaded
Open this post in threaded view
|

Re: Creating content from XML feed

I have started pondering the very same problem while writing a proposal  
for a (potential) client. since I haven't got the job yet, I haven't  
dewelled any further so far, but having said that, I think:

- cronjobbing and wgetting on the server seems unavoidable (and also not  
really a problem ;-)
- the problem will be to notify the plone instance that something has  
changed.
- I'm thinking of using webdav via localhost to push the new file into the  
zodb, as this would trigger the creation of a new content object.
- I would then have to look into the practice of overriding the PUT method  
of that ATCT and register it as a contenttype (if it's the only .xml you  
will be dealing with it should be easy - could be tricky if you want to  
deal with multiple types of xmls. (perhaps parsing for the DTD or similar  
identifier?)

if you're looking into a comfortable way of extracting the data from your  
xml to create an ATCT instance you might want to look into amara[1] - not  
exactly light-weight but very pythonic and very convenient ;-)

hth, I'd be very interested in any progress you made in that field. you  
can also send me pmail at [hidden email]

best regards,

tom

[1] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/


On Fri, 14 Oct 2005 18:08:47 +0200, Gareth Adams  
<[hidden email]> wrote:

> Hi,
>
> Our sister site produces a list of events as an XML feed refreshed every  
> 30
> minutes, this includes static fields like 'id', and potentially dynamic
> fields like 'ticketsremaining'.
>
> I'm trying to work out how I can set my site up to automatically
> create/update content items (into a custom ATType) based on the XML  
> feed. I
> can handle the logic and the deciding what fields to update or not, but I
> have no idea how to go about the mechanics of carrying out the update.  
> I'm
> thinking:
>
> - The script will be triggered most likely by cron, will it need to load  
> a
> URL in a browser to do that, or can I use a simple standalone python  
> script?
> (I'm guessing I can but I haven't delved into Plone/Python enough to know
> what to import)
>
> - Content will be created from a local copy of the xml feed which will
> probably be `wget`ted over from their server just before the script is  
> run
>
> - Deleting created content won't be an issue as the xml file provides a
> snapshot of forthcoming events, our site will also act as an archive
>
> Has anyone tried anything like this before, have any opinions whether  
> I've
> got the right idea, or know for a fact that this won't work?
>
> Thank you, you brilliant people,
> Gareth
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by:
> Power Architecture Resource Center: Free content, downloads, discussions,
> and more. http://solutions.newsforge.com/ibmarch.tmpl



--
Tom Lazar
http://tomster.org/blog



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Plone-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users
Cyrille Bonnet Cyrille Bonnet
Reply | Threaded
Open this post in threaded view
|

Re: Creating content from XML feed

For multiple content types (and metadata), you might want to look into
Marshaller:

http://www.zopemag.com/Issue009/Section_Articles/article_PloneWebDAV.html

Cheers

Cyrille

Tom Lazar wrote:

> I have started pondering the very same problem while writing a proposal  
> for a (potential) client. since I haven't got the job yet, I haven't  
> dewelled any further so far, but having said that, I think:
>
> - cronjobbing and wgetting on the server seems unavoidable (and also
> not  really a problem ;-)
> - the problem will be to notify the plone instance that something has  
> changed.
> - I'm thinking of using webdav via localhost to push the new file into
> the  zodb, as this would trigger the creation of a new content object.
> - I would then have to look into the practice of overriding the PUT
> method  of that ATCT and register it as a contenttype (if it's the only
> .xml you  will be dealing with it should be easy - could be tricky if
> you want to  deal with multiple types of xmls. (perhaps parsing for the
> DTD or similar  identifier?)
>
> if you're looking into a comfortable way of extracting the data from
> your  xml to create an ATCT instance you might want to look into
> amara[1] - not  exactly light-weight but very pythonic and very
> convenient ;-)
>
> hth, I'd be very interested in any progress you made in that field. you  
> can also send me pmail at [hidden email]
>
> best regards,
>
> tom
>
> [1] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/
>
>
> On Fri, 14 Oct 2005 18:08:47 +0200, Gareth Adams  
> <[hidden email]> wrote:
>
>> Hi,
>>
>> Our sister site produces a list of events as an XML feed refreshed
>> every  30
>> minutes, this includes static fields like 'id', and potentially dynamic
>> fields like 'ticketsremaining'.
>>
>> I'm trying to work out how I can set my site up to automatically
>> create/update content items (into a custom ATType) based on the XML  
>> feed. I
>> can handle the logic and the deciding what fields to update or not, but I
>> have no idea how to go about the mechanics of carrying out the
>> update.  I'm
>> thinking:
>>
>> - The script will be triggered most likely by cron, will it need to
>> load  a
>> URL in a browser to do that, or can I use a simple standalone python  
>> script?
>> (I'm guessing I can but I haven't delved into Plone/Python enough to know
>> what to import)
>>
>> - Content will be created from a local copy of the xml feed which will
>> probably be `wget`ted over from their server just before the script
>> is  run
>>
>> - Deleting created content won't be an issue as the xml file provides a
>> snapshot of forthcoming events, our site will also act as an archive
>>
>> Has anyone tried anything like this before, have any opinions whether  
>> I've
>> got the right idea, or know for a fact that this won't work?
>>
>> Thank you, you brilliant people,
>> Gareth
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by:
>> Power Architecture Resource Center: Free content, downloads, discussions,
>> and more. http://solutions.newsforge.com/ibmarch.tmpl
>
>
>
>



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Plone-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users
Jon Stahl Jon Stahl
Reply | Threaded
Open this post in threaded view
|

Re: Creating content from XML feed

In reply to this post by Tom Lazar
Gareth Adams <G.D.Adams@...> writes:

>
> Hi,
>
> Our sister site produces a list of events as an XML feed refreshed every 30
> minutes, this includes static fields like 'id', and potentially dynamic
> fields like 'ticketsremaining'.
>
> I'm trying to work out how I can set my site up to automatically
> create/update content items (into a custom ATType) based on the XML feed. I
> can handle the logic and the deciding what fields to update or not, but I
> have no idea how to go about the mechanics of carrying out the update. I'm
> thinking:
>
> - The script will be triggered most likely by cron, will it need to load a
> URL in a browser to do that, or can I use a simple standalone python script?
> (I'm guessing I can but I haven't delved into Plone/Python enough to know
> what to import)
>
> - Content will be created from a local copy of the xml feed which will
> probably be `wget`ted over from their server just before the script is run
>
> - Deleting created content won't be an issue as the xml file provides a
> snapshot of forthcoming events, our site will also act as an archive
>
> Has anyone tried anything like this before, have any opinions whether I've
> got the right idea, or know for a fact that this won't work?

It seems like CMFFeed (http://linux.co.uk/Pages/projects/cmffeed) might be a
possible starting point. It's designed to turn incoming RSS feeds into
first-class content items.  (Which I think is a *huge* need in Plone, BTW.)  It
seems like your problem is similiar, although dealing with a custom type of XML
feed rather than a well-known standard like RSS.

I'm not enough of a programmer to say if CMFFeed does things in the best
possible way. It seems to use ZEO threads to do the fetching of content in the
background.

I would be very, very interested in hearing more about anyone who has used this
or other tools to create a solid RSS aggregator that turns aggregated items into
first-class content objects. (I've been writing a bit about this and other
community-collaboration type features at
http://blogs.onenw.org/jon/archives/2005/10/15/)

best,
jon

---
Jon Stahl
ONE/Northwest
www.onenw.org



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Plone-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users
-----
Jon Stahl, Director of Web Solutions
ONE/Northwest - Online Networking for the Environment
http://www.onenw.org
Kapil Thangavelu Kapil Thangavelu
Reply | Threaded
Open this post in threaded view
|

Re: Re: Creating content from XML feed

In reply to this post by Tom Lazar
another option, that i'd use, is setup a zope zeo client, and use its
zopectl run function to execute a custom script in a cron job. direct
access to the application data with periodic out of process execution.

the marshall product isn't going to help out of the box here, in either
your doing the deserialization and mapping of element to attributes and
the content types, marshall is a framework for plugging in serializer /
deserializers, and is the basis for a forthcoming content import /
export system for plone.

for parsing xml, for rss feeds i'd recommend, the feedparser module, for
others, elementtree or lxml. there are a few python/xml object bindings,
but imho their overkill for this sort of thing.

cheers,

-kapil

Tom Lazar wrote:

> I have started pondering the very same problem while writing a proposal
> for a (potential) client. since I haven't got the job yet, I haven't
> dewelled any further so far, but having said that, I think:
>
> - cronjobbing and wgetting on the server seems unavoidable (and also
> not  really a problem ;-)
> - the problem will be to notify the plone instance that something has
> changed.
> - I'm thinking of using webdav via localhost to push the new file into
> the  zodb, as this would trigger the creation of a new content object.
> - I would then have to look into the practice of overriding the PUT
> method  of that ATCT and register it as a contenttype (if it's the only
> .xml you  will be dealing with it should be easy - could be tricky if
> you want to  deal with multiple types of xmls. (perhaps parsing for the
> DTD or similar  identifier?)
>
> if you're looking into a comfortable way of extracting the data from
> your  xml to create an ATCT instance you might want to look into
> amara[1] - not  exactly light-weight but very pythonic and very
> convenient ;-)
>
> hth, I'd be very interested in any progress you made in that field. you
> can also send me pmail at [hidden email]
>
> best regards,
>
> tom
>
> [1] http://uche.ogbuji.net/uche.ogbuji.net/tech/4suite/amara/
>
>
> On Fri, 14 Oct 2005 18:08:47 +0200, Gareth Adams
> <[hidden email]> wrote:
>
>> Hi,
>>
>> Our sister site produces a list of events as an XML feed refreshed
>> every  30
>> minutes, this includes static fields like 'id', and potentially dynamic
>> fields like 'ticketsremaining'.
>>
>> I'm trying to work out how I can set my site up to automatically
>> create/update content items (into a custom ATType) based on the XML
>> feed. I
>> can handle the logic and the deciding what fields to update or not, but I
>> have no idea how to go about the mechanics of carrying out the
>> update.  I'm
>> thinking:
>>
>> - The script will be triggered most likely by cron, will it need to
>> load  a
>> URL in a browser to do that, or can I use a simple standalone python
>> script?
>> (I'm guessing I can but I haven't delved into Plone/Python enough to know
>> what to import)
>>
>> - Content will be created from a local copy of the xml feed which will
>> probably be `wget`ted over from their server just before the script
>> is  run
>>
>> - Deleting created content won't be an issue as the xml file provides a
>> snapshot of forthcoming events, our site will also act as an archive
>>
>> Has anyone tried anything like this before, have any opinions whether
>> I've
>> got the right idea, or know for a fact that this won't work?
>>
>> Thank you, you brilliant people,
>> Gareth
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by:
>> Power Architecture Resource Center: Free content, downloads, discussions,
>> and more. http://solutions.newsforge.com/ibmarch.tmpl
>
>
>
>



-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Plone-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-users