five.intid hashes aren't always stable

classic Classic list List threaded Threaded
18 messages Options
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

five.intid hashes aren't always stable

Hi.

five.intid includes a key reference definition in its
KeyReferenceToPersistent class and specifies their hash as:

def __hash__(self):
    return hash((self.dbname, self.object._p_oid, ))

This hash is used as the primary key for many of the internals in five.intid.

I had so far assumed, that this hash would be stable. With the same
input arguments the hash function should create the same hash again.
Since both poid and database_name of an object are persistent and
never changing, I concluded it to be possible to recreate the intid
again.

While actually trying this for a site, I ran into a problem. My
development machine was using a 64bit Python, whereas the production
machine was using a 32bit Python.

Python's internal hash function uses the entire available integer
space. It produces different results on 32 and 64 bit systems for the
same input arguments.

One potential problem this might create is in the following scenario:
You are working on a demo site and prepare some content on your local
laptop running a 64bit Python. You copy over the database to a
production machine running a 32bit Python. At this point you can get
OverflowErrors from the intid internals, as it has integers that don't
fit in the 32bit range.

This situation is somewhat artificial. I'm still wondering if it would
be better to use a different hashing algoritm here, which is not
suspect to 32/64 bit changes or any other code optimizations that
might occur in different Python versions.

Hanno

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Martin Aspeli Martin Aspeli
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On 30 July 2010 23:15, Hanno Schlichting <[hidden email]> wrote:

> Hi.
>
> five.intid includes a key reference definition in its
> KeyReferenceToPersistent class and specifies their hash as:
>
> def __hash__(self):
>    return hash((self.dbname, self.object._p_oid, ))
>
> This hash is used as the primary key for many of the internals in five.intid.
>
> I had so far assumed, that this hash would be stable. With the same
> input arguments the hash function should create the same hash again.
> Since both poid and database_name of an object are persistent and
> never changing, I concluded it to be possible to recreate the intid
> again.
>
> While actually trying this for a site, I ran into a problem. My
> development machine was using a 64bit Python, whereas the production
> machine was using a 32bit Python.
>
> Python's internal hash function uses the entire available integer
> space. It produces different results on 32 and 64 bit systems for the
> same input arguments.
>
> One potential problem this might create is in the following scenario:
> You are working on a demo site and prepare some content on your local
> laptop running a 64bit Python. You copy over the database to a
> production machine running a 32bit Python. At this point you can get
> OverflowErrors from the intid internals, as it has integers that don't
> fit in the 32bit range.
>
> This situation is somewhat artificial. I'm still wondering if it would
> be better to use a different hashing algoritm here, which is not
> suspect to 32/64 bit changes or any other code optimizations that
> might occur in different Python versions.

What would the BBB implications be of changing this now?

Martin

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On Fri, Jul 30, 2010 at 5:54 PM, Martin Aspeli <[hidden email]> wrote:
> On 30 July 2010 23:15, Hanno Schlichting <[hidden email]> wrote:
>> This situation is somewhat artificial. I'm still wondering if it would
>> be better to use a different hashing algoritm here, which is not
>> suspect to 32/64 bit changes or any other code optimizations that
>> might occur in different Python versions.
>
> What would the BBB implications be of changing this now?

The main problem I can think of is collisions. But these can already
occur today.

Say you have a production database on a 32bit system and migrate it
over to a 64bit system. Now add some new objects and generate intids
for them. At this point you can get the same integer value for a new
object, that an existing different object already got before. I
haven't studied the actual algorithm in detail to determine the
likelihood of this. [1]

Any change of algorithm faces the same problem. As we are using the
full unconstrained integer range here (and for performance reasons it
should continue to be all integers) we would likely need to migrate
all existing intids to the new algorithm.

Hanno

[1] The relevant parts appear to be here (the code is identical in
Python 2.4 to 2.7):

PyObject_Hash in
http://svn.python.org/projects/python/branches/release26-maint/Objects/object.c
tuplehash in http://svn.python.org/projects/python/branches/release26-maint/Objects/tupleobject.c
string_hash in http://svn.python.org/projects/python/branches/release26-maint/Objects/stringobject.c
int_hash http://svn.python.org/projects/python/branches/release26-maint/Objects/intobject.c

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On Fri, Jul 30, 2010 at 8:46 PM, Hanno Schlichting <[hidden email]> wrote:
> The main problem I can think of is collisions. But these can already
> occur today.

Thinking about this some more, it should even be possible to get
collisions inside the same environment. Once you use intid in a
multi-database environment you should be able to get the same intid
for different combinations of database name and poid. The algorithm is
a fairly simple multiplication with some added static constants in it.

The same problems applies to zope.keyreference where this code was taken from.

Hanno

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

Il 30/07/2010 20:53, Hanno Schlichting ha scritto:

> On Fri, Jul 30, 2010 at 8:46 PM, Hanno Schlichting<[hidden email]>  wrote:
>    
>> The main problem I can think of is collisions. But these can already
>> occur today.
>>      
> Thinking about this some more, it should even be possible to get
> collisions inside the same environment. Once you use intid in a
> multi-database environment you should be able to get the same intid
> for different combinations of database name and poid. The algorithm is
> a fairly simple multiplication with some added static constants in it.
>
> The same problems applies to zope.keyreference where this code was taken from.
>    

  Knuth, where are you today? :)

Knuth (1962) "The Art of Computer Programming"
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming

The_Internet_Developer (2010) "The Art of Lazy Programming"



------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi Alexander Limi
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On Sun, Aug 1, 2010 at 11:48 PM, Yuri <[hidden email]> wrote:
Il 30/07/2010 20:53, Hanno Schlichting ha scritto:
> On Fri, Jul 30, 2010 at 8:46 PM, Hanno Schlichting<[hidden email]>  wrote:
>
>> The main problem I can think of is collisions. But these can already
>> occur today.
>>
> Thinking about this some more, it should even be possible to get
> collisions inside the same environment. Once you use intid in a
> multi-database environment you should be able to get the same intid
> for different combinations of database name and poid. The algorithm is
> a fairly simple multiplication with some added static constants in it.
>
> The same problems applies to zope.keyreference where this code was taken from.
>

 Knuth, where are you today? :)

Knuth (1962) "The Art of Computer Programming"
http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming

The_Internet_Developer (2010) "The Art of Lazy Programming"

intid wasn't designed to be unique, it was designed to be fast and usable in certain situations.

plone.uuid should give us what we need on that front — PLIPed for inclusion in 4.1.

--
Alexander Limi · http://limi.net

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Alexander Limi · http://limi.net

yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

Il 02/08/2010 10:25, Alexander Limi ha scritto:

> On Sun, Aug 1, 2010 at 11:48 PM, Yuri <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Il 30/07/2010 20:53, Hanno Schlichting ha scritto:
>     > On Fri, Jul 30, 2010 at 8:46 PM, Hanno
>     Schlichting<[hidden email] <mailto:[hidden email]>>  wrote:
>     >
>     >> The main problem I can think of is collisions. But these can
>     already
>     >> occur today.
>     >>
>     > Thinking about this some more, it should even be possible to get
>     > collisions inside the same environment. Once you use intid in a
>     > multi-database environment you should be able to get the same intid
>     > for different combinations of database name and poid. The
>     algorithm is
>     > a fairly simple multiplication with some added static constants
>     in it.
>     >
>     > The same problems applies to zope.keyreference where this code
>     was taken from.
>     >
>
>      Knuth, where are you today? :)
>
>     Knuth (1962) "The Art of Computer Programming"
>     http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming
>
>     The_Internet_Developer (2010) "The Art of Lazy Programming"
>
>
> intid wasn't designed to be unique, it was designed to be fast and
> usable in certain situations.
>
> plone.uuid <http://dev.plone.org/plone/browser/plone.uuid> should give
> us what we need on that front — PLIPed for inclusion in 4.1
> <http://dev.plone.org/plone/ticket/10778>.


Can we include also a unique - autoincrementing atomic counter? This is
a common feature. For example, Poi uses:

      def _renameAfterCreation(self, check_auto_id=False):
         parent = self.getTracker()
         maxId = 0
         for id in parent.objectIds():
             try:
                 intId = int(id)
                 maxId = max(maxId, intId)
             except (TypeError, ValueError):
                 pass
         newId = str(maxId + 1)
         # Can't rename without a subtransaction commit when using
         # portal_factory!
         transaction.savepoint(optimistic=True)
         self.setId(newId)

This can be lead to various errors, problems (database conflict with
many issuers), archetypes calling three times the same code on renaming,
portal_factory dependencies  and so on.

Having the zodb a transaction aware mechanism, it should be trivial, I
suppose.

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Martin Aspeli Martin Aspeli
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On 2 August 2010 16:38, Yuri <[hidden email]> wrote:

> Can we include also a unique - autoincrementing atomic counter? This is
> a common feature. For example, Poi uses:

Not in that PLIP we can't. Sorry. Unrelated functionality.

>      def _renameAfterCreation(self, check_auto_id=False):
>         parent = self.getTracker()
>         maxId = 0
>         for id in parent.objectIds():
>             try:
>                 intId = int(id)
>                 maxId = max(maxId, intId)
>             except (TypeError, ValueError):
>                 pass
>         newId = str(maxId + 1)
>         # Can't rename without a subtransaction commit when using
>         # portal_factory!
>         transaction.savepoint(optimistic=True)
>         self.setId(newId)
>
> This can be lead to various errors, problems (database conflict with
> many issuers), archetypes calling three times the same code on renaming,
> portal_factory dependencies  and so on.
>
> Having the zodb a transaction aware mechanism, it should be trivial, I
> suppose.

Any central counter is going to have the same problem, which is why
you don't build central counters like that. ;)

Martin

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

Il 02/08/2010 10:42, Martin Aspeli ha scritto:

>>       def _renameAfterCreation(self, check_auto_id=False):
>>          parent = self.getTracker()
>>          maxId = 0
>>          for id in parent.objectIds():
>>              try:
>>                  intId = int(id)
>>                  maxId = max(maxId, intId)
>>              except (TypeError, ValueError):
>>                  pass
>>          newId = str(maxId + 1)
>>          # Can't rename without a subtransaction commit when using
>>          # portal_factory!
>>          transaction.savepoint(optimistic=True)
>>          self.setId(newId)
>>
>> This can be lead to various errors, problems (database conflict with
>> many issuers), archetypes calling three times the same code on renaming,
>> portal_factory dependencies  and so on.
>>
>> Having the zodb a transaction aware mechanism, it should be trivial, I
>> suppose.
>>      
> Any central counter is going to have the same problem, which is why
> you don't build central counters like that. ;)
>    

  Relational Databases do it from ages, I think.

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by yuri-2
On Mon, Aug 2, 2010 at 10:38 AM, Yuri <[hidden email]> wrote:
> Can we include also a unique - autoincrementing atomic counter? This is
> a common feature.

Global incrementing counters are a bad idea.

If your application code really needs an incrementing counter, most of
the time you can use a BTree.Length instance. This has proper conflict
resolution. In case of conflict errors, you don't have a guarantee to
get all values assigned, but that's usually a trade-off that is
acceptable.

The counter on the catalog tool uses such a construct for example.

Hanno

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

Il 02/08/2010 10:49, Hanno Schlichting ha scritto:

> On Mon, Aug 2, 2010 at 10:38 AM, Yuri<[hidden email]>  wrote:
>    
>> Can we include also a unique - autoincrementing atomic counter? This is
>> a common feature.
>>      
> Global incrementing counters are a bad idea.
>
> If your application code really needs an incrementing counter, most of
> the time you can use a BTree.Length instance. This has proper conflict
> resolution. In case of conflict errors, you don't have a guarantee to
> get all values assigned, but that's usually a trade-off that is
> acceptable.
>
> The counter on the catalog tool uses such a construct for example.
>    
http://www.zope.org/Products/CMF/CMF-2.0.0-alpha/CHANGES.txt

CMFUid/UniqueIdGeneratorTool.py: Replaced the old BTree.Length.Length
implementation by a simple counter. Using a BTree.Length.Length object
as counter may have caused setting the same unique id to multiple
objects under high load. The tools counter gets automigrated on the
first access. This is a forward port from CMF-1_5-branch before the CMF
1.5.2 release.

NOW -> http://pypi.python.org/pypi/Products.CMFUid

class UniqueIdGeneratorTool(UniqueObject, SimpleItem):

     """Generator of unique ids.

     This is a dead simple implementation using a counter. May cause
     ConflictErrors under high load and the values are predictable.
     """

     implements(IUniqueIdGenerator)

     id = 'portal_uidgenerator'
     alternative_id = 'portal_standard_uidgenerator'
     meta_type = 'Unique Id Generator Tool'

     security = ClassSecurityInfo()

     security.declarePrivate('__init__')
     def __init__(self):
         """Initialize the generator
         """
         # The previous ``BTrees.Length.Length`` implementation may cause
         # double unique ids under high load. So for the moment we just use
         # a simple counter.
         self._uid_counter = 0

     security.declarePrivate('__call__')
     def __call__(self):
         """See IUniqueIdGenerator.
         """
         # For sites that have already used CMF 1.5.1 (and older) the
         # BTrees.Length.Length object has to be migrated to an integer.
         if isinstance(self._uid_counter, Length):
             self._uid_counter = self._uid_counter()
         self._uid_counter += 1
         return self._uid_counter

     security.declarePrivate('convert')
     def convert(self, uid):
         """See IUniqueIdGenerator.
         """
         return int(uid)

InitializeClass(UniqueIdGeneratorTool)
registerToolInterface('portal_uidgenerator', IUniqueIdGenerator)

Mmmmm, something is not clear here...

Maybe a good idea to do it in C? At least it should work! :)

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Wichert Akkerman Wichert Akkerman
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by yuri-2
On 8/2/10 10:45 , Yuri wrote:

> Il 02/08/2010 10:42, Martin Aspeli ha scritto:
>>>        def _renameAfterCreation(self, check_auto_id=False):
>>>           parent = self.getTracker()
>>>           maxId = 0
>>>           for id in parent.objectIds():
>>>               try:
>>>                   intId = int(id)
>>>                   maxId = max(maxId, intId)
>>>               except (TypeError, ValueError):
>>>                   pass
>>>           newId = str(maxId + 1)
>>>           # Can't rename without a subtransaction commit when using
>>>           # portal_factory!
>>>           transaction.savepoint(optimistic=True)
>>>           self.setId(newId)
>>>
>>> This can be lead to various errors, problems (database conflict with
>>> many issuers), archetypes calling three times the same code on renaming,
>>> portal_factory dependencies  and so on.
>>>
>>> Having the zodb a transaction aware mechanism, it should be trivial, I
>>> suppose.
>>>
>> Any central counter is going to have the same problem, which is why
>> you don't build central counters like that. ;)
>>
>
>    Relational Databases do it from ages, I think.

Not in high end systems for the same reason. For big systems it is
common to assign ranges of ids to nodes instead of having a central counter.

Wichert.

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by yuri-2
On Mon, Aug 2, 2010 at 11:00 AM, Yuri <[hidden email]> wrote:
> NOW -> http://pypi.python.org/pypi/Products.CMFUid

You could also the conflict resolution optimized version of this from
https://svn.enfoldsystems.com/trac/public/browser/enfold.fixes/trunk/src/enfold/fixes/cmfuid.py

In order to sustain really high write rates Enfold had to patch almost
all of the "global counter" implementations we have.

Hanno

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

Il 02/08/2010 11:11, Hanno Schlichting ha scritto:

> On Mon, Aug 2, 2010 at 11:00 AM, Yuri<[hidden email]>  wrote:
>    
>> NOW ->  http://pypi.python.org/pypi/Products.CMFUid
>>      
> You could also the conflict resolution optimized version of this from
> https://svn.enfoldsystems.com/trac/public/browser/enfold.fixes/trunk/src/enfold/fixes/cmfuid.py
>
> In order to sustain really high write rates Enfold had to patch almost
> all of the "global counter" implementations we have.
>
> Hanno
>    

Hence we need a unique "global counter" implementation, fixed, well
written and that works.
The code above is far from my understanding, sorry, I cannot get how the
conflict resolution fix works. I think it is something related with
using the prefix _v_.

Thanks for all the info and suggestions, it has been very interesting :)

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Hanno Schlichting-4 Hanno Schlichting-4
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by Wichert Akkerman
On Mon, Aug 2, 2010 at 11:10 AM, Wichert Akkerman <[hidden email]> wrote:
> Not in high end systems for the same reason. For big systems it is
> common to assign ranges of ids to nodes instead of having a central counter.

Which happens to be the same strategy the ZODB uses to assign poid's.
Each store connection gets a pool of poids to assign to new objects :)

Hanno

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Laurence Rowe Laurence Rowe
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by Wichert Akkerman
On 2 August 2010 10:10, Wichert Akkerman <[hidden email]> wrote:

> On 8/2/10 10:45 , Yuri wrote:
>> Il 02/08/2010 10:42, Martin Aspeli ha scritto:
>>>>        def _renameAfterCreation(self, check_auto_id=False):
>>>>           parent = self.getTracker()
>>>>           maxId = 0
>>>>           for id in parent.objectIds():
>>>>               try:
>>>>                   intId = int(id)
>>>>                   maxId = max(maxId, intId)
>>>>               except (TypeError, ValueError):
>>>>                   pass
>>>>           newId = str(maxId + 1)
>>>>           # Can't rename without a subtransaction commit when using
>>>>           # portal_factory!
>>>>           transaction.savepoint(optimistic=True)
>>>>           self.setId(newId)
>>>>
>>>> This can be lead to various errors, problems (database conflict with
>>>> many issuers), archetypes calling three times the same code on renaming,
>>>> portal_factory dependencies  and so on.
>>>>
>>>> Having the zodb a transaction aware mechanism, it should be trivial, I
>>>> suppose.
>>>>
>>> Any central counter is going to have the same problem, which is why
>>> you don't build central counters like that. ;)
>>>
>>
>>    Relational Databases do it from ages, I think.
>
> Not in high end systems for the same reason. For big systems it is
> common to assign ranges of ids to nodes instead of having a central counter.

The implementation in
http://svn.zope.org/zope.intid/trunk/src/zope/intid/__init__.py is
scalable and correct - intids are assigned sequentially on each thread
at a random position in the keyspace.

Using a Btrees.Length.Length() is just bound to cause problems, while
there is conflict resolution to keep the length value correct, it is
not an atomic counter and there is nothing to ensure the *values* are
non-unique.

plone.uuid is solving a different problem, a general universally
unique id (replacing AT's poorly named UID() method). IntIds are
useful as catalogue keys. Eventually they should be used by
portal_catalog for its rid, so results from the various catalogues can
be combined.

Laurence

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
Martin Aspeli Martin Aspeli
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

On 2 August 2010 17:56, Laurence Rowe <[hidden email]> wrote:

>>>    Relational Databases do it from ages, I think.
>>
>> Not in high end systems for the same reason. For big systems it is
>> common to assign ranges of ids to nodes instead of having a central counter.
>
> The implementation in
> http://svn.zope.org/zope.intid/trunk/src/zope/intid/__init__.py is
> scalable and correct - intids are assigned sequentially on each thread
> at a random position in the keyspace.
>
> Using a Btrees.Length.Length() is just bound to cause problems, while
> there is conflict resolution to keep the length value correct, it is
> not an atomic counter and there is nothing to ensure the *values* are
> non-unique.
>
> plone.uuid is solving a different problem, a general universally
> unique id (replacing AT's poorly named UID() method). IntIds are
> useful as catalogue keys. Eventually they should be used by
> portal_catalog for its rid, so results from the various catalogues can
> be combined.

The bigger question is all of this thread is - why?

zope.intid uses ints for performance reasons. For any other reason, a
sequential int id is not al that interesting: it's not stable across
databases, not stable across export/import, and not useful in any
human-oriented manner (since the numbers will quickly get very big and
seem arbitrary).

Yuri was talking about sequential ids in Poi, which is a wholly
different thing, since the point there is to have a per-folder counter
(each container starts from 0). That's something else again.

Martin

------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers
yuri-2 yuri-2
Reply | Threaded
Open this post in threaded view
|

Re: five.intid hashes aren't always stable

In reply to this post by Laurence Rowe
Il 02/08/2010 11:56, Laurence Rowe ha scritto:

> On 2 August 2010 10:10, Wichert Akkerman<[hidden email]>  wrote:
>    
>> On 8/2/10 10:45 , Yuri wrote:
>>      
>>> Il 02/08/2010 10:42, Martin Aspeli ha scritto:
>>>        
>>>>>         def _renameAfterCreation(self, check_auto_id=False):
>>>>>            parent = self.getTracker()
>>>>>            maxId = 0
>>>>>            for id in parent.objectIds():
>>>>>                try:
>>>>>                    intId = int(id)
>>>>>                    maxId = max(maxId, intId)
>>>>>                except (TypeError, ValueError):
>>>>>                    pass
>>>>>            newId = str(maxId + 1)
>>>>>            # Can't rename without a subtransaction commit when using
>>>>>            # portal_factory!
>>>>>            transaction.savepoint(optimistic=True)
>>>>>            self.setId(newId)
>>>>>
>>>>> This can be lead to various errors, problems (database conflict with
>>>>> many issuers), archetypes calling three times the same code on renaming,
>>>>> portal_factory dependencies  and so on.
>>>>>
>>>>> Having the zodb a transaction aware mechanism, it should be trivial, I
>>>>> suppose.
>>>>>
>>>>>            
>>>> Any central counter is going to have the same problem, which is why
>>>> you don't build central counters like that. ;)
>>>>
>>>>          
>>>     Relational Databases do it from ages, I think.
>>>        
>> Not in high end systems for the same reason. For big systems it is
>> common to assign ranges of ids to nodes instead of having a central counter.
>>      
> The implementation in
> http://svn.zope.org/zope.intid/trunk/src/zope/intid/__init__.py is
> scalable and correct - intids are assigned sequentially on each thread
> at a random position in the keyspace.
>    

     def _generateId(self):
         """Generate an id which is not yet taken.

         This tries to allocate sequential ids so they fall into the
         same BTree bucket, and randomizes if it stumbles upon a
         used one.
         """
         nextid = getattr(self, '_v_nextid', None)
         while True:
             if nextid is None:
                 nextid = self._randrange(0, self.family.maxint)
             uid = nextid
             if uid not in self.refs:
                 nextid += 1
                 if nextid>  self.family.maxint:
                     nextid = None
                 self._v_nextid = nextid
                 return uid
             nextid = None


this one is correct?

How could I implement a unique incremental counter in my code? Can you
provide an example or a pointer? Thanks :)

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
Plone-developers mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/plone-developers