(I’m only mostly joking)
GP's post is probably the "unityped" critique of dynamically typed languages by Robert Harper: https://existentialtype.wordpress.com/2011/03/19/dynamic-lan...
But dealing with Python infrastructure is so awful as to make the whole experience just bad.
uv fixes a lot of that, but I think it will be some time before it's used everywhere, and I have zero hope that the Python devs will ever do the right thing and officially replace Pip with uv.
> Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.
They're not to be used as a crypto digest. The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together. Apparently that hasn't been enough of an issue over the years to bother changing it. One advantage is that those values are common enough that it might break code that incorrectly expects hash() to have predictable outputs. If it used 0xffffffff as the value, lots of busted tests may never stumble across it.
Note that they'll only end up bucketed together in the underlying hashmap. The dictionary will still disambiguate the keys.
>>> a = -1
>>> b = -2
>>> {a: a, b: b}
{-1: -1, -2: -2}
A mitigating factor is that dict keys have to be hashable, which implies immutability, which generally implies a small number of native, fast types like str, numbers, or simple data structures like tuples. You could have some frozendict monstrosity as keys, but 1) that's not something you see often, and 2) don't do that. It you must, define a fast __eq__. For example, an object mapping to a database row might look like:
def __eq__(self, other):
if self.pk != other.pk:
return False # Fail quickly
# If that's true, only then do a deep comparison
return self.tuple_of_values == other.tuple_of_values
But again, that's just not really something that's done.2. It may be an argument for ensuring that absolutely everything that is an object can hash: the object hasher must not have error states. Nobody wants the overhead of a simple hash code being wrapped in a result type.
The pigeonhole principle says there are an infinite number of inputs with the same hash output. Trying to figure out how that happens can be fun and enlightening, but why? “Because it just does, that’s all.”
It's very much trying to go _under_ the abstraction layer to investigate its behavior. Because it's interesting.
This is very similar to how people investigate performance quirks or security issues.
I am not a Python expert, but...
Python is described as OO and I thought is is dynamically typed
Not quite "without any typing" but close
Strong versus weak is not super solidly defined as referring only to values and not variables.
Correct.
> Strong versus weak is not super solidly defined as referring only to values and not variables.
You're right. It's not super solidly defined as referring only to variables and not values, either. If dynamic typing is the same as weak typing, there'd be no point in having both phrases.
> Correct.
Is it still unclear with the clarification you didn't quote? I can't tell if I need to explain more or not.
> It's not super solidly defined as referring only to variables and not values, either.
I didn't mean to suggest it was.
> If dynamic typing is the same as weak typing, there'd be no point in having both phrases.
It's not the same. There's a bunch of things referred to as "weak".
I would call that weakly typed since there is no step before your code runs that validates (or even tries to validate) that your program satisfies a "type system". But since the definition of "strong" and "weak" typing has always been vague, I will just say that it is stupidly typed[1], and that I am a pedantic nerd.
1. https://danieltuveson.github.io/programming/languages/stupid...
Using both Python and mandatorily statically typed languages regularly professsionally (my main working languages being C#, Python, and a mix of JS and TS), that's not my experience at all.
(Of course, Python has a variety of optional static typecheckers with differing degrees of type inference, as well as supporting explicit type specifications; its not untyped unless you choose to use it that way.)
I'm pretty competent in Python, C, Java, JavaScript, assembly, and a number of other languages†, and my experience in Python is nothing like what you're describing, although I do have my own frustrations with it. It's possible that when you're more experienced you'll have a different perspective, but it sounds like you're stuck inside a pretty small bubble right now.
______
† Last year I also programmed in C++, Lua, C#, bash, Tcl, Emacs Lisp, Forth, Golang, OCaml, Scheme, Perl, Common Lisp, and the ngspice scripting language.
Yea, nah!
You cannot have both.
Particularly, there is a common definition of “typed” which is exactly equivalent to “statically typed” under which all so-called “dynamically typed” languages are actually “untyped”, and within that system strong and weak typing are either a meaningless distinction or one within the set of statically types languages.
There's also, of course, a common usage within which “dynamically typed” is meaningful and strong vs. weak typing is a separate distinction, usually mostly discussed within languages with dynamic typing, though languages like C being both statically typed and weakly typed has been discussed.
>>> "abc" + 321
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "int") to str
I rest my case.I program in assembly language. Memory is just a big array of bytes. Values don't have types, but each instruction chooses what type to interpret memory as - e.g. uint8, int16, float32, x86 real-mode pointer, long-mode pointer, etc.
That would mean languages like Tcl. Or, for that matter, B.
[0] https://en.wikipedia.org/wiki/BCPL
[1] https://en.wikipedia.org/wiki/BLISS
[2] https://en.wikipedia.org/wiki/Forth_(programming_language)
print(hash(2**61 - 2))
2305843009213693950
print(hash(2**61 - 1))
0
I can’t think of any other language where this kind of thing happens, which means other developers won’t expect it either.
I can see the bug report now: “certain records cause errors, occurs only for about one in a few billion.”
If you're adding a new type to the core python language then you have to be aware of this, but if you're hacking the C implementation to change the core language then you're probably pretty well versed in the cpython internals, or at least surrounded by people that are.
>>> class Foo:
def __hash__(self):
return -1
>>> f = Foo()
>>> print(hash(f))
-2
So if you're doing something where you have a custom __hash__ function that you're expecting to return -1 for a certain value and then are testing for value of the hash of an object rather than testing the property of the object directly, then this might bite you. But I cannot think of any reasonable case where you might want to do that.The standard (not custom) hash function, in CPython, which calls your custom method and returns the actual value CPython uses for bucketing, will return -2 in that case, though.
Which is necessary, so that, e.g., following the single mathematical function for hashing of numeric types published in the Python Language Reference will preserve x==y implies hash(x)==hash(y) for custom and build in numeric types in CPython where hash(-1) does not follow that pattern.
Even though for -1 itself, the divergence from the rule is in __hash__ itself. This is also consistent with the basic rule that dunder methods are for communication with the standard machinery, but external consumers should use the standard machinery, not the dunder method.
>>> -1 in d
False
What gives?
-- https://docs.python.org/3/reference/datamodel.html#object.__...
> The general contract of hashCode is:
> * If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.
> * It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
-- https://docs.oracle.com/en/java/javase/23/docs/api/java.base...
Well, you can also use an errno-like system. It has its own set of drawbacks as well, but it removes the "I need to reserve a sentinel value" problem.
In those situations, the application must begin by clearing errno to zero.
Then, it checks for a certain error value or values from the function which are ambiguous: they could be legit or indicate an error.
If one of those values occurs, and if errno is nonzero, then the error occurred.
This is how you deal with, for instance the strtol (string to long int) function. If there is a range error, strtol returns LONG_MIN or LONG_MAX. Those are also valid values in the range of long, but when no error has occurred, they are produced without errno being touched.
strtol can also return 0 in another error case, when the input is such that no conversion can be performed. ISO C doesn't require errno to be set to anything in this case, unfortunately. The case is distinguished from a legitimate zero by the original pointer to the string being stored in *endptr (if the caller specifies endptr that is not null).
ISO C and POSIX library functions do not reset errno to zero. They either leave it alone or set it to a nonzero value.
If you need to use the above trick and are working inside a C library function, you have to save the original errno value before storing a zero in it, and then put that value back if no error has happened.
This is the kind of crap ignorant C developers do and then pretend is the language's fault.
You can jump, you can return a reference to a complex type, a tuple, whatever.
This is just wrong. C doesn't feature 'error handling' as a dedicated form of branching the way many higher level languages do and you're in no way required to use return codes as an error signal. This is a case of bad API design and is entirely python's fault.
>>> class X:
... def __hash__(self): return -1
...
>>> hash(X())
-2
So, it ends up being a matter of optimizing for the common case, but still being reasonably performant for the worst case.
It just doesn't seem to have been identified as a real isssue, probably because the most commonly hashed external data is character strings. (For those, there are countermeasures like a properly scrambled hashing function, modified by a seed that can be randomized.)
>>> {i * sys.hash_info.modulus for i in range(5000)}
0.40s
>>> {i * sys.hash_info.modulus for i in range(50000)}
29.34s
>>> {i for i in range(50000)}
0.06s
What else is unmitigated shit in Python? :)