CPython Data Types

Python's Data Types
CSCI 1470

by K. Yue

1. Python Data Type

In Python, data have types that determine:
1. What are their legitimate values?
2. What functions can be applied on the values?
3. How are the values stored?

Example: for

i = 32

Legitimate value: any integer that can be held by available memory.
Functions: e.g., + , -, max, ...

Functions that can be used: e.g., len(i), i[2], i.upper(), ...

Storage: the value is stored as binary number, together with other information, such as sign, storage length, etc.

How are the data type int stored?

Java:

It is always stored at a fixed length of 4 Bytes.
The range of legitimate int values: [-2,147,483,648 (which is -2^31), 2,147,483,647 (which is 2^31 - 1)].
An int value out of this range will raise an out of range error. (e.g., i = 2147483647 * 2;)
It is fast and efficient.

Python:

It has a flexible length and can store a very large int value.
For example, sys.getsizeof(x) returns the storage size of x in Bytes. In Python 3.13, we have this example:

>>>import sys
>>>sys.getsizeof(1)
28
>>>sys.getsizeof(8783783748374838473847398478348937483748324932487394)
48

It is not as fast but very flexible and easy to use.

Python is dynamically typed.
- The data is determined during runtime dynamically.
- No need of type declaration. Data types are inferred.
- The type of a variable can be changed during runtime. To be more exact, a variable can be assigned to a different object with a different type.

Example:

In contrast, Java is statically typed. For an int i:
1. It must be declared before using. E.g., int i;
2. The variable i cannot be assigned to a value of another data type.
3. The variable i will remain as a 4 Byte int during its lifetime.

1.1 Important Python Data Types

The following diagram shows the most common data types in Python:

Python has many built-in types/classes as shown in the table below. We will cover the one emphasized in red italic font. You do not need to know all of them now. We will cover them in details later.

Some major built-in types/classes in Python:

Numeric Types:
int: For integer numbers.

float: For floating-point numbers.

complex: For complex numbers.
Sequence Types:
list: Mutable ordered sequences of items.

tuple: Immutable ordered sequences of items.

str: Immutable sequences of Unicode characters (strings).

bytes: Immutable sequences of bytes.

bytearray: Mutable sequences of bytes.

range: a range for iteration.
Mapping Type:
dict: Unordered collections of key-value pairs.
Set Types:
set: Mutable unordered collections of unique items.

frozenset: Immutable unordered collections of unique items.
Other Essential Built-in Types/Classes:
bool: Represents Boolean values (True and False), a subclass of int.

object: The base class for all classes in Python; every class implicitly inherits from object.

type: Represents the type of an object; classes themselves are instances of type.

NoneType: Represents the None object, indicating the absence of a value.

In this class, we use the terms type and class interchangeably. There are actually very subtle differences between them.

For comparison, Java:

has primitive data types (store values only): including various number data types and char.
has non-primitive data types (store references to storage): including array, string, class, etc.
do not support built-in data types such as dictionary, set, tuple, list and range.

1.2 Type Classifications

1.2.1 Primitive versus Composite Type

Primitive (atomic) data types represent a single atomic value.
- Examples: int, float, bool, NoneType.
Composite data types are collections of data items that can be accessed individually.
- Examples: str, list, tuple, dict, set.

Example:

# Primitive/atomic types
i = 1
pi = 3.14159265
t = True
s = None

# Composite types:
l = [1,2,3,4] # a list.
l[0]
1
m = (4,5,6,7) # a tuple
m[0]
4

1.2.2 Mutable vs Immutable Types (advanced topic)

There are relative merits of mutable and immutable types. Performance is a key consideration. We will not go into details in this class.
Mutable data types are objects whose values can be modified in place after creation.
Operations on mutable objects can change their internal state without creating a new object in memory.
Common built-in immutable types covered in this class
1. int
2. float
3. str
4. tuple
5. bool
Immutable data types are objects whose values cannot be changed after creation.
Operations modifying an immutable object create a new object.
Common built-in mutable types to be covered in this class.
1. list
2. set
3. dict

2. Internal Representations In Bits and Bytes

Data are stored as bits and bytes in computer systems.

Example:

A byte may contain 00101010.

Since bits are long and confusing (just 0 and 1), hexadecimal digits are used to represent 4 bits.
Hexadecimal numbers use a base of 16:
- 0-9: 0-9
- A (or a): 10
- B (or b): 11
- C: 12
- D: 13
- E: 14
- F: 15
In contrast, decimal numbers use a base of 10.

Example:

Binary: 101010 -> Hexadecimal: 2A
Binary: 1110111101 -> Hexadecimal: 3BD

Conversion of binary value to hexadecimal value:

Pad the binary value with leading 0 to make groups of 4 bits. E.g., 1110111101 -> 001110111101.
Convert each group from left to right to hexadecimal. E.g.,
1. 0011 -> 3
2. 1011 -> B
3. 1101 -> D
Concatenate the result from right to left. E.g., 0011 + 1011 + 1101 -> 3BD

bin_to_hex.py: converts an input binary value to a hexadecimal value. Please download and play with it. You do not need to fully understand the program at this time.

Conversion of hexadecimal value to binary value:

Convert each hexadecimal digit to 4 bit binary value.
Concatenate the result.

hex_to_bin.py: converts an input hexadecimal value to a binary value. Please download and play with it. You do not need to fully understand the program at this time.

Conversion of decimal values to binary and hexadecimal values

You do not need to fully understand the following algorithm.

Algorithm Convert(i, b)
     Convert the decimal non-negative integer i to a integer string of the base b (b=2: binary, b=16: hexadecimal)
Output: the integer string in base b.
Steps:
[1] If i == 0 return '0'
[2] curr_num <- i
[3] result <- ''
[4] while (curr_num > 0)
   [4.1] remainder <- curr_num % b
   [4.2] digit <- the char symbol representing remainder in the base b.
   [4.3] result <- result + digit
   [4.4] curr_num <- curr_num // digit
[5] return result

dec_to_bin_hex.py: converts an input decimal value to a binary string and a hexadecimal string. Please download and play with it. You do not need to fully understand the program at this time.

Example:

Three common data types:

integer: an integer
float: a number in which the decimal point can float around. E.g., 234929.703125, 2.34929703125, 2349297031.25.
string: a sequence of characters.

Example:

The Byte 00101010:

Interpreted as an integer: 42
Interpreted as a floating point number (after padding with leading 0's to make 4 bytes): 5.885453550164232 * 10^-44.
Interpreted as a string: '*'

encoding_3.py: input a hexadecimal number and show how it may be interpreted as integer, float and UTF-8 strings. Please download and play with it. You do not need to fully understand the program at this time.

Example:

3. Common types

3.1 int

Some int operators:
1. Arithmetic
2. Assignment
3. comparison
4. bitwise operation

int_1.py:

#   int types
#   int arithmetic operators.
a = 20
b = 3
print(f"a: {a}")
print(f"b: {b}")
print(f"a+b: {a+b}")
print(f"a-b: {a-b}")
print(f"a*b: {a*b}")
print(f"a/b: {a/b}")
print(f"a//b: {a//b}")
print(f"a%b: {a%b}")
print(f"a**b: {a**b}")

c = a/b
print(f"type(a): {type(a)}")
print(f"type(b): {type(b)}")
print(f"type(c): {type(c)}")

#   int assignments.
x = 4
print(f"x: {x}")
x += 2
print(f"after x+= 2; x: {x}")
x -= 3
print(f"after x-= 3; x: {x}")
x *= 6
print(f"after x*= 6; x: {x}")

x %= 4
print(f"after x%=4; x: {x}")

#   Comparisons operators
p = 5
q = 7
print(f"p > q: {p > q}")
print(f"p >= q: {p >= q}")
print(f"p == q: {p == q}")
print(f"p < q: {p < q}")
print(f"p <= q: {p <= q}")

#   bitwise operation.
s = 5 # binary: 0101
t = 3 # binary: 0011
print(f"s: {s}; binary: {s:08b}")
print(f"t: {t}; binary: {t:08b}")
print(f"s & t: {s & t}; binary: {s&t:08b}") # bitwise and
print(f"s | t: {s | t}; binary: {s|t:08b}") # bitwise or
print(f"s ^ t: {s ^ t}; binary: {s^t:08b}") # bitwise exclusive or
print(f"s<<1: {s<<2}; binary: {s<<2:08b}") # shift left 1 bits.

3.2 float

Floating point numbers have a decimal point that floats. E.g. 3.13, 313.0, 31.300013, 3.1e-08 (i.e. 3.1 * 10 ** -08), etc.
Floats are generally implemented using the C double type: 8 Bytes.
Conforms to the IEEE 754 double-precision binary floating-point standard (binary64).
Floating numbers may have precision problems and rounding errors.

Example: In IDLE:

>>> 1/3
0.3333333333333333
>>> 0.1 + 0.57
0.6699999999999999
>>> 0.1 + 0.2
0.30000000000000004

If fixed point numbers are needed, use the Python module 'decimal'.

Some float operators (similar to int):

Arithmetic
Assignment
comparison

There is no bitwise operator for float.

3.3 str

A string liter may use ' or ".
Triple quoted strings support multi-line string literals.
\ is the escape character in string literals.
f strings are string literals with a 'f' before the string. They support the special syntax of {expression_to_be_evaluated} in which the expression expression_to_be_evaluated will be evaluated to be included in that location.
The string data type will be covered in more details later.

3.4 list

A Python list is a built-in data structure used to store an ordered collection (or container) of items (or elements).
Important characters:
1. Ordered: [1,2] is not the same as [2,1]. It is known as a sequence data type.
2. Index based: an individual element can be accessed by an integer starting with 0.
3. Can contain duplicate elements: e.g., [1,2,2,3,2,2,3].
4. Elements may have different types: e.g., [1, 'hello', 1.89e40, [3+10, 'world']]
5. Mutable: a list object can be updated.
  - More specifically, a list is an object whose state or value can be changed after their creation, without creating a new object in memory.
Lists will be covered in more details later.

Example:

list_1.py: download and try it out. Use the techniques in the program to tinker and explore lists.

# some basic list examples.
l = [10,20,30]
m = [30,20,10]

print(f"l: {l}")
print(f"m: {m}")
print(f"m[0]: {m[0]}")
print(f"m[1]: {m[1]}")
print(f"m[2]: {m[2]}")
print(f"l==m: {l==m}")
print(f"l==[10,20,30]: {l==[10,20,30]}")

# no need to know for the moment: lexicographical comparison on lists.
print(f"l>m: {l>m}")
print(f"m>l: {m>l}")

# append: add to the end.
print("""l.append(20)
l.append(30)
l.append(20)""")
l.append(20)
l.append(30)
l.append(20)
print(f"l: {l}")
print(f"len(l): {len(l)}")
print(f"l.count(20): {l.count(20)}")
print(f"l.reverse(): {l.reverse()}")

print(f"l+m:{l+m}")

p = [1, 'hello', 1.89e40, [3+10, 'world']]

print(f"p: {p}")
print(f"p[0]: {p[0]}")
print(f"p[1]: {p[1]}")
print(f"p[2]: {p[2]}")
print(f"p[3]: {p[3]}")
print(f"p[3][0]: {p[3][0]}")
print(f"p[3][1]: {p[3][1]}")

q = [10,20,30]
print("lists are mutable")
print(f"q: {q}")
print(f"id(q): {id(q)}")
print("q.pop()")
q.pop()
print(f"q: {q}")
print(f"id(q): {id(q)}")
print("int are immutable")
i = 1
print(f"i: {i}")
print(f"id(i): {id(i)}")
i = 20
print(f"i: {i}")
print(f"id(i): {id(i)}")

3.5 tuple

Tuples are like lists: ordered, indexed, heterogenous (elements can be of different types)
Tuples use () instead of [].
However, tuples are immutable.
Tuples are used for:
1. A sequence that should not be changed. Advantages: integrity and performance.
2. A tuple can be used as a key in a dictionary.

Example:

tuple_1.py: download and try it out. Use the techniques in the program to tinker and explore tuples.

# some basic tuple examples.
l = (10,20,30)
m = (30,20,10)

print(f"l: {l}")
print(f"m: {m}")
print(f"m[0]: {m[0]}")
print(f"m[1]: {m[1]}")
print(f"m[2]: {m[2]}")
print(f"l==m: {l==m}")
print(f"l==[10,20,30]: {l==[10,20,30]}")

# no need to know for the moment: lexicographical comparison on lists.
print(f"l>m: {l>m}")
print(f"m>l: {m>l}")

""" Methods changing lists do not exist for tuples. E.g.,
# append: add to the end.
print(l.append(20)
l.append(30)
l.append(20))
l.append(20)
l.append(30)
l.append(20)
"""

print(f"l: {l}")
print(f"len(l): {len(l)}")
print(f"l.count(20): {l.count(20)}")

print(f"l+m:{l+m}")

p = (1, 'hello', 1.89e40, (3+10, 'world'))

print(f"p: {p}")
print(f"p[0]: {p[0]}")
print(f"p[1]: {p[1]}")
print(f"p[2]: {p[2]}")
print(f"p[3]: {p[3]}")
print(f"p[3][0]: {p[3][0]}")
print(f"p[3][1]: {p[3][1]}")

q = (10,20,30)
print("tuples are immutable")
print(f"q: {q}")
print(f"id(q): {id(q)}")

print(
"""ql = list(q)
ql.reverse()
ql.append(90)
q = tuple(ql)""")
print("If really, really need to change a tuple, convert it to a list.")
ql = list(q)
ql.reverse()
ql.append(90)
q = tuple(ql)
print(f"q: {q}")
print(f"id(q): {id(q)}")

3.6 bool

The boolean type of Python expresses truth values.
It only have two literals: True and False.
It is used in conditional and iteration statements when conditions are needed.
bool is a subtype of int.
Logical operators: and, or, not.
The logical operators and and or use short-circuit evaluations. If the result is known after the first argument, the evaluation of the second argument is skipped.
In "cond_1 and cond_2", cond_1 can serve as a guard for cond_2. If cond_1 is false, cond_2 will not be evaluated.

Example:

bool_1.py: download and try it out. Use the techniques in the program to tinker and explore the bool type.

#   some basic bool examples.

x = True
y = False

print(f"x: {x}")
print(f"y: {y}")

print(f"not(x): {not(x)}")
print(f"x and y: {x and y}")
print(f"x and not y: {x and not y}")
print(f"x or y: {x or y}")

print("Short circuit evaluations")
li = [10,20]
print(f"li[1]: {li[1]}")
if li[1]:
    print(f"li[1]: true")
else:
    print(f"li[1]: true")

k = 4
print(f"k: {k}")

try:
    if li[k]:
        print(f"li[k]: true")
except IndexError:
    print("Index out of range error for li[k]")

try:
    if x or li[k]:
        print(f"x or li[k]: true")
    else:
        print(f"x or li[k]: false")
except IndexError:
    print("Index out of range error for 'x or li[k]'")

try:
    if x and li[k]:
        print(f"x and li[k]: true")
    else:
        print(f"x and li[k]: false")
except IndexError:
    print("Index out of range error for 'x and li[k]'")

try:
    if y and li[k]:
        print(f"y and li[k]: true")
    else:
        print(f"y and li[k]: false")
except IndexError:
    print("Index out of range error for 'y and li[k]'")

try:
    if y or li[k]:
        print(f"y or li[k]: true")
    else:
        print(f"y or li[k]: false")
except IndexError:
    print("Index out of range error for 'y or li[k]'")

3.7 NoneType and None

None is a special constant representing the absence of a value or a null value in Python.
None is of the type NoneType.
NoneType has only one value/object: None.
None behaves likes False in a Boolean context. If a bool value is needed, None is interpreted as False.
There are good use cases of None which will not be covered now.

Advanced topic (not in examination):

None or Null are handled differently in different languages:

In Java, null is a special literal representing a missing value of a reference to objects or reference types. It is not available for primitive types, such as int and float. For example, if i is an integer, i = null is not allowed.

In SQL, null is a value/marker that can be used in all data types. Thus, if x is null, you can compare it to an integer, such as 'x < 3', which will return null.

In Python, None is an object of the type NoneType. Thus, if x is None, comparison such as 'x < 3' raise a TypeError exception.

3.8 dict

A Python dictionary (dict) is a mutable, unordered collection of key-value pairs.
An English dictionary maps a word (key) to its meaning (value).
Dictionaries and maps have the same meaning in many languages.
Hash tables (or hashes) are techniques for implementing dictionaries.
Dictionaries are examples of abstract data types (ADT).
ADT is a conceptual model for a data structure that defines its behavior and operations, independent of any implementation details.
Syntax of dictionaries use {}: {key1: value1, key2: value2, ...}
The key of a dict must be an immutable object.

Example:

dict_1.py: download and try it out. Use the techniques in the program to tinker and explore dict.

#   some basic dict examples.

d1 = {}
print(f"d1: {d1}")
print(f"id(d1): {id(d1)}")

print(
"""d1['a'] = 100
d1[2] = 'hello, world'""")

d1['a'] = 100
d1[2] = 'hello, world'
print(f"d1: {d1}")
print("d1['a'] = [10,20,30]")
print(f"d1: {d1}")
print(f"d1['a']: {d1['a']}")

print(
"""d1['a'] = 50
d1[1.33] = 90
d1['hello'] = 3.1415""")

d1['a'] = 50
d1[1.33] = 90
d1['hello'] = 3.1415
print(f"d1: {d1}")

#   KeyError: key does not exist in the dict.
try:
    print(f"d1['world']: {d1['world']}")
except KeyError:
    print("key error for d1['world']")

#   Use the get() to avoid raising KeyError if the key does not exist.
print(f"d1.get('a'): {d1.get('a')}")
print(f"d1.get('world'): {d1.get('world')}")

#   key exists?
print(f"'hello' in d1: {'hello' in d1}")
print(f"'world' in d1: {'world' in d1}")

print(f"id(d1): {id(d1)}")

d2 = {'a': 100, 'b': 'hello', 'c': 1.25, 'd': 40}
print(f"d2: {d2}")

# setdefault: set a key to a value only if it does not already exist.
print(f"d1.get('e'): {d1.get('e')}")
print("d2.setdefault('e', 'orange juice')")
d2.setdefault('e', 'orange juice')
print(f"d1.get('e'): {d1.get('e')}")

print(f"d1.get('a'): {d1.get('a')}")
print("d2.setdefault('a', 'apple juice')")
d2.setdefault('a', 'apple juice')
print(f"d1.get('a'): {d1.get('a')}")

#   remove a key-value pair.
print(f"d2: {d2}")
print("del d2['c']")
del d2['c']
print(f"d2: {d2}")