Python 进阶教程

首先一般而言,当前的 Python 内容实际上只需讨论 Python 3。Python 2 因为 众所周知的大问题 ,且已停止更新了。

Python 的一些官方网站

  • python 官方网站:https://www.python.org/
  • Anaconda:https://www.anaconda.com/ 、 https://anaconda.org/
    • Spyder:https://docs.spyder-ide.org/current/index.html
    • JupyterLab:https://jupyterlab.readthedocs.io/en/stable/
    • 清华大学开源软件镜像站:https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
  • Jupyter nbviewer:https://nbviewer.org/
  • Pycharm:https://www.jetbrains.com/zh-cn/pycharm/

Python 基础学习网站

Python 基础&进阶书籍

  • 《Python编程:从入门到实践》
    • 美国作者,这本书作为学习 Python 语言学习的入门书籍及中等提高类书籍非常合适。
    • 个人认为比曾经绝大多数人推荐的挪威作者《Python 基础教程》(当时推荐的时候还是第 2 版)书要好读得多。
  • 《量化投资以 Python 为工具》
    • 不要被书籍的名字骗了,这本书讲述的内容是最最最基础的 Python 及金融知识应用,完全可以作为初学 Python 数据分析的低年级同学使用。
  • 《Python 金融风险管理 FRM》
    • 基础篇:没有看过,不做评价
    • ※ 实战篇:推荐,既是学习金融衍生品初步知识的好书,又配套全了 Python 代码,适合在校学生学习使用 + 作为工具书备用。
  • 《机器学习及 Python 应用》
    • 作者是大名鼎鼎的陈强老师,曾写作《高级计量经济学及Stata应用》。
    • 但是这本书很明显,是陈强老师所带学生写作的书,并不是特别地深入。
  • To Be Continue...

Python爬虫:requests、Selenium+PhantomJS

requests

requests Example 1:Python爬取Konachan站图片

  • 详见:https://accelerator086.github.io/2020/03/28/Python-Konachan/

requests Example 2:Python爬取csgowallpaper站图片

  • 详见:https://github.com/Accelerator086/python-scrapy-csgowallpaper

Data Science:Numpy、Pandas、Scipy、Sympy

Numpy

  • numpy 主要用来处理矩阵或者列表类型的东西,熟练使用 numpy 的前提是熟练 python 的数据结构。
  • Official Website:https://numpy.org/
  • 教程:
    • Runoob:https://www.runoob.com/numpy/numpy-tutorial.html

Numpy-financial

  • https://numpy.org/numpy-financial/latest/
  • The financial functions in NumPy are deprecated and eventually will be removed from NumPy; see NEP-32 for more information. This package is the replacement for the deprecated NumPy financial functions.
  • The importable name of the package is numpy_financial. The recommended alias is npf.

Numpy Example 1

#coding:utf-8
# 1
a=[1,2]
print (a)

# 2
import numpy as np
a=[0,1,2,3]
print (np.max(a),np.mean(a))

np_ar=np.array([[1,2,3],[4,5,6]])
print (type(np_ar))
print (np_ar)
print (np_ar[:,:2])
np_ar[:,:2]=0
print (np_ar)
print (np_ar[:,:2].shape)

a=np.array(['1','2','3'])
print (a)
a_=a.astype('float_')
print (a_)

x=np.array([[0,1,2,3,4,5],\
[10,11,12,13,14,15],\
[20,21,22,23,24,25],\
[30,31,32,33,34,35],\
[40,41,42,43,44,45],\
[50,51,52,53,54,55]])
print (x)
print (x[0,3:5])
print (x[4:,4:])
print (x[:,2])
print (x[2::2,::2])
print (x[(0,1,2,3,4),(1,2,3,4,5)])
mask=np.array([1,0,1,0,0,1],dtype=np.bool)
print (x[mask,2])
print (np_ar>=2)
print (np_ar[np_ar>=2])
print (x.transpose())
print (x.reshape((2,18)))
print (np.fliplr(x))
print (np.flipud(x))

a=np.array([1,2,3,4,5])
print (np.log(a))
a=np.array([1,3,4,5,8])
print (np.diff(a))

# 3
import fractions
x=fractions.Fraction(1,2)
print (x)

a=np.array([[2,-2,-4],[-1,3,4],[1,-2,-3]])
b=np.array([0,0,0])
U,s,V=np.linalg.svd(a)
c=np.compress(s<1e-10,V,axis=0)
print (np.compress(s<1e-10,V,axis=0))

# 4
from numpy import random
#random.seed(675)
#不指定seed,则每次都不同的数值
print (random.rand(2,3))
print (random.randint(0,10))

print (random.binomial(n=5,p=0.5,size=5))
print (random.uniform(-1,1,5))
print (random.normal(size=(2,3)))
print (random.normal(loc=0,scale=5,size=(2,3)))

# 5
import numpy as np
a=np.array([[0,1,2,3,4,5],[10,11,12,13,14,15],[20,21,22,23,24,25],[30,31,32,33,34,35],[40,41,42,43,44,45],[50,51,52,53,54,55]])
print(a)
b=a[0,3:5]
print (b)
c=a[2::2,::2]
print (c)
d=a[(0,1,2,3,4),(1,2,3,4,5)]
print (d)
e=a[3:,[0,2,5]]
print (e)
mask=np.array([1,0,1,0,0,1],dtype=np.bool)
#布尔型,匹配提取true时的数字
f=a[mask,2]
print (f)
print (np.linalg.eig(a))

a=np.arange(15).reshape(3,5)
print(a)

b=np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(b.shape)

random.seed(666)
a=np.floor(10*np.random.random((2,2)))
b=np.floor(10*np.random.random((2,2)))
#print(a)
c=np.vstack((a,b))
d=np.hstack((a,b))
print(c,'\n',d)
e=np.concatenate((a,b),axis=0)
f=np.concatenate((a,b),axis=1)
print(e,'\n',f)

# 6
import numpy as np
a=np.arange(15).reshape(3,5)
print(a)

Pandas

  • 个人认为,pandasnumpy 的进阶版,而掌握了 pandas 后,基本上可以替代 numpy
  • Official Website:https://pandas.pydata.org/

Scipy

  • Scipy 基于 numpy ,同时又扩展 numpy 功能,也就是说能用 Scipy 就用 Scipy
  • Scipy 主要功能:线性代数、数值分析、最优化、统计描述……
  • Official Website: https://scipy.org/
  • Scipy Example 1: Numerical Methods in Finance
    • 详见 https://github.com/Accelerator086/Numerical-Methods-in-Finance-Using-Scipy/

Sympy

  • SymPy是一个符号计算的Python库。

Sympy Example 1

from sympy import *
x = symbols('x')
print(integrate(x*(tan(x)**2), x))

Visualization可视化:matplotlib、seaborn

Matplotlib

  • Official Website:https://matplotlib.org/

Matplotlib Example 1

import numpy as np
from numpy import random
import matplotlib.pyplot as plt

random.seed(666)

data_10=random.binomial(n=10,p=0.5,size=10000)
data_5=random.binomial(n=5,p=0.5,size=10000)
data_1=random.binomial(n=1,p=0.5,size=10000)

data=[data_10,data_5,data_1]
n_list=[10,5,1]
color=['#5dbe80','#2d9ed8','#efab40']
fig,ax=plt.subplots()
for row,c,n in zip(data,color,n_list):
ax.hist(row,color=c,label='$B($'+str(n)+',0.5'+'$)$')
ax.legend(loc='upper right',fontsize=10)
plt.axis('tight')
plt.grid('on')
plt.savefig('random binomial.svg',format='svg')
plt.show()
random binomial

在markdown中插入svg格式的图片:详见:

https://stackoverflow.com/questions/13808020/include-an-svg-hosted-on-github-in-markdown

seaborn

Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

  • Official Website:https://seaborn.pydata.org/

Machine Learning:Sklearn、Statsmodels、Linearmodels

Linear Regression 为两者的重合部分之一,不过需要注意的是 Statsmodels 包下,需要将 X 增加一列 1 以得到有截距项下的线性回归拟合,否则无线性回归截距项。预测(尤其是分 training set 和 test set 下)最好用 sklearn包。

Sklearn

  • sklearn 包主要用于 Machine Learning
  • Official User Guide:https://scikit-learn.org/stable/user_guide.html

Statsmodels

  • Statsmodels 包主要用来 Time Series Analysis
  • Official User Guide:https://www.statsmodels.org/stable/user-guide.html

linearmodels

  • Kevin Sheppard 推出的 Python 包 linearmodels 可以用来做金融计量,简单的比如说 FamaMacbeth、GMM之类的方法
    • 当然,这些可以试试去用 Stata 实现一下,详见 https://fintechprofessor.com/2017/12/10/fama-and-macbeth-1973-fastest-regression-in-stata/
    • Kevin Sheppard 还推出了许多非常有用的包:
      • Python: linearmodels、arch
      • MATLAB: MFE Toolbox
  • Official User Guide:https://bashtage.github.io/linearmodels/

Deep Learning:Keras、Tensorflow、Pytorch

Keras

Tensorflow

Pytorch

  • 某三中一华(没一华,因为学校被拉黑了)风险管理部用 Pytorch 做宏观预测、预警工作。【Deep Learning & Reinforcement learning】

Python与量化投资:akshare、QuantLib

此处声明:不会推荐Tushare。

akshare

  • Official User Guide:https://www.akshare.xyz/

QuantLib

Python Examples

练习题1:写一个分组的函数

根据一个M*N的array,将每一行的数据,计算其20%, 40%, 60%, 80%分位数

计算每一行,每一组0-20%, 20%-40%, 40-60%,60-80%,80-100%的数的平均值

产生一个100*1000的array的随机数,调用上面的分组函数,输出结果。

np.random.seed(1234)

np.random.randn(100,1000)

import numpy as np
np.random.seed(1234)

# np.nanpercentile(data,20,axis=1)
def sort(data):
list1=np.nanpercentile(data,20,axis=1)
list2=np.nanpercentile(data,40,axis=1)
list3=np.nanpercentile(data,60,axis=1)
list4=np.nanpercentile(data,80,axis=1)
percen=np.concatenate((list1,list2,list3,list4)).reshape(100,4)

means=np.zeros((np.size(data,axis=0),5),dtype=float,order='c')
for i in range(np.size(data,axis=0)):
means[i,0]=np.mean(datas[i][datas[i]<=np.nanpercentile(datas[i],20)])
means[i,1]=np.mean(datas[i][(datas[i]<=np.nanpercentile(datas[i],40)) & (datas[i]>=np.nanpercentile(datas[i],20))])
means[i,2]=np.mean(datas[i][(datas[i]<=np.nanpercentile(datas[i],60)) & (datas[i]>=np.nanpercentile(datas[i],40))])
means[i,3]=np.mean(datas[i][(datas[i]<=np.nanpercentile(datas[i],80)) & (datas[i]>=np.nanpercentile(datas[i],60))])
means[i,4]=np.mean(datas[i][datas[i]>=np.nanpercentile(datas[i],80)])

return percen,means


datas=np.random.randn(100,1000)
percen,means=sort(datas)
print(percen)
print(means)

或者更简洁的代码:

import numpy as np
np.random.seed(1234)
datas=np.random.randn(100,1000)
per=np.zeros((100,4))
data_mean=np.zeros((100,5))
def group(data,per,data_mean):
data.sort(axis=1)
for i in range(np.size(data,axis=0)):
for j in range(np.size(per,axis=1)):
per[i,j]=np.nanpercentile(data[i,:], 20*(j+1))
for k in range(np.size(data_mean,axis=1)):
data_mean[i,k]=data[i,int(((np.size(data,axis=1)/5)*k)):int(((np.size(data,axis=1)/5)*(k+1)))].mean()
return per,data_mean

per,data_mean=group(datas,per,data_mean)
print(per)
print(data_mean)

练习题2:写一个OLS回归的函数 (不能用包里面的自带函数)

计算OLS回归的系数(用矩阵的形式来写),t-值,R2

产生一个100*1的array的随机数当成y, 产生一个100*3的随机数当成x,输出系数,t-值和R2

np.random.seed(1234)

import numpy as np
np.random.seed(1234)
x=np.random.randn(100,3)
y=np.random.randn(100,1)

# 检验:statsmodels
import statsmodels.api as sm
x1=sm.add_constant(x) #如果想要输出无截距项的回归,可以去掉这一行
model=sm.OLS(y,x1)
result=model.fit()
print(result.summary())

# 检验:sklearn
from sklearn.linear_model import LinearRegression
linear_reg = LinearRegression().fit(x, y)
print(linear_reg.coef_)
print(linear_reg.intercept_)
print(linear_reg.score(x,y))

# 定义OLS计算
def OLS_funcs(x,y):
X=np.matrix(x);Y=np.matrix(y);theones=np.ones((np.size(x,axis=0),1))
x_counting=np.matrix(np.concatenate((theones,X),axis=1))
beta=(x_counting.T*x_counting).I*x_counting.T*Y #np.dot(np.dot(x_counting.T,x_counting).I,x_counting.T,Y) 下,x_counting是numpy.ndarray type
Y_esti=x_counting*beta;resi=Y-Y_esti;
RSS=resi.T*resi;TSS=(Y-Y.mean()).T*(Y-Y.mean());R2=1-RSS/TSS;
t_beta=[]
for k in range(np.size(x_counting,axis=1)):
se=np.sqrt(RSS/(np.size(x_counting,axis=0)-np.size(x_counting,axis=1))*np.linalg.inv(x_counting.T.dot(x_counting))[k,k])
t_beta.append((beta[k]/se)[0,0])
return beta,R2,t_beta
beta,R2,t_beta=OLS_funcs(x, y)
print(beta,R2,t_beta)

练习题3:三门问题

蒙提霍尔问题:https://zh.wikipedia.org/wiki/%E8%92%99%E6%8F%90%E9%9C%8D%E7%88%BE%E5%95%8F%E9%A1%8C

from numpy import random
#random.seed(675)

n_tests=10000
winning_doors=random.randint(0,3,n_tests)
change_mind_wins=0
insist_wins=0

for winning_door in winning_doors:
first_try=random.randint(0,3)
remaining_choices=[i\
for i in range(3)\
if i!=first_try]
wrong_choices=[i\
for i in range(3)\
if i!=winning_door]
if first_try in wrong_choices:
wrong_choices.remove(first_try)
screened_out=random.choice(wrong_choices)
remaining_choices.remove(screened_out)

changed_mind_try=remaining_choices[0]
change_mind_wins +=1 if changed_mind_try==winning_door else 0
insist_wins +=1 if first_try==winning_door else 0
print ('You win {1} out of {0} tests if you changed your mind\n'\
'You win {2} out of {0} tests if you insist on the initial choice'.format(n_tests,change_mind_wins,insist_wins))

Python 遗留代码

1

求列表中1串最长的开始和结束的位置

输入的例子可以包括

test = [0 0 0 0 0 0 0 0 0 0 0 0 0]

test = [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

test = [0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

test = [0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1]

#coding:utf-8

a=[0,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,0,1,1]

start=False
end=False

index_start=[]
index_end=[]
i=0

for v in a:
#print v,i
if v==1 and start ==False and end==False:
start =True
index_start.append(i)
end=False
if start and end ==False and v==0:
end =True
index_end.append(i)
start=False
i += 1

print (index_start,index_end)

2-1

class account:
def __init__(self,name,balance):
self.name=name
self.balance=balance

def deposit(self,amount):
self.balance += amount

def withdraw(self,amount):
if amount>self.balance:
print('余额不足,交易失败')
else:
self.balance -= amount

sam = account('Sam',1000)
sam.deposit(500)
sam.withdraw(1200)
sam.balance

2-2

class account:
def __init__(self,name,balance):
self.name=name
self.__balance=balance

def deposit(self,amount):
self.__balance += amount

def withdraw(self,amount):
if amount>self.__balance:
print('余额不足,交易失败')
else:
self.__balance -= amount

def get_balance(self):
return(self.__balance)

def set_balance(self,amount):
self.__balance = amount

sam = account('Sam',1000)
sam.deposit(500)
sam.withdraw(1200)
sam.get_balance

3-1

# 1
def sum( arg1, arg2 ):
# 返回2个参数的和."
total = arg1 + arg2
print ("函数内 : ", total)
return total

def range(total):
tell=total+1
return tell

# 调用sum函数
total = sum( 10, 20 );
print ("函数外 : ", total)
tell=range(total)
print(tell)

# 2
def describe_pet(pet_name,animal_type='dog'):
print('\nI have a '+animal_type+'.')
print('My '+animal_type+"'s name is "+ pet_name.title()+'.')
return

describe_pet(pet_name='willie')
describe_pet('willie')
describe_pet(pet_name='harry',animal_type='hamster')
describe_pet('harry','hamster')

# 3
def build_person(first_name,last_name,age=''):
first_name=first_name.title()
last_name=last_name.title()
person={'first':first_name, 'last':last_name}
if age:
person['age']=age
return person

musician=build_person('jimi','hendrix',27)
print(musician)

def build_profile(first, last,**user_info):
profile={}
profile['first name']=first
profile['last name']=last
for key,value in user_info.items():
profile[key]=value
return profile

user_profile=build_profile('albert','einstein',location='princeton',field='physics')
print(user_profile)

# 4
import sys
for i in sys.path:
print(i)
a='\n'
print(a)
for j in dir(sys):
print(j)
#print(sys.getdefaultencoding())


# 5
import this

3-2

#coding:utf-8

def my_print(args):
print (args)

def move(n, a, b, c):
my_print ((a, '-->', c)) if n==1 else (move(n-1,a,c,b) or move(1,a,b,c) or move(n-1,b,a,c))

move (3, 'a', 'b', 'c')


i = 0
while i < 10:
i += 1
if i%2 > 0: # 非双数时跳过输出
continue
#i += 1
print (i ) # 输出双数2、4、6、8、10


import random
s = int(random.uniform(1,10))
#print(s)
m = int(input('输入整数:'))
while m != s:
if m > s:
print('大了')
m = int(input('输入整数:'))
if m < s:
print('小了')
m = int(input('输入整数:'))
if m == s:
print('OK')
break


while 1:
s = int(random.randint(1, 3))
if s == 1:
ind = "石头"
elif s == 2:
ind = "剪子"
elif s == 3:
ind = "布"
m = input('输入 石头、剪子、布,输入"end"结束游戏:')
blist = ['石头', "剪子", "布"]
if (m not in blist) and (m != 'end'):
print ("输入错误,请重新输入!")
elif (m not in blist) and (m == 'end'):
print ("\n游戏退出中...")
break
elif m == ind :
print ("电脑出了: " + ind + ",平局!")
elif (m == '石头' and ind =='剪子') or (m == '剪子' and ind =='布') or (m == '布' and ind =='石头'):
print ("电脑出了: " + ind +",你赢了!")
elif (m == '石头' and ind =='布') or (m == '剪子' and ind =='石头') or (m == '布' and ind =='剪子'):
print ("电脑出了: " + ind +",你输了!")


import sys
import time

result = []
while True:
result.append(int(random.uniform(1,7)))
result.append(int(random.uniform(1,7)))
result.append(int(random.uniform(1,7)))
print (result)
count = 0
index = 2
pointStr = ""
while index >= 0:
currPoint = result[index]
count += currPoint
index -= 1
pointStr += " "
pointStr += str(currPoint)
if count <= 11:
sys.stdout.write(pointStr + " -> " + "小" + "\n")
time.sleep( 1 ) # 睡眠一秒
else:
sys.stdout.write(pointStr + " -> " + "大" + "\n")
time.sleep( 1 ) # 睡眠一秒
break
result = []


i = 1
while i:
j = 1
while j:
print (j, "*", i, " = ", i * j, ' ',)
if i == j:
break

j += 1
if j >= 10:
break

print ("\n")
i += 1
if i >= 10:
break


sequence = [12, 34, 34, 23, 45, 76, 89]
for i, j in enumerate(sequence):
print (i,j)


prime = []
for num in range(2,100): # 迭代 2 到 100 之间的数字
for i in range(2,num): # 根据因子迭代
if num%i == 0: # 确定第一个因子
break # 跳出当前循环
else: # 循环的 else 部分
prime.append(num)
print (prime)

for i in range(1,11):
for k in range(1,i):
print (k)
k +=1
i +=1
print ("\n")

i=1
while i:
j=1
while j:
print (j)
if i == j:
break
j += 1
if j>=10:
break
print ('\n')
i+=1
if i>=10:
break


num=[];
i=2
for i in range(2,100):
j=2
for j in range(2,i):
if(i%j==0):
break
else:
num.append(i)
print(num)


i=1
#j=1
while i<=9:
if i<=5:
print ("*"*i)

elif i<=9 :
j=i-2*(i-5)
print("*"*j)
i+=1
else :
print("")


dictionary = {}
flag = 'a'
pape = 'a'
off = 'a'
while flag == 'a' or 'c' :
flag = input("添加或查找单词 ?(a/c)")
if flag == "a" : # 开启
word = input("输入单词(key):")
defintion = input("输入定义值(value):")
dictionary[str(word)] = str(defintion) # 添加字典
print ("添加成功!")
pape = input("您是否要查找字典?(a/0)") #read
if pape == 'a':
print (dictionary)
else :
continue
elif flag == 'c':
check_word = input("要查找的单词:") # 检索
for key in sorted(dictionary.keys()): # yes
if str(check_word) == key:
print ("该单词存在! " ,key, dictionary[key])
break
else: # no
off = 'b'
if off == 'b':
print ("抱歉,该值不存在!")
else: # 停止
print ("error type")
break


import datetime
i = datetime.datetime.now()
print ("当前的日期和时间是 %s" % i)
print ("ISO格式的日期和时间是 %s" % i.isoformat() )
print ("当前的年份是 %s" %i.year)
print ("当前的月份是 %s" %i.month)
print ("当前的日期是 %s" %i.day)
print ("dd/mm/yyyy 格式是 %s/%s/%s" % (i.day, i.month, i.year) )
print ("当前小时是 %s" %i.hour)
print ("当前分钟是 %s" %i.minute)
print ("当前秒是 %s" %i.second)


# 可写函数说明
def changeme(mylist):
"修改传入的列表"
mylist.append([1, 2, 3, 4]);
print ("函数内取值: ", mylist)
return

# 调用changeme函数
mylist = [10, 20, 30];
print ("函数外取值2: ", mylist)
changeme(mylist);
print ("函数外取值: ", mylist)

import os
# 给出当前的目录
print (os.getcwd())


class Employee:
'所有员工的基类'
empCount = 0

def __init__(self, name, salary):
self.name = name
self.salary = salary
Employee.empCount += 1

def displayCount(self):
print ("Total Employee %d" % Employee.empCount)

def displayEmployee(self):
print ("Name : ", self.name, ", Salary: ", self.salary)

"创建 Employee 类的第一个对象"
emp1 = Employee("Zara", 2000)
"创建 Employee 类的第二个对象"
emp2 = Employee("Manni", 5000)
emp1.displayEmployee()
emp2.displayEmployee()
print ("Total Employee %d" % Employee.empCount)
print ("Employee.__doc__:", Employee.__doc__)
print ("Employee.__name__:", Employee.__name__)
print ("Employee.__module__:", Employee.__module__)
print ("Employee.__bases__:", Employee.__bases__)
print ("Employee.__dict__:", Employee.__dict__)

import re
print(re.match('www', 'www.runoob.com').span()) # 在起始位置匹配
print(re.match('com', 'www.runoob.com')) # 不在起始位置匹配

4:date-to-format

#coding:utf-8
months=['January',
'Febeuary',
'March',
'April',
'May',
'June',
'July',
'August',
'September',
'October',
'November',
'December']
endings=['st','nd','rd']+17*['th']+['st','nd','rd']+7*['th']+['st']

year=input('Year:')
month=input('Month(1-12):')
day=input('Day(1-31):')

month_number=int(month)
day_number=int(day)

month_name=months[month_number-1]
ordinal=day+endings[day_number-1]

print (month_name+' '+ordinal+', '+year)