正文
Python笔记(读取txt文件中的数据)
小程序:扫一扫查出行
【扫一扫了解最新限行尾号】
复制小程序
【扫一扫了解最新限行尾号】
复制小程序
在机器学习中,常常需要读取txt文本中的数据,这里主要整理了两种读取数据的方式
数据内容
- 共有四列数据,前三列为特征值,最后一列为数据标签
409208.3269760.9539523
144887.1534691.6739042
260521.4418710.8051241
7513613.1473940.4289641
383441.6697880.1342961
7299310.1417401.0329551
359486.8307921.2131923
4266613.2763690.5438803
674978.6315770.7492781
3548312.2731691.5080533
方式一:手动读取
from numpy import *
import operator
from os import listdirdef file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines()) #get the number of lines in the file
returnMat = zeros((numberOfLines,3)) #prepare matrix to return
classLabelVector = [] #prepare labels return
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVectordataMat,dataLabel = file2matrix("datingTestSet2.txt")print(dataMat, dataLabel)
方式二:使用pandas
import numpy as np
import pandas as pd
df_news = pd.read_table('datingTestSet2.txt',header = None)
df_news
详细可以查看下面文档
- pandas官方文档:https://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.read_table.html
- pandas.core.frame.DataFrame:https://blog.csdn.net/daydayup_668819/article/details/82315565