티스토리 뷰

SWDesk

[Python] Sentence Similarity

inhae 2021. 6. 29. 10:28

뉴스 제목, 기사 내용 등 문장의 유사도 분석을 위한 파이썬 소스 코드

 

def CheckSimilarity01():
	import konlpy
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.metrics.pairwise import cosine_similarity
	import matplotlib.pyplot as plt
	import seaborn as sns

	excel1 = cBExcel()
	filePathname = "./TestData/NN_친환경 제조업.xlsx"
	sheetName = "Test01"
	data1 = excel1.LoadData(filePathname, sheetName)
	titlesDF1 = data1['Title']
	titleList =[]
	twitterList = []
	for title1 in titlesDF1:
		titleList.append(title1)
		okt = konlpy.tag.Okt()
		twitter_nouns = ' '.join(okt.nouns(str(title1)))
		print(twitter_nouns)
		twitterList.append(twitter_nouns)

	tfidf_vectorizer = TfidfVectorizer(min_df = 1)
	tfidf_matrix_twitter = tfidf_vectorizer.fit_transform(twitterList)

	similarity = cosine_similarity(tfidf_matrix_twitter, tfidf_matrix_twitter)
	print(similarity)
	index1 = -1
	for title1 in titleList:
		index1 += 1
		similarity1 = similarity[index1][index1+1:]
		index2 = index1
		for sim1 in similarity1:
			print("[sim1]", sim1)
			index2 += 1
			if sim1>0.1:
				print(titleList[index2])

	# cosSize, rowSize = similarity.size

	plt.rc('font', family='Gothic')
	sns.heatmap(similarity, xticklabels=titleList, yticklabels=titleList, cmap='viridis')
	plt.show()

<분석 결과>

반응형

'SWDesk' 카테고리의 다른 글

[Python] Download Google Images  (0) 2021.06.29
[Python] Sharing Google Spreadsheet  (0) 2021.06.29
[Python] 카카오톡 메시지 보내기  (0) 2021.06.28
[Python] Google Spreadsheet  (0) 2021.06.26
[BMS] Request and Command String to Slave  (0) 2021.06.21
반응형
250x250
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2025/01   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함