Fit/Transform does not give top n matches

I was using get_matches() to get top 5 matches. Now, since moving to production thought of using Fit/Predict but seems it returns only top first matches for each item. Is there any other way to get top 5 matches in Fit/Predict

I am matching current text notes (non-semantic long text) with historical ones. Historical data will be large in lakhs. So, to make code more efficient planning to pass historical text notes in fit and current text notes in transform. Planning to retrain it monthly.

### Sample Current/Input Data:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/Shalini.M/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/Shalini.M/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>

</style>
</head>

<body link="#0563C1" vlink="#954F72">


ID | Alloc_No | Text_Notes
-- | -- | --
2354657 | 78 | RHJ…..//32456hjfg//vkcmEGHJJJYMM
4354657 | 35 | TFHGDVASFHC4636587//5748UJKNM
345676 | 889 | WUSERHIFKDJVN//23475//IUOSJDFGKV
34747586 | 57 | YWEIHFDSK//2435467//WEKSFDHLV
465768 | 3777 | 324TYVHBJN//435465//HUJNKHJKN



</body>

</html>

### Sample Reference/Historical Data:

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/Shalini.M/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/Shalini.M/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<style>

</style>
</head>

<body link="#0563C1" vlink="#954F72">


ID | Alloc_No | Text_Notes
-- | -- | --
2354657 | 78 | RHJ…..//32456hjfg//vkcmEGHJJJYMM
4354657 | 35 | TFHGDVASFHC4636587//5748UJKNM
345676 | 889 | WUSERHIFKDJVN//23475//IUOSJDFGKV
34747586 | 57 | YWEIHFDSK//2435467//WEKSFDHLV
465768 | 3777 | 324TYVHBJN//435465//HUJNKHJKN
2354657 | 78 | RHJ…..//32456hjfg//vkcmEGHJJJYMM
4354657 | 35 | TFHGDVASFHC4636587//5748UJKNM
345676 | 889 | WUSERHIFKDJVN//23475//IUOSJDFGKV
34747586 | 57 | YWEIHFDSK//2435467//WEKSFDHLV
465768 | 3777 | 324TYVHBJN//435465//HUJNKHJKN



</body>

</html>


### Sample Code:

### Old Code using get_matches():
 
# Passing reference historical text notes to "to_list"
 
to_list = Ref.Text_Notes.to_list()
 
for i in range(0,Input.shape[0]):
 
    # Passing the new text notes one by one to get similarity score for all reference items and then get top 5 from it
 
    from_list=[]
    from_list.append(Input.Text_Notes[i])
    #print(to_list)
    model = PolyFuzz("TF-IDF").match(from_list, to_list)
    matches=model.get_matches().sort_values(by='Similarity',ascending=False)
    matches1=pd.merge(matches,Ref,left_index=True, right_index=True)
    dict1=matches1[['ID','Similarity','Alloc_No','From','To']].to_dict('index')
    list1=list(dict1.items())[:5]
    dict2.update({Input['ID'][i]: list1})
dict2
 
 
### New Changed Code for Production using Fit/Transform:
 
# Fit the reference historical text notes
# Frequency - monthly
 
from_list = Ref.Text_Notes.to_list()
model = PolyFuzz("TF-IDF")
model.fit(from_list)
model.save("TF-IDF")
 
# Match the new text notes
# Frequency -  Daily
 
dict2 ={}
for i in range(0,Input.shape[0]):
    to_list=[]
    to_list.append(Input.Text_Notes[i])
    model = PolyFuzz.load("TF-IDF")
    matches=model.transform(to_list)
    print(matches)
dict2
 
Now the issue is in transform i don't get similarity score for all reference rather only top 1 match whereas I need top 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit/Transform does not give top n matches #81

Sample Current/Input Data:

Sample Reference/Historical Data:

Sample Code:

Old Code using get_matches():

Passing reference historical text notes to "to_list"

New Changed Code for Production using Fit/Transform:

Fit the reference historical text notes

Frequency - monthly

Match the new text notes

Frequency - Daily

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ID	Alloc_No	Text_Notes
2354657	78	RHJ…..//32456hjfg//vkcmEGHJJJYMM
4354657	35	TFHGDVASFHC4636587//5748UJKNM
345676	889	WUSERHIFKDJVN//23475//IUOSJDFGKV
34747586	57	YWEIHFDSK//2435467//WEKSFDHLV
465768	3777	324TYVHBJN//435465//HUJNKHJKN

Fit/Transform does not give top n matches #81

Description

Sample Current/Input Data:

Sample Reference/Historical Data:

Sample Code:

Old Code using get_matches():

Passing reference historical text notes to "to_list"

New Changed Code for Production using Fit/Transform:

Fit the reference historical text notes

Frequency - monthly

Match the new text notes

Frequency - Daily

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions