Asrestore: Restore data from .asb files (AER-4539)

ldt

#1

Is it possible to get LDT data from aerospike backup? (exclude asrestore utility - it is not working)


#2

Hi

Please let us know whats not working in asrestore. We will help you in that.


#3

Hi, here a full log https://github.com/MaxJust/public-view/blob/master/README.md thanks in advanse


#4

Hi, Please try with single thread. If you see similar issue, is it possible to share one/part of your backup file so that we can root cause the asrestore issue with your LDT data?


#5

Can you plese tell me what does it mean try with single thread?


#6

Try single thread means run asrestore with “-t 1” as argument.

But From the asrestore error message it seems like you don’t have key of LDT as part of LDT data. So I need your backup LDT data to find the actual issue.


#7

How I can give a link to download archive (it is about 2Gb in lzo) only for you? maybe skype or email?


#8

Before sending, please check following in your source cluster from where you took your backup.

  1. Run following command to check whether your LDT data has “key” field as part of llist data.

execute llist.scan(‘bin_name’) on namespace_name.set_name where pk = ‘key_name’

2.If not then let us know how you inserted your LDT data. Because with out a key field its not allowed to insert LDT data.

3.For further analysis I dont need entire backup file. Just send me 20-50 lines of your backup file. My E-mail id is jyoti@aerospike.com .


#9

I think I found a problem (very strange behaviour). here the output:

corrupted record

aql> select * from der.queue where pk = 'knsrostov.ru'                
+----------------+---------------+------+
| sld            | LDTCONTROLBIN | urlq |
+----------------+---------------+------+
| "knsrostov.ru" |               |      |
+----------------+---------------+------+
1 row in set (0.001 secs)

aql> execute llist.scan('urlq') on der.queue where pk = 'knsrostov.ru'
Error: (100) /opt/aerospike/sys/udf/lua/ldt/ldt_common.lua:751: 1422:LDT-Sub Record Open Error

and when I try to get size or add | remove records, commands executing succesfully. here example:

aql> execute llist.add('urlq', 'test') on der.queue where pk = 'knsrostov.ru'                 
+-----+
| add |
+-----+
| 0   |
+-----+
1 row in set (0.001 secs)

aql> execute llist.size('urlq') on der-queue where pk = 'knsrostov.ru'    
+-------+
| size  |
+-------+
| 33220 |
+-------+
1 row in set (0.000 secs)

aql> execute llist.remove('urlq', 'test') on der.queue where pk = 'knsrostov.ru'
+--------+
| remove |
+--------+
| 0      |
+--------+
1 row in set (0.000 secs)

aql> execute llist.size('urlq') on der.queue where pk = 'knsrostov.ru'
+-------+
| size  |
+-------+
| 33219 |
+-------+
1 row in set (0.001 secs)

but when I try to execute a scan I get an error:

aql> execute llist.scan('urlq') on der.queue where pk = 'knsrostov.ru'
Error: (100) /opt/aerospike/sys/udf/lua/ldt/ldt_common.lua:751: 1422:LDT-Sub Record Open Error

Some records have this error, some records vorks fine as expected, return data on scan command. What wrong? How I can repear and/or prevent it


#10

I scan all records, and I have 165 corrupted LTD with follow errors:

/opt/aerospike/sys/udf/lua/ldt/ldt_common.lua:751: 1422:LDT-Sub Record Open Error

other 17 000 records is ok. Each corrupted records have from 1000 to 30000 subrecords…

Only 2 questions:

  • Is it posiible to recovery it (how) ?
  • How to prevent my data to exclude this situation in future?

#11

Max83,

Which version of the server are you using ?

Did you have any cluster view change event like node going down / new node added / rolling upgrade from the version <3.6.1

– R


LDT Data corruption after aerospike restart (AER-4539)
#12

I start collecting data in version 3.6.1 , some times ago I will upgrade to 3.6.2. Problem come after server restart.

P/S: I have one node, so node quantity was not chaged.


#13

@max83,

Thanks. We have captured this issue in JIRA ticket AER-4539.