Here is the description of the performance tests and results for the re-engineered BDII.

All the following tests were conducted on a dual 1GHz Intel Pentium III machine with 512Mb of Ram. All the tests involve gathering information from an information provider script and pushing the data into the LDAP database. The information providers scripts were created by doing ldapsearches on the LCG2 production grid and writing this output to a file. The information provider script would then print out the contents on the file thus simulating the real information in the grid information system. The reason why the ldapsearch was not used directly is due to the varying time delays that occur when querying an MDS based grid information system.
 
Three different entry points to the grid information were used.

The top level: one file of 1.8Mb.
The regional level: three files, 780k, 823k and 217k.
The GRIS level: 29 files, ce 71k and se 5k.

All the three entry points produced the same 1.8M of information, 658 ldap entries for 24 sites.

For each test two times were measured and recorded. The first and usually largest time is how long it takes to add all the entries to an empty database. The second time is how long it takes to do and update on that database.

The top level:        20s 7s
The regional level:  29s 7s
The GRIS level:      16s 9s

Whist simultaneously querying the database using 5 streams.

The top level:       21s 12s
The regional level:  40s 28s
The GRIS level:      20s 15s

Whist simultaneously querying the database using 10 streams.

The top level:       24s 16s
The regional level:  50s 39s
The GRIS level:      24s 17s



26/05/2004

Whist in production, the BDII started to show problems when the number of sites was approaching 50. It was noticed that the information being returned by an ldapsearch was not always consistent. The performance test was run,

For 50 sites with no query load.
The GRIS level 235s 62s

As the cron job was running every two mins, this mean that for 50% of the time the BDII is updating. While this is happening, the incoming queries are queued until that can be completed. When the queue is two large new queries are rejected. There is an attribute in the slapd.conf file that can be used to specified the number of queries that can be queued, conn_max_pending. This was increased an an investigation was done to try and improve the speed of the update.

There were two areas which were slowing down the BDII. The addition of a new node and the updating of the DB. The addition of a new node was slow due to the amount of calls that were being made to the DB. To resolve this problem the dn were sorted by length, shortest fist. This would ensure that the parent would always be created before the child. The resulting list could be added to the DB in one call.  The second problem was that all the forked process that queries the LDIF sources were simultaneously trying to write to the DB. To improve this the forked processed wrote the output to a file and the main process would read all these file back and create one ldif source. The update would then be done with one command. Instead of writing to a file a named pipe could be used to increase performance even more. 

It was noticed during the restructuring that a query load would significantly slow down the DB update even if it was done in one call. The circumvent this problem two DBs are used. One for read and one for write. When the write DB has been updated, the DB files are move to the read DB. This is done with both slapds stopped and a firewall is temporarily used to pause the new incoming queries. 

For 54 sites with now query load.

Time for searches: 5 s
Time to create nodes: 3 s
Time to update DB: 2 s 


Note: The time for the searches depends on the underlying speed of response from MDS.

dbnosync 


500 Sites Scalibilty Test. 

The aim of this test was to see if the BDII would work with 500 sites. A script was made that would create an ldif file (17.1Kb) that contained example information from a site. The ldif file created will have a UniqueID. Another script was used to start 50 slapd servers on one machine. Each slapd was populated using the "ldapadd" command and the ldif files. 10 machines were set up in this way and the result was 500 slapd servers with each slapd server containing the same information but with a different UniqueID. A BDII was set up and 1 machine was added at a time. The results are in the table below.


Sites	Search	Read/Sort	Add	Modify	Total
50	1	0		3	2	6
100	2	0		5	4	11
150	4	1		8	7	20
200	5	1		10	10	26
250	6	2		13	12	33
300	7	2		16	15	40
350	9	3		18	17	47
400	10	3		22	20	55
450	11	4		25	24	64
500	13	4		27	25	69