Tuesday, 28 July 2015

Week 9

Salut! This week was a bitter-sweet one. It was immensely frustrating and gratifying at the same time. I spent a large part of the week looking for hard to find bugs and fixing them. I'll attempt to list these nasty creatures below:
  • The first one relates to a mistake in the construction of the list r in same_charclass? in the slope = 0 case. A ',' (comma) was used for combining two elements in the list but cons should have been used instead.
  • A couple of more changes related to computing unsafe factors in factor_newton and avoiding repeated computations in factor_minmult1 were rendered redundant based on the final modification (more on it in the last point).
  • I also fixed the order of multiplication in the adjoint case in try_factorization. Earlier, the inverse of the leading coefficient of srl was used as the left operand in the multiplication, though it should have been the right one.
  • Operators passed as arguments to same_charclass? are now made monic  before further computation is performed on them. This avoids the errors arising from cases where the operators have different leading coefficients.
  • I figured out that I was passing the wrong singularity and factor to factor_minmult1 and hence I changed the code so that factor_minmult1
    is no longer used. I now believe that the factorizer should work as far as the methods from chapters 2 and 3 are concerned.
"If you want to shine like a sun. First burn like a sun." -- Dr. APJ Abdul Kalam.

Wednesday, 22 July 2015

Week 8

Bonjour! This week, I modified factor_global to incorporate the remarks made regarding it on the mailing list. This involved adding infinity as a singularity only if it was known that infinity was not a regular point of the operator. A new function inf_singularity? was used for the same. Other changes included using rootOf instead of zerosOf for computing the roots of a factored polynomial and using already known factors instead of computing the lcm of denominators and then factoring. compute_bound was fixed to not ignore ramified exponential parts and compute the trace correctly. Further, code corresponding to the v'(e_i-e_j) term was also added. try_factorization(2) was altered to take advantage of the routine used to solve the Hermite-Padé problem posted by Waldek on fricas-devel. I also implemented an exported function to compute generalized exponents. It takes an operator and a point p as inputs (if p ~= 0, then it is moved to 0 and the result has x-p or 1/x if p = infinity in place of x). The output is a list of equivalence class of generalized exponents up to conjugation over Q((x)) [ecs, ecr, ect] where ecs is written in terms of ecr=ect.

This week, I intend to fix the remaining problems in the current code, if any, and move on to implementing the eigenring  related functions as described in chapter 5 of the van Hoeij thesis. The interesting thing about this method is that it works best for cases that are difficult to factorize via the methods in chapters 2 & 3, i.e, the routines implemented so far.

The next update will be posted (hopefully) in a weeks time ;)

Tuesday, 14 July 2015

Week 7

As mentioned in the previous post, this week I worked on extending factor_global which is basically the main function that is required to be implemented as the routine which is exported by the package, factor is like a wrapper. To do this the Padé approximation functions were added including lift_pade2, valuation_pade2, smaller_pade2?, wipe_pade2 and smallest_pade2. Currently, these do not use guessHolo as suggested on the mailing list, but will soon be changed. Also, this enabled me to complete try_factorization and try_factorization2. I also implemented the linchpin of this project, factor. However, it is not all rosy as calls to factor don't end for most input operators, but the function does work with the powers of D as arguments (maybe more, but I only tested with them). try_factorization takes as input a local factor, the maximal order to look for, Newton polygon bound, point of singularity, the global operator to be factored and the extra singularities bound and produces as output a global factorisation or "failed". It is made use of in factor_minmult1 and factor_global. factor_minmult1 takes a local factor R with multiplicity 1 and results in a factorization of f. If f is returned unfactored, then it is irreducible (if the specified bound is correct).

This week, I intend to straighten out the kinks remaining  in my code to ensure that the methods from chapters 2 and 3 of the van Hoeij thesis are correctly implemented. This Sunday marks the second milestone point of my project.

"May the force be with you" -- Star Wars.

Thursday, 9 July 2015

Week 6

Sorry for the delay. There was a problem with the internet connection. Coming back to the task at hand, this week I wrote the documentation for the factor_* functions and their helpers. Now, all the subroutines related to factorization in the k((x))[xD] domain are complete. The small addition that I mentioned about last week regarding the splitting of coefficients in a basis of the algebraic expression in lift_rightfactor was also finished. There was a problem remaining in factor_riccati when the "split over k((x))" option was used but, that was resolved via a patch too OREPCTO.spad submitted by Waldek on the mailing list. I also implemented the same_charclass?, l_p and compute_bound functions. The same_charclass? function tests whether two irreducible operators have the same exponential parts. l_p(f) produces the localisation of f at the point x = p as mentioned in Section 3.3.4 of the van Hoeij thesis. compute_bound returns the bound for the number of extra singularities in a first order right factor. This week I intend to extend the factor_global routine which is already a work in progress and write the functions required for the same.

"May the odds be ever in your favor." -- Suzanne Collins.

Wednesday, 1 July 2015

Week 5

I'm truly having a blast this summer doing this project! Speaking of which, the project has reached its halfway and the mid-term evaluation are upon us.

Coming back to business, this week I implemented the coprime index > 1 factorization algorithm that takes a first order right hand factor with coefficients in an algebraic extension of k((x)) and produces an irreducible right hand factor belonging to k((x))[xD]. This involved the writing down of two functions, make_rightfactor and lift_rightfactor. The lift_rightfactor routine required the use of undetermined coefficients. Two options for representing the same were mentioned on the mailing list. One possibility was to use Polynomial(US) but this approach could be problematic as the domain performs a lot of comparisons of its coefficients to zero while doing its internal calculations. So, it was decided to go with Vector(US). Another decision was regarding what method to use while solving the resulting system of linear equations. Gaussian elimination might have had problems with finding pivots, so Cramer's rule was used instead. A small addition regarding splitting of coefficients in a basis of the algebraic extension is still to be finished in lift_rightfactor, after which I'll start with the global factorization subroutines.

"May the stars watch over you."  -- Christopher Paolini.

Tuesday, 23 June 2015

Week 4

Hope you have been taking good care of yourself! As mentioned in the previous post, the first thing I did last week was to resolve the bug in the function implementing step one of the pipeline, listream_coefs. So, now all the functions related to factorization via the Newton method work as expected. Hence, I added documentation for them.
I also wrote an initial version of the factor_riccati and the factor_op functions along with their helpers: substitute, ramification_of and iram_of. The factor_riccati function requires us to find the zero of a polynomial that is irreducible in F, the domain of coefficients. Two approaches for this task were mentioned on the mailing list - one was to use the domain SimpleAlgebraicExtension, but this approach had a problem with return values. Thus, I went with the second approach which is also used by the integrator viz. use Expression R (usually Integer) as the base domain.
This week, I plan to modify the factor_* routines according to the comments received and implement the coprime index > 1 algorithm. This will be followed by the global factorization methods. For further updates, stay tuned!

Wednesday, 17 June 2015

Week 3

This week I fixed a small problem in the lifting code regarding how I was initializing the start_D variable. Also, it was mentioned on the mailing list that the lifting step should be done in a lazy way (i.e. on demand). So, I changed the lift_newton function to an incremental version and added routines to proceed according to the following pipeline mentioned by Waldek:

stream of [l_extra, r_extra] --> list of streams of coefficients
  --> list of Laurent series --> operator

The last two steps of the pipeline were quite easy to implement. The main work involved was in the first step. There is still an issue remaining for step 1 regarding how to handle zero series and I'm working on resolving that at the moment. The next task for me is to add documentation for all the undocumented functions relating to factorization via the Newton method. This week, I look forward to really starting to write code for the Riccati solution based functions. Take care.

Tuesday, 9 June 2015

Week 2

In the second week, I implemented the lift_newton function that is used to lift coprime index 1 factorizations. plug_delta, coeffx, coefs_operator and coefs_poly were the helper functions used in the process. coefs_operator takes a polynomial and returns an operator with the given valuation, while coefs_poly takes an operator and returns a polynomial after extracting its relevant part. plug_delta simply converts a polynomial to a differential operator by substituting δ = xD in place of x. coeffx(f, e) returns the coefficient of x^e in f while simultaneously substituting x in the place of δ. The initial version of lift_newton that I implemented was buggy, but thanks to the review on the mailing list, I was able to correct it. The plan now is to tie up any loose ends in the functions finished so far which all deal with factorization using the Newton method. If there are no more problems with them, then I'll go on to implement the functions related to factorization using a Riccati solution : factor_riccati and ramification_of. That's all from me. Good-bye, until next time.

Tuesday, 2 June 2015

GSoC 2015 - Week 1

My proposal "Factorization of LODOs in FriCAS - the van Hoeij algorithm" has been accepted for Google Summer of Code 2015. So, I'll again be working under lmonade this summer (and I'm glad to be working under the same umbrella organization). My mentors for this project are Waldek Hebisch and Ralf Hemmecke. The official coding period began last week (May 25) and so I have started to code. The source code can be found here.

I've finished the implementation of the newtonpolygon function that returns the extreme points of the Newton polygon, the associated slopes and Newton polynomials of an operator.
Further, I also implemented the factor_newton function that performs coprime index 1 factorizations for which the op_with_slope function was a helper. However, the lifting step of the algorithm is still to be done and that is what I'm working on at present. So, the plan for this week is to finish the algorithm for coprime index 1 factorizations by implementing lift_newton and coefs_operator. I was a little stuck about how to actually implement the lifting algorithm, but Waldek clarified my doubts to a large extent on the fricas-devel mailing list. I'm still not completely confident but am able to work on the functions, so I guess I'll finish the implementation first and then modify it according to the comments received. Okay, that's it for now. See you next week!

Wednesday, 20 August 2014

Week 11 - 'pencils down'

Sorry for not updating this blog for the last couple of weeks. I forgot about this after the semester began. I'll use this blog post to summarize the project progress in this month. Before that, however, I'd like to note that the current module does indeed perform LLL, LLL with removals and ULLL which was the primary goal of the project.

So, starting off where I left in the previous post, the augmentation of the identity to the basis matrix actually plays a part in bettering the numerical stability of ULLL. Hence, I modified the ULLL function to be similar to the one in flint-1.6.
Also, the default LLL functions (which are essentially wrappers for ULLL) were added. The LLL-reducedness checkers were also changed to implement the certification method introduced by Villard. It will not always work, but if it does the result is guaranteed. It is efficient, and much faster than using the provable precision in the L^2 algorithm or using exact arithmetic for testing.

The Gram matrix version of the provable is_reduced functions was also added.
This was followed by an implementation of a modified LLL reduction algorithm due to Storjohann which has better complexity in terms of the lattice dimension. However, it didn't fare well against the other LLL implementations (specifically classical LLL) during profiling. The mpf versions of the is_reduced functions were removed because they weren't proven and mpfr versions were written.
Adding a function which uses mpf and epsilons to emulate the behaviour of changing the rounding mode is a possibility as mentioned on flint-devel. Finally, the documentation was updated and now should (hopefully) explain about all the strategies and types used in this module (along with the function descriptions of course).

I'd like to thank my mentors, William Hart and Curtis Bright, for their valuable time, support and patience. This project wouldn't have been possible without their help. I'd also like to thank Fredrik Johansson for reviewing some of my code and Burcin Erocal from the lmonade organization for his advice. I'm grateful to the Google OSPO team for running the GSoC program. This has been a great summer and contributing to FLINT has been a tremendous learning experience for me.

Monday, 28 July 2014

Week 10

This week, I removed the redundant store_trans parameter from the context object. Now, the capturing matrix is always updated unless the user passes NULL as the value. Also, in the special case that the user inputs a d x d identity and the number of columns of the input basis is greater than the number of rows (i.e. the embedding dimension is greater than the lattice dimension), I avoid updating the vectors during reduction itself and instead do a matrix multiplication at the end.

Also, as the mpf and wrapper functions of the LLL subroutines are supposed to guarantee that the output is indeed LLL-reduced, we need to check this in the most efficient way. This means that a fp test for reducedness should be used prior to using the exact arithmetic version (which is slower). Thus, I've added is_reduced functions in the module which first test using doubles, then mpfs if the matrix is certified non-reduced in the first test and finally, fmpq.

An initial implementation of the ULLL function was also added. It is different from the one in flint-1.6 because it does not perform any adjoining operations to the matrix. Instead, the option to store the unimodular transformations is utilised here. Also, the original did not use recursion on the truncated data, but the current code does. However, it needs to be tested.

This week, I plan to add test code and documentation for the ULLL function and document the functions for checking LLL-reducedness.

Tuesday, 22 July 2014

Week 9

This week, I updated the removals code to use a heuristic lower bound, mentioned by Curtis on flint-devel, on the final GS norm while removing the last vector during the execution of the reduction algorithm (after a failed Lovasz test involving kappa = d - 1). This was required because the proof of a theorem in the L^2 paper assumes that the final vector is LLL-reduced while deriving the error bound on the accuracy of the norm. This, of course, doesn't matter in the case of standard LLL but matters here because we need to be sure about the accuracy, lest we may remove something useful.

Besides this, I also unified the code for the cases where the exact Gram matrix is computed or input as mentioned in the todo section of my previous post. This required factoring out the row exponents from the Gram matrix rather than the basis itself, because the latter is not always be available (i.e. fl->rt == GRAM).

Also, I changed the code dealing with the matrix capturing the unimodular transformations to not make any assumptions regarding its column dimension. Earlier, I was assuming U to be a d x d matrix which was to be updated to satisfy the relation B* = UB where B* is the basis obtained by LLL-reducing B, i.e. U was the change-of-basis matrix. However, now U can be any matrix with the same number of rows as B.

I think the main implementation of LLL and LLL with removals is completed now, modulo a few (hopefully minor) changes. So, I plan to at least start working on the ULLL function this week.

Monday, 14 July 2014

Week 8

The eighth week of GSoC involved a lot of debugging to find the reasons for the bad behaviour of the removals function. In theory, the last vector can be removed from the basis if its squared GS length becomes greater than the bound at any point during the execution of the algorithm. However, it is not so straightforward in practice if the GSO is known only approximately. I think this may be a reason why the version in flint-1.6 removed vectors at the end as the algorithm seems to show much better behaviour in this case, i.e. the norms were more accurate.

I added the wrapper function for  LLL with removals optimised for knapsack lattices. Test code for this was also written.  Knapsack LLL differs from the textbook version in the fact that it performs early size reductions on the input basis occasionally which speeds up things in the knapsack case. Speaking of LLL with removals, the code was modified to remove the numerical inaccuracies plaguing it. Earlier, due to a floating point approximation of the Gram Schmidt orthogonalisation being used, the norm was incorrectly flagged as being greater than the bound which led to the removal of some useful vectors. The documentation was also updated and is now up to date with the module. Thus, the prerequisites for ULLL have now been completed.

This week I plan to unify the code for performing LLL on the input matrix in the cases that it is a lattice basis and an exact Gram matrix is to be used for computing the GSO or it is the Gram matrix itself.

Tuesday, 8 July 2014

Week 7

This week, I added support for LLL with removals on Gram matrix. This finds application in vector rational number reconstruction. Also, the inaccuracy checks (incorrect Lovasz tests and too many Babai loops) were improved to be similar to those used in fpLLL.

Also, the LLL function specialized to knapsack-type lattices, for which the Babai functions were implemented last week was added. It performs early size reductions which tend to make things faster for knapsack problem type lattice bases. It is also a prerequisite for the ULLL with removals function. Another feature implemented was early removal of vectors in LLL with removals. Now, the vectors whose squared GS lengths are greater than the input bound are removed from the basis during the reduction algorithm itself to avoid unnecessary overhead involved in keeping them updated during the computations.

This week, I plan to write the wrapper function for knapsack LLL with removals and fix any loose ends which may be remaining. The plan is to get the existing module to ready before I start working on the actual ULLL algorithm itself.

Thursday, 3 July 2014

Week 6

Well, this week's update is late. Sorry for that. I'll try to summarize the preceding week here. I added the heuristic version of LLL with removals, along with it's test code and documentation. In the MPFR version in flint-1.6, the squared GS length is divided by 8 instead of 2 as mentioned in the comments. I don't know if this an overlook or extra precaution. If the latter, I see no reason for this though. As I mentioned in my previous post, however, I avoid division altogether. LLL with removals was completed when its arbitrary precision variant was added and the wrapper function written. Its return value is the new dimension of the basis to be considered for further computation.

This brings us to the third major and perhaps, the most important part of this project: implementing ULLL with removals. This is because of its property of sub-quadratic time complexity in the size of the entries. Also, it is numerically more stable. It isn't mentioned in literature and hence, Bill graciously offered to write a paper on this for reference. Before I start work on the actual ULLL function, however, I need to implement an LLL with removals optimised for knapsack type lattices as it is used in ULLL. This requires a few Babai-like functions as well. These procedures will only reduce the kappa'th vector against the vectors upto cur_kappa (which is an index before which the basis is assumed to be LLL reduced) and not kappa - 1. The Babai functions added differ in their way of computing the dot products like the former versions

Along with the progress reported on the mailing list, one important thing to do now is to update the documentation which is lagging behind. I hope I'll be able to find time to do this task this week.

"The documentation needs documentation."   -- a Bellevue Linux Users Group member, 2005

Monday, 23 June 2014

Week 5

The previous week was quite interesting. I wrote the wrapper function fmpz_lll_wrapper and documented it. Test code for it was also added. Thus, the first milestone of the project was reached. Improvements to the module implemented thus far were made according to mentor comments. In particular, is_reduced functions and fmpz_mat_lll were improved to avoid storing GS vectors as suggested by Curtis on the mailing list. The function fmpz_lll_wrapper should now provide functionality identical to fpLLL and flint-1.6, but an additional feature in this version is that the Gram matrix can also be supplied for reduction instead of the lattice basis. It should be useful because the only other software which I've seen that allows passing a Gram matrix as input is Magma, which isn't open source.

The next step is LLL with removals. There seem to be 2 definitions for LLL_with_removal in literature: Both versions accept a basis (say B) and a bound (say N). Now, the intuitive definition seems to be that B is LLL reduced with bound N if for every vector b in B, norm(b) <= N. However, according to these papers by Mark van Hoeij, et al., it actually deals with the length of the Gram-Schmidt (GS) vectors i.e. the last vector is removed if its GS length is greater than N. Of course, for computational simplicity (and accuracy) I accept the squared norm bounding the GS length. Another point to be observed is that in flint-1.6, the bound is compared with half the squared GS length, probably because a fp approximation of the GS lengths is used. This is done to avoid removing something valuable. Also, the documentation is a bit ambiguous as it mentions that the bound is for the "target vectors". I'm going with van Hoeij's definition because he mentions it in the context of factoring polynomials which applies to a possible use of LLL in FLINT.
I am not sure if LLL with removals requires a version for the Gram matrix as input, as the only mentions of it in literature relate to factoring polynomials which input a lattice basis to the procedure. So, I haven't written a Gram matrix variant for now. Although, I may implement it later, if it's good to have.
My version of LLL_with_removal works even when I directly compare the bound with the squared GS norm because I avoid conversion to doubles and instead compare fmpz_t's. This was ascertained from the test code for fmpz_lll_d_with_removal. Documentation for fmpz_lll_d_with_removal and fmpz_mat_is_reduced_with_removal in the corresponding modules was added.

This week, I plan to write the fmpz_lll_d_heuristic_with_removal function, along with its test code and documentation. If things are smooth, maybe I can even work on the arbitrary precision variant.

Monday, 16 June 2014

Week 4

This last week has been a productive one. I implemented check_babai_heuristic (the Babai version using mpfs) and documented it.  Helper functions for this were also implemented in the fmpz and fmpz_vec modules. The fmpz_lll_mpf2 function was also written. The "2" in the name of the function signifies that it takes the precision to be used for storing the temporary variables and the GSO data (also the approximate Gram matrix if fl->rt == APPROX) as arguments, or in other words mpf_init2 is used for initialising any intermediate mpf_t's used. The wrapper for this is fmpz_lll_mpf which increases the precision until the LLL is successful or God forbid, the precision maxes out. Test code for lll_mpf was added. Also, the test code now uses the fmpz_mat versions of is_reduced and is_reduced_gram which use exact arithmetic to check if a basis is LLL reduced and help cover edge cases.

Also, there's some good news to report. The lll_d and lll_d_heuristic functions now work on all test cases without failure for Z_BASIS as input matrix! With GRAM, they fail sometimes due to inadequate precision of double to store large values. I can confirm that fmpz_lll_d works for the 35 dimensional lattice on the web page for the L^2 paper by Nguyen and Stehle (tested against fplll-4.0.4). I also tested fmpz_lll_mpf with the 55-dimensional lattice which makes NTL's LLL_FP (with delta=0.99) loop forever. It works! :) The output produced is the same as that if the matrix is passed to fpLLL with the "-m proved" option.

This week, I look forward to writing the LLL wrapper fmpz_lll_wrapper and documenting it besides improving the part of the module implemented so far and fixing bugs, if any. I also plan to document those functions which were left undocumented and shift fmpq_mat_lll to the fmpz_mat module and rename it to avoid confusion.

Monday, 9 June 2014

Week 3

Henceforth, starts another exciting week of coding. This week I look forward to implementing arbitrary precision floating point LLL using the mpf_t data type provided by GMP (which is dependency for installing FLINT). According to the discussion on the mailing list, as it isn't a big issue, I've decided to get the code working with mpf for now as some helper functions are already implemented in the mpf_vec and mpf_mat modules (The other option is Arb which will be used eventually).

Last week, I wrote the heuristic version of LLL (which always uses heuristic dot products that attempt to detect cancellation) over doubles and worked on improving the textbook L^2 and Gram matrix variants. I think the textbook L^2 implementation needs more work before it can be proclaimed ready for use. Though it works on most of the test cases, there are still some corner cases to consider. For use in fmpz_lll_check_babai_heuristic, some changes were also made to the mpf_vec, fmpz, fmpz_vec modules to help facilitate conversion between fmpz and mpf and vice versa. Also, _mpf_vec_dot2 now signals possible cancellation through an integer return value.

Monday, 2 June 2014

Week 2

Two weeks have passed since GSoC started and it has been an eventful journey so far. I've implemented the fmpz_lll_d function which performs LLL over doubles on a basis matrix with integer (fmpz_t) entries using either an approximate Gram matrix (having floating point entries) or an exact Gram matrix (with multiprecision integer entries). An LLL context object has been created for supplying LLL parameters and other options to the function based on Bill sir's suggestion on the mailing list.

I wrote test code for fmpz_lll_heuristic_dot and thought the tests pass, I still have to try out the different methods (including using _d_vec_dot_heuristic) to see which one is faster and gives more reliable results. A few helper functions were implemented use in the fmpz_lll_d and its test code. Two such functions are fmpz_mat_get_d_mat for converting an fmpz_mat into a d_mat and fmpz_mat_rq_d for RQ decomposition in the fmpz_mat module. However, these require including d_mat.h in fmpz_mat.h leading to a lot of compiling every time d_mat.h is modified. I don't know if this is an issue or not.

One thing I noticed when testing the LLL function was that when the approximate Gram matrix is used, all tests pass when using randntrulike, randajtai and randsimdioph (with the bitlengths specified in the test code, of course). However, there is a need to shift to arbitrary precision for some cases with randintrel (bits <= 200). However, when the exact Gram matrix of the lattice basis is used, randntrulike fails sometimes as well. Is this to be expected, or is there an error in the implementation?

This week I plan to make changes to the code according to the mentor comments and write fmpz_lll_d_heuristic function, test code and documentation.

Tuesday, 27 May 2014

GSoC :- Week 1

My proposal "Implementing the LLL algorithm in FLINT" has been accepted for Google Summer of Code 2014. The official coding period began last week (May 19) and so I have started to code. The source code can be found here.

What did you work on this week?

Implemented the _fmpz_vec_dot function in fmpz_vec module for use in fmpz_lll_heuristic_dot.

- Implemented the fmpz_lll_heuristic_dot function for use in fmpz_lll_check_babai_heuristic_d.

- Implemented the _fmpz_vec_get_d_vec_2exp function in fmpz_vec module for use in fmpz_lll_check_babai.

- Wrote the fmpz_lll_check_babai function.

- Wrote the fmpz_lll_check_babai_heuristic_d function.

- Made changes to d_mat, d_vec modules and fmpq_mat implementation of classical LLL based on mentor comments.

What do you plan on doing next week?

- Change fmpz_lll_heuristic_dot to use the new _d_vec_dot heurstic function and test it.

- Document the Babai functions.

- Write the fmpz_lll_d function.

- Test, profile and document fmpz_lll_d.

- Increase the pace, maybe? :)

Are you stuck on anything?

I am not stuck because of these issues, but would certainly appreciate comments to improve the  code.

- How to check if a double is NaN?: IEEE standard specifies f != f will be true only if f is NaN and so, I use this as the check. However, compilers may optimize this out. This requires checking with different compilers. The condition cannot be changed to use isnan function in math.h as that was introduced in C99.

- Will +-0 cause a problem? -0.0 == 0.0 returns true, so it seems to be okay.

- Hackish way of finding ulps in _d_vec_dot_heurstic. Is there a better way not using functions like nextafter which is since C99?

- y = rint(x); is not ANSI C. I used if(x < 0) y = ceil(x-0.5); else y = floor(x+0.5); instead. Here's the output on my machine with the rounding mode as default.
rint(5.5) = 6.0 rint(-1.5) = -2.0
ceil(5.5-0.5) = 5.0 floor(5.5+0.5) = 6.0 ceil(-1.5-0.5) = -2.0 floor(-1.5+0.5) = -1.0