-
Notifications
You must be signed in to change notification settings - Fork 0
hanishkvc/go-apropos
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
############# Go Apropos ############# HanishKVC, 2022 Overview ######### General ======== go doc requires one to already know the standard package to look for, to get info about the same. However if one doesnt know which package may contain what they want, they will have to grep the src directory or do a web search. This provides something like the apropos command wrt man pages, but here it searchs for matching symbols/comments/packagenames from among the packages in the go source directory. NOTE: Symbol refers to const or var or type or func name. As I wanted to look at Go lang a bit, so found this need and thus this. However I havent really looked at Go or read through Go documentation, so this code could be as far away from the conventions and concepts in Go land. This is based on some quick random scanning of docs and go src followed by compilation errors and potentially flawed logical guess work. At same time, it should do the job and be useful when exploring Go lang. AutoCache Mode =============== To speed up normal use of the program, by default it works in auto cache mode, where it uses a previously created cache containing meta data wrt symbols / pkg names / comments, when searching for these. If the cache file is missing and or appears out of sync with the go source directory, then the cache file will be freshly created. However one can force the program into a non cache mode, if required, by passing the following args goapropos --autocache=false searchtoken goapropos --autocache=false --usecache=false searchtoken in which case it will parse through the system / specified go source dir. NOTE: Currently this logic uses the modtime wrt the go source directory as the version check. So if a new go source package install / update or go lang suite install / update occurs, 99% this modtime should change and inturn autocache logic should trigger a cache update. However if someone changes contents of some subdirectories only within the go src dir, then the src dir's modtime may not change. In order to udpate the cache in such a situation, one will have to manually either touch the go src dir's modtime or run goapropos --autocache=false --createcache NOTE: In autocache mode createcache and usecache flags will be manipulated by the autocache logic, overriding any user commandline setup wrt same. License ========= GPL, BSD Thank you all for all the fish :) Usage ###### Normal Use ============ NOTE: Remember to pass the named arguments/flags before any unnamed args to the program. One can specify the token/substring to match / search for wrt symbols in packages by using either the cmdline arg --find or by just specifing it in the cmdline ie. goapropos --find search_token OR goapropos search_token By default it tries to find matching exported symbols only from available packages. However If one wants the logic to use both internal and exported symbols of the packages when trying to find a match, one needs to specify the cmdline argument --allsymbols. This needs to be used either when creating a cache of the symbols and or when doing a noncache based search ie goapropos --autocache=false --createcache --allsymbols goapropos --autocache=false --allsymbols search_token By default it tries to search through all the packages in the identified go source directory. However if required one can filter the packages that will be searched by using --findpkg argument. The token given through findpkg argument will be used to filter the package names (including ~import path prefix~ potentially) for a match. goapropos --findpkg packagename_token search_token If one wants to get a list of package names, which match a given token, one can run goapropos with only the findpkg argument and no find argument. goapropos --findpkg packagename_search_token NOTE: Using --findpkg also prints the filenames corresponding to the matching packages. To get all the exported symbols of all the packages, use goapropos --find "" To get all the exported symbols of a specific package, use goapropos --findpkg "packagename$" --find "" If one wants to find symbols based on their comments if any, then they can use --findcmt to specify a match token wrt comment. Any symbols which contain comments that match the specified token will be shown to the user. NOTE: If the comment being searched for is found at a generic file level, rather than wrt a specific symbol within it, then the package name of the file is shown. goapropos --findcmt cmt_search_token ex: goapropos --findcmt "device" The tokens specified are used to match package name or the symbols or their comments as the case maybe by using a case insensitive search, by default. If one wants to use case sensitive matching, pass --casesensitive. By default the search token provided (be it wrt package names or symbols or comments) is treated as a regular expression. However if required one can change from regular expression to a simple if-string-contains-substring logic by specifying --matchmode contains goapropos search_token_regexp ex: goapropos fmt goapropos --matchmode contains search_token_substring ex: goapropos --matchmode contains fmt goapropos --matchmode regexp search_token_regexp ex: goapropos --matchmode regexp "fm.*t" ex: goapropos --matchmode regexp "fm+t" NOTE: by default the search tokens as well as the pkg names/symbols/comments will be converted to upper case if casesensitive search is disabled (which is the default). Any implications of this wrt regexp if any needs to be kept in mind. Inturn enabling casesensitive searching will disable this automatic upper case conversion and inturn will leave the search token, as well as the pkg names and symbols and comments as it is. Go Source directory ====================== When looking for go source directory, by default it uses src directory under GOROOT (or else /usr/share or /usr/local/share or /usr/lib which matchs the pattern "go-*" or "golang*") as the go source directory wrt packages to search for and or as the source for meta data saved into cache. If required one can explicitly set the go source directory by using the cmdline arg --basepath goapropos --basepath <base_path> search_token NOTE: Logic doesnt follow symbolic links under the go source directory NOTE: If the go source directory auto identified by goapropos is wrong, and one is required to set a new basepath, then basepath argument needs to be used always, so that autocaching remains in sync. Skipping files ================ One can skip files matching certain predefined substrings in their name or path by using --skipfiles. One can specify multiple matching tokens to filter out source files from different-paths/... by using skipfiles multiple times. --skipfiles "substring" (skip files containing substring in their path) goapropos --skipfiles "/src/cmd/" --skipfiles "/src/internal/" findme If autocache true -------------------- As the program may autoupdate the cache any time, if one needs to skip certain go source files always, then one is required to pass the skipfiles argument always. However If one wants the cache to contain these files, but at same time, if one wants to temporarily skip certain paths / files and search, then one will have to request the program to avoid using the cache and then search ie goapropos --autocache=false --skipfiles "path/to/skip" search_token NOTE: disabling autocache above is critical, because otherwise for any reason if autocache logic decides to udpate the cache, when one has passed skipfiles argument, then the cache will no longer contain data wrt these skipped files. NOTE: In some cases explicitly disabling autocache mode and inturn creating a new cache explicitly with the unwanted files skipped may allow one to avoid the need to pass the skipfiles each time. However this is not permanent and will get overridden, once the program autoupdates the cache file. goapropos --autocache=false --skipfiles "path/to/skip" --createcache Cache management and use =========================== To avoid having to parse the go source files each time the program is called, it supports the creation and inturn use of a cache, with required data. In autocache mode, which is the default, this cache is managed automatically. However for some reason if one wants to manually control caching, Then one will have to instruct the program to stop autocache management and inturn pass additional cache related arguments. Use the flag --createcache to create / update the cache of data wrt package symbols, paths and comments. goapropos --autocache=false --createcache Use the flag --usecache to use a previously created cache rather than freshly parsing through the go source files. goapropos --autocache=false --usecache search_token NOTE: One needs to first create the cache before trying to use it. When ever the go language package is updated, one will have to recreate the cache file, to match the same. TODO ###### DONE: A optional simple regular expression based token matching option has been added. DONE: Allow searching through package / identifier comments, if possible. Have added logic to extract comments at a basic level. TODO: Comments at the block level wrt const or var containing multiple definitions needs to be accounted. Maybe simplify by using parseDir on dirs and no need to look at the src files individually seperately. However this may not skip test go source files in them, parsing of which can be avoided by walking through files and calling parseFile on them, like current flow. THink of this later. DONE:Maybe add support to cache the package identifiers/paths/comments map. This will inturn require a cmdline argument to force rebuilding of this cache, when required. DONE: Compile regexp match tokens, later. Maybe later add support for searching multiple different comment tokens and or symbol tokens. Currently one can search for either a single symbol or single comment or one symbolORcomment match tokens together. DONE: Build seperate maps wrt each go routine, and then merge them at the end. This should allow the go routines to run without blocking when trying to update map / db, unlike today, when they block as they need to synchronise when trying to update a single map / db.. [DONE-NOTE: didnt gain much, if any, performance, bcas rather availability of multiple go routines for working with multiple files parallely, seems to be bypassing from this contention becoming the hot path.] DONE: Maintain Packages with path info wrt base dir for the package. Note ###### AST and Parsing ================= From a initial quick glance at golang source found go/ast and its Inspect function. Inturn to feed Inspect found parser.parseFile to parse go source files. However on using them found that no package ast node or comment related nodes (comment/commentgroup) was getting found at any level, by looking at the call back function of Inspect. Then there was also that mode argument to parseFile which I had not yet looked at. From another quick glance at source files in go/ast, go/doc and go/parser, as also looking at go doc parser I can see a parser.parseDir, which seems to return package nodes (as a given source directory could have multiple pkgs). Also found bits about the Mode type and inturn ParseComments. By using the go/ast and inturn the nodes that it extracts during inspecting of the go source files, realised that ast.Ident node is triggered for both own as well as others' (bcas the go source file refers to symbols from the packages it uses) symbols which is found in a go source file. WHile GenDecl/ ValueSpec/TypeSpec/FuncDecl nodes are triggered only for own symbols, which the go source file being inspected, defines. String Matchs =============== For simple string matching based goapropos/find searchs, the strings.Contains version was found to be twice as fast as the uncompiled-regexp version. However using compiled-re and inturn reuse of the re which is possible in this program's flow, makes the re version as fast as the strings.Contains version. The matcher interface added to support both strings.Contains and re.MatchString in a seemless manner, adds a tiny pico bit of overhead compared to direct use of strings.Contains. Duplicate Symbol Names ======================== Potentially the same symbol name could represent different things within a package. For example one could have a type and a method associated with the same or different type, to have the same symbol name. For now the type and comments wrt such duplicated symbol name is collated togehter into the same entry in the database. Inturn type tag within the type field is not duplicated. So also when printing the symbols, it may contain more than one type tag associated with it and the comment wrt it will be from across all the duplicates. May handle this situation differently in future. Changelog ############ Rather major changes 20220624 ========= fullcomments flag and logic to print comments wrt the symbols in a indented manner. Add trimmed comments wrt symbols into db and inturn cache. So also update the cache version, so that any existing old caches will be overwritten with newer cache with these trimmed comments. 20220622+ ========== Add basepath wrt package names, to help differentiate btw different packages having the same package name. Try keep the logic os agnostic, be it wrt path seps, as well as how the pkg's basepath is stored and shown (to always inc '/'). Print part of comments wrt symbols. Handle situation of duplicate symbol names in a package in a crude way for now. Add Program tag name and a cache file format version to the cache version file. The version number ensures that if there is a change to logic, which changes the cache file format, then the program can automatically recreate the cache file. Print pkgname on same line as pkg file paths and pkg symbols, along with spacing around to hopefully make it cleaner and easier to read. Make sorted and non sorted result prints match and be the cleaner version. 20220620+ ========== Cleanup compare a bit, built around structs embedding MatcherConfig and MatcherType specific members. Less dependency on globals and more on passed around args wrt Matcher and db. The hunt for go src files starts with GOROOT Switch to regexp as the default matchmode 20220617+ ========== Cleanup db related types and print and find Add simple test and benchmark helper functions. Update printing of find related results to be simple, clean and tagged wrt what it represents. Add sample Test and Benchmark functions. Add a Matcher interface with implementations for strings-contains and re-compiled. Compile the re in regexp match mode, this makes the re version match the simple contains version speed. Make the re matcher work with a ptr value to the underlying regexp.Regexp, rather than with a normal value to the Regexp, when using matcher methods. This squeezes out few msecs. More importantly However helped realise that the type to its supported interfaces matching during assignment doesnt do the equivalent of auto-adjust wrt method calling and a type or its pointer. 20220616 ========== Auto manage cache by default. Recover from insane (minimal check currently) / unavailable db. 20220615+ =========== Sorted find related results at end or as they are found. Add type info (ie Const/Var or Type or Func) wrt each Symbol. Differentiate within ValueSpec (ie Const or Var) 20220614+ ========== Use go routines to see how things go. Here the walking of dir is made parallel to the handling of the file. Logically this shouldnt and doesnt change performance much, rightly so. Rather the overhead with go routines, if any, makes the overall logic bit more slower compared to the NoGoRoutines version. Wrt each symbol store just the comments related string, thus simplifying the flow. This should simplify the json loading and inturn seems to speed up usecache based search by around 20-25%. Add support for Multiple Go Routines wrt file handling, so that even when there is a contention like trying to update the shared global db or a io delay like reading a file, there is some other go routine to make use of the available cpu/processing resources. This version is about 25% faster than the No Go routines version. Use independent maps wrt each go routine, so that there is no need for any contention to a shared global database when they are running. Then at the end build a merged global database. The multiple go routines seems to be hiding any contention wrt shared global db, as had been hoped. This version didnt change performance much. This also indicates that there is enough io bandwidth to spare on the test machine and potentially in general on other machines also, to allow the parallel go routines to munch on additional files. Make both raw source file parsing and cache based paths handle find++ queries in equivalently similar ways. 20220612 ========== Cache and use the Maps/Database created wrt Pkg symbols, paths and comments. User needs to pass named arguments to enable the creation as well as use of cache. 20220610 ========== Avoid populating the apropos's pkg symbols database with symbols from other pkgs used by a given go source file. IE avoid using ast.Ident node. A bit more flexible find go source directory logic. Link comment at the block level wrt consts or vars or types to the members of the block, so that a search for any part of such a comment will list all the members of that block. If a comment level search matches any of the generic level comments in any of the files belonging to that package, instead of to the comment specific to a symbol, then the package name will be shown. 20220608 ========== A almost basic level of go apropos logic has been implemented.
About
Something like apropos wrt go packages. It searchs through symbols or comments or package names, as the case may be.
Topics
Resources
Stars
Watchers
Forks
Packages 0
No packages published