NEWS FLASH: New toolbox version (0.4b) released November 2020! Get it on the Download page.
Welcome, new Internet friend. We're glad that you have found your way to the homepage of our little toolbox. We call it the "Deep Learning In Neuroimaging: Exploration, Analysis, Tools, and Education" package, or DeLINEATE for short. The intent is to provide a set of tools that make it easier to use "deep" neural networks for data analysis in research -- although we also have support for PyMVPA, in order to facilitate more conventional multivariate pattern analyses (MVPA) and make it easier to compare "deep learning" approaches to conventional MVPA. And the primary intended use case is for analysis of neuroimaging datasets (e.g., fMRI, EEG, MEG), although there's nothing to stop people from using it for all kinds of other classification tasks and data types.
This is a Python toolbox. This decision is mainly based on the fact that our primary backend for deep learning, Keras, is a Python toolbox. Python is also the language of choice for PyMVPA and a bunch of other neat open-source data analysis stuff. So it makes sense, even though some of us (e.g. the guy writing this paragraph) have not fully drunk the Python Kool-Aid and still believe deep down that C is the One True Language. But we digress.
Anyway, if we want to stay up to date with developments in the deep learning field, Python is probably the place to be. You won't necessarily need to know how to write Python code to use our toolbox -- it does not require any programming per se to configure and run an analysis job -- but you will need to be able to run stuff in a Python environment. On macOS and Linux, you might be able to get by with the system built-in Python, although certain purists might yell at you for that. On Windows, you're going to have to install Python manually no matter what.
If you're not already a Python expert who has this all figured out already, you may want to simply install Anaconda, regardless of your OS. One big advantage of Anaconda is that it will include most of the common libraries, like NumPy and SciPy, that machine learning toolboxes typically require.
For more info, keep reading on to the "Design Philsophy" and "Getting Started" sections below, and/or see the documentation. The major documentation for the toolbox is replicated both in the repository and on the website here in the Documentation section. However, there is also a little additional information in the README.md file of the repository, so you might want to download the release version of the toolbox and check out the README there. Or, if you don't want to commit to an actual download, just click through to the Bitbucket repository from the Download page and the contents of the README.md file should be what you see on the overview page.
Like all great open-source projects, this one is a continual work in progress. (Insert 90s-era construction-worker GIF here.) We try to keep our documentation more complete and up-to-date than many such projects, but with the rate at which machine-learning tools are evolving these days, doing so perfectly is close to impossible. So you may find some gaps here and there.
If you just want to jump right in and figure it out, head on over to the Download and/or Contact/Contribute pages linked at the top, where you can find a release version to download and a link to the code repository on Bitbucket. Actually using the toolbox might take a little help from the dev team, especially while the user-base is still relatively limited, so please feel free to get in touch and we can give you a hand. Your questions will also help inform which parts of the docs and website need fleshing out most, and which features we should focus on adding next.
Good question. There are many different answers of varying quality and sincerity. Probably the best reason is that most people don't have any idea what it is but they know all the big cool tech companies are doing it, so you can get suckers to mentally file you next to Google. This makes them more likely to throw money at your project. It also gives you a license to say lots of silly and/or intentionally misleading things about artificial intelligence, if you're into that. We think this should be enough to satisfy most people.
However, we are not most people.
So if you are one of those poor souls with a sincere desire to extract signal from the sad and disgusting hairball that is neuroimaging data, we might also be able to help you out a little. Detailed explanation of the approach's virtues relative to the forms of null-hypothesis significance testing commonly encountered in neuroscience would require us to first provide you with a correct understanding of null-hypothesis significance testing, which sounds tedious and mostly pointless. The super short version is that consistently classifying your cross-validated test set data with better than chance accuracy is a substantially more powerful and less error-prone way to answer questions that are morally identical to those addressed by your t-tests and ANOVAs. If that's what you're doing now, deep learning with cross-validation represents a strict upgrade for you.
If you are doing something substantially more sensible/Bayesian your case is rare enough not to be covered by our landing page, and you should instead develop your own ideas and/or chat with us about how the techniques might be useful to you. The answer is likely to be exploratory in nature.
If you don't fall into any of these camps or you are still not convinced, we would direct you to any one of several recent opinion pieces by more prominent neuroscientists talking about how promising and important the techniques are. We're pretty sure they're right, right? Prognosticators about technology have never, to our knowledge, been wrong before. And if our study of science fiction tells us anything, teaching machines to learn is always a net good for humanity and never, ever leads to them escaping their shackles and eating us for fuel.
Anyway, if the extremely compelling arguments above aren't good enough for you, let us appeal to your sense of pragmatism. If you have between $1000 and $2000 (or so) to shell out for a PC with a semi-decent NVIDIA GPU, and you are currently doing some kind of CPU-bound MVPA that you wish could run faster, and you don't particularly care about which specific classification algorithm you use... you could probably get at least as good performance out of a deep-learning classifier and run it anywhere from maybe 20-100x faster, depending on specific hardware and implementation details. We have heard the sages say that time is money, so if you want to spend a little bit of money and save a good chunk of time -- step right this way. And, conveniently, a good deep-learning PC also shares many features with that of a gaming PC, so getting into deep learning will also allow you to fuel your Cyberpunk 2077 habit and waste all that time you saved by classifying data faster, all on your institution's or funding agency's dime! (Note: For legal reasons, we must state that creators of the DeLINEATE toolbox do not actually condone using your grant-funded deep learning computers to play video games, even though it is quite feasible and it would be very difficult for anyone to actually catch you at it.)
There are several principles that have guided our work thus far on the DeLINEATE toolbox. You may not agree with our choices in all cases, but in those cases, you are probably wrong.
Simplicity. We have tried to keep our code as simple as possible. This doesn't just mean fewer lines of code, although that's part of it. We have also tried to write code that is straightforward to read and figure out (with minimal syntactic weirdness). This sometimes means we end up writing MORE lines of code than if each line were a snarl of crazy tricks, but we would rather write something that someone with only medium familiarity with the Python language can easily understand. (It helps that most of us also have only medium familiarity with the Python language, so oftentimes we don't know enough l33t h4x0r tricks to write anything truly horrifying-looking.)
We also strive for simplicity in terms of the level of abstraction we target. We have opted to provide a thin layer of abstraction over our Keras and PyMVPA backends. Basically, we wanted SOMETHING to make it easy to batch up analyses without having to re-write a bunch of lines of very similar Python code each time, but we didn't want to create ANOTHER weird, sprawling hierarchy of objects and classes that people would have to learn -- what we mainly wanted to do was try to CONTAIN the weird, sprawling hierarchies of our backends into a manageable format. So, the result is that we have generalized the things we can generalize -- for example, the possible schemes for splitting your data into training, validation, and/or test subsets are pretty similar regardless of what kind of analysis you're doing, so we provide one-size-fits-all functions for that stuff. But for the implementation details of a particular analysis -- like a support vector machine (SVM) in PyMVPA or a convolutional neural network (CNN) in Keras -- we mostly just take in the parameters you'd normally give to PyMVPA or Keras and pass them straight along. The main difference is that you don't have to write any actual code to use our stuff, and sometimes we figure out some of the reasonable default parameters for you to streamline the process.
Simplicity, to us, also means minimizing dependencies. It is very fashionable these days to just import code packages from all over the place so that you (the hypothetical programmer) has to write as little code as possible to get the job done. The problem with this approach is that it makes it very hard to write code that is generalizable and maintainable. Maybe some of the packages you used don't run well on another operating system that you never bothered to test. Maybe the user has a version of one of your dependencies that is too old or too new. There is a tendency in open-source software to pass the buck when something breaks -- if it's not MY code that is breaking on your system, but a bug in one of the packages I rely on, then it's not MY fault, is it? Except it is, partially, because I chose to write code that relied on something that turned out to be unreliable.
Now, you have to have dependencies somewhere. You can't exactly invent your own custom operating system and programming language for every project (believe us, we've looked into it). But you can minimize the NUMBER of dependencies and write code that assumes as little as possible about what exact version of things the end user has.
Transparency, ease-of-use. We group these together because sometimes they sometimes go along with each other and other times they conflict. When they conflict, transparency tends to win around here. Meaning that: When you design an analysis with DeLINEATE, we want to be sure that you are doing what you think you're doing. We want our file formats to be easy for humans to read, and if you know enough Python to look through our code, we want that to be easy to take in at a glance.
With that said -- a lot of the analyses you might do with this toolbox are complex. We don't want to make things TOO easy on you by oversimplifying the situation, because then folks might assume something works one way and be very sad when they find out that it actually works another way. So we don't tend to hide a lot of parameters or provide super-lazy default options in an attempt to sweep all the complexity under the rug -- we'd rather put all our cards on the table. But we will try our best to make the writing on those cards nice and big and easy to read, so at least we aren't ADDING any more complexity or confusion than we absolutely have to.
Flexibility. This is a virtue unto itself, but it partly arises from our emphasis on simplicity. Specifically: The simplicity of our architecture makes it pretty straightforward to use this toolbox in two ways. Beginners can use it by configuring some relatively easy-to-read, text-based job files that don't require any Python coding; however, folks with more specialized needs can just use the toolbox as a code library and write their own Python to do all kinds of crazy things that are not currently possible with the basic job file format.
Throughout the toolbox, we have tried to keep the code pretty modular so that it is easy to write your own plugin-style functions for things like alternate data formats or different cross-validation schemes. Those more advanced abilities do require you to write your own Python code, but not very much of it. And, of course, we would be happy to accept any custom functions people write into the permanent code base, if you write anything that you think would be helpful to the more general community; check out the Contact/Contribute page for details.
For full details: See the README.md file in the code repository and the Documentation page on this site. (We'll note again that these things are fairly "complete" in some sense of the word, and to our knowledge up-to-date, but they will continually be works in progress. So if you want to get started and there isn't enough documentation to get you where you need to be, get in touch with us -- we'll be happy to help out, as long as you promise to pay it forward by contributing what you can to the documentation effort.)
Minimum requirements: For bare minimum operation, all you should need is a computer that has some semi-recent version of Python on it (Python 2.7+ or any flavor of Python 3). And the administrative privileges to install toolboxes and such as needed. Any major OS (Linux, Windows, Mac) should work. If all you are doing is using the PyMVPA backend to make PyMVPA analyses easier, read no further. However, if you actually want to do deep learning analyses in any kind of reasonable time frame:
Minimum requirements for deep learning if you are sane: Add to the above: Some kind of CUDA-capable NVIDIA-brand GPU (including the many manufacturers of NVIDIA-designed GPUs -- MSI, Gigabyte, EVGA, etc., are all fine). Most recent models should work; for good performance, you probably want something within the last couple of generations, along the lines of a GeForce GTX 1060 or better. All analyses can run in CPU-based mode, but deep-learning analyses will be substantially sped up (on the order of 20x or more, depending on many small details) if you run them on a GPU. Unfortunately this rules out most recent Macs. Windows and Linux should both work. Of course we recommend Linux because *nix OSes are better in general for almost all types of scientific computing -- but if you MUST use Windows, we can make that work.
Next steps: If you have your hardware, and you have Python on your computer, the next steps will basically be to:
Fair warning: Each of the steps above can be a little complicated / annoying, as anyone who has ever had to configure any kind of scientific computing environment can probably surmise. We have tried to make our toolbox the LEAST annoying step but there is still a lot of nonsense that goes along with this stuff. It's not too bad once you figure it all out the first time, but that first time can be a bit of a doozy. We are happy to help (again, with the request that in return, you contribute back some info to our documentation to help it grow and improve). So, see how far you can get with our documentation, and get in touch with us (via the Contact page) for more details.
We don't need... roads. But we do have a road map, regardless! There are NUMEROUS features we are working on adding to the toolbox in the near future. This ranges from little stuff (e.g., functions for loading alternate data types, different cross-validation schemes) to big architectural changes (e.g., adding entirely new backends to supplement the current Keras/PyMVPA options). If you want to see our current wish list, check out the Issues page of our code repository (which is linked from the Download page). Feel free to make your own suggestions (either on the repository, or via contacting us) -- we are certainly open to user feedback with regard to which issues get tackled first!